From 62c83c65993dc48e1e2358137df95d0c113b6120 Mon Sep 17 00:00:00 2001 From: CoprDistGit Date: Fri, 5 May 2023 07:35:42 +0000 Subject: automatic import of python-fuzzy-pandas --- python-fuzzy-pandas.spec | 413 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 413 insertions(+) create mode 100644 python-fuzzy-pandas.spec (limited to 'python-fuzzy-pandas.spec') diff --git a/python-fuzzy-pandas.spec b/python-fuzzy-pandas.spec new file mode 100644 index 0000000..fd52b59 --- /dev/null +++ b/python-fuzzy-pandas.spec @@ -0,0 +1,413 @@ +%global _empty_manifest_terminate_build 0 +Name: python-fuzzy-pandas +Version: 0.1 +Release: 1 +Summary: Fuzzy matching in pandas using csvmatch +License: MIT +URL: http://github.com/jsoma/fuzzy_pandas +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/37/1c/e0e1ea616ff1d09a33b53915258dd5e4cf586aed6237358e3312a5c90be6/fuzzy_pandas-0.1.tar.gz +BuildArch: noarch + +Requires: python3-pandas +Requires: python3-csvmatch + +%description +# fuzzy_pandas + +A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes. + +## Installation + +``` +pip install fuzzy_pandas +``` + +## Usage + +To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as: + +``` +name,location,codename +George Smiley,London,Beggerman +Percy Alleline,London,Tinker +Roy Bland,London,Soldier +Toby Esterhase,Vienna,Poorman +Peter Guillam,Brixton,none +Bill Haydon,London,Tailor +Oliver Lacon,London,none +Jim Prideaux,Slovakia,none +Connie Sachs,Oxford,none +``` + +And another such as: + +``` +Person Name,Location +Maria Andreyevna Ostrakova,Russia +Otto Leipzig,Estonia +George SMILEY,London +Peter Guillam,Brixton +Konny Saks,Oxford +Saul Enderby,London +Sam Collins,Vietnam +Tony Esterhase,Vienna +Claus Kretzschmar,Hamburg +``` + +You can then find which names are in both files: + +```python +import pandas as pd +import fuzzy_pandas as fpd + +df1 = pd.read_csv("data1.csv") +df2 = pd.read_csv("data2.csv") + +matches = fpd.fuzzy_merge(df1, df2, + left_on=['name'], + right_on=['Person Name'], + ignore_case=True, + keep='match') + +print(matches) +``` + +|.|name|Person Name| +|---|---|---| +|0|George Smiley|George SMILEY| +|1|Peter Guillam|Peter Guillam| + +### Options + +Dumping this out of the code itself, apologies for lack of pretty formatting. + +* **left** : DataFrame +* **right** : DataFrame + - Object to merge left with +* **on** : str or list + - Column names to compare. These must be found in both DataFrames. +* **left_on** : str or list + - Column names to compare in the left DataFrame. +* **right_on** : str or list + - Column names to compare in the right DataFrame. +* **left_cols** : list, default None + - List of columns to preserve from the left DataFrame. + - Defaults to `left_on`. +* **right_cols** : list, default None + - List of columns to preserve from the right DataFrame. + - Defaults to `right_on`. +* **method** : str or list, default 'exact' + - Perform a fuzzy match, and an optional specified algorithm. + - Multiple algorithms can be specified which will apply to each field + respectively. + - Options: + * **exact**: exact matches + * **levenshtein**: string distance metric + * **jaro**: string distance metric + * **metaphone**: phoenetic matching algorithm + * **bilenko**: prompts for matches +* **threshold** : float or list, default `0.6` + - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively. +* **ignore_case** : bool, default False + - Ignore case (default is case-sensitive) +* **ignore_nonalpha** : bool, default False + - Ignore non-alphanumeric characters +* **ignore_nonlatin** : bool, default False + - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent +* **ignore_order_words** : bool, default False + - Ignore the order words are given in +* **ignore_order_letters** : bool, default False + - Ignore the order the letters are given in, regardless of word order +* **ignore_titles** : bool, default False + - Ignore a predefined list of name titles (such as Mr, Ms, etc) +* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' } +``` + +For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch). + + + +%package -n python3-fuzzy-pandas +Summary: Fuzzy matching in pandas using csvmatch +Provides: python-fuzzy-pandas +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-fuzzy-pandas +# fuzzy_pandas + +A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes. + +## Installation + +``` +pip install fuzzy_pandas +``` + +## Usage + +To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as: + +``` +name,location,codename +George Smiley,London,Beggerman +Percy Alleline,London,Tinker +Roy Bland,London,Soldier +Toby Esterhase,Vienna,Poorman +Peter Guillam,Brixton,none +Bill Haydon,London,Tailor +Oliver Lacon,London,none +Jim Prideaux,Slovakia,none +Connie Sachs,Oxford,none +``` + +And another such as: + +``` +Person Name,Location +Maria Andreyevna Ostrakova,Russia +Otto Leipzig,Estonia +George SMILEY,London +Peter Guillam,Brixton +Konny Saks,Oxford +Saul Enderby,London +Sam Collins,Vietnam +Tony Esterhase,Vienna +Claus Kretzschmar,Hamburg +``` + +You can then find which names are in both files: + +```python +import pandas as pd +import fuzzy_pandas as fpd + +df1 = pd.read_csv("data1.csv") +df2 = pd.read_csv("data2.csv") + +matches = fpd.fuzzy_merge(df1, df2, + left_on=['name'], + right_on=['Person Name'], + ignore_case=True, + keep='match') + +print(matches) +``` + +|.|name|Person Name| +|---|---|---| +|0|George Smiley|George SMILEY| +|1|Peter Guillam|Peter Guillam| + +### Options + +Dumping this out of the code itself, apologies for lack of pretty formatting. + +* **left** : DataFrame +* **right** : DataFrame + - Object to merge left with +* **on** : str or list + - Column names to compare. These must be found in both DataFrames. +* **left_on** : str or list + - Column names to compare in the left DataFrame. +* **right_on** : str or list + - Column names to compare in the right DataFrame. +* **left_cols** : list, default None + - List of columns to preserve from the left DataFrame. + - Defaults to `left_on`. +* **right_cols** : list, default None + - List of columns to preserve from the right DataFrame. + - Defaults to `right_on`. +* **method** : str or list, default 'exact' + - Perform a fuzzy match, and an optional specified algorithm. + - Multiple algorithms can be specified which will apply to each field + respectively. + - Options: + * **exact**: exact matches + * **levenshtein**: string distance metric + * **jaro**: string distance metric + * **metaphone**: phoenetic matching algorithm + * **bilenko**: prompts for matches +* **threshold** : float or list, default `0.6` + - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively. +* **ignore_case** : bool, default False + - Ignore case (default is case-sensitive) +* **ignore_nonalpha** : bool, default False + - Ignore non-alphanumeric characters +* **ignore_nonlatin** : bool, default False + - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent +* **ignore_order_words** : bool, default False + - Ignore the order words are given in +* **ignore_order_letters** : bool, default False + - Ignore the order the letters are given in, regardless of word order +* **ignore_titles** : bool, default False + - Ignore a predefined list of name titles (such as Mr, Ms, etc) +* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' } +``` + +For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch). + + + +%package help +Summary: Development documents and examples for fuzzy-pandas +Provides: python3-fuzzy-pandas-doc +%description help +# fuzzy_pandas + +A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes. + +## Installation + +``` +pip install fuzzy_pandas +``` + +## Usage + +To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as: + +``` +name,location,codename +George Smiley,London,Beggerman +Percy Alleline,London,Tinker +Roy Bland,London,Soldier +Toby Esterhase,Vienna,Poorman +Peter Guillam,Brixton,none +Bill Haydon,London,Tailor +Oliver Lacon,London,none +Jim Prideaux,Slovakia,none +Connie Sachs,Oxford,none +``` + +And another such as: + +``` +Person Name,Location +Maria Andreyevna Ostrakova,Russia +Otto Leipzig,Estonia +George SMILEY,London +Peter Guillam,Brixton +Konny Saks,Oxford +Saul Enderby,London +Sam Collins,Vietnam +Tony Esterhase,Vienna +Claus Kretzschmar,Hamburg +``` + +You can then find which names are in both files: + +```python +import pandas as pd +import fuzzy_pandas as fpd + +df1 = pd.read_csv("data1.csv") +df2 = pd.read_csv("data2.csv") + +matches = fpd.fuzzy_merge(df1, df2, + left_on=['name'], + right_on=['Person Name'], + ignore_case=True, + keep='match') + +print(matches) +``` + +|.|name|Person Name| +|---|---|---| +|0|George Smiley|George SMILEY| +|1|Peter Guillam|Peter Guillam| + +### Options + +Dumping this out of the code itself, apologies for lack of pretty formatting. + +* **left** : DataFrame +* **right** : DataFrame + - Object to merge left with +* **on** : str or list + - Column names to compare. These must be found in both DataFrames. +* **left_on** : str or list + - Column names to compare in the left DataFrame. +* **right_on** : str or list + - Column names to compare in the right DataFrame. +* **left_cols** : list, default None + - List of columns to preserve from the left DataFrame. + - Defaults to `left_on`. +* **right_cols** : list, default None + - List of columns to preserve from the right DataFrame. + - Defaults to `right_on`. +* **method** : str or list, default 'exact' + - Perform a fuzzy match, and an optional specified algorithm. + - Multiple algorithms can be specified which will apply to each field + respectively. + - Options: + * **exact**: exact matches + * **levenshtein**: string distance metric + * **jaro**: string distance metric + * **metaphone**: phoenetic matching algorithm + * **bilenko**: prompts for matches +* **threshold** : float or list, default `0.6` + - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively. +* **ignore_case** : bool, default False + - Ignore case (default is case-sensitive) +* **ignore_nonalpha** : bool, default False + - Ignore non-alphanumeric characters +* **ignore_nonlatin** : bool, default False + - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent +* **ignore_order_words** : bool, default False + - Ignore the order words are given in +* **ignore_order_letters** : bool, default False + - Ignore the order the letters are given in, regardless of word order +* **ignore_titles** : bool, default False + - Ignore a predefined list of name titles (such as Mr, Ms, etc) +* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' } +``` + +For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch). + + + +%prep +%autosetup -n fuzzy-pandas-0.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-fuzzy-pandas -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot - 0.1-1 +- Package Spec generated -- cgit v1.2.3