%global _empty_manifest_terminate_build 0 Name: python-fuzzy-pandas Version: 0.1 Release: 1 Summary: Fuzzy matching in pandas using csvmatch License: MIT URL: http://github.com/jsoma/fuzzy_pandas Source0: https://mirrors.nju.edu.cn/pypi/web/packages/37/1c/e0e1ea616ff1d09a33b53915258dd5e4cf586aed6237358e3312a5c90be6/fuzzy_pandas-0.1.tar.gz BuildArch: noarch Requires: python3-pandas Requires: python3-csvmatch %description # fuzzy_pandas A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes. ## Installation ``` pip install fuzzy_pandas ``` ## Usage To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as: ``` name,location,codename George Smiley,London,Beggerman Percy Alleline,London,Tinker Roy Bland,London,Soldier Toby Esterhase,Vienna,Poorman Peter Guillam,Brixton,none Bill Haydon,London,Tailor Oliver Lacon,London,none Jim Prideaux,Slovakia,none Connie Sachs,Oxford,none ``` And another such as: ``` Person Name,Location Maria Andreyevna Ostrakova,Russia Otto Leipzig,Estonia George SMILEY,London Peter Guillam,Brixton Konny Saks,Oxford Saul Enderby,London Sam Collins,Vietnam Tony Esterhase,Vienna Claus Kretzschmar,Hamburg ``` You can then find which names are in both files: ```python import pandas as pd import fuzzy_pandas as fpd df1 = pd.read_csv("data1.csv") df2 = pd.read_csv("data2.csv") matches = fpd.fuzzy_merge(df1, df2, left_on=['name'], right_on=['Person Name'], ignore_case=True, keep='match') print(matches) ``` |.|name|Person Name| |---|---|---| |0|George Smiley|George SMILEY| |1|Peter Guillam|Peter Guillam| ### Options Dumping this out of the code itself, apologies for lack of pretty formatting. * **left** : DataFrame * **right** : DataFrame - Object to merge left with * **on** : str or list - Column names to compare. These must be found in both DataFrames. * **left_on** : str or list - Column names to compare in the left DataFrame. * **right_on** : str or list - Column names to compare in the right DataFrame. * **left_cols** : list, default None - List of columns to preserve from the left DataFrame. - Defaults to `left_on`. * **right_cols** : list, default None - List of columns to preserve from the right DataFrame. - Defaults to `right_on`. * **method** : str or list, default 'exact' - Perform a fuzzy match, and an optional specified algorithm. - Multiple algorithms can be specified which will apply to each field respectively. - Options: * **exact**: exact matches * **levenshtein**: string distance metric * **jaro**: string distance metric * **metaphone**: phoenetic matching algorithm * **bilenko**: prompts for matches * **threshold** : float or list, default `0.6` - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively. * **ignore_case** : bool, default False - Ignore case (default is case-sensitive) * **ignore_nonalpha** : bool, default False - Ignore non-alphanumeric characters * **ignore_nonlatin** : bool, default False - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent * **ignore_order_words** : bool, default False - Ignore the order words are given in * **ignore_order_letters** : bool, default False - Ignore the order the letters are given in, regardless of word order * **ignore_titles** : bool, default False - Ignore a predefined list of name titles (such as Mr, Ms, etc) * **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' } ``` For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch). %package -n python3-fuzzy-pandas Summary: Fuzzy matching in pandas using csvmatch Provides: python-fuzzy-pandas BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-fuzzy-pandas # fuzzy_pandas A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes. ## Installation ``` pip install fuzzy_pandas ``` ## Usage To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as: ``` name,location,codename George Smiley,London,Beggerman Percy Alleline,London,Tinker Roy Bland,London,Soldier Toby Esterhase,Vienna,Poorman Peter Guillam,Brixton,none Bill Haydon,London,Tailor Oliver Lacon,London,none Jim Prideaux,Slovakia,none Connie Sachs,Oxford,none ``` And another such as: ``` Person Name,Location Maria Andreyevna Ostrakova,Russia Otto Leipzig,Estonia George SMILEY,London Peter Guillam,Brixton Konny Saks,Oxford Saul Enderby,London Sam Collins,Vietnam Tony Esterhase,Vienna Claus Kretzschmar,Hamburg ``` You can then find which names are in both files: ```python import pandas as pd import fuzzy_pandas as fpd df1 = pd.read_csv("data1.csv") df2 = pd.read_csv("data2.csv") matches = fpd.fuzzy_merge(df1, df2, left_on=['name'], right_on=['Person Name'], ignore_case=True, keep='match') print(matches) ``` |.|name|Person Name| |---|---|---| |0|George Smiley|George SMILEY| |1|Peter Guillam|Peter Guillam| ### Options Dumping this out of the code itself, apologies for lack of pretty formatting. * **left** : DataFrame * **right** : DataFrame - Object to merge left with * **on** : str or list - Column names to compare. These must be found in both DataFrames. * **left_on** : str or list - Column names to compare in the left DataFrame. * **right_on** : str or list - Column names to compare in the right DataFrame. * **left_cols** : list, default None - List of columns to preserve from the left DataFrame. - Defaults to `left_on`. * **right_cols** : list, default None - List of columns to preserve from the right DataFrame. - Defaults to `right_on`. * **method** : str or list, default 'exact' - Perform a fuzzy match, and an optional specified algorithm. - Multiple algorithms can be specified which will apply to each field respectively. - Options: * **exact**: exact matches * **levenshtein**: string distance metric * **jaro**: string distance metric * **metaphone**: phoenetic matching algorithm * **bilenko**: prompts for matches * **threshold** : float or list, default `0.6` - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively. * **ignore_case** : bool, default False - Ignore case (default is case-sensitive) * **ignore_nonalpha** : bool, default False - Ignore non-alphanumeric characters * **ignore_nonlatin** : bool, default False - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent * **ignore_order_words** : bool, default False - Ignore the order words are given in * **ignore_order_letters** : bool, default False - Ignore the order the letters are given in, regardless of word order * **ignore_titles** : bool, default False - Ignore a predefined list of name titles (such as Mr, Ms, etc) * **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' } ``` For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch). %package help Summary: Development documents and examples for fuzzy-pandas Provides: python3-fuzzy-pandas-doc %description help # fuzzy_pandas A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes. ## Installation ``` pip install fuzzy_pandas ``` ## Usage To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as: ``` name,location,codename George Smiley,London,Beggerman Percy Alleline,London,Tinker Roy Bland,London,Soldier Toby Esterhase,Vienna,Poorman Peter Guillam,Brixton,none Bill Haydon,London,Tailor Oliver Lacon,London,none Jim Prideaux,Slovakia,none Connie Sachs,Oxford,none ``` And another such as: ``` Person Name,Location Maria Andreyevna Ostrakova,Russia Otto Leipzig,Estonia George SMILEY,London Peter Guillam,Brixton Konny Saks,Oxford Saul Enderby,London Sam Collins,Vietnam Tony Esterhase,Vienna Claus Kretzschmar,Hamburg ``` You can then find which names are in both files: ```python import pandas as pd import fuzzy_pandas as fpd df1 = pd.read_csv("data1.csv") df2 = pd.read_csv("data2.csv") matches = fpd.fuzzy_merge(df1, df2, left_on=['name'], right_on=['Person Name'], ignore_case=True, keep='match') print(matches) ``` |.|name|Person Name| |---|---|---| |0|George Smiley|George SMILEY| |1|Peter Guillam|Peter Guillam| ### Options Dumping this out of the code itself, apologies for lack of pretty formatting. * **left** : DataFrame * **right** : DataFrame - Object to merge left with * **on** : str or list - Column names to compare. These must be found in both DataFrames. * **left_on** : str or list - Column names to compare in the left DataFrame. * **right_on** : str or list - Column names to compare in the right DataFrame. * **left_cols** : list, default None - List of columns to preserve from the left DataFrame. - Defaults to `left_on`. * **right_cols** : list, default None - List of columns to preserve from the right DataFrame. - Defaults to `right_on`. * **method** : str or list, default 'exact' - Perform a fuzzy match, and an optional specified algorithm. - Multiple algorithms can be specified which will apply to each field respectively. - Options: * **exact**: exact matches * **levenshtein**: string distance metric * **jaro**: string distance metric * **metaphone**: phoenetic matching algorithm * **bilenko**: prompts for matches * **threshold** : float or list, default `0.6` - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively. * **ignore_case** : bool, default False - Ignore case (default is case-sensitive) * **ignore_nonalpha** : bool, default False - Ignore non-alphanumeric characters * **ignore_nonlatin** : bool, default False - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent * **ignore_order_words** : bool, default False - Ignore the order words are given in * **ignore_order_letters** : bool, default False - Ignore the order the letters are given in, regardless of word order * **ignore_titles** : bool, default False - Ignore a predefined list of name titles (such as Mr, Ms, etc) * **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' } ``` For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch). %prep %autosetup -n fuzzy-pandas-0.1 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-fuzzy-pandas -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri May 05 2023 Python_Bot - 0.1-1 - Package Spec generated