automatic import of python-fuzzy-pandasopeneuler20.03

author: CoprDistGit <infra@openeuler.org> 2023-05-05 07:35:42 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-05 07:35:42 +0000
commit: 62c83c65993dc48e1e2358137df95d0c113b6120 (patch)
tree: bf0d368a7791d6845c96b0fb26da2e9a1a1696b2
parent: 379cb1e33b29f7034e1115cb66277507759c210f (diff)
3 files changed, 415 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..fd48eea 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/fuzzy_pandas-0.1.tar.gz
diff --git a/python-fuzzy-pandas.spec b/python-fuzzy-pandas.spec
new file mode 100644
index 0000000..fd52b59
--- /dev/null
+++ b/python-fuzzy-pandas.spec
@@ -0,0 +1,413 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-fuzzy-pandas
+Version:	0.1
+Release:	1
+Summary:	Fuzzy matching in pandas using csvmatch
+License:	MIT
+URL:		http://github.com/jsoma/fuzzy_pandas
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/37/1c/e0e1ea616ff1d09a33b53915258dd5e4cf586aed6237358e3312a5c90be6/fuzzy_pandas-0.1.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-pandas
+Requires:	python3-csvmatch
+
+%description
+# fuzzy_pandas
+
+A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes.
+
+## Installation
+
+```
+pip install fuzzy_pandas
+```
+
+## Usage
+
+To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as:
+
+```
+name,location,codename
+George Smiley,London,Beggerman
+Percy Alleline,London,Tinker
+Roy Bland,London,Soldier
+Toby Esterhase,Vienna,Poorman
+Peter Guillam,Brixton,none
+Bill Haydon,London,Tailor
+Oliver Lacon,London,none
+Jim Prideaux,Slovakia,none
+Connie Sachs,Oxford,none
+```
+
+And another such as:
+
+```
+Person Name,Location
+Maria Andreyevna Ostrakova,Russia
+Otto Leipzig,Estonia
+George SMILEY,London
+Peter Guillam,Brixton
+Konny Saks,Oxford
+Saul Enderby,London
+Sam Collins,Vietnam
+Tony Esterhase,Vienna
+Claus Kretzschmar,Hamburg
+```
+
+You can then find which names are in both files:
+
+```python
+import pandas as pd
+import fuzzy_pandas as fpd
+
+df1 = pd.read_csv("data1.csv")
+df2 = pd.read_csv("data2.csv")
+
+matches = fpd.fuzzy_merge(df1, df2,
+                          left_on=['name'],
+                          right_on=['Person Name'],
+                          ignore_case=True,
+                          keep='match')
+
+print(matches)
+```
+
+|.|name|Person Name|
+|---|---|---|
+|0|George Smiley|George SMILEY|
+|1|Peter Guillam|Peter Guillam|
+
+### Options
+
+Dumping this out of the code itself, apologies for lack of pretty formatting.
+
+* **left** : DataFrame
+* **right** : DataFrame
+    - Object to merge left with
+* **on** : str or list
+    - Column names to compare. These must be found in both DataFrames.
+* **left_on** : str or list
+    - Column names to compare in the left DataFrame.
+* **right_on** : str or list
+    - Column names to compare in the right DataFrame.
+* **left_cols** : list, default None
+    - List of columns to preserve from the left DataFrame.
+    - Defaults to `left_on`.
+* **right_cols** : list, default None
+    - List of columns to preserve from the right DataFrame. 
+    - Defaults to `right_on`.
+* **method** : str or list, default 'exact'
+    - Perform a fuzzy match, and an optional specified algorithm.
+    - Multiple algorithms can be specified which will apply to each field
+    respectively.
+    - Options:
+        * **exact**: exact matches
+        * **levenshtein**: string distance metric
+        * **jaro**: string distance metric
+        * **metaphone**: phoenetic matching algorithm
+        * **bilenko**: prompts for matches
+* **threshold** : float or list, default `0.6`
+    - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively.
+* **ignore_case** : bool, default False
+    - Ignore case (default is case-sensitive)
+* **ignore_nonalpha** : bool, default False
+    - Ignore non-alphanumeric characters
+* **ignore_nonlatin** : bool, default False
+    - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent
+* **ignore_order_words** : bool, default False
+    - Ignore the order words are given in
+* **ignore_order_letters** : bool, default False
+    - Ignore the order the letters are given in, regardless of word order
+* **ignore_titles** : bool, default False
+    - Ignore a predefined list of name titles (such as Mr, Ms, etc)
+* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' }
+```
+
+For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch).
+
+
+
+%package -n python3-fuzzy-pandas
+Summary:	Fuzzy matching in pandas using csvmatch
+Provides:	python-fuzzy-pandas
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-fuzzy-pandas
+# fuzzy_pandas
+
+A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes.
+
+## Installation
+
+```
+pip install fuzzy_pandas
+```
+
+## Usage
+
+To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as:
+
+```
+name,location,codename
+George Smiley,London,Beggerman
+Percy Alleline,London,Tinker
+Roy Bland,London,Soldier
+Toby Esterhase,Vienna,Poorman
+Peter Guillam,Brixton,none
+Bill Haydon,London,Tailor
+Oliver Lacon,London,none
+Jim Prideaux,Slovakia,none
+Connie Sachs,Oxford,none
+```
+
+And another such as:
+
+```
+Person Name,Location
+Maria Andreyevna Ostrakova,Russia
+Otto Leipzig,Estonia
+George SMILEY,London
+Peter Guillam,Brixton
+Konny Saks,Oxford
+Saul Enderby,London
+Sam Collins,Vietnam
+Tony Esterhase,Vienna
+Claus Kretzschmar,Hamburg
+```
+
+You can then find which names are in both files:
+
+```python
+import pandas as pd
+import fuzzy_pandas as fpd
+
+df1 = pd.read_csv("data1.csv")
+df2 = pd.read_csv("data2.csv")
+
+matches = fpd.fuzzy_merge(df1, df2,
+                          left_on=['name'],
+                          right_on=['Person Name'],
+                          ignore_case=True,
+                          keep='match')
+
+print(matches)
+```
+
+|.|name|Person Name|
+|---|---|---|
+|0|George Smiley|George SMILEY|
+|1|Peter Guillam|Peter Guillam|
+
+### Options
+
+Dumping this out of the code itself, apologies for lack of pretty formatting.
+
+* **left** : DataFrame
+* **right** : DataFrame
+    - Object to merge left with
+* **on** : str or list
+    - Column names to compare. These must be found in both DataFrames.
+* **left_on** : str or list
+    - Column names to compare in the left DataFrame.
+* **right_on** : str or list
+    - Column names to compare in the right DataFrame.
+* **left_cols** : list, default None
+    - List of columns to preserve from the left DataFrame.
+    - Defaults to `left_on`.
+* **right_cols** : list, default None
+    - List of columns to preserve from the right DataFrame. 
+    - Defaults to `right_on`.
+* **method** : str or list, default 'exact'
+    - Perform a fuzzy match, and an optional specified algorithm.
+    - Multiple algorithms can be specified which will apply to each field
+    respectively.
+    - Options:
+        * **exact**: exact matches
+        * **levenshtein**: string distance metric
+        * **jaro**: string distance metric
+        * **metaphone**: phoenetic matching algorithm
+        * **bilenko**: prompts for matches
+* **threshold** : float or list, default `0.6`
+    - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively.
+* **ignore_case** : bool, default False
+    - Ignore case (default is case-sensitive)
+* **ignore_nonalpha** : bool, default False
+    - Ignore non-alphanumeric characters
+* **ignore_nonlatin** : bool, default False
+    - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent
+* **ignore_order_words** : bool, default False
+    - Ignore the order words are given in
+* **ignore_order_letters** : bool, default False
+    - Ignore the order the letters are given in, regardless of word order
+* **ignore_titles** : bool, default False
+    - Ignore a predefined list of name titles (such as Mr, Ms, etc)
+* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' }
+```
+
+For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch).
+
+
+
+%package help
+Summary:	Development documents and examples for fuzzy-pandas
+Provides:	python3-fuzzy-pandas-doc
+%description help
+# fuzzy_pandas
+
+A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes.
+
+## Installation
+
+```
+pip install fuzzy_pandas
+```
+
+## Usage
+
+To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as:
+
+```
+name,location,codename
+George Smiley,London,Beggerman
+Percy Alleline,London,Tinker
+Roy Bland,London,Soldier
+Toby Esterhase,Vienna,Poorman
+Peter Guillam,Brixton,none
+Bill Haydon,London,Tailor
+Oliver Lacon,London,none
+Jim Prideaux,Slovakia,none
+Connie Sachs,Oxford,none
+```
+
+And another such as:
+
+```
+Person Name,Location
+Maria Andreyevna Ostrakova,Russia
+Otto Leipzig,Estonia
+George SMILEY,London
+Peter Guillam,Brixton
+Konny Saks,Oxford
+Saul Enderby,London
+Sam Collins,Vietnam
+Tony Esterhase,Vienna
+Claus Kretzschmar,Hamburg
+```
+
+You can then find which names are in both files:
+
+```python
+import pandas as pd
+import fuzzy_pandas as fpd
+
+df1 = pd.read_csv("data1.csv")
+df2 = pd.read_csv("data2.csv")
+
+matches = fpd.fuzzy_merge(df1, df2,
+                          left_on=['name'],
+                          right_on=['Person Name'],
+                          ignore_case=True,
+                          keep='match')
+
+print(matches)
+```
+
+|.|name|Person Name|
+|---|---|---|
+|0|George Smiley|George SMILEY|
+|1|Peter Guillam|Peter Guillam|
+
+### Options
+
+Dumping this out of the code itself, apologies for lack of pretty formatting.
+
+* **left** : DataFrame
+* **right** : DataFrame
+    - Object to merge left with
+* **on** : str or list
+    - Column names to compare. These must be found in both DataFrames.
+* **left_on** : str or list
+    - Column names to compare in the left DataFrame.
+* **right_on** : str or list
+    - Column names to compare in the right DataFrame.
+* **left_cols** : list, default None
+    - List of columns to preserve from the left DataFrame.
+    - Defaults to `left_on`.
+* **right_cols** : list, default None
+    - List of columns to preserve from the right DataFrame. 
+    - Defaults to `right_on`.
+* **method** : str or list, default 'exact'
+    - Perform a fuzzy match, and an optional specified algorithm.
+    - Multiple algorithms can be specified which will apply to each field
+    respectively.
+    - Options:
+        * **exact**: exact matches
+        * **levenshtein**: string distance metric
+        * **jaro**: string distance metric
+        * **metaphone**: phoenetic matching algorithm
+        * **bilenko**: prompts for matches
+* **threshold** : float or list, default `0.6`
+    - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively.
+* **ignore_case** : bool, default False
+    - Ignore case (default is case-sensitive)
+* **ignore_nonalpha** : bool, default False
+    - Ignore non-alphanumeric characters
+* **ignore_nonlatin** : bool, default False
+    - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent
+* **ignore_order_words** : bool, default False
+    - Ignore the order words are given in
+* **ignore_order_letters** : bool, default False
+    - Ignore the order the letters are given in, regardless of word order
+* **ignore_titles** : bool, default False
+    - Ignore a predefined list of name titles (such as Mr, Ms, etc)
+* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' }
+```
+
+For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch).
+
+
+
+%prep
+%autosetup -n fuzzy-pandas-0.1
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-fuzzy-pandas -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..3030396
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+70b0e9aa8b147a283bbcf0e2e7f6b61f  fuzzy_pandas-0.1.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-05 07:35:42 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-05 07:35:42 +0000
commit	62c83c65993dc48e1e2358137df95d0c113b6120 (patch)
tree	bf0d368a7791d6845c96b0fb26da2e9a1a1696b2
parent	379cb1e33b29f7034e1115cb66277507759c210f (diff)