%global _empty_manifest_terminate_build 0
Name:		python-fuzzy-pandas
Version:	0.1
Release:	1
Summary:	Fuzzy matching in pandas using csvmatch
License:	MIT
URL:		http://github.com/jsoma/fuzzy_pandas
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/37/1c/e0e1ea616ff1d09a33b53915258dd5e4cf586aed6237358e3312a5c90be6/fuzzy_pandas-0.1.tar.gz
BuildArch:	noarch

Requires:	python3-pandas
Requires:	python3-csvmatch

%description
# fuzzy_pandas

A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes.

## Installation

```
pip install fuzzy_pandas
```

## Usage

To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as:

```
name,location,codename
George Smiley,London,Beggerman
Percy Alleline,London,Tinker
Roy Bland,London,Soldier
Toby Esterhase,Vienna,Poorman
Peter Guillam,Brixton,none
Bill Haydon,London,Tailor
Oliver Lacon,London,none
Jim Prideaux,Slovakia,none
Connie Sachs,Oxford,none
```

And another such as:

```
Person Name,Location
Maria Andreyevna Ostrakova,Russia
Otto Leipzig,Estonia
George SMILEY,London
Peter Guillam,Brixton
Konny Saks,Oxford
Saul Enderby,London
Sam Collins,Vietnam
Tony Esterhase,Vienna
Claus Kretzschmar,Hamburg
```

You can then find which names are in both files:

```python
import pandas as pd
import fuzzy_pandas as fpd

df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")

matches = fpd.fuzzy_merge(df1, df2,
                          left_on=['name'],
                          right_on=['Person Name'],
                          ignore_case=True,
                          keep='match')

print(matches)
```

|.|name|Person Name|
|---|---|---|
|0|George Smiley|George SMILEY|
|1|Peter Guillam|Peter Guillam|

### Options

Dumping this out of the code itself, apologies for lack of pretty formatting.

* **left** : DataFrame
* **right** : DataFrame
    - Object to merge left with
* **on** : str or list
    - Column names to compare. These must be found in both DataFrames.
* **left_on** : str or list
    - Column names to compare in the left DataFrame.
* **right_on** : str or list
    - Column names to compare in the right DataFrame.
* **left_cols** : list, default None
    - List of columns to preserve from the left DataFrame.
    - Defaults to `left_on`.
* **right_cols** : list, default None
    - List of columns to preserve from the right DataFrame. 
    - Defaults to `right_on`.
* **method** : str or list, default 'exact'
    - Perform a fuzzy match, and an optional specified algorithm.
    - Multiple algorithms can be specified which will apply to each field
    respectively.
    - Options:
        * **exact**: exact matches
        * **levenshtein**: string distance metric
        * **jaro**: string distance metric
        * **metaphone**: phoenetic matching algorithm
        * **bilenko**: prompts for matches
* **threshold** : float or list, default `0.6`
    - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively.
* **ignore_case** : bool, default False
    - Ignore case (default is case-sensitive)
* **ignore_nonalpha** : bool, default False
    - Ignore non-alphanumeric characters
* **ignore_nonlatin** : bool, default False
    - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent
* **ignore_order_words** : bool, default False
    - Ignore the order words are given in
* **ignore_order_letters** : bool, default False
    - Ignore the order the letters are given in, regardless of word order
* **ignore_titles** : bool, default False
    - Ignore a predefined list of name titles (such as Mr, Ms, etc)
* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' }
```

For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch).


%package -n python3-fuzzy-pandas
Summary:	Fuzzy matching in pandas using csvmatch
Provides:	python-fuzzy-pandas
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-fuzzy-pandas
# fuzzy_pandas

A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes.

## Installation

```
pip install fuzzy_pandas
```

## Usage

To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as:

```
name,location,codename
George Smiley,London,Beggerman
Percy Alleline,London,Tinker
Roy Bland,London,Soldier
Toby Esterhase,Vienna,Poorman
Peter Guillam,Brixton,none
Bill Haydon,London,Tailor
Oliver Lacon,London,none
Jim Prideaux,Slovakia,none
Connie Sachs,Oxford,none
```

And another such as:

```
Person Name,Location
Maria Andreyevna Ostrakova,Russia
Otto Leipzig,Estonia
George SMILEY,London
Peter Guillam,Brixton
Konny Saks,Oxford
Saul Enderby,London
Sam Collins,Vietnam
Tony Esterhase,Vienna
Claus Kretzschmar,Hamburg
```

You can then find which names are in both files:

```python
import pandas as pd
import fuzzy_pandas as fpd

df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")

matches = fpd.fuzzy_merge(df1, df2,
                          left_on=['name'],
                          right_on=['Person Name'],
                          ignore_case=True,
                          keep='match')

print(matches)
```

|.|name|Person Name|
|---|---|---|
|0|George Smiley|George SMILEY|
|1|Peter Guillam|Peter Guillam|

### Options

Dumping this out of the code itself, apologies for lack of pretty formatting.

* **left** : DataFrame
* **right** : DataFrame
    - Object to merge left with
* **on** : str or list
    - Column names to compare. These must be found in both DataFrames.
* **left_on** : str or list
    - Column names to compare in the left DataFrame.
* **right_on** : str or list
    - Column names to compare in the right DataFrame.
* **left_cols** : list, default None
    - List of columns to preserve from the left DataFrame.
    - Defaults to `left_on`.
* **right_cols** : list, default None
    - List of columns to preserve from the right DataFrame. 
    - Defaults to `right_on`.
* **method** : str or list, default 'exact'
    - Perform a fuzzy match, and an optional specified algorithm.
    - Multiple algorithms can be specified which will apply to each field
    respectively.
    - Options:
        * **exact**: exact matches
        * **levenshtein**: string distance metric
        * **jaro**: string distance metric
        * **metaphone**: phoenetic matching algorithm
        * **bilenko**: prompts for matches
* **threshold** : float or list, default `0.6`
    - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively.
* **ignore_case** : bool, default False
    - Ignore case (default is case-sensitive)
* **ignore_nonalpha** : bool, default False
    - Ignore non-alphanumeric characters
* **ignore_nonlatin** : bool, default False
    - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent
* **ignore_order_words** : bool, default False
    - Ignore the order words are given in
* **ignore_order_letters** : bool, default False
    - Ignore the order the letters are given in, regardless of word order
* **ignore_titles** : bool, default False
    - Ignore a predefined list of name titles (such as Mr, Ms, etc)
* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' }
```

For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch).


%package help
Summary:	Development documents and examples for fuzzy-pandas
Provides:	python3-fuzzy-pandas-doc
%description help
# fuzzy_pandas

A razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes.

## Installation

```
pip install fuzzy_pandas
```

## Usage

To borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as:

```
name,location,codename
George Smiley,London,Beggerman
Percy Alleline,London,Tinker
Roy Bland,London,Soldier
Toby Esterhase,Vienna,Poorman
Peter Guillam,Brixton,none
Bill Haydon,London,Tailor
Oliver Lacon,London,none
Jim Prideaux,Slovakia,none
Connie Sachs,Oxford,none
```

And another such as:

```
Person Name,Location
Maria Andreyevna Ostrakova,Russia
Otto Leipzig,Estonia
George SMILEY,London
Peter Guillam,Brixton
Konny Saks,Oxford
Saul Enderby,London
Sam Collins,Vietnam
Tony Esterhase,Vienna
Claus Kretzschmar,Hamburg
```

You can then find which names are in both files:

```python
import pandas as pd
import fuzzy_pandas as fpd

df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")

matches = fpd.fuzzy_merge(df1, df2,
                          left_on=['name'],
                          right_on=['Person Name'],
                          ignore_case=True,
                          keep='match')

print(matches)
```

|.|name|Person Name|
|---|---|---|
|0|George Smiley|George SMILEY|
|1|Peter Guillam|Peter Guillam|

### Options

Dumping this out of the code itself, apologies for lack of pretty formatting.

* **left** : DataFrame
* **right** : DataFrame
    - Object to merge left with
* **on** : str or list
    - Column names to compare. These must be found in both DataFrames.
* **left_on** : str or list
    - Column names to compare in the left DataFrame.
* **right_on** : str or list
    - Column names to compare in the right DataFrame.
* **left_cols** : list, default None
    - List of columns to preserve from the left DataFrame.
    - Defaults to `left_on`.
* **right_cols** : list, default None
    - List of columns to preserve from the right DataFrame. 
    - Defaults to `right_on`.
* **method** : str or list, default 'exact'
    - Perform a fuzzy match, and an optional specified algorithm.
    - Multiple algorithms can be specified which will apply to each field
    respectively.
    - Options:
        * **exact**: exact matches
        * **levenshtein**: string distance metric
        * **jaro**: string distance metric
        * **metaphone**: phoenetic matching algorithm
        * **bilenko**: prompts for matches
* **threshold** : float or list, default `0.6`
    - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively.
* **ignore_case** : bool, default False
    - Ignore case (default is case-sensitive)
* **ignore_nonalpha** : bool, default False
    - Ignore non-alphanumeric characters
* **ignore_nonlatin** : bool, default False
    - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent
* **ignore_order_words** : bool, default False
    - Ignore the order words are given in
* **ignore_order_letters** : bool, default False
    - Ignore the order the letters are given in, regardless of word order
* **ignore_titles** : bool, default False
    - Ignore a predefined list of name titles (such as Mr, Ms, etc)
* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' }
```

For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch).


%prep
%autosetup -n fuzzy-pandas-0.1

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-fuzzy-pandas -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1-1
- Package Spec generated