%global _empty_manifest_terminate_build 0
Name:		python-simple-keyword-clusterer
Version:	1.3
Release:	1
Summary:	Extract higher level clusters from keywords
License:	MIT
URL:		https://github.com/andrea-dagostino/simple_keyword_clusterer
Source0:	https://mirrors.aliyun.com/pypi/web/packages/35/30/21f001506df400d86addedb0f235f471a8c584edc96c26b6457bd9ec9082/simple_keyword_clusterer-1.3.tar.gz
BuildArch:	noarch

Requires:	python3-scikit-learn
Requires:	python3-tqdm
Requires:	python3-seaborn
Requires:	python3-numpy
Requires:	python3-nltk
Requires:	python3-matplotlib
Requires:	python3-pandas

%description
# Simple Keyword Clusterer
A simple machine learning package to cluster keywords in higher-level groups.

Example:<br>
*"Senior Frontend Engineer" --> "Frontend Engineer"*<br>
*"Junior Backend developer" --> "Backend developer"*
___
## Installation
```
pip install simple_keyword_clusterer
```
## Usage
```python
# import the package
from simple_keyword_clusterer import Clusterer

# read your keywords in list
with open("../my_keywords.txt", "r") as f:
    data = f.read().splitlines()

# instantiate object
clusterer = Clusterer()

# apply clustering
df = clusterer.extract(data)

print(df)
```
<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/>


## Performance
The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.

You can specify the number of clusters yourself too

```python
# instantiate object
clusterer = Clusterer(n_clusters=4)

# apply clustering
df = clusterer.extract(data)
```

For best performance, try to reduce the variance of data by providing the same semantic context <br>
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br>

If items are clearly separable, the algorithm should still be able to provide a useable output.

## Customization
You can customize the clustering mechanism through the files 
- blacklist.txt
- to_normalize.txt

If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.

The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
```
("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
```
Simply add your tuples to use this functionality.


## Dependencies
- Scikit-learn
- Pandas
- Matplotlib
- Seaborn
- Numpy
- NLTK
- Tqdm

Make sure to download NLTK English stopwords and punctuation with the command

```python
nltk.download("stopwords")
nltk.download('punkt')
```

## Contact
If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).


%package -n python3-simple-keyword-clusterer
Summary:	Extract higher level clusters from keywords
Provides:	python-simple-keyword-clusterer
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-simple-keyword-clusterer
# Simple Keyword Clusterer
A simple machine learning package to cluster keywords in higher-level groups.

Example:<br>
*"Senior Frontend Engineer" --> "Frontend Engineer"*<br>
*"Junior Backend developer" --> "Backend developer"*
___
## Installation
```
pip install simple_keyword_clusterer
```
## Usage
```python
# import the package
from simple_keyword_clusterer import Clusterer

# read your keywords in list
with open("../my_keywords.txt", "r") as f:
    data = f.read().splitlines()

# instantiate object
clusterer = Clusterer()

# apply clustering
df = clusterer.extract(data)

print(df)
```
<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/>


## Performance
The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.

You can specify the number of clusters yourself too

```python
# instantiate object
clusterer = Clusterer(n_clusters=4)

# apply clustering
df = clusterer.extract(data)
```

For best performance, try to reduce the variance of data by providing the same semantic context <br>
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br>

If items are clearly separable, the algorithm should still be able to provide a useable output.

## Customization
You can customize the clustering mechanism through the files 
- blacklist.txt
- to_normalize.txt

If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.

The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
```
("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
```
Simply add your tuples to use this functionality.


## Dependencies
- Scikit-learn
- Pandas
- Matplotlib
- Seaborn
- Numpy
- NLTK
- Tqdm

Make sure to download NLTK English stopwords and punctuation with the command

```python
nltk.download("stopwords")
nltk.download('punkt')
```

## Contact
If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).


%package help
Summary:	Development documents and examples for simple-keyword-clusterer
Provides:	python3-simple-keyword-clusterer-doc
%description help
# Simple Keyword Clusterer
A simple machine learning package to cluster keywords in higher-level groups.

Example:<br>
*"Senior Frontend Engineer" --> "Frontend Engineer"*<br>
*"Junior Backend developer" --> "Backend developer"*
___
## Installation
```
pip install simple_keyword_clusterer
```
## Usage
```python
# import the package
from simple_keyword_clusterer import Clusterer

# read your keywords in list
with open("../my_keywords.txt", "r") as f:
    data = f.read().splitlines()

# instantiate object
clusterer = Clusterer()

# apply clustering
df = clusterer.extract(data)

print(df)
```
<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/>


## Performance
The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.

You can specify the number of clusters yourself too

```python
# instantiate object
clusterer = Clusterer(n_clusters=4)

# apply clustering
df = clusterer.extract(data)
```

For best performance, try to reduce the variance of data by providing the same semantic context <br>
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br>

If items are clearly separable, the algorithm should still be able to provide a useable output.

## Customization
You can customize the clustering mechanism through the files 
- blacklist.txt
- to_normalize.txt

If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.

The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
```
("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
```
Simply add your tuples to use this functionality.


## Dependencies
- Scikit-learn
- Pandas
- Matplotlib
- Seaborn
- Numpy
- NLTK
- Tqdm

Make sure to download NLTK English stopwords and punctuation with the command

```python
nltk.download("stopwords")
nltk.download('punkt')
```

## Contact
If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).


%prep
%autosetup -n simple_keyword_clusterer-1.3

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-simple-keyword-clusterer -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3-1
- Package Spec generated