%global _empty_manifest_terminate_build 0
Name: python-simple-keyword-clusterer
Version: 1.3
Release: 1
Summary: Extract higher level clusters from keywords
License: MIT
URL: https://github.com/andrea-dagostino/simple_keyword_clusterer
Source0: https://mirrors.aliyun.com/pypi/web/packages/35/30/21f001506df400d86addedb0f235f471a8c584edc96c26b6457bd9ec9082/simple_keyword_clusterer-1.3.tar.gz
BuildArch: noarch
Requires: python3-scikit-learn
Requires: python3-tqdm
Requires: python3-seaborn
Requires: python3-numpy
Requires: python3-nltk
Requires: python3-matplotlib
Requires: python3-pandas
%description
# Simple Keyword Clusterer
A simple machine learning package to cluster keywords in higher-level groups.
Example:
*"Senior Frontend Engineer" --> "Frontend Engineer"*
*"Junior Backend developer" --> "Backend developer"*
___
## Installation
```
pip install simple_keyword_clusterer
```
## Usage
```python
# import the package
from simple_keyword_clusterer import Clusterer
# read your keywords in list
with open("../my_keywords.txt", "r") as f:
data = f.read().splitlines()
# instantiate object
clusterer = Clusterer()
# apply clustering
df = clusterer.extract(data)
print(df)
```
## Performance
The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.
You can specify the number of clusters yourself too
```python
# instantiate object
clusterer = Clusterer(n_clusters=4)
# apply clustering
df = clusterer.extract(data)
```
For best performance, try to reduce the variance of data by providing the same semantic context
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords).
If items are clearly separable, the algorithm should still be able to provide a useable output.
## Customization
You can customize the clustering mechanism through the files
- blacklist.txt
- to_normalize.txt
If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.
The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
```
("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
```
Simply add your tuples to use this functionality.
## Dependencies
- Scikit-learn
- Pandas
- Matplotlib
- Seaborn
- Numpy
- NLTK
- Tqdm
Make sure to download NLTK English stopwords and punctuation with the command
```python
nltk.download("stopwords")
nltk.download('punkt')
```
## Contact
If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).
%package -n python3-simple-keyword-clusterer
Summary: Extract higher level clusters from keywords
Provides: python-simple-keyword-clusterer
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
%description -n python3-simple-keyword-clusterer
# Simple Keyword Clusterer
A simple machine learning package to cluster keywords in higher-level groups.
Example:
*"Senior Frontend Engineer" --> "Frontend Engineer"*
*"Junior Backend developer" --> "Backend developer"*
___
## Installation
```
pip install simple_keyword_clusterer
```
## Usage
```python
# import the package
from simple_keyword_clusterer import Clusterer
# read your keywords in list
with open("../my_keywords.txt", "r") as f:
data = f.read().splitlines()
# instantiate object
clusterer = Clusterer()
# apply clustering
df = clusterer.extract(data)
print(df)
```
## Performance
The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.
You can specify the number of clusters yourself too
```python
# instantiate object
clusterer = Clusterer(n_clusters=4)
# apply clustering
df = clusterer.extract(data)
```
For best performance, try to reduce the variance of data by providing the same semantic context
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords).
If items are clearly separable, the algorithm should still be able to provide a useable output.
## Customization
You can customize the clustering mechanism through the files
- blacklist.txt
- to_normalize.txt
If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.
The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
```
("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
```
Simply add your tuples to use this functionality.
## Dependencies
- Scikit-learn
- Pandas
- Matplotlib
- Seaborn
- Numpy
- NLTK
- Tqdm
Make sure to download NLTK English stopwords and punctuation with the command
```python
nltk.download("stopwords")
nltk.download('punkt')
```
## Contact
If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).
%package help
Summary: Development documents and examples for simple-keyword-clusterer
Provides: python3-simple-keyword-clusterer-doc
%description help
# Simple Keyword Clusterer
A simple machine learning package to cluster keywords in higher-level groups.
Example:
*"Senior Frontend Engineer" --> "Frontend Engineer"*
*"Junior Backend developer" --> "Backend developer"*
___
## Installation
```
pip install simple_keyword_clusterer
```
## Usage
```python
# import the package
from simple_keyword_clusterer import Clusterer
# read your keywords in list
with open("../my_keywords.txt", "r") as f:
data = f.read().splitlines()
# instantiate object
clusterer = Clusterer()
# apply clustering
df = clusterer.extract(data)
print(df)
```
## Performance
The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.
You can specify the number of clusters yourself too
```python
# instantiate object
clusterer = Clusterer(n_clusters=4)
# apply clustering
df = clusterer.extract(data)
```
For best performance, try to reduce the variance of data by providing the same semantic context
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords).
If items are clearly separable, the algorithm should still be able to provide a useable output.
## Customization
You can customize the clustering mechanism through the files
- blacklist.txt
- to_normalize.txt
If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.
The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
```
("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
```
Simply add your tuples to use this functionality.
## Dependencies
- Scikit-learn
- Pandas
- Matplotlib
- Seaborn
- Numpy
- NLTK
- Tqdm
Make sure to download NLTK English stopwords and punctuation with the command
```python
nltk.download("stopwords")
nltk.download('punkt')
```
## Contact
If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).
%prep
%autosetup -n simple_keyword_clusterer-1.3
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-simple-keyword-clusterer -f filelist.lst
%dir %{python3_sitelib}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Tue Jun 20 2023 Python_Bot - 1.3-1
- Package Spec generated