diff options
author | CoprDistGit <infra@openeuler.org> | 2023-06-20 08:34:26 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-06-20 08:34:26 +0000 |
commit | f1fcadcc45ca50f5bd26061f1f3a0cc7ef765cf6 (patch) | |
tree | 701d99fd253977c8f98d273b965641e9aca9c381 | |
parent | 826ee2e188c09d3d844a9741ed344967d94e9a45 (diff) |
automatic import of python-simple-keyword-clustereropeneuler20.03
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-simple-keyword-clusterer.spec | 328 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 330 insertions, 0 deletions
@@ -0,0 +1 @@ +/simple_keyword_clusterer-1.3.tar.gz diff --git a/python-simple-keyword-clusterer.spec b/python-simple-keyword-clusterer.spec new file mode 100644 index 0000000..d15061d --- /dev/null +++ b/python-simple-keyword-clusterer.spec @@ -0,0 +1,328 @@ +%global _empty_manifest_terminate_build 0 +Name: python-simple-keyword-clusterer +Version: 1.3 +Release: 1 +Summary: Extract higher level clusters from keywords +License: MIT +URL: https://github.com/andrea-dagostino/simple_keyword_clusterer +Source0: https://mirrors.aliyun.com/pypi/web/packages/35/30/21f001506df400d86addedb0f235f471a8c584edc96c26b6457bd9ec9082/simple_keyword_clusterer-1.3.tar.gz +BuildArch: noarch + +Requires: python3-scikit-learn +Requires: python3-tqdm +Requires: python3-seaborn +Requires: python3-numpy +Requires: python3-nltk +Requires: python3-matplotlib +Requires: python3-pandas + +%description +# Simple Keyword Clusterer +A simple machine learning package to cluster keywords in higher-level groups. + +Example:<br> +*"Senior Frontend Engineer" --> "Frontend Engineer"*<br> +*"Junior Backend developer" --> "Backend developer"* +___ +## Installation +``` +pip install simple_keyword_clusterer +``` +## Usage +```python +# import the package +from simple_keyword_clusterer import Clusterer + +# read your keywords in list +with open("../my_keywords.txt", "r") as f: + data = f.read().splitlines() + +# instantiate object +clusterer = Clusterer() + +# apply clustering +df = clusterer.extract(data) + +print(df) +``` +<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/> + + +## Performance +The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score. + +You can specify the number of clusters yourself too + +```python +# instantiate object +clusterer = Clusterer(n_clusters=4) + +# apply clustering +df = clusterer.extract(data) +``` + +For best performance, try to reduce the variance of data by providing the same semantic context <br> +(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br> + +If items are clearly separable, the algorithm should still be able to provide a useable output. + +## Customization +You can customize the clustering mechanism through the files +- blacklist.txt +- to_normalize.txt + +If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file. + +The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance +``` +("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior") +``` +Simply add your tuples to use this functionality. + + +## Dependencies +- Scikit-learn +- Pandas +- Matplotlib +- Seaborn +- Numpy +- NLTK +- Tqdm + +Make sure to download NLTK English stopwords and punctuation with the command + +```python +nltk.download("stopwords") +nltk.download('punkt') +``` + +## Contact +If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com). + + + + +%package -n python3-simple-keyword-clusterer +Summary: Extract higher level clusters from keywords +Provides: python-simple-keyword-clusterer +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-simple-keyword-clusterer +# Simple Keyword Clusterer +A simple machine learning package to cluster keywords in higher-level groups. + +Example:<br> +*"Senior Frontend Engineer" --> "Frontend Engineer"*<br> +*"Junior Backend developer" --> "Backend developer"* +___ +## Installation +``` +pip install simple_keyword_clusterer +``` +## Usage +```python +# import the package +from simple_keyword_clusterer import Clusterer + +# read your keywords in list +with open("../my_keywords.txt", "r") as f: + data = f.read().splitlines() + +# instantiate object +clusterer = Clusterer() + +# apply clustering +df = clusterer.extract(data) + +print(df) +``` +<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/> + + +## Performance +The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score. + +You can specify the number of clusters yourself too + +```python +# instantiate object +clusterer = Clusterer(n_clusters=4) + +# apply clustering +df = clusterer.extract(data) +``` + +For best performance, try to reduce the variance of data by providing the same semantic context <br> +(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br> + +If items are clearly separable, the algorithm should still be able to provide a useable output. + +## Customization +You can customize the clustering mechanism through the files +- blacklist.txt +- to_normalize.txt + +If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file. + +The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance +``` +("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior") +``` +Simply add your tuples to use this functionality. + + +## Dependencies +- Scikit-learn +- Pandas +- Matplotlib +- Seaborn +- Numpy +- NLTK +- Tqdm + +Make sure to download NLTK English stopwords and punctuation with the command + +```python +nltk.download("stopwords") +nltk.download('punkt') +``` + +## Contact +If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com). + + + + +%package help +Summary: Development documents and examples for simple-keyword-clusterer +Provides: python3-simple-keyword-clusterer-doc +%description help +# Simple Keyword Clusterer +A simple machine learning package to cluster keywords in higher-level groups. + +Example:<br> +*"Senior Frontend Engineer" --> "Frontend Engineer"*<br> +*"Junior Backend developer" --> "Backend developer"* +___ +## Installation +``` +pip install simple_keyword_clusterer +``` +## Usage +```python +# import the package +from simple_keyword_clusterer import Clusterer + +# read your keywords in list +with open("../my_keywords.txt", "r") as f: + data = f.read().splitlines() + +# instantiate object +clusterer = Clusterer() + +# apply clustering +df = clusterer.extract(data) + +print(df) +``` +<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/> + + +## Performance +The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score. + +You can specify the number of clusters yourself too + +```python +# instantiate object +clusterer = Clusterer(n_clusters=4) + +# apply clustering +df = clusterer.extract(data) +``` + +For best performance, try to reduce the variance of data by providing the same semantic context <br> +(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br> + +If items are clearly separable, the algorithm should still be able to provide a useable output. + +## Customization +You can customize the clustering mechanism through the files +- blacklist.txt +- to_normalize.txt + +If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file. + +The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance +``` +("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior") +``` +Simply add your tuples to use this functionality. + + +## Dependencies +- Scikit-learn +- Pandas +- Matplotlib +- Seaborn +- Numpy +- NLTK +- Tqdm + +Make sure to download NLTK English stopwords and punctuation with the command + +```python +nltk.download("stopwords") +nltk.download('punkt') +``` + +## Contact +If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com). + + + + +%prep +%autosetup -n simple_keyword_clusterer-1.3 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-simple-keyword-clusterer -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3-1 +- Package Spec generated @@ -0,0 +1 @@ +4a0e71ee22fbf471beb54734580d6f92 simple_keyword_clusterer-1.3.tar.gz |