%global _empty_manifest_terminate_build 0 Name: python-simple-keyword-clusterer Version: 1.3 Release: 1 Summary: Extract higher level clusters from keywords License: MIT URL: https://github.com/andrea-dagostino/simple_keyword_clusterer Source0: https://mirrors.aliyun.com/pypi/web/packages/35/30/21f001506df400d86addedb0f235f471a8c584edc96c26b6457bd9ec9082/simple_keyword_clusterer-1.3.tar.gz BuildArch: noarch Requires: python3-scikit-learn Requires: python3-tqdm Requires: python3-seaborn Requires: python3-numpy Requires: python3-nltk Requires: python3-matplotlib Requires: python3-pandas %description # Simple Keyword Clusterer A simple machine learning package to cluster keywords in higher-level groups. Example:
*"Senior Frontend Engineer" --> "Frontend Engineer"*
*"Junior Backend developer" --> "Backend developer"* ___ ## Installation ``` pip install simple_keyword_clusterer ``` ## Usage ```python # import the package from simple_keyword_clusterer import Clusterer # read your keywords in list with open("../my_keywords.txt", "r") as f: data = f.read().splitlines() # instantiate object clusterer = Clusterer() # apply clustering df = clusterer.extract(data) print(df) ``` clustering_example ## Performance The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score. You can specify the number of clusters yourself too ```python # instantiate object clusterer = Clusterer(n_clusters=4) # apply clustering df = clusterer.extract(data) ``` For best performance, try to reduce the variance of data by providing the same semantic context
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords).
If items are clearly separable, the algorithm should still be able to provide a useable output. ## Customization You can customize the clustering mechanism through the files - blacklist.txt - to_normalize.txt If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file. The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance ``` ("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior") ``` Simply add your tuples to use this functionality. ## Dependencies - Scikit-learn - Pandas - Matplotlib - Seaborn - Numpy - NLTK - Tqdm Make sure to download NLTK English stopwords and punctuation with the command ```python nltk.download("stopwords") nltk.download('punkt') ``` ## Contact If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com). %package -n python3-simple-keyword-clusterer Summary: Extract higher level clusters from keywords Provides: python-simple-keyword-clusterer BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-simple-keyword-clusterer # Simple Keyword Clusterer A simple machine learning package to cluster keywords in higher-level groups. Example:
*"Senior Frontend Engineer" --> "Frontend Engineer"*
*"Junior Backend developer" --> "Backend developer"* ___ ## Installation ``` pip install simple_keyword_clusterer ``` ## Usage ```python # import the package from simple_keyword_clusterer import Clusterer # read your keywords in list with open("../my_keywords.txt", "r") as f: data = f.read().splitlines() # instantiate object clusterer = Clusterer() # apply clustering df = clusterer.extract(data) print(df) ``` clustering_example ## Performance The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score. You can specify the number of clusters yourself too ```python # instantiate object clusterer = Clusterer(n_clusters=4) # apply clustering df = clusterer.extract(data) ``` For best performance, try to reduce the variance of data by providing the same semantic context
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords).
If items are clearly separable, the algorithm should still be able to provide a useable output. ## Customization You can customize the clustering mechanism through the files - blacklist.txt - to_normalize.txt If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file. The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance ``` ("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior") ``` Simply add your tuples to use this functionality. ## Dependencies - Scikit-learn - Pandas - Matplotlib - Seaborn - Numpy - NLTK - Tqdm Make sure to download NLTK English stopwords and punctuation with the command ```python nltk.download("stopwords") nltk.download('punkt') ``` ## Contact If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com). %package help Summary: Development documents and examples for simple-keyword-clusterer Provides: python3-simple-keyword-clusterer-doc %description help # Simple Keyword Clusterer A simple machine learning package to cluster keywords in higher-level groups. Example:
*"Senior Frontend Engineer" --> "Frontend Engineer"*
*"Junior Backend developer" --> "Backend developer"* ___ ## Installation ``` pip install simple_keyword_clusterer ``` ## Usage ```python # import the package from simple_keyword_clusterer import Clusterer # read your keywords in list with open("../my_keywords.txt", "r") as f: data = f.read().splitlines() # instantiate object clusterer = Clusterer() # apply clustering df = clusterer.extract(data) print(df) ``` clustering_example ## Performance The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score. You can specify the number of clusters yourself too ```python # instantiate object clusterer = Clusterer(n_clusters=4) # apply clustering df = clusterer.extract(data) ``` For best performance, try to reduce the variance of data by providing the same semantic context
(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords).
If items are clearly separable, the algorithm should still be able to provide a useable output. ## Customization You can customize the clustering mechanism through the files - blacklist.txt - to_normalize.txt If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file. The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance ``` ("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior") ``` Simply add your tuples to use this functionality. ## Dependencies - Scikit-learn - Pandas - Matplotlib - Seaborn - Numpy - NLTK - Tqdm Make sure to download NLTK English stopwords and punctuation with the command ```python nltk.download("stopwords") nltk.download('punkt') ``` ## Contact If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com). %prep %autosetup -n simple_keyword_clusterer-1.3 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-simple-keyword-clusterer -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue Jun 20 2023 Python_Bot - 1.3-1 - Package Spec generated