summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-06-20 08:34:26 +0000
committerCoprDistGit <infra@openeuler.org>2023-06-20 08:34:26 +0000
commitf1fcadcc45ca50f5bd26061f1f3a0cc7ef765cf6 (patch)
tree701d99fd253977c8f98d273b965641e9aca9c381
parent826ee2e188c09d3d844a9741ed344967d94e9a45 (diff)
automatic import of python-simple-keyword-clustereropeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-simple-keyword-clusterer.spec328
-rw-r--r--sources1
3 files changed, 330 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..26a82db 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/simple_keyword_clusterer-1.3.tar.gz
diff --git a/python-simple-keyword-clusterer.spec b/python-simple-keyword-clusterer.spec
new file mode 100644
index 0000000..d15061d
--- /dev/null
+++ b/python-simple-keyword-clusterer.spec
@@ -0,0 +1,328 @@
+%global _empty_manifest_terminate_build 0
+Name: python-simple-keyword-clusterer
+Version: 1.3
+Release: 1
+Summary: Extract higher level clusters from keywords
+License: MIT
+URL: https://github.com/andrea-dagostino/simple_keyword_clusterer
+Source0: https://mirrors.aliyun.com/pypi/web/packages/35/30/21f001506df400d86addedb0f235f471a8c584edc96c26b6457bd9ec9082/simple_keyword_clusterer-1.3.tar.gz
+BuildArch: noarch
+
+Requires: python3-scikit-learn
+Requires: python3-tqdm
+Requires: python3-seaborn
+Requires: python3-numpy
+Requires: python3-nltk
+Requires: python3-matplotlib
+Requires: python3-pandas
+
+%description
+# Simple Keyword Clusterer
+A simple machine learning package to cluster keywords in higher-level groups.
+
+Example:<br>
+*"Senior Frontend Engineer" --> "Frontend Engineer"*<br>
+*"Junior Backend developer" --> "Backend developer"*
+___
+## Installation
+```
+pip install simple_keyword_clusterer
+```
+## Usage
+```python
+# import the package
+from simple_keyword_clusterer import Clusterer
+
+# read your keywords in list
+with open("../my_keywords.txt", "r") as f:
+ data = f.read().splitlines()
+
+# instantiate object
+clusterer = Clusterer()
+
+# apply clustering
+df = clusterer.extract(data)
+
+print(df)
+```
+<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/>
+
+
+## Performance
+The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.
+
+You can specify the number of clusters yourself too
+
+```python
+# instantiate object
+clusterer = Clusterer(n_clusters=4)
+
+# apply clustering
+df = clusterer.extract(data)
+```
+
+For best performance, try to reduce the variance of data by providing the same semantic context <br>
+(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br>
+
+If items are clearly separable, the algorithm should still be able to provide a useable output.
+
+## Customization
+You can customize the clustering mechanism through the files
+- blacklist.txt
+- to_normalize.txt
+
+If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.
+
+The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
+```
+("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
+```
+Simply add your tuples to use this functionality.
+
+
+## Dependencies
+- Scikit-learn
+- Pandas
+- Matplotlib
+- Seaborn
+- Numpy
+- NLTK
+- Tqdm
+
+Make sure to download NLTK English stopwords and punctuation with the command
+
+```python
+nltk.download("stopwords")
+nltk.download('punkt')
+```
+
+## Contact
+If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).
+
+
+
+
+%package -n python3-simple-keyword-clusterer
+Summary: Extract higher level clusters from keywords
+Provides: python-simple-keyword-clusterer
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-simple-keyword-clusterer
+# Simple Keyword Clusterer
+A simple machine learning package to cluster keywords in higher-level groups.
+
+Example:<br>
+*"Senior Frontend Engineer" --> "Frontend Engineer"*<br>
+*"Junior Backend developer" --> "Backend developer"*
+___
+## Installation
+```
+pip install simple_keyword_clusterer
+```
+## Usage
+```python
+# import the package
+from simple_keyword_clusterer import Clusterer
+
+# read your keywords in list
+with open("../my_keywords.txt", "r") as f:
+ data = f.read().splitlines()
+
+# instantiate object
+clusterer = Clusterer()
+
+# apply clustering
+df = clusterer.extract(data)
+
+print(df)
+```
+<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/>
+
+
+## Performance
+The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.
+
+You can specify the number of clusters yourself too
+
+```python
+# instantiate object
+clusterer = Clusterer(n_clusters=4)
+
+# apply clustering
+df = clusterer.extract(data)
+```
+
+For best performance, try to reduce the variance of data by providing the same semantic context <br>
+(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br>
+
+If items are clearly separable, the algorithm should still be able to provide a useable output.
+
+## Customization
+You can customize the clustering mechanism through the files
+- blacklist.txt
+- to_normalize.txt
+
+If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.
+
+The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
+```
+("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
+```
+Simply add your tuples to use this functionality.
+
+
+## Dependencies
+- Scikit-learn
+- Pandas
+- Matplotlib
+- Seaborn
+- Numpy
+- NLTK
+- Tqdm
+
+Make sure to download NLTK English stopwords and punctuation with the command
+
+```python
+nltk.download("stopwords")
+nltk.download('punkt')
+```
+
+## Contact
+If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).
+
+
+
+
+%package help
+Summary: Development documents and examples for simple-keyword-clusterer
+Provides: python3-simple-keyword-clusterer-doc
+%description help
+# Simple Keyword Clusterer
+A simple machine learning package to cluster keywords in higher-level groups.
+
+Example:<br>
+*"Senior Frontend Engineer" --> "Frontend Engineer"*<br>
+*"Junior Backend developer" --> "Backend developer"*
+___
+## Installation
+```
+pip install simple_keyword_clusterer
+```
+## Usage
+```python
+# import the package
+from simple_keyword_clusterer import Clusterer
+
+# read your keywords in list
+with open("../my_keywords.txt", "r") as f:
+ data = f.read().splitlines()
+
+# instantiate object
+clusterer = Clusterer()
+
+# apply clustering
+df = clusterer.extract(data)
+
+print(df)
+```
+<img src="https://github.com/Tangelus/simple_keyword_clusterer/raw/master/images/clustering_sample.png" alt="clustering_example" width="600"/>
+
+
+## Performance
+The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.
+
+You can specify the number of clusters yourself too
+
+```python
+# instantiate object
+clusterer = Clusterer(n_clusters=4)
+
+# apply clustering
+df = clusterer.extract(data)
+```
+
+For best performance, try to reduce the variance of data by providing the same semantic context <br>
+(the *job title* keywords file should remain coherent, in that it shouldn't contain other stuff like *gardening* keywords). <br>
+
+If items are clearly separable, the algorithm should still be able to provide a useable output.
+
+## Customization
+You can customize the clustering mechanism through the files
+- blacklist.txt
+- to_normalize.txt
+
+If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.
+
+The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance
+```
+("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")
+```
+Simply add your tuples to use this functionality.
+
+
+## Dependencies
+- Scikit-learn
+- Pandas
+- Matplotlib
+- Seaborn
+- Numpy
+- NLTK
+- Tqdm
+
+Make sure to download NLTK English stopwords and punctuation with the command
+
+```python
+nltk.download("stopwords")
+nltk.download('punkt')
+```
+
+## Contact
+If you feel like contacting me, do so and send me a mail. You can find my contact information on my [website](https://andreadagostino.com).
+
+
+
+
+%prep
+%autosetup -n simple_keyword_clusterer-1.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-simple-keyword-clusterer -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..df1c4bc
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+4a0e71ee22fbf471beb54734580d6f92 simple_keyword_clusterer-1.3.tar.gz