diff options
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-gpym-tm.spec | 567 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 569 insertions, 0 deletions
@@ -0,0 +1 @@ +/GPyM_TM-3.0.1.tar.gz diff --git a/python-gpym-tm.spec b/python-gpym-tm.spec new file mode 100644 index 0000000..d183ebc --- /dev/null +++ b/python-gpym-tm.spec @@ -0,0 +1,567 @@ +%global _empty_manifest_terminate_build 0 +Name: python-GPyM-TM +Version: 3.0.1 +Release: 1 +Summary: The following package enables users to perform text modelling +License: MIT License +URL: https://github.com/jrmazarura/GPM +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/4f/63/1468a4e7e5e6890ddc3cf3879d2edbad7b35b7dc563c0df3280afb406644/GPyM_TM-3.0.1.tar.gz +BuildArch: noarch + + +%description +# [GPyM_TM](https://github.com/jrmazarura/GPM) + +**GPyM_TM** is a Python package to perform topic modelling, either through the use of a Dirichlet multinomial mixture model, or a Poisson model. Each of the above models is available within the package in a separate class, namely GSDMM utilizes the Dirichlet multinomial mixture model, while GPM makes use of the Poisson model to perform the text clustering respectively. The package is also available on [Pypi](https://pypi.org/project/GPyM-TM/3.0.0/). + +## Preamble +The aim of topic modelling is to extract latent topics from large corpora. GSDMM [1] assumes each document belongs to a single topic, which is a suitable assumption for some short texts. Given an initial number of topics, K, this algorithm clusters documents and extracts the topical structures present within the corpus. If K is set to a high value, then the model will also automatically learn the number of clusters. + +[1] Yin, J. and Wang, J., 2014, August. A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233-242). + +## Getting Started: + +The package is available [online](https://pypi.org/project/GPyM-TM/) for use within Python 3 enviroments. + +The installation can be performed through the use of a standard 'pip' install command, as provided below: + +`pip install GPyM-TM` + +## Prerequisites: + +The package has several dependencies, namely: + +* numpy +* random +* math +* pandas +* re +* nltk +* gensim +* scipy + +# GSDMM + +## Function and class description: + +The class is named **GSDMM**, while the function itself is named **DMM**. + +The function can take 6 possible arguments, two of which are required, and the remaining 4 being optional. + +### The required arguments are: + +* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. +* **nTopics** - the number of topics. + +### The optional requirements are: + +* **alpha**, **beta** - these are the distribution specific parameters.(**The defaults for both of these parameters are 0.1.**) +* **nTopWords** - number of top words per a topic.(**The default is 10.**) +* **iters** - number of Gibbs sampler iterations.(**The default is 15.**) + +## Output: + +The function provides several components of output, namely: +* **psi** - topic x word matrix. +* **theta** - document x topic matrix. +* **topics** - the top words per topic. +* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments. +* **Final k** - the final number of selected topics. +* **coherence** - the coherence score, which is a performance measure. +* **selected_theta** +* **selected_psi** + +# GPM + +## Function and class description: + +The class is named **GPM**, while the function itself is named **GPM**. + +The function can take 8 possible arguments, two of which are required, and the remaining 6 being optional. + +### The required arguments are: + +* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. +* **nTopics** - the number of topics. + +### The optional requirements are: + +* **alpha**, **beta** and **gam** - these are the distribution specific parameters.(**The defaults for these parameters are alpha = 0.001, beta = 0.001 and gam = 0.1 respectively.**) +* **nTopWords** - number of top words per a topic.(**The default is 10.**) +* **iters** - number of Gibbs sampler iterations.(**The default is 15.**) +* **N** - this is a parameter used to normalize the document lengths, which is required for the Poisson model. + +## Output: + +The function provides several components of output, namely: +* **psi** - topic x word matrix. +* **theta** - document x topic matrix. +* **topics** - the top words per topic. +* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments. +* **Final k** - the final number of selected topics. +* **coherence** - the coherence score, which is a performance measure. +* **selected_theta** +* **selected_psi** + +# Example Usage: + +A more comprehensive [tutorial](https://github.com/CAIR-ZA/GPyM_TM/blob/master/Tutorial.ipynb) is also available. + +### Installation; + +Run the following command within a Python command window: + +`pip install GPym_TM` + +### Implementation; + +Import the package into the relevant python script, with the following: + +`from GSDMM import DMM` +`from GPM import GPM` + +> Call the class: + +#### Possible examples of calling the GSDMM function are as follows: + +`data_DMM = GSDMM.DMM(corpus, nTopics)` + +`data_DMM = GSDMM.DMM(corpus, nTopics, alpha = 0.25, beta = 0.15, nTopWords = 12, iters =5)` + +#### Possible examples of calling the GPM function are as follows: + +`data_GPM = GPM.GPM(corpus, nTopics)` + +`data_GPM = GPM.GPM(corpus, nTopics, alpha = 0.002, beta = 0.03, gam = 0.06, nTopWords = 12, iters = 7, N = 8)` + +### Results; + +The output obtained for the Dirichlet multinomial mixture model appears as follows: + + + +While, the output obtained for the Poisson model appears as follows: + + + +## Built With: + +[Google Collab](https://colab.research.google.com/notebooks/intro.ipynb) - Web framework + +[Python](https://www.python.org/) - Programming language of choice + +[Pypi](https://pypi.org/) - Distribution + +## Authors: + +[Jocelyn Mazarura](https://github.com/jrmazarura/GPM) + + +## Co-Authors: + +[Alta de Waal](https://github.com/altadewaal) + +[Ricardo Marques](https://github.com/RicSalgado) + + +## License: + +This project is licensed under the MIT License - see the LICENSE file for details. + + +## Acknowledgments: + +University of Pretoria + + + + + +%package -n python3-GPyM-TM +Summary: The following package enables users to perform text modelling +Provides: python-GPyM-TM +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-GPyM-TM +# [GPyM_TM](https://github.com/jrmazarura/GPM) + +**GPyM_TM** is a Python package to perform topic modelling, either through the use of a Dirichlet multinomial mixture model, or a Poisson model. Each of the above models is available within the package in a separate class, namely GSDMM utilizes the Dirichlet multinomial mixture model, while GPM makes use of the Poisson model to perform the text clustering respectively. The package is also available on [Pypi](https://pypi.org/project/GPyM-TM/3.0.0/). + +## Preamble +The aim of topic modelling is to extract latent topics from large corpora. GSDMM [1] assumes each document belongs to a single topic, which is a suitable assumption for some short texts. Given an initial number of topics, K, this algorithm clusters documents and extracts the topical structures present within the corpus. If K is set to a high value, then the model will also automatically learn the number of clusters. + +[1] Yin, J. and Wang, J., 2014, August. A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233-242). + +## Getting Started: + +The package is available [online](https://pypi.org/project/GPyM-TM/) for use within Python 3 enviroments. + +The installation can be performed through the use of a standard 'pip' install command, as provided below: + +`pip install GPyM-TM` + +## Prerequisites: + +The package has several dependencies, namely: + +* numpy +* random +* math +* pandas +* re +* nltk +* gensim +* scipy + +# GSDMM + +## Function and class description: + +The class is named **GSDMM**, while the function itself is named **DMM**. + +The function can take 6 possible arguments, two of which are required, and the remaining 4 being optional. + +### The required arguments are: + +* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. +* **nTopics** - the number of topics. + +### The optional requirements are: + +* **alpha**, **beta** - these are the distribution specific parameters.(**The defaults for both of these parameters are 0.1.**) +* **nTopWords** - number of top words per a topic.(**The default is 10.**) +* **iters** - number of Gibbs sampler iterations.(**The default is 15.**) + +## Output: + +The function provides several components of output, namely: +* **psi** - topic x word matrix. +* **theta** - document x topic matrix. +* **topics** - the top words per topic. +* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments. +* **Final k** - the final number of selected topics. +* **coherence** - the coherence score, which is a performance measure. +* **selected_theta** +* **selected_psi** + +# GPM + +## Function and class description: + +The class is named **GPM**, while the function itself is named **GPM**. + +The function can take 8 possible arguments, two of which are required, and the remaining 6 being optional. + +### The required arguments are: + +* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. +* **nTopics** - the number of topics. + +### The optional requirements are: + +* **alpha**, **beta** and **gam** - these are the distribution specific parameters.(**The defaults for these parameters are alpha = 0.001, beta = 0.001 and gam = 0.1 respectively.**) +* **nTopWords** - number of top words per a topic.(**The default is 10.**) +* **iters** - number of Gibbs sampler iterations.(**The default is 15.**) +* **N** - this is a parameter used to normalize the document lengths, which is required for the Poisson model. + +## Output: + +The function provides several components of output, namely: +* **psi** - topic x word matrix. +* **theta** - document x topic matrix. +* **topics** - the top words per topic. +* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments. +* **Final k** - the final number of selected topics. +* **coherence** - the coherence score, which is a performance measure. +* **selected_theta** +* **selected_psi** + +# Example Usage: + +A more comprehensive [tutorial](https://github.com/CAIR-ZA/GPyM_TM/blob/master/Tutorial.ipynb) is also available. + +### Installation; + +Run the following command within a Python command window: + +`pip install GPym_TM` + +### Implementation; + +Import the package into the relevant python script, with the following: + +`from GSDMM import DMM` +`from GPM import GPM` + +> Call the class: + +#### Possible examples of calling the GSDMM function are as follows: + +`data_DMM = GSDMM.DMM(corpus, nTopics)` + +`data_DMM = GSDMM.DMM(corpus, nTopics, alpha = 0.25, beta = 0.15, nTopWords = 12, iters =5)` + +#### Possible examples of calling the GPM function are as follows: + +`data_GPM = GPM.GPM(corpus, nTopics)` + +`data_GPM = GPM.GPM(corpus, nTopics, alpha = 0.002, beta = 0.03, gam = 0.06, nTopWords = 12, iters = 7, N = 8)` + +### Results; + +The output obtained for the Dirichlet multinomial mixture model appears as follows: + + + +While, the output obtained for the Poisson model appears as follows: + + + +## Built With: + +[Google Collab](https://colab.research.google.com/notebooks/intro.ipynb) - Web framework + +[Python](https://www.python.org/) - Programming language of choice + +[Pypi](https://pypi.org/) - Distribution + +## Authors: + +[Jocelyn Mazarura](https://github.com/jrmazarura/GPM) + + +## Co-Authors: + +[Alta de Waal](https://github.com/altadewaal) + +[Ricardo Marques](https://github.com/RicSalgado) + + +## License: + +This project is licensed under the MIT License - see the LICENSE file for details. + + +## Acknowledgments: + +University of Pretoria + + + + + +%package help +Summary: Development documents and examples for GPyM-TM +Provides: python3-GPyM-TM-doc +%description help +# [GPyM_TM](https://github.com/jrmazarura/GPM) + +**GPyM_TM** is a Python package to perform topic modelling, either through the use of a Dirichlet multinomial mixture model, or a Poisson model. Each of the above models is available within the package in a separate class, namely GSDMM utilizes the Dirichlet multinomial mixture model, while GPM makes use of the Poisson model to perform the text clustering respectively. The package is also available on [Pypi](https://pypi.org/project/GPyM-TM/3.0.0/). + +## Preamble +The aim of topic modelling is to extract latent topics from large corpora. GSDMM [1] assumes each document belongs to a single topic, which is a suitable assumption for some short texts. Given an initial number of topics, K, this algorithm clusters documents and extracts the topical structures present within the corpus. If K is set to a high value, then the model will also automatically learn the number of clusters. + +[1] Yin, J. and Wang, J., 2014, August. A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233-242). + +## Getting Started: + +The package is available [online](https://pypi.org/project/GPyM-TM/) for use within Python 3 enviroments. + +The installation can be performed through the use of a standard 'pip' install command, as provided below: + +`pip install GPyM-TM` + +## Prerequisites: + +The package has several dependencies, namely: + +* numpy +* random +* math +* pandas +* re +* nltk +* gensim +* scipy + +# GSDMM + +## Function and class description: + +The class is named **GSDMM**, while the function itself is named **DMM**. + +The function can take 6 possible arguments, two of which are required, and the remaining 4 being optional. + +### The required arguments are: + +* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. +* **nTopics** - the number of topics. + +### The optional requirements are: + +* **alpha**, **beta** - these are the distribution specific parameters.(**The defaults for both of these parameters are 0.1.**) +* **nTopWords** - number of top words per a topic.(**The default is 10.**) +* **iters** - number of Gibbs sampler iterations.(**The default is 15.**) + +## Output: + +The function provides several components of output, namely: +* **psi** - topic x word matrix. +* **theta** - document x topic matrix. +* **topics** - the top words per topic. +* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments. +* **Final k** - the final number of selected topics. +* **coherence** - the coherence score, which is a performance measure. +* **selected_theta** +* **selected_psi** + +# GPM + +## Function and class description: + +The class is named **GPM**, while the function itself is named **GPM**. + +The function can take 8 possible arguments, two of which are required, and the remaining 6 being optional. + +### The required arguments are: + +* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. +* **nTopics** - the number of topics. + +### The optional requirements are: + +* **alpha**, **beta** and **gam** - these are the distribution specific parameters.(**The defaults for these parameters are alpha = 0.001, beta = 0.001 and gam = 0.1 respectively.**) +* **nTopWords** - number of top words per a topic.(**The default is 10.**) +* **iters** - number of Gibbs sampler iterations.(**The default is 15.**) +* **N** - this is a parameter used to normalize the document lengths, which is required for the Poisson model. + +## Output: + +The function provides several components of output, namely: +* **psi** - topic x word matrix. +* **theta** - document x topic matrix. +* **topics** - the top words per topic. +* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments. +* **Final k** - the final number of selected topics. +* **coherence** - the coherence score, which is a performance measure. +* **selected_theta** +* **selected_psi** + +# Example Usage: + +A more comprehensive [tutorial](https://github.com/CAIR-ZA/GPyM_TM/blob/master/Tutorial.ipynb) is also available. + +### Installation; + +Run the following command within a Python command window: + +`pip install GPym_TM` + +### Implementation; + +Import the package into the relevant python script, with the following: + +`from GSDMM import DMM` +`from GPM import GPM` + +> Call the class: + +#### Possible examples of calling the GSDMM function are as follows: + +`data_DMM = GSDMM.DMM(corpus, nTopics)` + +`data_DMM = GSDMM.DMM(corpus, nTopics, alpha = 0.25, beta = 0.15, nTopWords = 12, iters =5)` + +#### Possible examples of calling the GPM function are as follows: + +`data_GPM = GPM.GPM(corpus, nTopics)` + +`data_GPM = GPM.GPM(corpus, nTopics, alpha = 0.002, beta = 0.03, gam = 0.06, nTopWords = 12, iters = 7, N = 8)` + +### Results; + +The output obtained for the Dirichlet multinomial mixture model appears as follows: + + + +While, the output obtained for the Poisson model appears as follows: + + + +## Built With: + +[Google Collab](https://colab.research.google.com/notebooks/intro.ipynb) - Web framework + +[Python](https://www.python.org/) - Programming language of choice + +[Pypi](https://pypi.org/) - Distribution + +## Authors: + +[Jocelyn Mazarura](https://github.com/jrmazarura/GPM) + + +## Co-Authors: + +[Alta de Waal](https://github.com/altadewaal) + +[Ricardo Marques](https://github.com/RicSalgado) + + +## License: + +This project is licensed under the MIT License - see the LICENSE file for details. + + +## Acknowledgments: + +University of Pretoria + + + + + +%prep +%autosetup -n GPyM-TM-3.0.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-GPyM-TM -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 3.0.1-1 +- Package Spec generated @@ -0,0 +1 @@ +19b2a9f760eb06b0ded4d10bf5466f38 GPyM_TM-3.0.1.tar.gz |