%global _empty_manifest_terminate_build 0
Name:		python-GPyM-TM
Version:	3.0.1
Release:	1
Summary:	The following package enables users to perform text modelling
License:	MIT License
URL:		https://github.com/jrmazarura/GPM
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/4f/63/1468a4e7e5e6890ddc3cf3879d2edbad7b35b7dc563c0df3280afb406644/GPyM_TM-3.0.1.tar.gz
BuildArch:	noarch


%description
# [GPyM_TM](https://github.com/jrmazarura/GPM)

**GPyM_TM** is a Python package to perform topic modelling, either through the use of a Dirichlet multinomial mixture model, or a Poisson model. Each of the above models is available within the package in a separate class, namely GSDMM utilizes the Dirichlet multinomial mixture model, while GPM makes use of the Poisson model to perform the text clustering respectively. The package is also available on [Pypi](https://pypi.org/project/GPyM-TM/3.0.0/).

## Preamble  
The aim of topic modelling is to extract latent topics from large corpora. GSDMM [1] assumes each document belongs to a single topic, which is a suitable assumption for some short texts. Given an initial number of topics, K, this algorithm clusters documents and extracts the topical structures present within the corpus. If K is set to a high value, then the model will also automatically learn the number of clusters.

[1]	Yin, J. and Wang, J., 2014, August. A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233-242).

## Getting Started:

The package is available [online](https://pypi.org/project/GPyM-TM/) for use within Python 3 enviroments.

The installation can be performed through the use of a standard 'pip' install command, as provided below: 

`pip install GPyM-TM`

## Prerequisites:

The package has several dependencies, namely: 

* numpy
* random
* math
* pandas
* re
* nltk
* gensim
* scipy

# GSDMM

## Function and class description:

The class is named **GSDMM**, while the function itself is named **DMM**.

The function can take 6 possible arguments, two of which are required, and the remaining 4 being optional. 

### The required arguments are: 

* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. 
* **nTopics** - the number of topics.

### The optional requirements are:

* **alpha**, **beta** - these are the distribution specific parameters.(**The defaults for both of these parameters are 0.1.**)
* **nTopWords** - number of top words per a topic.(**The default is 10.**)  
* **iters** - number of Gibbs sampler iterations.(**The default is 15.**)

## Output:

The function provides several components of output, namely:
* **psi** - topic x word matrix.
* **theta** - document x topic matrix.
* **topics** - the top words per topic. 
* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments.
* **Final k** - the final number of selected topics.
* **coherence** - the coherence score, which is a performance measure.
* **selected_theta**
* **selected_psi**

# GPM

## Function and class description:

The class is named **GPM**, while the function itself is named **GPM**.

The function can take 8 possible arguments, two of which are required, and the remaining 6 being optional. 

### The required arguments are: 

* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. 
* **nTopics** - the number of topics.

### The optional requirements are:

* **alpha**, **beta** and **gam** - these are the distribution specific parameters.(**The defaults for these parameters are alpha = 0.001, beta = 0.001 and gam = 0.1 respectively.**)
* **nTopWords** - number of top words per a topic.(**The default is 10.**)  
* **iters** - number of Gibbs sampler iterations.(**The default is 15.**)
* **N** - this is a parameter used to normalize the document lengths, which is required for the Poisson model.

## Output:

The function provides several components of output, namely:
* **psi** - topic x word matrix.
* **theta** - document x topic matrix.
* **topics** - the top words per topic. 
* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments.
* **Final k** - the final number of selected topics.
* **coherence** - the coherence score, which is a performance measure.
* **selected_theta**
* **selected_psi**

# Example Usage:

A more comprehensive [tutorial](https://github.com/CAIR-ZA/GPyM_TM/blob/master/Tutorial.ipynb) is also available.

### Installation;

Run the following command within a Python command window:

`pip install GPym_TM`

### Implementation;

Import the package into the relevant python script, with the following: 

`from GSDMM import DMM`
`from GPM import GPM`

> Call the class:

#### Possible examples of calling the GSDMM function are as follows:

`data_DMM = GSDMM.DMM(corpus, nTopics)`

`data_DMM = GSDMM.DMM(corpus, nTopics, alpha = 0.25, beta = 0.15, nTopWords = 12, iters =5)`

#### Possible examples of calling the GPM function are as follows:

`data_GPM = GPM.GPM(corpus, nTopics)`

`data_GPM = GPM.GPM(corpus, nTopics, alpha = 0.002, beta = 0.03, gam = 0.06, nTopWords = 12, iters = 7, N = 8)`

### Results;

The output obtained for the Dirichlet multinomial mixture model appears as follows: 

![Post](/Images/Post.png)

While, the output obtained for the Poisson model appears as follows:

![poisson](/Images/poisson.png)

## Built With:

[Google Collab](https://colab.research.google.com/notebooks/intro.ipynb) - Web framework

[Python](https://www.python.org/) - Programming language of choice

[Pypi](https://pypi.org/) - Distribution

## Authors:

[Jocelyn Mazarura](https://github.com/jrmazarura/GPM)


## Co-Authors:

[Alta de Waal](https://github.com/altadewaal)

[Ricardo Marques](https://github.com/RicSalgado)


## License:

This project is licensed under the MIT License - see the LICENSE file for details.


## Acknowledgments:

University of Pretoria 
![Tuks Logo](/Images/UPlogohighres.jpg)


%package -n python3-GPyM-TM
Summary:	The following package enables users to perform text modelling
Provides:	python-GPyM-TM
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-GPyM-TM
# [GPyM_TM](https://github.com/jrmazarura/GPM)

**GPyM_TM** is a Python package to perform topic modelling, either through the use of a Dirichlet multinomial mixture model, or a Poisson model. Each of the above models is available within the package in a separate class, namely GSDMM utilizes the Dirichlet multinomial mixture model, while GPM makes use of the Poisson model to perform the text clustering respectively. The package is also available on [Pypi](https://pypi.org/project/GPyM-TM/3.0.0/).

## Preamble  
The aim of topic modelling is to extract latent topics from large corpora. GSDMM [1] assumes each document belongs to a single topic, which is a suitable assumption for some short texts. Given an initial number of topics, K, this algorithm clusters documents and extracts the topical structures present within the corpus. If K is set to a high value, then the model will also automatically learn the number of clusters.

[1]	Yin, J. and Wang, J., 2014, August. A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233-242).

## Getting Started:

The package is available [online](https://pypi.org/project/GPyM-TM/) for use within Python 3 enviroments.

The installation can be performed through the use of a standard 'pip' install command, as provided below: 

`pip install GPyM-TM`

## Prerequisites:

The package has several dependencies, namely: 

* numpy
* random
* math
* pandas
* re
* nltk
* gensim
* scipy

# GSDMM

## Function and class description:

The class is named **GSDMM**, while the function itself is named **DMM**.

The function can take 6 possible arguments, two of which are required, and the remaining 4 being optional. 

### The required arguments are: 

* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. 
* **nTopics** - the number of topics.

### The optional requirements are:

* **alpha**, **beta** - these are the distribution specific parameters.(**The defaults for both of these parameters are 0.1.**)
* **nTopWords** - number of top words per a topic.(**The default is 10.**)  
* **iters** - number of Gibbs sampler iterations.(**The default is 15.**)

## Output:

The function provides several components of output, namely:
* **psi** - topic x word matrix.
* **theta** - document x topic matrix.
* **topics** - the top words per topic. 
* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments.
* **Final k** - the final number of selected topics.
* **coherence** - the coherence score, which is a performance measure.
* **selected_theta**
* **selected_psi**

# GPM

## Function and class description:

The class is named **GPM**, while the function itself is named **GPM**.

The function can take 8 possible arguments, two of which are required, and the remaining 6 being optional. 

### The required arguments are: 

* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. 
* **nTopics** - the number of topics.

### The optional requirements are:

* **alpha**, **beta** and **gam** - these are the distribution specific parameters.(**The defaults for these parameters are alpha = 0.001, beta = 0.001 and gam = 0.1 respectively.**)
* **nTopWords** - number of top words per a topic.(**The default is 10.**)  
* **iters** - number of Gibbs sampler iterations.(**The default is 15.**)
* **N** - this is a parameter used to normalize the document lengths, which is required for the Poisson model.

## Output:

The function provides several components of output, namely:
* **psi** - topic x word matrix.
* **theta** - document x topic matrix.
* **topics** - the top words per topic. 
* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments.
* **Final k** - the final number of selected topics.
* **coherence** - the coherence score, which is a performance measure.
* **selected_theta**
* **selected_psi**

# Example Usage:

A more comprehensive [tutorial](https://github.com/CAIR-ZA/GPyM_TM/blob/master/Tutorial.ipynb) is also available.

### Installation;

Run the following command within a Python command window:

`pip install GPym_TM`

### Implementation;

Import the package into the relevant python script, with the following: 

`from GSDMM import DMM`
`from GPM import GPM`

> Call the class:

#### Possible examples of calling the GSDMM function are as follows:

`data_DMM = GSDMM.DMM(corpus, nTopics)`

`data_DMM = GSDMM.DMM(corpus, nTopics, alpha = 0.25, beta = 0.15, nTopWords = 12, iters =5)`

#### Possible examples of calling the GPM function are as follows:

`data_GPM = GPM.GPM(corpus, nTopics)`

`data_GPM = GPM.GPM(corpus, nTopics, alpha = 0.002, beta = 0.03, gam = 0.06, nTopWords = 12, iters = 7, N = 8)`

### Results;

The output obtained for the Dirichlet multinomial mixture model appears as follows: 

![Post](/Images/Post.png)

While, the output obtained for the Poisson model appears as follows:

![poisson](/Images/poisson.png)

## Built With:

[Google Collab](https://colab.research.google.com/notebooks/intro.ipynb) - Web framework

[Python](https://www.python.org/) - Programming language of choice

[Pypi](https://pypi.org/) - Distribution

## Authors:

[Jocelyn Mazarura](https://github.com/jrmazarura/GPM)


## Co-Authors:

[Alta de Waal](https://github.com/altadewaal)

[Ricardo Marques](https://github.com/RicSalgado)


## License:

This project is licensed under the MIT License - see the LICENSE file for details.


## Acknowledgments:

University of Pretoria 
![Tuks Logo](/Images/UPlogohighres.jpg)


%package help
Summary:	Development documents and examples for GPyM-TM
Provides:	python3-GPyM-TM-doc
%description help
# [GPyM_TM](https://github.com/jrmazarura/GPM)

**GPyM_TM** is a Python package to perform topic modelling, either through the use of a Dirichlet multinomial mixture model, or a Poisson model. Each of the above models is available within the package in a separate class, namely GSDMM utilizes the Dirichlet multinomial mixture model, while GPM makes use of the Poisson model to perform the text clustering respectively. The package is also available on [Pypi](https://pypi.org/project/GPyM-TM/3.0.0/).

## Preamble  
The aim of topic modelling is to extract latent topics from large corpora. GSDMM [1] assumes each document belongs to a single topic, which is a suitable assumption for some short texts. Given an initial number of topics, K, this algorithm clusters documents and extracts the topical structures present within the corpus. If K is set to a high value, then the model will also automatically learn the number of clusters.

[1]	Yin, J. and Wang, J., 2014, August. A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233-242).

## Getting Started:

The package is available [online](https://pypi.org/project/GPyM-TM/) for use within Python 3 enviroments.

The installation can be performed through the use of a standard 'pip' install command, as provided below: 

`pip install GPyM-TM`

## Prerequisites:

The package has several dependencies, namely: 

* numpy
* random
* math
* pandas
* re
* nltk
* gensim
* scipy

# GSDMM

## Function and class description:

The class is named **GSDMM**, while the function itself is named **DMM**.

The function can take 6 possible arguments, two of which are required, and the remaining 4 being optional. 

### The required arguments are: 

* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. 
* **nTopics** - the number of topics.

### The optional requirements are:

* **alpha**, **beta** - these are the distribution specific parameters.(**The defaults for both of these parameters are 0.1.**)
* **nTopWords** - number of top words per a topic.(**The default is 10.**)  
* **iters** - number of Gibbs sampler iterations.(**The default is 15.**)

## Output:

The function provides several components of output, namely:
* **psi** - topic x word matrix.
* **theta** - document x topic matrix.
* **topics** - the top words per topic. 
* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments.
* **Final k** - the final number of selected topics.
* **coherence** - the coherence score, which is a performance measure.
* **selected_theta**
* **selected_psi**

# GPM

## Function and class description:

The class is named **GPM**, while the function itself is named **GPM**.

The function can take 8 possible arguments, two of which are required, and the remaining 6 being optional. 

### The required arguments are: 

* **corpus** - text file, which has been cleaned and loaded into Python. That is, the text should all be lowercase, all punctuation and numbers should have also been removed. 
* **nTopics** - the number of topics.

### The optional requirements are:

* **alpha**, **beta** and **gam** - these are the distribution specific parameters.(**The defaults for these parameters are alpha = 0.001, beta = 0.001 and gam = 0.1 respectively.**)
* **nTopWords** - number of top words per a topic.(**The default is 10.**)  
* **iters** - number of Gibbs sampler iterations.(**The default is 15.**)
* **N** - this is a parameter used to normalize the document lengths, which is required for the Poisson model.

## Output:

The function provides several components of output, namely:
* **psi** - topic x word matrix.
* **theta** - document x topic matrix.
* **topics** - the top words per topic. 
* **assignments** - the topic numbers of selected topics only, as well as the final topic assignments.
* **Final k** - the final number of selected topics.
* **coherence** - the coherence score, which is a performance measure.
* **selected_theta**
* **selected_psi**

# Example Usage:

A more comprehensive [tutorial](https://github.com/CAIR-ZA/GPyM_TM/blob/master/Tutorial.ipynb) is also available.

### Installation;

Run the following command within a Python command window:

`pip install GPym_TM`

### Implementation;

Import the package into the relevant python script, with the following: 

`from GSDMM import DMM`
`from GPM import GPM`

> Call the class:

#### Possible examples of calling the GSDMM function are as follows:

`data_DMM = GSDMM.DMM(corpus, nTopics)`

`data_DMM = GSDMM.DMM(corpus, nTopics, alpha = 0.25, beta = 0.15, nTopWords = 12, iters =5)`

#### Possible examples of calling the GPM function are as follows:

`data_GPM = GPM.GPM(corpus, nTopics)`

`data_GPM = GPM.GPM(corpus, nTopics, alpha = 0.002, beta = 0.03, gam = 0.06, nTopWords = 12, iters = 7, N = 8)`

### Results;

The output obtained for the Dirichlet multinomial mixture model appears as follows: 

![Post](/Images/Post.png)

While, the output obtained for the Poisson model appears as follows:

![poisson](/Images/poisson.png)

## Built With:

[Google Collab](https://colab.research.google.com/notebooks/intro.ipynb) - Web framework

[Python](https://www.python.org/) - Programming language of choice

[Pypi](https://pypi.org/) - Distribution

## Authors:

[Jocelyn Mazarura](https://github.com/jrmazarura/GPM)


## Co-Authors:

[Alta de Waal](https://github.com/altadewaal)

[Ricardo Marques](https://github.com/RicSalgado)


## License:

This project is licensed under the MIT License - see the LICENSE file for details.


## Acknowledgments:

University of Pretoria 
![Tuks Logo](/Images/UPlogohighres.jpg)


%prep
%autosetup -n GPyM-TM-3.0.1

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-GPyM-TM -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 3.0.1-1
- Package Spec generated