%global _empty_manifest_terminate_build 0
Name:		python-cpt
Version:	1.3.3
Release:	1
Summary:	Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction
License:	MIT
URL:		https://github.com/bluesheeptoken/CPT
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/b3/ad/3260eba1ab749bef0a707bc6195e0386147da19e70f9577f74286874f2ce/cpt-1.3.3.tar.gz


%description
# CPT

[![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/)
[![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE)

## What is it ?

This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet.

This implementation is based on the following research papers:

- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf
- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf

## Installation

You can simply use `pip install cpt`.

## Simple example

You can test the model with the following code:

```python
from cpt.cpt import Cpt
model = Cpt()

model.fit([['hello', 'world'],
           ['hello', 'this', 'is', 'me'],
           ['hello', 'me']
          ])

model.predict([['hello'], ['hello', 'this']])
# Output: ['me', 'is']
```
For an example with the compatibility with sklearn, you should check the [documentation][1].

## Features
### Train

The model can be trained with the `fit` method.

If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.

### Multithreading

The predictions are launched by default with multithreading with OpenMP.

The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method.

You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`.

### Pickling

You can pickle the model to save it, and load it later via pickle library.
```python
from cpt.cpt import Cpt
import pickle


model = Cpt()
model.fit([['hello', 'world']])

dumped = pickle.dumps(model)

unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)
```

### Explainability

The CPT class has several methods to explain the predictions.

You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`.

You can retrieve trained sequences with `model.retrieve_sequence(id)`.

You can find similar sequences with `find_similar_sequences(sequence)`.

You can not yet retrieve automatically all similar sequences with the noise reduction technique.

### Tuning

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to.

## Benchmark

The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4].

Using multithreading, `CPT` was able to perform around 5000 predictions per second.

Without multithreading, `CPT` predicted around 1650 sequences per second.

Details on the benchmark can be found [here](benchmark).

## Further reading

A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset.

The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss.

One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq)

[1]: https://cpt.readthedocs.io/en/latest/
[2]: https://github.com/bluesheeptoken/CPT#tuning
[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example
[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf


## Support

If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available.


%package -n python3-cpt
Summary:	Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction
Provides:	python-cpt
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
BuildRequires:	python3-cffi
BuildRequires:	gcc
BuildRequires:	gdb
%description -n python3-cpt
# CPT

[![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/)
[![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE)

## What is it ?

This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet.

This implementation is based on the following research papers:

- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf
- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf

## Installation

You can simply use `pip install cpt`.

## Simple example

You can test the model with the following code:

```python
from cpt.cpt import Cpt
model = Cpt()

model.fit([['hello', 'world'],
           ['hello', 'this', 'is', 'me'],
           ['hello', 'me']
          ])

model.predict([['hello'], ['hello', 'this']])
# Output: ['me', 'is']
```
For an example with the compatibility with sklearn, you should check the [documentation][1].

## Features
### Train

The model can be trained with the `fit` method.

If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.

### Multithreading

The predictions are launched by default with multithreading with OpenMP.

The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method.

You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`.

### Pickling

You can pickle the model to save it, and load it later via pickle library.
```python
from cpt.cpt import Cpt
import pickle


model = Cpt()
model.fit([['hello', 'world']])

dumped = pickle.dumps(model)

unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)
```

### Explainability

The CPT class has several methods to explain the predictions.

You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`.

You can retrieve trained sequences with `model.retrieve_sequence(id)`.

You can find similar sequences with `find_similar_sequences(sequence)`.

You can not yet retrieve automatically all similar sequences with the noise reduction technique.

### Tuning

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to.

## Benchmark

The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4].

Using multithreading, `CPT` was able to perform around 5000 predictions per second.

Without multithreading, `CPT` predicted around 1650 sequences per second.

Details on the benchmark can be found [here](benchmark).

## Further reading

A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset.

The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss.

One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq)

[1]: https://cpt.readthedocs.io/en/latest/
[2]: https://github.com/bluesheeptoken/CPT#tuning
[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example
[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf


## Support

If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available.


%package help
Summary:	Development documents and examples for cpt
Provides:	python3-cpt-doc
%description help
# CPT

[![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/)
[![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE)

## What is it ?

This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet.

This implementation is based on the following research papers:

- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf
- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf

## Installation

You can simply use `pip install cpt`.

## Simple example

You can test the model with the following code:

```python
from cpt.cpt import Cpt
model = Cpt()

model.fit([['hello', 'world'],
           ['hello', 'this', 'is', 'me'],
           ['hello', 'me']
          ])

model.predict([['hello'], ['hello', 'this']])
# Output: ['me', 'is']
```
For an example with the compatibility with sklearn, you should check the [documentation][1].

## Features
### Train

The model can be trained with the `fit` method.

If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.

### Multithreading

The predictions are launched by default with multithreading with OpenMP.

The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method.

You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`.

### Pickling

You can pickle the model to save it, and load it later via pickle library.
```python
from cpt.cpt import Cpt
import pickle


model = Cpt()
model.fit([['hello', 'world']])

dumped = pickle.dumps(model)

unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)
```

### Explainability

The CPT class has several methods to explain the predictions.

You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`.

You can retrieve trained sequences with `model.retrieve_sequence(id)`.

You can find similar sequences with `find_similar_sequences(sequence)`.

You can not yet retrieve automatically all similar sequences with the noise reduction technique.

### Tuning

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to.

## Benchmark

The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4].

Using multithreading, `CPT` was able to perform around 5000 predictions per second.

Without multithreading, `CPT` predicted around 1650 sequences per second.

Details on the benchmark can be found [here](benchmark).

## Further reading

A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset.

The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss.

One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq)

[1]: https://cpt.readthedocs.io/en/latest/
[2]: https://github.com/bluesheeptoken/CPT#tuning
[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example
[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf


## Support

If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available.


%prep
%autosetup -n cpt-1.3.3

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-cpt -f filelist.lst
%dir %{python3_sitearch}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Tue May 30 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3.3-1
- Package Spec generated