automatic import of python-cpt

author: CoprDistGit <infra@openeuler.org> 2023-05-15 04:18:41 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-15 04:18:41 +0000
commit: 10272366ad9058a6177ce69ee798d583705ff072 (patch)
tree: a63d7f9c38428e1aaf11e799e84b1149628d1efe
parent: e6d86582422e66226549fac6c5bf0d630e5b7d71 (diff)
3 files changed, 421 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..7ce62fc 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/cpt-1.3.3.tar.gz
diff --git a/python-cpt.spec b/python-cpt.spec
new file mode 100644
index 0000000..3b2e978
--- /dev/null
+++ b/python-cpt.spec
@@ -0,0 +1,419 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-cpt
+Version:	1.3.3
+Release:	1
+Summary:	Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction
+License:	MIT
+URL:		https://github.com/bluesheeptoken/CPT
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/b3/ad/3260eba1ab749bef0a707bc6195e0386147da19e70f9577f74286874f2ce/cpt-1.3.3.tar.gz
+
+
+%description
+# CPT
+
+[![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/)
+[![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE)
+
+## What is it ?
+
+This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.
+
+CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet.
+
+This implementation is based on the following research papers:
+
+- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf
+- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf
+
+## Installation
+
+You can simply use `pip install cpt`.
+
+## Simple example
+
+You can test the model with the following code:
+
+```python
+from cpt.cpt import Cpt
+model = Cpt()
+
+model.fit([['hello', 'world'],
+           ['hello', 'this', 'is', 'me'],
+           ['hello', 'me']
+          ])
+
+model.predict([['hello'], ['hello', 'this']])
+# Output: ['me', 'is']
+```
+For an example with the compatibility with sklearn, you should check the [documentation][1].
+
+## Features
+### Train
+
+The model can be trained with the `fit` method.
+
+If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.
+
+### Multithreading
+
+The predictions are launched by default with multithreading with OpenMP.
+
+The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method.
+
+You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`.
+
+### Pickling
+
+You can pickle the model to save it, and load it later via pickle library.
+```python
+from cpt.cpt import Cpt
+import pickle
+
+
+model = Cpt()
+model.fit([['hello', 'world']])
+
+dumped = pickle.dumps(model)
+
+unpickled_model = pickle.loads(dumped)
+
+print(model == unpickled_model)
+```
+
+### Explainability
+
+The CPT class has several methods to explain the predictions.
+
+You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`.
+
+You can retrieve trained sequences with `model.retrieve_sequence(id)`.
+
+You can find similar sequences with `find_similar_sequences(sequence)`.
+
+You can not yet retrieve automatically all similar sequences with the noise reduction technique.
+
+### Tuning
+
+CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to.
+
+## Benchmark
+
+The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4].
+
+Using multithreading, `CPT` was able to perform around 5000 predictions per second.
+
+Without multithreading, `CPT` predicted around 1650 sequences per second.
+
+Details on the benchmark can be found [here](benchmark).
+
+## Further reading
+
+A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset.
+
+The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss.
+
+One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq)
+
+[1]: https://cpt.readthedocs.io/en/latest/
+[2]: https://github.com/bluesheeptoken/CPT#tuning
+[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example
+[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
+[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf
+
+
+## Support
+
+If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available.
+
+
+%package -n python3-cpt
+Summary:	Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction
+Provides:	python-cpt
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+BuildRequires:	python3-cffi
+BuildRequires:	gcc
+BuildRequires:	gdb
+%description -n python3-cpt
+# CPT
+
+[![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/)
+[![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE)
+
+## What is it ?
+
+This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.
+
+CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet.
+
+This implementation is based on the following research papers:
+
+- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf
+- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf
+
+## Installation
+
+You can simply use `pip install cpt`.
+
+## Simple example
+
+You can test the model with the following code:
+
+```python
+from cpt.cpt import Cpt
+model = Cpt()
+
+model.fit([['hello', 'world'],
+           ['hello', 'this', 'is', 'me'],
+           ['hello', 'me']
+          ])
+
+model.predict([['hello'], ['hello', 'this']])
+# Output: ['me', 'is']
+```
+For an example with the compatibility with sklearn, you should check the [documentation][1].
+
+## Features
+### Train
+
+The model can be trained with the `fit` method.
+
+If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.
+
+### Multithreading
+
+The predictions are launched by default with multithreading with OpenMP.
+
+The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method.
+
+You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`.
+
+### Pickling
+
+You can pickle the model to save it, and load it later via pickle library.
+```python
+from cpt.cpt import Cpt
+import pickle
+
+
+model = Cpt()
+model.fit([['hello', 'world']])
+
+dumped = pickle.dumps(model)
+
+unpickled_model = pickle.loads(dumped)
+
+print(model == unpickled_model)
+```
+
+### Explainability
+
+The CPT class has several methods to explain the predictions.
+
+You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`.
+
+You can retrieve trained sequences with `model.retrieve_sequence(id)`.
+
+You can find similar sequences with `find_similar_sequences(sequence)`.
+
+You can not yet retrieve automatically all similar sequences with the noise reduction technique.
+
+### Tuning
+
+CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to.
+
+## Benchmark
+
+The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4].
+
+Using multithreading, `CPT` was able to perform around 5000 predictions per second.
+
+Without multithreading, `CPT` predicted around 1650 sequences per second.
+
+Details on the benchmark can be found [here](benchmark).
+
+## Further reading
+
+A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset.
+
+The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss.
+
+One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq)
+
+[1]: https://cpt.readthedocs.io/en/latest/
+[2]: https://github.com/bluesheeptoken/CPT#tuning
+[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example
+[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
+[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf
+
+
+## Support
+
+If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available.
+
+
+%package help
+Summary:	Development documents and examples for cpt
+Provides:	python3-cpt-doc
+%description help
+# CPT
+
+[![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/)
+[![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE)
+
+## What is it ?
+
+This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.
+
+CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet.
+
+This implementation is based on the following research papers:
+
+- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf
+- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf
+
+## Installation
+
+You can simply use `pip install cpt`.
+
+## Simple example
+
+You can test the model with the following code:
+
+```python
+from cpt.cpt import Cpt
+model = Cpt()
+
+model.fit([['hello', 'world'],
+           ['hello', 'this', 'is', 'me'],
+           ['hello', 'me']
+          ])
+
+model.predict([['hello'], ['hello', 'this']])
+# Output: ['me', 'is']
+```
+For an example with the compatibility with sklearn, you should check the [documentation][1].
+
+## Features
+### Train
+
+The model can be trained with the `fit` method.
+
+If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.
+
+### Multithreading
+
+The predictions are launched by default with multithreading with OpenMP.
+
+The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method.
+
+You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`.
+
+### Pickling
+
+You can pickle the model to save it, and load it later via pickle library.
+```python
+from cpt.cpt import Cpt
+import pickle
+
+
+model = Cpt()
+model.fit([['hello', 'world']])
+
+dumped = pickle.dumps(model)
+
+unpickled_model = pickle.loads(dumped)
+
+print(model == unpickled_model)
+```
+
+### Explainability
+
+The CPT class has several methods to explain the predictions.
+
+You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`.
+
+You can retrieve trained sequences with `model.retrieve_sequence(id)`.
+
+You can find similar sequences with `find_similar_sequences(sequence)`.
+
+You can not yet retrieve automatically all similar sequences with the noise reduction technique.
+
+### Tuning
+
+CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to.
+
+## Benchmark
+
+The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4].
+
+Using multithreading, `CPT` was able to perform around 5000 predictions per second.
+
+Without multithreading, `CPT` predicted around 1650 sequences per second.
+
+Details on the benchmark can be found [here](benchmark).
+
+## Further reading
+
+A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset.
+
+The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss.
+
+One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq)
+
+[1]: https://cpt.readthedocs.io/en/latest/
+[2]: https://github.com/bluesheeptoken/CPT#tuning
+[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example
+[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
+[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf
+
+
+## Support
+
+If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available.
+
+
+%prep
+%autosetup -n cpt-1.3.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-cpt -f filelist.lst
+%dir %{python3_sitearch}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..d0b2423
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+83b1686f65769ecc2dc8bf35ee3a8e56  cpt-1.3.3.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-15 04:18:41 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-15 04:18:41 +0000
commit	10272366ad9058a6177ce69ee798d583705ff072 (patch)
tree	a63d7f9c38428e1aaf11e799e84b1149628d1efe
parent	e6d86582422e66226549fac6c5bf0d630e5b7d71 (diff)