diff options
author | CoprDistGit <infra@openeuler.org> | 2023-05-15 04:18:41 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-05-15 04:18:41 +0000 |
commit | 10272366ad9058a6177ce69ee798d583705ff072 (patch) | |
tree | a63d7f9c38428e1aaf11e799e84b1149628d1efe | |
parent | e6d86582422e66226549fac6c5bf0d630e5b7d71 (diff) |
automatic import of python-cpt
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-cpt.spec | 419 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 421 insertions, 0 deletions
@@ -0,0 +1 @@ +/cpt-1.3.3.tar.gz diff --git a/python-cpt.spec b/python-cpt.spec new file mode 100644 index 0000000..3b2e978 --- /dev/null +++ b/python-cpt.spec @@ -0,0 +1,419 @@ +%global _empty_manifest_terminate_build 0 +Name: python-cpt +Version: 1.3.3 +Release: 1 +Summary: Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction +License: MIT +URL: https://github.com/bluesheeptoken/CPT +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b3/ad/3260eba1ab749bef0a707bc6195e0386147da19e70f9577f74286874f2ce/cpt-1.3.3.tar.gz + + +%description +# CPT + +[](https://pypi.org/project/cpt/) +[](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE) + +## What is it ? + +This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading. + +CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet. + +This implementation is based on the following research papers: + +- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf +- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf + +## Installation + +You can simply use `pip install cpt`. + +## Simple example + +You can test the model with the following code: + +```python +from cpt.cpt import Cpt +model = Cpt() + +model.fit([['hello', 'world'], + ['hello', 'this', 'is', 'me'], + ['hello', 'me'] + ]) + +model.predict([['hello'], ['hello', 'this']]) +# Output: ['me', 'is'] +``` +For an example with the compatibility with sklearn, you should check the [documentation][1]. + +## Features +### Train + +The model can be trained with the `fit` method. + +If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones. + +### Multithreading + +The predictions are launched by default with multithreading with OpenMP. + +The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method. + +You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`. + +### Pickling + +You can pickle the model to save it, and load it later via pickle library. +```python +from cpt.cpt import Cpt +import pickle + + +model = Cpt() +model.fit([['hello', 'world']]) + +dumped = pickle.dumps(model) + +unpickled_model = pickle.loads(dumped) + +print(model == unpickled_model) +``` + +### Explainability + +The CPT class has several methods to explain the predictions. + +You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`. + +You can retrieve trained sequences with `model.retrieve_sequence(id)`. + +You can find similar sequences with `find_similar_sequences(sequence)`. + +You can not yet retrieve automatically all similar sequences with the noise reduction technique. + +### Tuning + +CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to. + +## Benchmark + +The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4]. + +Using multithreading, `CPT` was able to perform around 5000 predictions per second. + +Without multithreading, `CPT` predicted around 1650 sequences per second. + +Details on the benchmark can be found [here](benchmark). + +## Further reading + +A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset. + +The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss. + +One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq) + +[1]: https://cpt.readthedocs.io/en/latest/ +[2]: https://github.com/bluesheeptoken/CPT#tuning +[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example +[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php +[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf + + +## Support + +If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available. + + +%package -n python3-cpt +Summary: Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction +Provides: python-cpt +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +BuildRequires: python3-cffi +BuildRequires: gcc +BuildRequires: gdb +%description -n python3-cpt +# CPT + +[](https://pypi.org/project/cpt/) +[](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE) + +## What is it ? + +This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading. + +CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet. + +This implementation is based on the following research papers: + +- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf +- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf + +## Installation + +You can simply use `pip install cpt`. + +## Simple example + +You can test the model with the following code: + +```python +from cpt.cpt import Cpt +model = Cpt() + +model.fit([['hello', 'world'], + ['hello', 'this', 'is', 'me'], + ['hello', 'me'] + ]) + +model.predict([['hello'], ['hello', 'this']]) +# Output: ['me', 'is'] +``` +For an example with the compatibility with sklearn, you should check the [documentation][1]. + +## Features +### Train + +The model can be trained with the `fit` method. + +If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones. + +### Multithreading + +The predictions are launched by default with multithreading with OpenMP. + +The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method. + +You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`. + +### Pickling + +You can pickle the model to save it, and load it later via pickle library. +```python +from cpt.cpt import Cpt +import pickle + + +model = Cpt() +model.fit([['hello', 'world']]) + +dumped = pickle.dumps(model) + +unpickled_model = pickle.loads(dumped) + +print(model == unpickled_model) +``` + +### Explainability + +The CPT class has several methods to explain the predictions. + +You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`. + +You can retrieve trained sequences with `model.retrieve_sequence(id)`. + +You can find similar sequences with `find_similar_sequences(sequence)`. + +You can not yet retrieve automatically all similar sequences with the noise reduction technique. + +### Tuning + +CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to. + +## Benchmark + +The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4]. + +Using multithreading, `CPT` was able to perform around 5000 predictions per second. + +Without multithreading, `CPT` predicted around 1650 sequences per second. + +Details on the benchmark can be found [here](benchmark). + +## Further reading + +A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset. + +The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss. + +One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq) + +[1]: https://cpt.readthedocs.io/en/latest/ +[2]: https://github.com/bluesheeptoken/CPT#tuning +[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example +[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php +[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf + + +## Support + +If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available. + + +%package help +Summary: Development documents and examples for cpt +Provides: python3-cpt-doc +%description help +# CPT + +[](https://pypi.org/project/cpt/) +[](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE) + +## What is it ? + +This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading. + +CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet. + +This implementation is based on the following research papers: + +- http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf +- http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf + +## Installation + +You can simply use `pip install cpt`. + +## Simple example + +You can test the model with the following code: + +```python +from cpt.cpt import Cpt +model = Cpt() + +model.fit([['hello', 'world'], + ['hello', 'this', 'is', 'me'], + ['hello', 'me'] + ]) + +model.predict([['hello'], ['hello', 'this']]) +# Output: ['me', 'is'] +``` +For an example with the compatibility with sklearn, you should check the [documentation][1]. + +## Features +### Train + +The model can be trained with the `fit` method. + +If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones. + +### Multithreading + +The predictions are launched by default with multithreading with OpenMP. + +The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method. + +You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`. + +### Pickling + +You can pickle the model to save it, and load it later via pickle library. +```python +from cpt.cpt import Cpt +import pickle + + +model = Cpt() +model.fit([['hello', 'world']]) + +dumped = pickle.dumps(model) + +unpickled_model = pickle.loads(dumped) + +print(model == unpickled_model) +``` + +### Explainability + +The CPT class has several methods to explain the predictions. + +You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`. + +You can retrieve trained sequences with `model.retrieve_sequence(id)`. + +You can find similar sequences with `find_similar_sequences(sequence)`. + +You can not yet retrieve automatically all similar sequences with the noise reduction technique. + +### Tuning + +CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to. + +## Benchmark + +The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4]. + +Using multithreading, `CPT` was able to perform around 5000 predictions per second. + +Without multithreading, `CPT` predicted around 1650 sequences per second. + +Details on the benchmark can be found [here](benchmark). + +## Further reading + +A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset. + +The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss. + +One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq) + +[1]: https://cpt.readthedocs.io/en/latest/ +[2]: https://github.com/bluesheeptoken/CPT#tuning +[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example +[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php +[5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf + + +## Support + +If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available. + + +%prep +%autosetup -n cpt-1.3.3 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-cpt -f filelist.lst +%dir %{python3_sitearch}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3.3-1 +- Package Spec generated @@ -0,0 +1 @@ +83b1686f65769ecc2dc8bf35ee3a8e56 cpt-1.3.3.tar.gz |