%global _empty_manifest_terminate_build 0 Name: python-cpt Version: 1.3.3 Release: 1 Summary: Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction License: MIT URL: https://github.com/bluesheeptoken/CPT Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b3/ad/3260eba1ab749bef0a707bc6195e0386147da19e70f9577f74286874f2ce/cpt-1.3.3.tar.gz %description # CPT [![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/) [![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE) ## What is it ? This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading. CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet. This implementation is based on the following research papers: - http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf - http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf ## Installation You can simply use `pip install cpt`. ## Simple example You can test the model with the following code: ```python from cpt.cpt import Cpt model = Cpt() model.fit([['hello', 'world'], ['hello', 'this', 'is', 'me'], ['hello', 'me'] ]) model.predict([['hello'], ['hello', 'this']]) # Output: ['me', 'is'] ``` For an example with the compatibility with sklearn, you should check the [documentation][1]. ## Features ### Train The model can be trained with the `fit` method. If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones. ### Multithreading The predictions are launched by default with multithreading with OpenMP. The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method. You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`. ### Pickling You can pickle the model to save it, and load it later via pickle library. ```python from cpt.cpt import Cpt import pickle model = Cpt() model.fit([['hello', 'world']]) dumped = pickle.dumps(model) unpickled_model = pickle.loads(dumped) print(model == unpickled_model) ``` ### Explainability The CPT class has several methods to explain the predictions. You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`. You can retrieve trained sequences with `model.retrieve_sequence(id)`. You can find similar sequences with `find_similar_sequences(sequence)`. You can not yet retrieve automatically all similar sequences with the noise reduction technique. ### Tuning CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to. ## Benchmark The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4]. Using multithreading, `CPT` was able to perform around 5000 predictions per second. Without multithreading, `CPT` predicted around 1650 sequences per second. Details on the benchmark can be found [here](benchmark). ## Further reading A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset. The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss. One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq) [1]: https://cpt.readthedocs.io/en/latest/ [2]: https://github.com/bluesheeptoken/CPT#tuning [3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example [4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php [5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf ## Support If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available. %package -n python3-cpt Summary: Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction Provides: python-cpt BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip BuildRequires: python3-cffi BuildRequires: gcc BuildRequires: gdb %description -n python3-cpt # CPT [![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/) [![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE) ## What is it ? This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading. CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet. This implementation is based on the following research papers: - http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf - http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf ## Installation You can simply use `pip install cpt`. ## Simple example You can test the model with the following code: ```python from cpt.cpt import Cpt model = Cpt() model.fit([['hello', 'world'], ['hello', 'this', 'is', 'me'], ['hello', 'me'] ]) model.predict([['hello'], ['hello', 'this']]) # Output: ['me', 'is'] ``` For an example with the compatibility with sklearn, you should check the [documentation][1]. ## Features ### Train The model can be trained with the `fit` method. If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones. ### Multithreading The predictions are launched by default with multithreading with OpenMP. The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method. You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`. ### Pickling You can pickle the model to save it, and load it later via pickle library. ```python from cpt.cpt import Cpt import pickle model = Cpt() model.fit([['hello', 'world']]) dumped = pickle.dumps(model) unpickled_model = pickle.loads(dumped) print(model == unpickled_model) ``` ### Explainability The CPT class has several methods to explain the predictions. You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`. You can retrieve trained sequences with `model.retrieve_sequence(id)`. You can find similar sequences with `find_similar_sequences(sequence)`. You can not yet retrieve automatically all similar sequences with the noise reduction technique. ### Tuning CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to. ## Benchmark The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4]. Using multithreading, `CPT` was able to perform around 5000 predictions per second. Without multithreading, `CPT` predicted around 1650 sequences per second. Details on the benchmark can be found [here](benchmark). ## Further reading A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset. The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss. One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq) [1]: https://cpt.readthedocs.io/en/latest/ [2]: https://github.com/bluesheeptoken/CPT#tuning [3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example [4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php [5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf ## Support If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available. %package help Summary: Development documents and examples for cpt Provides: python3-cpt-doc %description help # CPT [![Downloads](https://img.shields.io/pypi/dm/CPT)](https://pypi.org/project/cpt/) [![License](https://img.shields.io/pypi/l/cpt.svg)](https://github.com/bluesheeptoken/CPT/blob/master/LICENSE) ## What is it ? This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading. CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet. This implementation is based on the following research papers: - http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf - http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf ## Installation You can simply use `pip install cpt`. ## Simple example You can test the model with the following code: ```python from cpt.cpt import Cpt model = Cpt() model.fit([['hello', 'world'], ['hello', 'this', 'is', 'me'], ['hello', 'me'] ]) model.predict([['hello'], ['hello', 'this']]) # Output: ['me', 'is'] ``` For an example with the compatibility with sklearn, you should check the [documentation][1]. ## Features ### Train The model can be trained with the `fit` method. If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones. ### Multithreading The predictions are launched by default with multithreading with OpenMP. The predictions can also be launched in a single thread with the option `multithread=False` in the `predict` method. You can control the number of threads by setting the following environment variable `OMP_NUM_THREADS`. ### Pickling You can pickle the model to save it, and load it later via pickle library. ```python from cpt.cpt import Cpt import pickle model = Cpt() model.fit([['hello', 'world']]) dumped = pickle.dumps(model) unpickled_model = pickle.loads(dumped) print(model == unpickled_model) ``` ### Explainability The CPT class has several methods to explain the predictions. You can see which elements are considered as `noise` (with a low presence in sequences) with `model.compute_noisy_items(noise_ratio)`. You can retrieve trained sequences with `model.retrieve_sequence(id)`. You can find similar sequences with `find_similar_sequences(sequence)`. You can not yet retrieve automatically all similar sequences with the noise reduction technique. ### Tuning CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to. ## Benchmark The benchmark has been made on the FIFA dataset, the data can be found on the [SPMF website][4]. Using multithreading, `CPT` was able to perform around 5000 predictions per second. Without multithreading, `CPT` predicted around 1650 sequences per second. Details on the benchmark can be found [here](benchmark). ## Further reading A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset. The study has been published in IJIKM review [here][5]. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss. One of the co-author of `CPT` has also published an algorithm `subseq` for sequence prediction. An implementation can be found [here](https://github.com/bluesheeptoken/subseq) [1]: https://cpt.readthedocs.io/en/latest/ [2]: https://github.com/bluesheeptoken/CPT#tuning [3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example [4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php [5]: http://www.ijikm.org/Volume14/IJIKMv14p027-044Da5395.pdf ## Support If you enjoy the project and wish to support me, a [buymeacoffee link](https://www.buymeacoffee.com/louisfrule) is available. %prep %autosetup -n cpt-1.3.3 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-cpt -f filelist.lst %dir %{python3_sitearch}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue May 30 2023 Python_Bot - 1.3.3-1 - Package Spec generated