diff options
author | CoprDistGit <infra@openeuler.org> | 2023-04-11 22:41:29 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-04-11 22:41:29 +0000 |
commit | 3fa3207cf2dfdecf46c2ccc1398917051b19b872 (patch) | |
tree | 6eb5157f9aa2e3d695f6825f185b0ec0248c8a2f | |
parent | ee246da9d23c0cf7fb8c41e9b5bc1203dffd3047 (diff) |
automatic import of python-divik
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-divik.spec | 569 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 571 insertions, 0 deletions
@@ -0,0 +1 @@ +/divik-3.2.2.tar.gz diff --git a/python-divik.spec b/python-divik.spec new file mode 100644 index 0000000..cf76a71 --- /dev/null +++ b/python-divik.spec @@ -0,0 +1,569 @@ +%global _empty_manifest_terminate_build 0 +Name: python-divik +Version: 3.2.2 +Release: 1 +Summary: Divisive iK-means algorithm implementation +License: Apache-2.0 +URL: https://github.com/gmrukwa/divik +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/ee/5f/5955b3a724fa5d93a532c8d1af8535dd6377a4897c3e5f7fb1f352d8e3ba/divik-3.2.2.tar.gz + +Requires: python3-dask-distance +Requires: python3-dask[dataframe] +Requires: python3-gin-config +Requires: python3-h5py +Requires: python3-importlib-metadata +Requires: python3-joblib +Requires: python3-kneed +Requires: python3-matplotlib +Requires: python3-numpy +Requires: python3-pandas +Requires: python3-polyaxon +Requires: python3-scikit-image +Requires: python3-scikit-learn +Requires: python3-scipy +Requires: python3-tqdm + +%description +[](https://www.codefactor.io/repository/github/gmrukwa/divik) +[](https://codeclimate.com/github/gmrukwa/divik/maintainability) + + +[](https://divik.readthedocs.io/en/latest/?badge=latest) + +# divik + +Python implementation of Divisive iK-means (DiviK) algorithm. + +## Tools within this package + +- Clustering at your command line with fit-clusters +- Set of algorithm implementations for unsupervised analyses + - Clustering + - DiviK - hands-free clustering method with built-in feature selection + - K-Means with Dunn method for selecting the number of clusters + - K-Means with GAP index for selecting the number of clusters + - Modular K-Means implementation with custom distance metrics and initializations + - Feature extraction + - PCA with knee-based components selection + - Locally Adjusted RBF Spectral Embedding + - Feature selection + - EXIMS + - Gaussian Mixture Model based data-driven feature selection + - High Abundance And Variance Selector - allows you to select highly variant features above noise level, based on GMM-decomposition + - Outlier based Selector + - Outlier Abundance And Variance Selector - allows you to select highly variant features above noise level, based on outlier detection + - Percentage based Selector - allows you to select highly variant features above noise level with your predefined thresholds for each + - Sampling + - StratifiedSampler - generates samples of fixed number of rows from given dataset + - UniformPCASampler - generates samples of random observations within boundaries of an original dataset, and preserving the rotation of the data + - UniformSampler - generates samples of random observations within boundaries of an original dataset + +## Installation + +### Docker + +The recommended way to use this software is through +[Docker](https://www.docker.com/). This is the most convenient way, if you want +to use `divik` application. + +To install latest stable version use: + +```bash +docker pull gmrukwa/divik +``` + +### Python package + +Prerequisites for installation of base package: + +- Python 3.7 / 3.8 / 3.9 +- compiler capable of compiling the native C code and OpenMP support + +#### Installation of OpenMP for Ubuntu / Debian + +You should have it already installed with GCC compiler, but if somehow +not, try the following: + +```bash +sudo apt-get install libgomp1 +``` + +#### Installation of OpenMP for Mac + +OpenMP is available as part of LLVM. You may need to install it with conda: + +```bash +conda install -c conda-forge "compilers>=1.0.4,!=1.1.0" llvm-openmp +``` + +#### Installation of dependencied on Mac + +You may see messages that some dependencies are invalid for the platform. +It is a [known bug](https://github.com/actions/setup-python/issues/469), +with [a workaround](https://github.com/actions/setup-python/issues/469#issuecomment-1192522949). + +Use: + +```bash +SYSTEM_VERSION_COMPAT=0 pip install divik +``` + +#### DiviK Installation + +Having prerequisites installed, one can install latest base version of the +package: + +```bash +pip install divik +``` + +If you want to have compatibility with +[`gin-config`](https://github.com/google/gin-config), you can install +necessary extras with: + +```bash +pip install divik[gin] +``` + +**Note:** Remember about `\` before `[` and `]` in `zsh` shell. + +You can install all extras with: + +```bash +pip install divik[all] +``` + +## High-Volume Data Considerations + +If you are using DiviK to run the analysis that could fail to fit RAM of your +computer, consider disabling the default parallelism and switch to +[dask](https://dask.org/). It's easy to achieve through configuration: + +- set all parameters named `n_jobs` to `1`; +- set all parameters named `allow_dask` to `True`. + +**Note:** Never set `n_jobs>1` and `allow_dask=True` at the same time, the +computations will freeze due to how `multiprocessing` and `dask` handle +parallelism. + +## Known Issues + +### Segmentation Fault + +It can happen if the he `gamred_native` package (part of `divik` package) was +compiled with different numpy ABI than scikit-learn. This could happen if you +used different set of compilers than the developers of the scikit-learn +package. + +In such a case, a handler is defined to display the stack trace. If the trace +comes from `_matlab_legacy.py`, the most probably this is the issue. + +To resolve the issue, consider following the installation instructions once +again. The exact versions get updated to avoid the issue. + +## Contributing + +Contribution guide will be developed soon. + +Format the code with: + +```bash +isort -m 3 --fgw 3 --tc . +black -t py36 . +``` + +## References + +This software is part of contribution made by [Data Mining Group of Silesian +University of Technology](http://www.zaed.polsl.pl/), rest of which is +published [here](https://github.com/ZAEDPolSl). + +- [Mrukwa, G. and Polanska, J., 2020. DiviK: Divisive intelligent K-means for +hands-free unsupervised clustering in biological big data. *arXiv preprint +arXiv:2009.10706.*][1] + +[1]: https://arxiv.org/abs/2009.10706 + + + +%package -n python3-divik +Summary: Divisive iK-means algorithm implementation +Provides: python-divik +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +BuildRequires: python3-cffi +BuildRequires: gcc +BuildRequires: gdb +%description -n python3-divik +[](https://www.codefactor.io/repository/github/gmrukwa/divik) +[](https://codeclimate.com/github/gmrukwa/divik/maintainability) + + +[](https://divik.readthedocs.io/en/latest/?badge=latest) + +# divik + +Python implementation of Divisive iK-means (DiviK) algorithm. + +## Tools within this package + +- Clustering at your command line with fit-clusters +- Set of algorithm implementations for unsupervised analyses + - Clustering + - DiviK - hands-free clustering method with built-in feature selection + - K-Means with Dunn method for selecting the number of clusters + - K-Means with GAP index for selecting the number of clusters + - Modular K-Means implementation with custom distance metrics and initializations + - Feature extraction + - PCA with knee-based components selection + - Locally Adjusted RBF Spectral Embedding + - Feature selection + - EXIMS + - Gaussian Mixture Model based data-driven feature selection + - High Abundance And Variance Selector - allows you to select highly variant features above noise level, based on GMM-decomposition + - Outlier based Selector + - Outlier Abundance And Variance Selector - allows you to select highly variant features above noise level, based on outlier detection + - Percentage based Selector - allows you to select highly variant features above noise level with your predefined thresholds for each + - Sampling + - StratifiedSampler - generates samples of fixed number of rows from given dataset + - UniformPCASampler - generates samples of random observations within boundaries of an original dataset, and preserving the rotation of the data + - UniformSampler - generates samples of random observations within boundaries of an original dataset + +## Installation + +### Docker + +The recommended way to use this software is through +[Docker](https://www.docker.com/). This is the most convenient way, if you want +to use `divik` application. + +To install latest stable version use: + +```bash +docker pull gmrukwa/divik +``` + +### Python package + +Prerequisites for installation of base package: + +- Python 3.7 / 3.8 / 3.9 +- compiler capable of compiling the native C code and OpenMP support + +#### Installation of OpenMP for Ubuntu / Debian + +You should have it already installed with GCC compiler, but if somehow +not, try the following: + +```bash +sudo apt-get install libgomp1 +``` + +#### Installation of OpenMP for Mac + +OpenMP is available as part of LLVM. You may need to install it with conda: + +```bash +conda install -c conda-forge "compilers>=1.0.4,!=1.1.0" llvm-openmp +``` + +#### Installation of dependencied on Mac + +You may see messages that some dependencies are invalid for the platform. +It is a [known bug](https://github.com/actions/setup-python/issues/469), +with [a workaround](https://github.com/actions/setup-python/issues/469#issuecomment-1192522949). + +Use: + +```bash +SYSTEM_VERSION_COMPAT=0 pip install divik +``` + +#### DiviK Installation + +Having prerequisites installed, one can install latest base version of the +package: + +```bash +pip install divik +``` + +If you want to have compatibility with +[`gin-config`](https://github.com/google/gin-config), you can install +necessary extras with: + +```bash +pip install divik[gin] +``` + +**Note:** Remember about `\` before `[` and `]` in `zsh` shell. + +You can install all extras with: + +```bash +pip install divik[all] +``` + +## High-Volume Data Considerations + +If you are using DiviK to run the analysis that could fail to fit RAM of your +computer, consider disabling the default parallelism and switch to +[dask](https://dask.org/). It's easy to achieve through configuration: + +- set all parameters named `n_jobs` to `1`; +- set all parameters named `allow_dask` to `True`. + +**Note:** Never set `n_jobs>1` and `allow_dask=True` at the same time, the +computations will freeze due to how `multiprocessing` and `dask` handle +parallelism. + +## Known Issues + +### Segmentation Fault + +It can happen if the he `gamred_native` package (part of `divik` package) was +compiled with different numpy ABI than scikit-learn. This could happen if you +used different set of compilers than the developers of the scikit-learn +package. + +In such a case, a handler is defined to display the stack trace. If the trace +comes from `_matlab_legacy.py`, the most probably this is the issue. + +To resolve the issue, consider following the installation instructions once +again. The exact versions get updated to avoid the issue. + +## Contributing + +Contribution guide will be developed soon. + +Format the code with: + +```bash +isort -m 3 --fgw 3 --tc . +black -t py36 . +``` + +## References + +This software is part of contribution made by [Data Mining Group of Silesian +University of Technology](http://www.zaed.polsl.pl/), rest of which is +published [here](https://github.com/ZAEDPolSl). + +- [Mrukwa, G. and Polanska, J., 2020. DiviK: Divisive intelligent K-means for +hands-free unsupervised clustering in biological big data. *arXiv preprint +arXiv:2009.10706.*][1] + +[1]: https://arxiv.org/abs/2009.10706 + + + +%package help +Summary: Development documents and examples for divik +Provides: python3-divik-doc +%description help +[](https://www.codefactor.io/repository/github/gmrukwa/divik) +[](https://codeclimate.com/github/gmrukwa/divik/maintainability) + + +[](https://divik.readthedocs.io/en/latest/?badge=latest) + +# divik + +Python implementation of Divisive iK-means (DiviK) algorithm. + +## Tools within this package + +- Clustering at your command line with fit-clusters +- Set of algorithm implementations for unsupervised analyses + - Clustering + - DiviK - hands-free clustering method with built-in feature selection + - K-Means with Dunn method for selecting the number of clusters + - K-Means with GAP index for selecting the number of clusters + - Modular K-Means implementation with custom distance metrics and initializations + - Feature extraction + - PCA with knee-based components selection + - Locally Adjusted RBF Spectral Embedding + - Feature selection + - EXIMS + - Gaussian Mixture Model based data-driven feature selection + - High Abundance And Variance Selector - allows you to select highly variant features above noise level, based on GMM-decomposition + - Outlier based Selector + - Outlier Abundance And Variance Selector - allows you to select highly variant features above noise level, based on outlier detection + - Percentage based Selector - allows you to select highly variant features above noise level with your predefined thresholds for each + - Sampling + - StratifiedSampler - generates samples of fixed number of rows from given dataset + - UniformPCASampler - generates samples of random observations within boundaries of an original dataset, and preserving the rotation of the data + - UniformSampler - generates samples of random observations within boundaries of an original dataset + +## Installation + +### Docker + +The recommended way to use this software is through +[Docker](https://www.docker.com/). This is the most convenient way, if you want +to use `divik` application. + +To install latest stable version use: + +```bash +docker pull gmrukwa/divik +``` + +### Python package + +Prerequisites for installation of base package: + +- Python 3.7 / 3.8 / 3.9 +- compiler capable of compiling the native C code and OpenMP support + +#### Installation of OpenMP for Ubuntu / Debian + +You should have it already installed with GCC compiler, but if somehow +not, try the following: + +```bash +sudo apt-get install libgomp1 +``` + +#### Installation of OpenMP for Mac + +OpenMP is available as part of LLVM. You may need to install it with conda: + +```bash +conda install -c conda-forge "compilers>=1.0.4,!=1.1.0" llvm-openmp +``` + +#### Installation of dependencied on Mac + +You may see messages that some dependencies are invalid for the platform. +It is a [known bug](https://github.com/actions/setup-python/issues/469), +with [a workaround](https://github.com/actions/setup-python/issues/469#issuecomment-1192522949). + +Use: + +```bash +SYSTEM_VERSION_COMPAT=0 pip install divik +``` + +#### DiviK Installation + +Having prerequisites installed, one can install latest base version of the +package: + +```bash +pip install divik +``` + +If you want to have compatibility with +[`gin-config`](https://github.com/google/gin-config), you can install +necessary extras with: + +```bash +pip install divik[gin] +``` + +**Note:** Remember about `\` before `[` and `]` in `zsh` shell. + +You can install all extras with: + +```bash +pip install divik[all] +``` + +## High-Volume Data Considerations + +If you are using DiviK to run the analysis that could fail to fit RAM of your +computer, consider disabling the default parallelism and switch to +[dask](https://dask.org/). It's easy to achieve through configuration: + +- set all parameters named `n_jobs` to `1`; +- set all parameters named `allow_dask` to `True`. + +**Note:** Never set `n_jobs>1` and `allow_dask=True` at the same time, the +computations will freeze due to how `multiprocessing` and `dask` handle +parallelism. + +## Known Issues + +### Segmentation Fault + +It can happen if the he `gamred_native` package (part of `divik` package) was +compiled with different numpy ABI than scikit-learn. This could happen if you +used different set of compilers than the developers of the scikit-learn +package. + +In such a case, a handler is defined to display the stack trace. If the trace +comes from `_matlab_legacy.py`, the most probably this is the issue. + +To resolve the issue, consider following the installation instructions once +again. The exact versions get updated to avoid the issue. + +## Contributing + +Contribution guide will be developed soon. + +Format the code with: + +```bash +isort -m 3 --fgw 3 --tc . +black -t py36 . +``` + +## References + +This software is part of contribution made by [Data Mining Group of Silesian +University of Technology](http://www.zaed.polsl.pl/), rest of which is +published [here](https://github.com/ZAEDPolSl). + +- [Mrukwa, G. and Polanska, J., 2020. DiviK: Divisive intelligent K-means for +hands-free unsupervised clustering in biological big data. *arXiv preprint +arXiv:2009.10706.*][1] + +[1]: https://arxiv.org/abs/2009.10706 + + + +%prep +%autosetup -n divik-3.2.2 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-divik -f filelist.lst +%dir %{python3_sitearch}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 3.2.2-1 +- Package Spec generated @@ -0,0 +1 @@ +f662d419598db682fcc9f960c4eac15f divik-3.2.2.tar.gz |