diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-29 10:02:10 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-29 10:02:10 +0000 |
| commit | 5a9ad8f13076435dbc68f779bffc8642779638e0 (patch) | |
| tree | f73874bd12508ff7a6237f8b6764ddf6dbda9dea | |
| parent | af12cf16faa1edf669352c3bf661d75947f58d6d (diff) | |
automatic import of python-pt-datasets
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-pt-datasets.spec | 589 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 591 insertions, 0 deletions
@@ -0,0 +1 @@ +/pt_datasets-0.19.24.tar.gz diff --git a/python-pt-datasets.spec b/python-pt-datasets.spec new file mode 100644 index 0000000..3bad34b --- /dev/null +++ b/python-pt-datasets.spec @@ -0,0 +1,589 @@ +%global _empty_manifest_terminate_build 0 +Name: python-pt-datasets +Version: 0.19.24 +Release: 1 +Summary: Library for loading PyTorch datasets and data loaders. +License: AGPL-3.0 License +URL: https://pypi.org/project/pt-datasets/ +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b4/55/5b154c662ab8bf210d931f899036f100470e8f23605f46d63d70667f27c2/pt_datasets-0.19.24.tar.gz +BuildArch: noarch + +Requires: python3-numpy +Requires: python3-torchvision +Requires: python3-torch +Requires: python3-scikit-learn +Requires: python3-opencv-python +Requires: python3-nltk +Requires: python3-imbalanced-learn +Requires: python3-gdown +Requires: python3-spacy +Requires: python3-pymagnitude-lite +Requires: python3-numba +Requires: python3-umap-learn +Requires: python3-opentsne + +%description +# PyTorch Datasets + +[](https://badge.fury.io/py/pt-datasets) +[](https://www.gnu.org/licenses/agpl-3.0) +[](https://www.python.org/downloads/release/python-3916/) + + + +## Overview + +This repository is meant for easier and faster access to commonly used +benchmark datasets. Using this repository, one can load the datasets in a +ready-to-use fashion for PyTorch models. Additionally, this can be used to load +the low-dimensional features of the aforementioned datasets, encoded using PCA, +t-SNE, or UMAP. + +## Datasets + +- MNIST +- Fashion-MNIST +- EMNIST-Balanced +- CIFAR10 +- SVHN +- MalImg +- AG News +- IMDB +- Yelp +- 20 Newsgroups +- KMNIST +- Wisconsin Diagnostic Breast Cancer +- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net) +- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net) + +_Note on COVID19 datasets: Training models on this is not intended to produce +models for direct clinical diagnosis. Please do not use the model output for +self-diagnosis, and seek help from your local health authorities._ + +## Usage + +It is recommended to use a virtual environment to isolate the project dependencies. + +```shell script +$ virtualenv env --python=python3 # we use python 3 +$ pip install pt-datasets # install the package +``` + +We can then use this package for loading ready-to-use data loaders, + +```python +from pt_datasets import load_dataset, create_dataloader + +# load the training and test data +train_data, test_data = load_dataset(name="cifar10") + +# create a data loader for the training data +train_loader = create_dataloader( + dataset=train_data, batch_size=64, shuffle=True, num_workers=1 +) + +... + +# use the data loader for training +model.fit(train_loader, epochs=10) +``` + +We can also encode the dataset features to a lower-dimensional space, + +```python +import seaborn as sns +import matplotlib.pyplot as plt +from pt_datasets import load_dataset, encode_features + +# load the training and test data +train_data, test_data = load_dataset(name="fashion_mnist") + +# get the numpy array of the features +# the encoders can only accept np.ndarray types +train_features = train_data.data.numpy() + +# flatten the tensors +train_features = train_features.reshape( + train_features.shape[0], -1 +) + +# get the labels +train_labels = train_data.targets.numpy() + +# get the class names +classes = train_data.classes + +# encode training features using t-SNE +encoded_train_features = encode_features( + features=train_features, + seed=1024, + encoder="tsne" +) + +# use seaborn styling +sns.set_style("darkgrid") + +# scatter plot each feature w.r.t class +for index in range(len(classes)): + plt.scatter( + encoded_train_features[train_labels == index, 0], + encoded_train_features[train_labels == index, 1], + label=classes[index], + edgecolors="black" + ) +plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5) +plt.show() +``` + + + +## Citation + +When using the Malware Image classification dataset, kindly use the following +citations, + +- BibTex + +``` +@article{agarap2017towards, + title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification}, + author={Agarap, Abien Fred}, + journal={arXiv preprint arXiv:1801.00318}, + year={2017} +} +``` + +- MLA + +``` +Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a +deep learning approach using support vector machine (svm) for malware +classification." arXiv preprint arXiv:1801.00318 (2017). +``` + +If you use this library, kindly cite it as, + +``` +@misc{agarap2020pytorch, + author = "Abien Fred Agarap", + title = "{PyTorch} datasets", + howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}", + note = "Accessed: 20xx-xx-xx" +} +``` + +## License + +``` +PyTorch Datasets utility repository +Copyright (C) 2020-2023 Abien Fred Agarap + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published +by the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see <https://www.gnu.org/licenses/>. +``` + + +%package -n python3-pt-datasets +Summary: Library for loading PyTorch datasets and data loaders. +Provides: python-pt-datasets +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-pt-datasets +# PyTorch Datasets + +[](https://badge.fury.io/py/pt-datasets) +[](https://www.gnu.org/licenses/agpl-3.0) +[](https://www.python.org/downloads/release/python-3916/) + + + +## Overview + +This repository is meant for easier and faster access to commonly used +benchmark datasets. Using this repository, one can load the datasets in a +ready-to-use fashion for PyTorch models. Additionally, this can be used to load +the low-dimensional features of the aforementioned datasets, encoded using PCA, +t-SNE, or UMAP. + +## Datasets + +- MNIST +- Fashion-MNIST +- EMNIST-Balanced +- CIFAR10 +- SVHN +- MalImg +- AG News +- IMDB +- Yelp +- 20 Newsgroups +- KMNIST +- Wisconsin Diagnostic Breast Cancer +- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net) +- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net) + +_Note on COVID19 datasets: Training models on this is not intended to produce +models for direct clinical diagnosis. Please do not use the model output for +self-diagnosis, and seek help from your local health authorities._ + +## Usage + +It is recommended to use a virtual environment to isolate the project dependencies. + +```shell script +$ virtualenv env --python=python3 # we use python 3 +$ pip install pt-datasets # install the package +``` + +We can then use this package for loading ready-to-use data loaders, + +```python +from pt_datasets import load_dataset, create_dataloader + +# load the training and test data +train_data, test_data = load_dataset(name="cifar10") + +# create a data loader for the training data +train_loader = create_dataloader( + dataset=train_data, batch_size=64, shuffle=True, num_workers=1 +) + +... + +# use the data loader for training +model.fit(train_loader, epochs=10) +``` + +We can also encode the dataset features to a lower-dimensional space, + +```python +import seaborn as sns +import matplotlib.pyplot as plt +from pt_datasets import load_dataset, encode_features + +# load the training and test data +train_data, test_data = load_dataset(name="fashion_mnist") + +# get the numpy array of the features +# the encoders can only accept np.ndarray types +train_features = train_data.data.numpy() + +# flatten the tensors +train_features = train_features.reshape( + train_features.shape[0], -1 +) + +# get the labels +train_labels = train_data.targets.numpy() + +# get the class names +classes = train_data.classes + +# encode training features using t-SNE +encoded_train_features = encode_features( + features=train_features, + seed=1024, + encoder="tsne" +) + +# use seaborn styling +sns.set_style("darkgrid") + +# scatter plot each feature w.r.t class +for index in range(len(classes)): + plt.scatter( + encoded_train_features[train_labels == index, 0], + encoded_train_features[train_labels == index, 1], + label=classes[index], + edgecolors="black" + ) +plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5) +plt.show() +``` + + + +## Citation + +When using the Malware Image classification dataset, kindly use the following +citations, + +- BibTex + +``` +@article{agarap2017towards, + title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification}, + author={Agarap, Abien Fred}, + journal={arXiv preprint arXiv:1801.00318}, + year={2017} +} +``` + +- MLA + +``` +Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a +deep learning approach using support vector machine (svm) for malware +classification." arXiv preprint arXiv:1801.00318 (2017). +``` + +If you use this library, kindly cite it as, + +``` +@misc{agarap2020pytorch, + author = "Abien Fred Agarap", + title = "{PyTorch} datasets", + howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}", + note = "Accessed: 20xx-xx-xx" +} +``` + +## License + +``` +PyTorch Datasets utility repository +Copyright (C) 2020-2023 Abien Fred Agarap + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published +by the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see <https://www.gnu.org/licenses/>. +``` + + +%package help +Summary: Development documents and examples for pt-datasets +Provides: python3-pt-datasets-doc +%description help +# PyTorch Datasets + +[](https://badge.fury.io/py/pt-datasets) +[](https://www.gnu.org/licenses/agpl-3.0) +[](https://www.python.org/downloads/release/python-3916/) + + + +## Overview + +This repository is meant for easier and faster access to commonly used +benchmark datasets. Using this repository, one can load the datasets in a +ready-to-use fashion for PyTorch models. Additionally, this can be used to load +the low-dimensional features of the aforementioned datasets, encoded using PCA, +t-SNE, or UMAP. + +## Datasets + +- MNIST +- Fashion-MNIST +- EMNIST-Balanced +- CIFAR10 +- SVHN +- MalImg +- AG News +- IMDB +- Yelp +- 20 Newsgroups +- KMNIST +- Wisconsin Diagnostic Breast Cancer +- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net) +- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net) + +_Note on COVID19 datasets: Training models on this is not intended to produce +models for direct clinical diagnosis. Please do not use the model output for +self-diagnosis, and seek help from your local health authorities._ + +## Usage + +It is recommended to use a virtual environment to isolate the project dependencies. + +```shell script +$ virtualenv env --python=python3 # we use python 3 +$ pip install pt-datasets # install the package +``` + +We can then use this package for loading ready-to-use data loaders, + +```python +from pt_datasets import load_dataset, create_dataloader + +# load the training and test data +train_data, test_data = load_dataset(name="cifar10") + +# create a data loader for the training data +train_loader = create_dataloader( + dataset=train_data, batch_size=64, shuffle=True, num_workers=1 +) + +... + +# use the data loader for training +model.fit(train_loader, epochs=10) +``` + +We can also encode the dataset features to a lower-dimensional space, + +```python +import seaborn as sns +import matplotlib.pyplot as plt +from pt_datasets import load_dataset, encode_features + +# load the training and test data +train_data, test_data = load_dataset(name="fashion_mnist") + +# get the numpy array of the features +# the encoders can only accept np.ndarray types +train_features = train_data.data.numpy() + +# flatten the tensors +train_features = train_features.reshape( + train_features.shape[0], -1 +) + +# get the labels +train_labels = train_data.targets.numpy() + +# get the class names +classes = train_data.classes + +# encode training features using t-SNE +encoded_train_features = encode_features( + features=train_features, + seed=1024, + encoder="tsne" +) + +# use seaborn styling +sns.set_style("darkgrid") + +# scatter plot each feature w.r.t class +for index in range(len(classes)): + plt.scatter( + encoded_train_features[train_labels == index, 0], + encoded_train_features[train_labels == index, 1], + label=classes[index], + edgecolors="black" + ) +plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5) +plt.show() +``` + + + +## Citation + +When using the Malware Image classification dataset, kindly use the following +citations, + +- BibTex + +``` +@article{agarap2017towards, + title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification}, + author={Agarap, Abien Fred}, + journal={arXiv preprint arXiv:1801.00318}, + year={2017} +} +``` + +- MLA + +``` +Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a +deep learning approach using support vector machine (svm) for malware +classification." arXiv preprint arXiv:1801.00318 (2017). +``` + +If you use this library, kindly cite it as, + +``` +@misc{agarap2020pytorch, + author = "Abien Fred Agarap", + title = "{PyTorch} datasets", + howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}", + note = "Accessed: 20xx-xx-xx" +} +``` + +## License + +``` +PyTorch Datasets utility repository +Copyright (C) 2020-2023 Abien Fred Agarap + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Affero General Public License as published +by the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public License +along with this program. If not, see <https://www.gnu.org/licenses/>. +``` + + +%prep +%autosetup -n pt-datasets-0.19.24 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-pt-datasets -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.19.24-1 +- Package Spec generated @@ -0,0 +1 @@ +aa75c39a68daa86b4aacf69c6e4fa91d pt_datasets-0.19.24.tar.gz |
