summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-29 10:02:10 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-29 10:02:10 +0000
commit5a9ad8f13076435dbc68f779bffc8642779638e0 (patch)
treef73874bd12508ff7a6237f8b6764ddf6dbda9dea
parentaf12cf16faa1edf669352c3bf661d75947f58d6d (diff)
automatic import of python-pt-datasets
-rw-r--r--.gitignore1
-rw-r--r--python-pt-datasets.spec589
-rw-r--r--sources1
3 files changed, 591 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..f5fa271 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/pt_datasets-0.19.24.tar.gz
diff --git a/python-pt-datasets.spec b/python-pt-datasets.spec
new file mode 100644
index 0000000..3bad34b
--- /dev/null
+++ b/python-pt-datasets.spec
@@ -0,0 +1,589 @@
+%global _empty_manifest_terminate_build 0
+Name: python-pt-datasets
+Version: 0.19.24
+Release: 1
+Summary: Library for loading PyTorch datasets and data loaders.
+License: AGPL-3.0 License
+URL: https://pypi.org/project/pt-datasets/
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b4/55/5b154c662ab8bf210d931f899036f100470e8f23605f46d63d70667f27c2/pt_datasets-0.19.24.tar.gz
+BuildArch: noarch
+
+Requires: python3-numpy
+Requires: python3-torchvision
+Requires: python3-torch
+Requires: python3-scikit-learn
+Requires: python3-opencv-python
+Requires: python3-nltk
+Requires: python3-imbalanced-learn
+Requires: python3-gdown
+Requires: python3-spacy
+Requires: python3-pymagnitude-lite
+Requires: python3-numba
+Requires: python3-umap-learn
+Requires: python3-opentsne
+
+%description
+# PyTorch Datasets
+
+[![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets)
+[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
+[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/)
+
+![](assets/term.png)
+
+## Overview
+
+This repository is meant for easier and faster access to commonly used
+benchmark datasets. Using this repository, one can load the datasets in a
+ready-to-use fashion for PyTorch models. Additionally, this can be used to load
+the low-dimensional features of the aforementioned datasets, encoded using PCA,
+t-SNE, or UMAP.
+
+## Datasets
+
+- MNIST
+- Fashion-MNIST
+- EMNIST-Balanced
+- CIFAR10
+- SVHN
+- MalImg
+- AG News
+- IMDB
+- Yelp
+- 20 Newsgroups
+- KMNIST
+- Wisconsin Diagnostic Breast Cancer
+- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net)
+- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net)
+
+_Note on COVID19 datasets: Training models on this is not intended to produce
+models for direct clinical diagnosis. Please do not use the model output for
+self-diagnosis, and seek help from your local health authorities._
+
+## Usage
+
+It is recommended to use a virtual environment to isolate the project dependencies.
+
+```shell script
+$ virtualenv env --python=python3 # we use python 3
+$ pip install pt-datasets # install the package
+```
+
+We can then use this package for loading ready-to-use data loaders,
+
+```python
+from pt_datasets import load_dataset, create_dataloader
+
+# load the training and test data
+train_data, test_data = load_dataset(name="cifar10")
+
+# create a data loader for the training data
+train_loader = create_dataloader(
+ dataset=train_data, batch_size=64, shuffle=True, num_workers=1
+)
+
+...
+
+# use the data loader for training
+model.fit(train_loader, epochs=10)
+```
+
+We can also encode the dataset features to a lower-dimensional space,
+
+```python
+import seaborn as sns
+import matplotlib.pyplot as plt
+from pt_datasets import load_dataset, encode_features
+
+# load the training and test data
+train_data, test_data = load_dataset(name="fashion_mnist")
+
+# get the numpy array of the features
+# the encoders can only accept np.ndarray types
+train_features = train_data.data.numpy()
+
+# flatten the tensors
+train_features = train_features.reshape(
+ train_features.shape[0], -1
+)
+
+# get the labels
+train_labels = train_data.targets.numpy()
+
+# get the class names
+classes = train_data.classes
+
+# encode training features using t-SNE
+encoded_train_features = encode_features(
+ features=train_features,
+ seed=1024,
+ encoder="tsne"
+)
+
+# use seaborn styling
+sns.set_style("darkgrid")
+
+# scatter plot each feature w.r.t class
+for index in range(len(classes)):
+ plt.scatter(
+ encoded_train_features[train_labels == index, 0],
+ encoded_train_features[train_labels == index, 1],
+ label=classes[index],
+ edgecolors="black"
+ )
+plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5)
+plt.show()
+```
+
+![](assets/tsne_fashion_mnist.png)
+
+## Citation
+
+When using the Malware Image classification dataset, kindly use the following
+citations,
+
+- BibTex
+
+```
+@article{agarap2017towards,
+ title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification},
+ author={Agarap, Abien Fred},
+ journal={arXiv preprint arXiv:1801.00318},
+ year={2017}
+}
+```
+
+- MLA
+
+```
+Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a
+deep learning approach using support vector machine (svm) for malware
+classification." arXiv preprint arXiv:1801.00318 (2017).
+```
+
+If you use this library, kindly cite it as,
+
+```
+@misc{agarap2020pytorch,
+ author = "Abien Fred Agarap",
+ title = "{PyTorch} datasets",
+ howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}",
+ note = "Accessed: 20xx-xx-xx"
+}
+```
+
+## License
+
+```
+PyTorch Datasets utility repository
+Copyright (C) 2020-2023 Abien Fred Agarap
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Affero General Public License as published
+by the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU Affero General Public License for more details.
+
+You should have received a copy of the GNU Affero General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+```
+
+
+%package -n python3-pt-datasets
+Summary: Library for loading PyTorch datasets and data loaders.
+Provides: python-pt-datasets
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-pt-datasets
+# PyTorch Datasets
+
+[![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets)
+[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
+[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/)
+
+![](assets/term.png)
+
+## Overview
+
+This repository is meant for easier and faster access to commonly used
+benchmark datasets. Using this repository, one can load the datasets in a
+ready-to-use fashion for PyTorch models. Additionally, this can be used to load
+the low-dimensional features of the aforementioned datasets, encoded using PCA,
+t-SNE, or UMAP.
+
+## Datasets
+
+- MNIST
+- Fashion-MNIST
+- EMNIST-Balanced
+- CIFAR10
+- SVHN
+- MalImg
+- AG News
+- IMDB
+- Yelp
+- 20 Newsgroups
+- KMNIST
+- Wisconsin Diagnostic Breast Cancer
+- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net)
+- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net)
+
+_Note on COVID19 datasets: Training models on this is not intended to produce
+models for direct clinical diagnosis. Please do not use the model output for
+self-diagnosis, and seek help from your local health authorities._
+
+## Usage
+
+It is recommended to use a virtual environment to isolate the project dependencies.
+
+```shell script
+$ virtualenv env --python=python3 # we use python 3
+$ pip install pt-datasets # install the package
+```
+
+We can then use this package for loading ready-to-use data loaders,
+
+```python
+from pt_datasets import load_dataset, create_dataloader
+
+# load the training and test data
+train_data, test_data = load_dataset(name="cifar10")
+
+# create a data loader for the training data
+train_loader = create_dataloader(
+ dataset=train_data, batch_size=64, shuffle=True, num_workers=1
+)
+
+...
+
+# use the data loader for training
+model.fit(train_loader, epochs=10)
+```
+
+We can also encode the dataset features to a lower-dimensional space,
+
+```python
+import seaborn as sns
+import matplotlib.pyplot as plt
+from pt_datasets import load_dataset, encode_features
+
+# load the training and test data
+train_data, test_data = load_dataset(name="fashion_mnist")
+
+# get the numpy array of the features
+# the encoders can only accept np.ndarray types
+train_features = train_data.data.numpy()
+
+# flatten the tensors
+train_features = train_features.reshape(
+ train_features.shape[0], -1
+)
+
+# get the labels
+train_labels = train_data.targets.numpy()
+
+# get the class names
+classes = train_data.classes
+
+# encode training features using t-SNE
+encoded_train_features = encode_features(
+ features=train_features,
+ seed=1024,
+ encoder="tsne"
+)
+
+# use seaborn styling
+sns.set_style("darkgrid")
+
+# scatter plot each feature w.r.t class
+for index in range(len(classes)):
+ plt.scatter(
+ encoded_train_features[train_labels == index, 0],
+ encoded_train_features[train_labels == index, 1],
+ label=classes[index],
+ edgecolors="black"
+ )
+plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5)
+plt.show()
+```
+
+![](assets/tsne_fashion_mnist.png)
+
+## Citation
+
+When using the Malware Image classification dataset, kindly use the following
+citations,
+
+- BibTex
+
+```
+@article{agarap2017towards,
+ title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification},
+ author={Agarap, Abien Fred},
+ journal={arXiv preprint arXiv:1801.00318},
+ year={2017}
+}
+```
+
+- MLA
+
+```
+Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a
+deep learning approach using support vector machine (svm) for malware
+classification." arXiv preprint arXiv:1801.00318 (2017).
+```
+
+If you use this library, kindly cite it as,
+
+```
+@misc{agarap2020pytorch,
+ author = "Abien Fred Agarap",
+ title = "{PyTorch} datasets",
+ howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}",
+ note = "Accessed: 20xx-xx-xx"
+}
+```
+
+## License
+
+```
+PyTorch Datasets utility repository
+Copyright (C) 2020-2023 Abien Fred Agarap
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Affero General Public License as published
+by the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU Affero General Public License for more details.
+
+You should have received a copy of the GNU Affero General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+```
+
+
+%package help
+Summary: Development documents and examples for pt-datasets
+Provides: python3-pt-datasets-doc
+%description help
+# PyTorch Datasets
+
+[![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets)
+[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
+[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/)
+
+![](assets/term.png)
+
+## Overview
+
+This repository is meant for easier and faster access to commonly used
+benchmark datasets. Using this repository, one can load the datasets in a
+ready-to-use fashion for PyTorch models. Additionally, this can be used to load
+the low-dimensional features of the aforementioned datasets, encoded using PCA,
+t-SNE, or UMAP.
+
+## Datasets
+
+- MNIST
+- Fashion-MNIST
+- EMNIST-Balanced
+- CIFAR10
+- SVHN
+- MalImg
+- AG News
+- IMDB
+- Yelp
+- 20 Newsgroups
+- KMNIST
+- Wisconsin Diagnostic Breast Cancer
+- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net)
+- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net)
+
+_Note on COVID19 datasets: Training models on this is not intended to produce
+models for direct clinical diagnosis. Please do not use the model output for
+self-diagnosis, and seek help from your local health authorities._
+
+## Usage
+
+It is recommended to use a virtual environment to isolate the project dependencies.
+
+```shell script
+$ virtualenv env --python=python3 # we use python 3
+$ pip install pt-datasets # install the package
+```
+
+We can then use this package for loading ready-to-use data loaders,
+
+```python
+from pt_datasets import load_dataset, create_dataloader
+
+# load the training and test data
+train_data, test_data = load_dataset(name="cifar10")
+
+# create a data loader for the training data
+train_loader = create_dataloader(
+ dataset=train_data, batch_size=64, shuffle=True, num_workers=1
+)
+
+...
+
+# use the data loader for training
+model.fit(train_loader, epochs=10)
+```
+
+We can also encode the dataset features to a lower-dimensional space,
+
+```python
+import seaborn as sns
+import matplotlib.pyplot as plt
+from pt_datasets import load_dataset, encode_features
+
+# load the training and test data
+train_data, test_data = load_dataset(name="fashion_mnist")
+
+# get the numpy array of the features
+# the encoders can only accept np.ndarray types
+train_features = train_data.data.numpy()
+
+# flatten the tensors
+train_features = train_features.reshape(
+ train_features.shape[0], -1
+)
+
+# get the labels
+train_labels = train_data.targets.numpy()
+
+# get the class names
+classes = train_data.classes
+
+# encode training features using t-SNE
+encoded_train_features = encode_features(
+ features=train_features,
+ seed=1024,
+ encoder="tsne"
+)
+
+# use seaborn styling
+sns.set_style("darkgrid")
+
+# scatter plot each feature w.r.t class
+for index in range(len(classes)):
+ plt.scatter(
+ encoded_train_features[train_labels == index, 0],
+ encoded_train_features[train_labels == index, 1],
+ label=classes[index],
+ edgecolors="black"
+ )
+plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5)
+plt.show()
+```
+
+![](assets/tsne_fashion_mnist.png)
+
+## Citation
+
+When using the Malware Image classification dataset, kindly use the following
+citations,
+
+- BibTex
+
+```
+@article{agarap2017towards,
+ title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification},
+ author={Agarap, Abien Fred},
+ journal={arXiv preprint arXiv:1801.00318},
+ year={2017}
+}
+```
+
+- MLA
+
+```
+Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a
+deep learning approach using support vector machine (svm) for malware
+classification." arXiv preprint arXiv:1801.00318 (2017).
+```
+
+If you use this library, kindly cite it as,
+
+```
+@misc{agarap2020pytorch,
+ author = "Abien Fred Agarap",
+ title = "{PyTorch} datasets",
+ howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}",
+ note = "Accessed: 20xx-xx-xx"
+}
+```
+
+## License
+
+```
+PyTorch Datasets utility repository
+Copyright (C) 2020-2023 Abien Fred Agarap
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Affero General Public License as published
+by the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU Affero General Public License for more details.
+
+You should have received a copy of the GNU Affero General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+```
+
+
+%prep
+%autosetup -n pt-datasets-0.19.24
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pt-datasets -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.19.24-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..f64f815
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+aa75c39a68daa86b4aacf69c6e4fa91d pt_datasets-0.19.24.tar.gz