%global _empty_manifest_terminate_build 0 Name: python-pt-datasets Version: 0.19.24 Release: 1 Summary: Library for loading PyTorch datasets and data loaders. License: AGPL-3.0 License URL: https://pypi.org/project/pt-datasets/ Source0: https://mirrors.aliyun.com/pypi/web/packages/b4/55/5b154c662ab8bf210d931f899036f100470e8f23605f46d63d70667f27c2/pt_datasets-0.19.24.tar.gz BuildArch: noarch Requires: python3-numpy Requires: python3-torchvision Requires: python3-torch Requires: python3-scikit-learn Requires: python3-opencv-python Requires: python3-nltk Requires: python3-imbalanced-learn Requires: python3-gdown Requires: python3-spacy Requires: python3-pymagnitude-lite Requires: python3-numba Requires: python3-umap-learn Requires: python3-opentsne %description # PyTorch Datasets [![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets) [![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0) [![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/) ![](assets/term.png) ## Overview This repository is meant for easier and faster access to commonly used benchmark datasets. Using this repository, one can load the datasets in a ready-to-use fashion for PyTorch models. Additionally, this can be used to load the low-dimensional features of the aforementioned datasets, encoded using PCA, t-SNE, or UMAP. ## Datasets - MNIST - Fashion-MNIST - EMNIST-Balanced - CIFAR10 - SVHN - MalImg - AG News - IMDB - Yelp - 20 Newsgroups - KMNIST - Wisconsin Diagnostic Breast Cancer - [COVID19 binary classification](https://github.com/lindawangg/COVID-Net) - [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net) _Note on COVID19 datasets: Training models on this is not intended to produce models for direct clinical diagnosis. Please do not use the model output for self-diagnosis, and seek help from your local health authorities._ ## Usage It is recommended to use a virtual environment to isolate the project dependencies. ```shell script $ virtualenv env --python=python3 # we use python 3 $ pip install pt-datasets # install the package ``` We can then use this package for loading ready-to-use data loaders, ```python from pt_datasets import load_dataset, create_dataloader # load the training and test data train_data, test_data = load_dataset(name="cifar10") # create a data loader for the training data train_loader = create_dataloader( dataset=train_data, batch_size=64, shuffle=True, num_workers=1 ) ... # use the data loader for training model.fit(train_loader, epochs=10) ``` We can also encode the dataset features to a lower-dimensional space, ```python import seaborn as sns import matplotlib.pyplot as plt from pt_datasets import load_dataset, encode_features # load the training and test data train_data, test_data = load_dataset(name="fashion_mnist") # get the numpy array of the features # the encoders can only accept np.ndarray types train_features = train_data.data.numpy() # flatten the tensors train_features = train_features.reshape( train_features.shape[0], -1 ) # get the labels train_labels = train_data.targets.numpy() # get the class names classes = train_data.classes # encode training features using t-SNE encoded_train_features = encode_features( features=train_features, seed=1024, encoder="tsne" ) # use seaborn styling sns.set_style("darkgrid") # scatter plot each feature w.r.t class for index in range(len(classes)): plt.scatter( encoded_train_features[train_labels == index, 0], encoded_train_features[train_labels == index, 1], label=classes[index], edgecolors="black" ) plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5) plt.show() ``` ![](assets/tsne_fashion_mnist.png) ## Citation When using the Malware Image classification dataset, kindly use the following citations, - BibTex ``` @article{agarap2017towards, title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification}, author={Agarap, Abien Fred}, journal={arXiv preprint arXiv:1801.00318}, year={2017} } ``` - MLA ``` Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (svm) for malware classification." arXiv preprint arXiv:1801.00318 (2017). ``` If you use this library, kindly cite it as, ``` @misc{agarap2020pytorch, author = "Abien Fred Agarap", title = "{PyTorch} datasets", howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}", note = "Accessed: 20xx-xx-xx" } ``` ## License ``` PyTorch Datasets utility repository Copyright (C) 2020-2023 Abien Fred Agarap This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see . ``` %package -n python3-pt-datasets Summary: Library for loading PyTorch datasets and data loaders. Provides: python-pt-datasets BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-pt-datasets # PyTorch Datasets [![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets) [![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0) [![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/) ![](assets/term.png) ## Overview This repository is meant for easier and faster access to commonly used benchmark datasets. Using this repository, one can load the datasets in a ready-to-use fashion for PyTorch models. Additionally, this can be used to load the low-dimensional features of the aforementioned datasets, encoded using PCA, t-SNE, or UMAP. ## Datasets - MNIST - Fashion-MNIST - EMNIST-Balanced - CIFAR10 - SVHN - MalImg - AG News - IMDB - Yelp - 20 Newsgroups - KMNIST - Wisconsin Diagnostic Breast Cancer - [COVID19 binary classification](https://github.com/lindawangg/COVID-Net) - [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net) _Note on COVID19 datasets: Training models on this is not intended to produce models for direct clinical diagnosis. Please do not use the model output for self-diagnosis, and seek help from your local health authorities._ ## Usage It is recommended to use a virtual environment to isolate the project dependencies. ```shell script $ virtualenv env --python=python3 # we use python 3 $ pip install pt-datasets # install the package ``` We can then use this package for loading ready-to-use data loaders, ```python from pt_datasets import load_dataset, create_dataloader # load the training and test data train_data, test_data = load_dataset(name="cifar10") # create a data loader for the training data train_loader = create_dataloader( dataset=train_data, batch_size=64, shuffle=True, num_workers=1 ) ... # use the data loader for training model.fit(train_loader, epochs=10) ``` We can also encode the dataset features to a lower-dimensional space, ```python import seaborn as sns import matplotlib.pyplot as plt from pt_datasets import load_dataset, encode_features # load the training and test data train_data, test_data = load_dataset(name="fashion_mnist") # get the numpy array of the features # the encoders can only accept np.ndarray types train_features = train_data.data.numpy() # flatten the tensors train_features = train_features.reshape( train_features.shape[0], -1 ) # get the labels train_labels = train_data.targets.numpy() # get the class names classes = train_data.classes # encode training features using t-SNE encoded_train_features = encode_features( features=train_features, seed=1024, encoder="tsne" ) # use seaborn styling sns.set_style("darkgrid") # scatter plot each feature w.r.t class for index in range(len(classes)): plt.scatter( encoded_train_features[train_labels == index, 0], encoded_train_features[train_labels == index, 1], label=classes[index], edgecolors="black" ) plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5) plt.show() ``` ![](assets/tsne_fashion_mnist.png) ## Citation When using the Malware Image classification dataset, kindly use the following citations, - BibTex ``` @article{agarap2017towards, title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification}, author={Agarap, Abien Fred}, journal={arXiv preprint arXiv:1801.00318}, year={2017} } ``` - MLA ``` Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (svm) for malware classification." arXiv preprint arXiv:1801.00318 (2017). ``` If you use this library, kindly cite it as, ``` @misc{agarap2020pytorch, author = "Abien Fred Agarap", title = "{PyTorch} datasets", howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}", note = "Accessed: 20xx-xx-xx" } ``` ## License ``` PyTorch Datasets utility repository Copyright (C) 2020-2023 Abien Fred Agarap This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see . ``` %package help Summary: Development documents and examples for pt-datasets Provides: python3-pt-datasets-doc %description help # PyTorch Datasets [![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets) [![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0) [![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/) ![](assets/term.png) ## Overview This repository is meant for easier and faster access to commonly used benchmark datasets. Using this repository, one can load the datasets in a ready-to-use fashion for PyTorch models. Additionally, this can be used to load the low-dimensional features of the aforementioned datasets, encoded using PCA, t-SNE, or UMAP. ## Datasets - MNIST - Fashion-MNIST - EMNIST-Balanced - CIFAR10 - SVHN - MalImg - AG News - IMDB - Yelp - 20 Newsgroups - KMNIST - Wisconsin Diagnostic Breast Cancer - [COVID19 binary classification](https://github.com/lindawangg/COVID-Net) - [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net) _Note on COVID19 datasets: Training models on this is not intended to produce models for direct clinical diagnosis. Please do not use the model output for self-diagnosis, and seek help from your local health authorities._ ## Usage It is recommended to use a virtual environment to isolate the project dependencies. ```shell script $ virtualenv env --python=python3 # we use python 3 $ pip install pt-datasets # install the package ``` We can then use this package for loading ready-to-use data loaders, ```python from pt_datasets import load_dataset, create_dataloader # load the training and test data train_data, test_data = load_dataset(name="cifar10") # create a data loader for the training data train_loader = create_dataloader( dataset=train_data, batch_size=64, shuffle=True, num_workers=1 ) ... # use the data loader for training model.fit(train_loader, epochs=10) ``` We can also encode the dataset features to a lower-dimensional space, ```python import seaborn as sns import matplotlib.pyplot as plt from pt_datasets import load_dataset, encode_features # load the training and test data train_data, test_data = load_dataset(name="fashion_mnist") # get the numpy array of the features # the encoders can only accept np.ndarray types train_features = train_data.data.numpy() # flatten the tensors train_features = train_features.reshape( train_features.shape[0], -1 ) # get the labels train_labels = train_data.targets.numpy() # get the class names classes = train_data.classes # encode training features using t-SNE encoded_train_features = encode_features( features=train_features, seed=1024, encoder="tsne" ) # use seaborn styling sns.set_style("darkgrid") # scatter plot each feature w.r.t class for index in range(len(classes)): plt.scatter( encoded_train_features[train_labels == index, 0], encoded_train_features[train_labels == index, 1], label=classes[index], edgecolors="black" ) plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5) plt.show() ``` ![](assets/tsne_fashion_mnist.png) ## Citation When using the Malware Image classification dataset, kindly use the following citations, - BibTex ``` @article{agarap2017towards, title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification}, author={Agarap, Abien Fred}, journal={arXiv preprint arXiv:1801.00318}, year={2017} } ``` - MLA ``` Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (svm) for malware classification." arXiv preprint arXiv:1801.00318 (2017). ``` If you use this library, kindly cite it as, ``` @misc{agarap2020pytorch, author = "Abien Fred Agarap", title = "{PyTorch} datasets", howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}", note = "Accessed: 20xx-xx-xx" } ``` ## License ``` PyTorch Datasets utility repository Copyright (C) 2020-2023 Abien Fred Agarap This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see . ``` %prep %autosetup -n pt_datasets-0.19.24 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-pt-datasets -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Thu Jun 08 2023 Python_Bot - 0.19.24-1 - Package Spec generated