%global _empty_manifest_terminate_build 0
Name: python-pt-datasets
Version: 0.19.24
Release: 1
Summary: Library for loading PyTorch datasets and data loaders.
License: AGPL-3.0 License
URL: https://pypi.org/project/pt-datasets/
Source0: https://mirrors.aliyun.com/pypi/web/packages/b4/55/5b154c662ab8bf210d931f899036f100470e8f23605f46d63d70667f27c2/pt_datasets-0.19.24.tar.gz
BuildArch: noarch
Requires: python3-numpy
Requires: python3-torchvision
Requires: python3-torch
Requires: python3-scikit-learn
Requires: python3-opencv-python
Requires: python3-nltk
Requires: python3-imbalanced-learn
Requires: python3-gdown
Requires: python3-spacy
Requires: python3-pymagnitude-lite
Requires: python3-numba
Requires: python3-umap-learn
Requires: python3-opentsne
%description
# PyTorch Datasets
[![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/)
![](assets/term.png)
## Overview
This repository is meant for easier and faster access to commonly used
benchmark datasets. Using this repository, one can load the datasets in a
ready-to-use fashion for PyTorch models. Additionally, this can be used to load
the low-dimensional features of the aforementioned datasets, encoded using PCA,
t-SNE, or UMAP.
## Datasets
- MNIST
- Fashion-MNIST
- EMNIST-Balanced
- CIFAR10
- SVHN
- MalImg
- AG News
- IMDB
- Yelp
- 20 Newsgroups
- KMNIST
- Wisconsin Diagnostic Breast Cancer
- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net)
- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net)
_Note on COVID19 datasets: Training models on this is not intended to produce
models for direct clinical diagnosis. Please do not use the model output for
self-diagnosis, and seek help from your local health authorities._
## Usage
It is recommended to use a virtual environment to isolate the project dependencies.
```shell script
$ virtualenv env --python=python3 # we use python 3
$ pip install pt-datasets # install the package
```
We can then use this package for loading ready-to-use data loaders,
```python
from pt_datasets import load_dataset, create_dataloader
# load the training and test data
train_data, test_data = load_dataset(name="cifar10")
# create a data loader for the training data
train_loader = create_dataloader(
dataset=train_data, batch_size=64, shuffle=True, num_workers=1
)
...
# use the data loader for training
model.fit(train_loader, epochs=10)
```
We can also encode the dataset features to a lower-dimensional space,
```python
import seaborn as sns
import matplotlib.pyplot as plt
from pt_datasets import load_dataset, encode_features
# load the training and test data
train_data, test_data = load_dataset(name="fashion_mnist")
# get the numpy array of the features
# the encoders can only accept np.ndarray types
train_features = train_data.data.numpy()
# flatten the tensors
train_features = train_features.reshape(
train_features.shape[0], -1
)
# get the labels
train_labels = train_data.targets.numpy()
# get the class names
classes = train_data.classes
# encode training features using t-SNE
encoded_train_features = encode_features(
features=train_features,
seed=1024,
encoder="tsne"
)
# use seaborn styling
sns.set_style("darkgrid")
# scatter plot each feature w.r.t class
for index in range(len(classes)):
plt.scatter(
encoded_train_features[train_labels == index, 0],
encoded_train_features[train_labels == index, 1],
label=classes[index],
edgecolors="black"
)
plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5)
plt.show()
```
![](assets/tsne_fashion_mnist.png)
## Citation
When using the Malware Image classification dataset, kindly use the following
citations,
- BibTex
```
@article{agarap2017towards,
title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification},
author={Agarap, Abien Fred},
journal={arXiv preprint arXiv:1801.00318},
year={2017}
}
```
- MLA
```
Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a
deep learning approach using support vector machine (svm) for malware
classification." arXiv preprint arXiv:1801.00318 (2017).
```
If you use this library, kindly cite it as,
```
@misc{agarap2020pytorch,
author = "Abien Fred Agarap",
title = "{PyTorch} datasets",
howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}",
note = "Accessed: 20xx-xx-xx"
}
```
## License
```
PyTorch Datasets utility repository
Copyright (C) 2020-2023 Abien Fred Agarap
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see .
```
%package -n python3-pt-datasets
Summary: Library for loading PyTorch datasets and data loaders.
Provides: python-pt-datasets
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
%description -n python3-pt-datasets
# PyTorch Datasets
[![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/)
![](assets/term.png)
## Overview
This repository is meant for easier and faster access to commonly used
benchmark datasets. Using this repository, one can load the datasets in a
ready-to-use fashion for PyTorch models. Additionally, this can be used to load
the low-dimensional features of the aforementioned datasets, encoded using PCA,
t-SNE, or UMAP.
## Datasets
- MNIST
- Fashion-MNIST
- EMNIST-Balanced
- CIFAR10
- SVHN
- MalImg
- AG News
- IMDB
- Yelp
- 20 Newsgroups
- KMNIST
- Wisconsin Diagnostic Breast Cancer
- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net)
- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net)
_Note on COVID19 datasets: Training models on this is not intended to produce
models for direct clinical diagnosis. Please do not use the model output for
self-diagnosis, and seek help from your local health authorities._
## Usage
It is recommended to use a virtual environment to isolate the project dependencies.
```shell script
$ virtualenv env --python=python3 # we use python 3
$ pip install pt-datasets # install the package
```
We can then use this package for loading ready-to-use data loaders,
```python
from pt_datasets import load_dataset, create_dataloader
# load the training and test data
train_data, test_data = load_dataset(name="cifar10")
# create a data loader for the training data
train_loader = create_dataloader(
dataset=train_data, batch_size=64, shuffle=True, num_workers=1
)
...
# use the data loader for training
model.fit(train_loader, epochs=10)
```
We can also encode the dataset features to a lower-dimensional space,
```python
import seaborn as sns
import matplotlib.pyplot as plt
from pt_datasets import load_dataset, encode_features
# load the training and test data
train_data, test_data = load_dataset(name="fashion_mnist")
# get the numpy array of the features
# the encoders can only accept np.ndarray types
train_features = train_data.data.numpy()
# flatten the tensors
train_features = train_features.reshape(
train_features.shape[0], -1
)
# get the labels
train_labels = train_data.targets.numpy()
# get the class names
classes = train_data.classes
# encode training features using t-SNE
encoded_train_features = encode_features(
features=train_features,
seed=1024,
encoder="tsne"
)
# use seaborn styling
sns.set_style("darkgrid")
# scatter plot each feature w.r.t class
for index in range(len(classes)):
plt.scatter(
encoded_train_features[train_labels == index, 0],
encoded_train_features[train_labels == index, 1],
label=classes[index],
edgecolors="black"
)
plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5)
plt.show()
```
![](assets/tsne_fashion_mnist.png)
## Citation
When using the Malware Image classification dataset, kindly use the following
citations,
- BibTex
```
@article{agarap2017towards,
title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification},
author={Agarap, Abien Fred},
journal={arXiv preprint arXiv:1801.00318},
year={2017}
}
```
- MLA
```
Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a
deep learning approach using support vector machine (svm) for malware
classification." arXiv preprint arXiv:1801.00318 (2017).
```
If you use this library, kindly cite it as,
```
@misc{agarap2020pytorch,
author = "Abien Fred Agarap",
title = "{PyTorch} datasets",
howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}",
note = "Accessed: 20xx-xx-xx"
}
```
## License
```
PyTorch Datasets utility repository
Copyright (C) 2020-2023 Abien Fred Agarap
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see .
```
%package help
Summary: Development documents and examples for pt-datasets
Provides: python3-pt-datasets-doc
%description help
# PyTorch Datasets
[![PyPI version](https://badge.fury.io/py/pt-datasets.svg)](https://badge.fury.io/py/pt-datasets)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-3916/)
![](assets/term.png)
## Overview
This repository is meant for easier and faster access to commonly used
benchmark datasets. Using this repository, one can load the datasets in a
ready-to-use fashion for PyTorch models. Additionally, this can be used to load
the low-dimensional features of the aforementioned datasets, encoded using PCA,
t-SNE, or UMAP.
## Datasets
- MNIST
- Fashion-MNIST
- EMNIST-Balanced
- CIFAR10
- SVHN
- MalImg
- AG News
- IMDB
- Yelp
- 20 Newsgroups
- KMNIST
- Wisconsin Diagnostic Breast Cancer
- [COVID19 binary classification](https://github.com/lindawangg/COVID-Net)
- [COVID19 multi-classification](https://github.com/lindawangg/COVID-Net)
_Note on COVID19 datasets: Training models on this is not intended to produce
models for direct clinical diagnosis. Please do not use the model output for
self-diagnosis, and seek help from your local health authorities._
## Usage
It is recommended to use a virtual environment to isolate the project dependencies.
```shell script
$ virtualenv env --python=python3 # we use python 3
$ pip install pt-datasets # install the package
```
We can then use this package for loading ready-to-use data loaders,
```python
from pt_datasets import load_dataset, create_dataloader
# load the training and test data
train_data, test_data = load_dataset(name="cifar10")
# create a data loader for the training data
train_loader = create_dataloader(
dataset=train_data, batch_size=64, shuffle=True, num_workers=1
)
...
# use the data loader for training
model.fit(train_loader, epochs=10)
```
We can also encode the dataset features to a lower-dimensional space,
```python
import seaborn as sns
import matplotlib.pyplot as plt
from pt_datasets import load_dataset, encode_features
# load the training and test data
train_data, test_data = load_dataset(name="fashion_mnist")
# get the numpy array of the features
# the encoders can only accept np.ndarray types
train_features = train_data.data.numpy()
# flatten the tensors
train_features = train_features.reshape(
train_features.shape[0], -1
)
# get the labels
train_labels = train_data.targets.numpy()
# get the class names
classes = train_data.classes
# encode training features using t-SNE
encoded_train_features = encode_features(
features=train_features,
seed=1024,
encoder="tsne"
)
# use seaborn styling
sns.set_style("darkgrid")
# scatter plot each feature w.r.t class
for index in range(len(classes)):
plt.scatter(
encoded_train_features[train_labels == index, 0],
encoded_train_features[train_labels == index, 1],
label=classes[index],
edgecolors="black"
)
plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5)
plt.show()
```
![](assets/tsne_fashion_mnist.png)
## Citation
When using the Malware Image classification dataset, kindly use the following
citations,
- BibTex
```
@article{agarap2017towards,
title={Towards building an intelligent anti-malware system: a deep learning approach using support vector machine (SVM) for malware classification},
author={Agarap, Abien Fred},
journal={arXiv preprint arXiv:1801.00318},
year={2017}
}
```
- MLA
```
Agarap, Abien Fred. "Towards building an intelligent anti-malware system: a
deep learning approach using support vector machine (svm) for malware
classification." arXiv preprint arXiv:1801.00318 (2017).
```
If you use this library, kindly cite it as,
```
@misc{agarap2020pytorch,
author = "Abien Fred Agarap",
title = "{PyTorch} datasets",
howpublished = "\url{https://gitlab.com/afagarap/pt-datasets}",
note = "Accessed: 20xx-xx-xx"
}
```
## License
```
PyTorch Datasets utility repository
Copyright (C) 2020-2023 Abien Fred Agarap
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see .
```
%prep
%autosetup -n pt_datasets-0.19.24
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-pt-datasets -f filelist.lst
%dir %{python3_sitelib}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Thu Jun 08 2023 Python_Bot - 0.19.24-1
- Package Spec generated