summaryrefslogtreecommitdiff
path: root/python-split-folders.spec
diff options
context:
space:
mode:
Diffstat (limited to 'python-split-folders.spec')
-rw-r--r--python-split-folders.spec484
1 files changed, 484 insertions, 0 deletions
diff --git a/python-split-folders.spec b/python-split-folders.spec
new file mode 100644
index 0000000..3f9c931
--- /dev/null
+++ b/python-split-folders.spec
@@ -0,0 +1,484 @@
+%global _empty_manifest_terminate_build 0
+Name: python-split-folders
+Version: 0.5.1
+Release: 1
+Summary: Split folders with files (e.g. images) into training, validation and test (dataset) folders.
+License: MIT
+URL: https://github.com/jfilter/split-folders
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/a7/4c/32d2d49b82ea5baf0ff1a55de88c7fb8a0bf2aab02763c8501b2a51bf55f/split_folders-0.5.1.tar.gz
+BuildArch: noarch
+
+Requires: python3-tqdm
+
+%description
+# `split-folders` [![Build Status](https://img.shields.io/github/workflow/status/jfilter/split-folders/Test)](https://github.com/jfilter/split-folders/actions/workflows/test.yml) [![PyPI](https://img.shields.io/pypi/v/split-folders.svg)](https://pypi.org/project/split-folders/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/split-folders.svg)](https://pypi.org/project/split-folders/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/split-folders)](https://pypistats.org/packages/split-folders)
+
+Split folders with files (e.g. images) into **train**, **validation** and **test** (dataset) folders.
+
+The input folder should have the following format:
+
+```
+input/
+ class1/
+ img1.jpg
+ img2.jpg
+ ...
+ class2/
+ imgWhatever.jpg
+ ...
+ ...
+```
+
+In order to give you this:
+
+```
+output/
+ train/
+ class1/
+ img1.jpg
+ ...
+ class2/
+ imga.jpg
+ ...
+ val/
+ class1/
+ img2.jpg
+ ...
+ class2/
+ imgb.jpg
+ ...
+ test/
+ class1/
+ img3.jpg
+ ...
+ class2/
+ imgc.jpg
+ ...
+```
+
+This should get you started to do some serious deep learning on your data. [Read here](https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set) why it's a good idea to split your data intro three different sets.
+
+- Split files into a training set and a validation set (and optionally a test set).
+- Works on any file types.
+- The files get shuffled.
+- A [seed](https://docs.python.org/3/library/random.html#random.seed) makes splits reproducible.
+- Allows randomized [oversampling](https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis) for imbalanced datasets.
+- Optionally group files by prefix.
+- (Should) work on all operating systems.
+
+## Install
+
+This package is Python only and there are no external dependencies.
+
+```bash
+pip install split-folders
+```
+
+Optionally, you may install [tqdm](https://github.com/tqdm/tqdm) to get get a progress bar when moving files.
+
+```bash
+pip install split-folders[full]
+```
+
+## Usage
+
+You can use `split-folders` as Python module or as a Command Line Interface (CLI).
+
+If your datasets is balanced (each class has the same number of samples), choose `ratio` otherwise `fixed`.
+NB: oversampling is turned off by default.
+Oversampling is only applied to the _train_ folder since having duplicates in _val_ or _test_ would be considered cheating.
+
+### Module
+
+```python
+import splitfolders
+
+# Split with a ratio.
+# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
+splitfolders.ratio("input_folder", output="output",
+ seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False) # default values
+
+# Split val/test with a fixed number of items, e.g. `(100, 100)`, for each set.
+# To only split into training and validation set, use a single number to `fixed`, i.e., `10`.
+# Set 3 values, e.g. `(300, 100, 100)`, to limit the number of training values.
+splitfolders.fixed("input_folder", output="output",
+ seed=1337, fixed=(100, 100), oversample=False, group_prefix=None, move=False) # default values
+```
+
+Occasionally, you may have things that comprise more than a single file (e.g. picture (.png) + annotation (.txt)).
+`splitfolders` lets you split files into equally-sized groups based on their prefix.
+Set `group_prefix` to the length of the group (e.g. `2`).
+But now _all_ files should be part of groups.
+
+Set `move=True` if you want to move the files instead of copying.
+
+### CLI
+
+```
+Usage:
+ splitfolders [--output] [--ratio] [--fixed] [--seed] [--oversample] [--group_prefix] [--move] folder_with_images
+Options:
+ --output path to the output folder. defaults to `output`. Get created if non-existent.
+ --ratio the ratio to split. e.g. for train/val/test `.8 .1 .1 --` or for train/val `.8 .2 --`.
+ --fixed set the absolute number of items per validation/test set. The remaining items constitute
+ the training set. e.g. for train/val/test `100 100` or for train/val `100`.
+ Set 3 values, e.g. `300 100 100`, to limit the number of training values.
+ --seed set seed value for shuffling the items. defaults to 1337.
+ --oversample enable oversampling of imbalanced datasets, works only with --fixed.
+ --group_prefix split files into equally-sized groups based on their prefix
+ --move move the files instead of copying
+Example:
+ splitfolders --ratio .8 .1 .1 -- folder_with_images
+```
+
+Because of some [Python quirks](https://github.com/jfilter/split-folders/issues/19) you have to prepend ` --` afer using `--ratio`.
+
+Instead of the command `splitfolders` you can also use `split_folders` or `split-folders`.
+
+## Development
+
+Install and use [poetry](https://python-poetry.org/).
+
+## Contributing
+
+If you have a **question**, found a **bug** or want to propose a new **feature**, have a look at the [issues page](https://github.com/jfilter/split-folders/issues).
+
+**Pull requests** are especially welcomed when they fix bugs or improve the code quality.
+
+## License
+
+MIT
+
+
+%package -n python3-split-folders
+Summary: Split folders with files (e.g. images) into training, validation and test (dataset) folders.
+Provides: python-split-folders
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-split-folders
+# `split-folders` [![Build Status](https://img.shields.io/github/workflow/status/jfilter/split-folders/Test)](https://github.com/jfilter/split-folders/actions/workflows/test.yml) [![PyPI](https://img.shields.io/pypi/v/split-folders.svg)](https://pypi.org/project/split-folders/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/split-folders.svg)](https://pypi.org/project/split-folders/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/split-folders)](https://pypistats.org/packages/split-folders)
+
+Split folders with files (e.g. images) into **train**, **validation** and **test** (dataset) folders.
+
+The input folder should have the following format:
+
+```
+input/
+ class1/
+ img1.jpg
+ img2.jpg
+ ...
+ class2/
+ imgWhatever.jpg
+ ...
+ ...
+```
+
+In order to give you this:
+
+```
+output/
+ train/
+ class1/
+ img1.jpg
+ ...
+ class2/
+ imga.jpg
+ ...
+ val/
+ class1/
+ img2.jpg
+ ...
+ class2/
+ imgb.jpg
+ ...
+ test/
+ class1/
+ img3.jpg
+ ...
+ class2/
+ imgc.jpg
+ ...
+```
+
+This should get you started to do some serious deep learning on your data. [Read here](https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set) why it's a good idea to split your data intro three different sets.
+
+- Split files into a training set and a validation set (and optionally a test set).
+- Works on any file types.
+- The files get shuffled.
+- A [seed](https://docs.python.org/3/library/random.html#random.seed) makes splits reproducible.
+- Allows randomized [oversampling](https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis) for imbalanced datasets.
+- Optionally group files by prefix.
+- (Should) work on all operating systems.
+
+## Install
+
+This package is Python only and there are no external dependencies.
+
+```bash
+pip install split-folders
+```
+
+Optionally, you may install [tqdm](https://github.com/tqdm/tqdm) to get get a progress bar when moving files.
+
+```bash
+pip install split-folders[full]
+```
+
+## Usage
+
+You can use `split-folders` as Python module or as a Command Line Interface (CLI).
+
+If your datasets is balanced (each class has the same number of samples), choose `ratio` otherwise `fixed`.
+NB: oversampling is turned off by default.
+Oversampling is only applied to the _train_ folder since having duplicates in _val_ or _test_ would be considered cheating.
+
+### Module
+
+```python
+import splitfolders
+
+# Split with a ratio.
+# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
+splitfolders.ratio("input_folder", output="output",
+ seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False) # default values
+
+# Split val/test with a fixed number of items, e.g. `(100, 100)`, for each set.
+# To only split into training and validation set, use a single number to `fixed`, i.e., `10`.
+# Set 3 values, e.g. `(300, 100, 100)`, to limit the number of training values.
+splitfolders.fixed("input_folder", output="output",
+ seed=1337, fixed=(100, 100), oversample=False, group_prefix=None, move=False) # default values
+```
+
+Occasionally, you may have things that comprise more than a single file (e.g. picture (.png) + annotation (.txt)).
+`splitfolders` lets you split files into equally-sized groups based on their prefix.
+Set `group_prefix` to the length of the group (e.g. `2`).
+But now _all_ files should be part of groups.
+
+Set `move=True` if you want to move the files instead of copying.
+
+### CLI
+
+```
+Usage:
+ splitfolders [--output] [--ratio] [--fixed] [--seed] [--oversample] [--group_prefix] [--move] folder_with_images
+Options:
+ --output path to the output folder. defaults to `output`. Get created if non-existent.
+ --ratio the ratio to split. e.g. for train/val/test `.8 .1 .1 --` or for train/val `.8 .2 --`.
+ --fixed set the absolute number of items per validation/test set. The remaining items constitute
+ the training set. e.g. for train/val/test `100 100` or for train/val `100`.
+ Set 3 values, e.g. `300 100 100`, to limit the number of training values.
+ --seed set seed value for shuffling the items. defaults to 1337.
+ --oversample enable oversampling of imbalanced datasets, works only with --fixed.
+ --group_prefix split files into equally-sized groups based on their prefix
+ --move move the files instead of copying
+Example:
+ splitfolders --ratio .8 .1 .1 -- folder_with_images
+```
+
+Because of some [Python quirks](https://github.com/jfilter/split-folders/issues/19) you have to prepend ` --` afer using `--ratio`.
+
+Instead of the command `splitfolders` you can also use `split_folders` or `split-folders`.
+
+## Development
+
+Install and use [poetry](https://python-poetry.org/).
+
+## Contributing
+
+If you have a **question**, found a **bug** or want to propose a new **feature**, have a look at the [issues page](https://github.com/jfilter/split-folders/issues).
+
+**Pull requests** are especially welcomed when they fix bugs or improve the code quality.
+
+## License
+
+MIT
+
+
+%package help
+Summary: Development documents and examples for split-folders
+Provides: python3-split-folders-doc
+%description help
+# `split-folders` [![Build Status](https://img.shields.io/github/workflow/status/jfilter/split-folders/Test)](https://github.com/jfilter/split-folders/actions/workflows/test.yml) [![PyPI](https://img.shields.io/pypi/v/split-folders.svg)](https://pypi.org/project/split-folders/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/split-folders.svg)](https://pypi.org/project/split-folders/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/split-folders)](https://pypistats.org/packages/split-folders)
+
+Split folders with files (e.g. images) into **train**, **validation** and **test** (dataset) folders.
+
+The input folder should have the following format:
+
+```
+input/
+ class1/
+ img1.jpg
+ img2.jpg
+ ...
+ class2/
+ imgWhatever.jpg
+ ...
+ ...
+```
+
+In order to give you this:
+
+```
+output/
+ train/
+ class1/
+ img1.jpg
+ ...
+ class2/
+ imga.jpg
+ ...
+ val/
+ class1/
+ img2.jpg
+ ...
+ class2/
+ imgb.jpg
+ ...
+ test/
+ class1/
+ img3.jpg
+ ...
+ class2/
+ imgc.jpg
+ ...
+```
+
+This should get you started to do some serious deep learning on your data. [Read here](https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set) why it's a good idea to split your data intro three different sets.
+
+- Split files into a training set and a validation set (and optionally a test set).
+- Works on any file types.
+- The files get shuffled.
+- A [seed](https://docs.python.org/3/library/random.html#random.seed) makes splits reproducible.
+- Allows randomized [oversampling](https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis) for imbalanced datasets.
+- Optionally group files by prefix.
+- (Should) work on all operating systems.
+
+## Install
+
+This package is Python only and there are no external dependencies.
+
+```bash
+pip install split-folders
+```
+
+Optionally, you may install [tqdm](https://github.com/tqdm/tqdm) to get get a progress bar when moving files.
+
+```bash
+pip install split-folders[full]
+```
+
+## Usage
+
+You can use `split-folders` as Python module or as a Command Line Interface (CLI).
+
+If your datasets is balanced (each class has the same number of samples), choose `ratio` otherwise `fixed`.
+NB: oversampling is turned off by default.
+Oversampling is only applied to the _train_ folder since having duplicates in _val_ or _test_ would be considered cheating.
+
+### Module
+
+```python
+import splitfolders
+
+# Split with a ratio.
+# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
+splitfolders.ratio("input_folder", output="output",
+ seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False) # default values
+
+# Split val/test with a fixed number of items, e.g. `(100, 100)`, for each set.
+# To only split into training and validation set, use a single number to `fixed`, i.e., `10`.
+# Set 3 values, e.g. `(300, 100, 100)`, to limit the number of training values.
+splitfolders.fixed("input_folder", output="output",
+ seed=1337, fixed=(100, 100), oversample=False, group_prefix=None, move=False) # default values
+```
+
+Occasionally, you may have things that comprise more than a single file (e.g. picture (.png) + annotation (.txt)).
+`splitfolders` lets you split files into equally-sized groups based on their prefix.
+Set `group_prefix` to the length of the group (e.g. `2`).
+But now _all_ files should be part of groups.
+
+Set `move=True` if you want to move the files instead of copying.
+
+### CLI
+
+```
+Usage:
+ splitfolders [--output] [--ratio] [--fixed] [--seed] [--oversample] [--group_prefix] [--move] folder_with_images
+Options:
+ --output path to the output folder. defaults to `output`. Get created if non-existent.
+ --ratio the ratio to split. e.g. for train/val/test `.8 .1 .1 --` or for train/val `.8 .2 --`.
+ --fixed set the absolute number of items per validation/test set. The remaining items constitute
+ the training set. e.g. for train/val/test `100 100` or for train/val `100`.
+ Set 3 values, e.g. `300 100 100`, to limit the number of training values.
+ --seed set seed value for shuffling the items. defaults to 1337.
+ --oversample enable oversampling of imbalanced datasets, works only with --fixed.
+ --group_prefix split files into equally-sized groups based on their prefix
+ --move move the files instead of copying
+Example:
+ splitfolders --ratio .8 .1 .1 -- folder_with_images
+```
+
+Because of some [Python quirks](https://github.com/jfilter/split-folders/issues/19) you have to prepend ` --` afer using `--ratio`.
+
+Instead of the command `splitfolders` you can also use `split_folders` or `split-folders`.
+
+## Development
+
+Install and use [poetry](https://python-poetry.org/).
+
+## Contributing
+
+If you have a **question**, found a **bug** or want to propose a new **feature**, have a look at the [issues page](https://github.com/jfilter/split-folders/issues).
+
+**Pull requests** are especially welcomed when they fix bugs or improve the code quality.
+
+## License
+
+MIT
+
+
+%prep
+%autosetup -n split-folders-0.5.1
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-split-folders -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed Apr 12 2023 Python_Bot <Python_Bot@openeuler.org> - 0.5.1-1
+- Package Spec generated