From a4db565020a76f8dad25891372b382f4dd67892f Mon Sep 17 00:00:00 2001 From: CoprDistGit Date: Wed, 10 May 2023 07:45:17 +0000 Subject: automatic import of python-targetran --- .gitignore | 1 + python-targetran.spec | 1166 +++++++++++++++++++++++++++++++++++++++++++++++++ sources | 1 + 3 files changed, 1168 insertions(+) create mode 100644 python-targetran.spec create mode 100644 sources diff --git a/.gitignore b/.gitignore index e69de29..0697af8 100644 --- a/.gitignore +++ b/.gitignore @@ -0,0 +1 @@ +/targetran-0.12.0.tar.gz diff --git a/python-targetran.spec b/python-targetran.spec new file mode 100644 index 0000000..203c13f --- /dev/null +++ b/python-targetran.spec @@ -0,0 +1,1166 @@ +%global _empty_manifest_terminate_build 0 +Name: python-targetran +Version: 0.12.0 +Release: 1 +Summary: Target transformation for data augmentation in objection detection +License: MIT License +URL: https://github.com/bhky/targetran +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/62/8c/14caab4eb936e8a1901a801ca95cd42c8858e37cf925ac2e65411bfe56c4/targetran-0.12.0.tar.gz +BuildArch: noarch + +Requires: python3-opencv-python +Requires: python3-numpy + +%description +![logo](logo/targetran_logo.png) + +[![ci](https://github.com/bhky/targetran/actions/workflows/ci.yml/badge.svg)](https://github.com/bhky/targetran/actions) +[![License MIT 1.0](https://img.shields.io/badge/license-MIT%201.0-blue.svg)](LICENSE) + +# Motivation + +[Data augmentation](https://en.wikipedia.org/wiki/Data_augmentation) +is a technique commonly used for training machine learning models in the +computer vision field, where one can increase the amount of image data by +creating transformed copies of the original images. + +In the object detection sub-field, the transformation has to be done also +to the target rectangular bounding-boxes. However, such functionality is not +readily available in frameworks such as TensorFlow and PyTorch. + +While there are other powerful augmentation tools available, many of those +do not work well with the +[TPU](https://cloud.google.com/tpu) +when accessing from [Google Colab](https://colab.research.google.com/) or +[Kaggle Notebooks](https://www.kaggle.com/code), +which are popular options nowadays for a lot of people who do not have their +own hardware resources. + +Here comes Targetran to fill the gap. + +# What is Targetran? + +- A light-weight data augmentation library to assist object detection or + image classification model training. +- Has simple Python API to transform both the images and the target rectangular + bounding-boxes. +- Use dataset-idiomatic approach for TensorFlow and PyTorch. +- Can be used with the TPU for acceleration (TensorFlow Dataset only). + +![example](docs/example.png) + +(Figure produced by the example code [here](examples/local/run_tf_dataset_local_example.py).) + +# Table of contents + +- [Installation](#installation) +- [Usage](#usage) + - [Notations](#notations) + - [Data format](#data-format) + - [Design principles](#design-principles) + - [TensorFlow Dataset](#tensorflow-dataset) + - [PyTorch Dataset](#pytorch-dataset) + - [Image classification](#image-classification) + - [Examples](#examples) +- [API](#api) + +# Installation + +Tested for Python 3.8, 3.9, and 3.10. + +The best way to install Targetran with its dependencies is from PyPI: +```shell +python3 -m pip install --upgrade targetran +``` +Alternatively, to obtain the latest version from this repository: +```shell +git clone https://github.com/bhky/targetran.git +cd targetran +python3 -m pip install . +``` + +# Usage + +## Notations + +- `NDFloatArray`: NumPy float array type, which is an alias to `np.typing.NDArray[np.float_]`. + The values are converted to `np.float32` internally. +- `tf.Tensor`: General TensorFlow Tensor type. The values are converted to `tf.float32` internally. + +## Data format + +For object detection model training, which is the primary usage here, the following data are needed. +- `image_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(height, width, num_channels)`): + - images in channel-last format; + - image sizes can be different. +- `bboxes_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(num_bboxes_per_image, 4)`): + - each `bboxes` array/tensor provides the bounding-boxes associated with an image; + - each single bounding-box is given as `[top_left_x, top_left_y, bbox_width, bbox_height]`; + - empty array/tensor means no bounding-boxes (and labels) for that image. +- `labels_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(num_bboxes_per_image,)`): + - each `labels` array/tensor provides the bounding-box labels associated with an image; + - empty array/tensor means no labels (and bounding-boxes) for that image. + +Some dummy data are created below for illustration. Please note the required format. +```python +import numpy as np + +# Each image could have different sizes, but they must follow the channel-last format, +# i.e., (height, width, num_channels). +image_seq = [np.random.rand(480, 512, 3) for _ in range(3)] + +# The bounding-boxes (bboxes) are given as a sequence of NumPy arrays (or TF tensors). +# Each array represents the bboxes for one corresponding image. +# +# Each bbox is given as [top_left_x, top_left_y, bbox_width, bbox_height]. +# +# In case an image has no bboxes, an empty array should be provided. +bboxes_seq = [ + np.array([ # Image with 2 bboxes. + [214, 223, 10, 11], + [345, 230, 21, 9], + ]), + np.array([]), # Empty array for image with no bboxes. + np.array([ # Image with 3 bboxes. + [104, 151, 22, 10], + [99, 132, 20, 15], + [340, 220, 31, 12], + ]), +] + +# Labels for the bboxes are also given as a sequence of NumPy arrays (or TF tensors). +# The number of bboxes and labels should match. An empty array indicates no bboxes/labels. +labels_seq = [ + np.array([0, 1]), # 2 labels. + np.array([]), # No labels. + np.array([2, 3, 0]), # 3 labels. +] + +# During operation, all the data values will be converted to float32. +``` + +## Design principles + +- Bounding-boxes will always be rectangular with sides parallel to the image frame. +- After transformation, each resulting bounding-box is determined by the smallest + rectangle (with sides parallel to the image frame) enclosing the original transformed bounding-box. +- After transformation, resulting bounding-boxes with their centroids outside the + image frame will be removed, together with the corresponding labels. + +## TensorFlow Dataset + +```python +import tensorflow as tf + +from targetran.tf import ( + seqs_to_tf_dataset, + TFCombineAffine, + TFRandomFlipLeftRight, + TFRandomFlipUpDown, + TFRandomRotate, + TFRandomShear, + TFRandomTranslate, + TFRandomCrop, + TFResize, +) + +# Convert the above data sequences into a TensorFlow Dataset. +# Users can have their own way to create the Dataset, as long as for each iteration +# it returns a tuple of tensors for a single sample: (image, bboxes, labels). +ds = seqs_to_tf_dataset(image_seq, bboxes_seq, labels_seq) + +# The affine transformations can be combined into one operation for better performance. +# Note that cropping and resizing are not affine and cannot be combined. +# Option (1): +affine_transform = TFCombineAffine( + [TFRandomRotate(probability=0.8), # Probability to include each affine transformation step + TFRandomShear(probability=0.6), # can be specified, otherwise the default value is used. + TFRandomTranslate(), # Thus, the number of selected steps could vary. + TFRandomFlipLeftRight(), + TFRandomFlipUpDown()], + probability=1.0 # Probability to apply this single combined transformation. +) +# Option (2): +# Alternatively, one can decide the exact number of randomly selected transformations, +# e.g., use only any two of them. This could be a better option because too many +# transformation steps may deform the images too much. +affine_transform = TFCombineAffine( + [TFRandomRotate(), # Individual `probability` has no effect in this approach. + TFRandomShear(), + TFRandomTranslate(), + TFRandomFlipLeftRight(), + TFRandomFlipUpDown()], + num_selected_transforms=2, # Only two steps from the list will be selected. + selected_probabilities=[0.5, 0.0, 0.3, 0.2, 0.0], # Must sum up to 1.0, if given. + keep_order=True, # If True, the selected steps must be performed in the given order. + probability=1.0 # Probability to apply this single combined transformation. +) +# Please refer to the API manual for more parameter options. + +# Apply transformations. +auto_tune = tf.data.AUTOTUNE +ds = ds \ + .map(TFRandomCrop(probability=0.5), num_parallel_calls=auto_tune) \ + .map(affine_transform, num_parallel_calls=auto_tune) \ + .map(TFResize((256, 256)), num_parallel_calls=auto_tune) + +# In the Dataset `map` call, the parameter `num_parallel_calls` can be set to, +# e.g., tf.data.AUTOTUNE, for better performance. See docs for TensorFlow Dataset. +``` +```python +# Batching: +# Since the array/tensor shape of each sample could be different, conventional +# way of batching may not work. Users will have to consider their own use cases. +# One possibly useful way is the padded-batch. +ds = ds.padded_batch(batch_size=2, padding_values=-1.0) +``` + +## PyTorch Dataset + +```python +from typing import Optional, Sequence, Tuple + +import numpy.typing +from torch.utils.data import Dataset + +from targetran.np import ( + CombineAffine, + RandomFlipLeftRight, + RandomFlipUpDown, + RandomRotate, + RandomShear, + RandomTranslate, + RandomCrop, + Resize, +) +from targetran.utils import Compose + +NDFloatArray = numpy.typing.NDArray[numpy.float_] + + +class PTDataset(Dataset): + """ + A very simple PyTorch Dataset. + As per common practice, transforms are done on NumPy arrays. + """ + + def __init__( + self, + image_seq: Sequence[NDFloatArray], + bboxes_seq: Sequence[NDFloatArray], + labels_seq: Sequence[NDFloatArray], + transforms: Optional[Compose] + ) -> None: + self.image_seq = image_seq + self.bboxes_seq = bboxes_seq + self.labels_seq = labels_seq + self.transforms = transforms + + def __len__(self) -> int: + return len(self.image_seq) + + def __getitem__( + self, + idx: int + ) -> Tuple[NDFloatArray, NDFloatArray, NDFloatArray]: + if self.transforms: + return self.transforms( + self.image_seq[idx], + self.bboxes_seq[idx], + self.labels_seq[idx] + ) + return ( + self.image_seq[idx], + self.bboxes_seq[idx], + self.labels_seq[idx] + ) + + +# The affine transformations can be combined into one operation for better performance. +# Note that cropping and resizing are not affine and cannot be combined. +# Option (1): +affine_transform = CombineAffine( + [RandomRotate(probability=0.8), # Probability to include each affine transformation step + RandomShear(probability=0.6), # can be specified, otherwise the default value is used. + RandomTranslate(), # Thus, the number of selected steps could vary. + RandomFlipLeftRight(), + RandomFlipUpDown()], + probability=1.0 # Probability to apply this single combined transformation. +) +# Option (2): +# Alternatively, one can decide the exact number of randomly selected transformations, +# e.g., use only any two of them. This could be a better option because too many +# transformation steps may deform the images too much. +affine_transform = CombineAffine( + [RandomRotate(), # Individual `probability` has no effect in this approach. + RandomShear(), + RandomTranslate(), + RandomFlipLeftRight(), + RandomFlipUpDown()], + num_selected_transforms=2, # Only two steps from the list will be selected. + selected_probabilities=[0.5, 0.0, 0.3, 0.2, 0.0], # Must sum up to 1.0, if given. + keep_order=True, # If True, the selected steps must be performed in the given order. + probability=1.0 # Probability to apply this single combined transformation. +) +# Please refer to the API manual for more parameter options. + +# The `Compose` here is similar to that from the torchvision package, except +# that here it also supports callables with multiple inputs and outputs needed +# for objection detection tasks, i.e., (image, bboxes, labels). +transforms = Compose([ + RandomCrop(probability=0.5), + affine_transform, + Resize((256, 256)), +]) + +# Convert the above data sequences into a PyTorch Dataset. +# Users can have their own way to create the Dataset, as long as for each iteration +# it returns a tuple of arrays for a single sample: (image, bboxes, labels). +ds = PTDataset(image_seq, bboxes_seq, labels_seq, transforms=transforms) +``` +```python +# Batching: +# In PyTorch, it is common to use a Dataset with a DataLoader, which provides +# batching functionality. However, since the array/tensor shape of each sample +# could be different, the default batching may not work. Targetran provides +# a `collate_fn` that helps producing batches of (image_seq, bboxes_seq, labels_seq). +from torch.utils.data import DataLoader +from targetran.utils import collate_fn + +data_loader = DataLoader(ds, batch_size=2, collate_fn=collate_fn) +``` + +## Image classification + +While the tools here are primarily designed for object detection tasks, they can +also be used for image classification in which only the images are to be transformed, +e.g., given a dataset that returns `(image, label)` samples, or even only `image` samples. +The `image_only` function can be used to convert a transformation class for this purpose. + +If the dataset returns a tuple `(image, ...)` in each iteration, only the `image` +will be transformed, other parameters that followed such as `(..., label, weight)` +will be returned untouched. + +If the dataset returns `image` only (not a tuple), then only the transformed `image` will be returned. +```python +from targetran.utils import image_only +``` +```python +# TensorFlow. +ds = ds \ + .map(image_only(TFRandomCrop())) \ + .map(image_only(affine_transform)) \ + .map(image_only(TFResize((256, 256)))) \ + .batch(32) # Conventional batching can be used for classification setup. +``` +```python +# PyTorch. +transforms = Compose([ + image_only(RandomCrop()), + image_only(affine_transform), + image_only(Resize((256, 256))), +]) +ds = PTDataset(..., transforms=transforms) +data_loader = DataLoader(ds, batch_size=32) +``` + +## Examples + +- [Code examples in this repository](examples) +- [Construct a TensorFlow Dataset with Targetran + and object detection data](https://www.kaggle.com/boscoyung/targetran-example-with-tensorflow-dataset) + (Kaggle Notebook) +- [Image classification with TensorFlow and Targetran on TPU](https://www.kaggle.com/boscoyung/targetran-tpu-for-image-classification-example) + (Kaggle Notebook) + +# API + +See [here](docs/API.md) for API details. + + +%package -n python3-targetran +Summary: Target transformation for data augmentation in objection detection +Provides: python-targetran +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-targetran +![logo](logo/targetran_logo.png) + +[![ci](https://github.com/bhky/targetran/actions/workflows/ci.yml/badge.svg)](https://github.com/bhky/targetran/actions) +[![License MIT 1.0](https://img.shields.io/badge/license-MIT%201.0-blue.svg)](LICENSE) + +# Motivation + +[Data augmentation](https://en.wikipedia.org/wiki/Data_augmentation) +is a technique commonly used for training machine learning models in the +computer vision field, where one can increase the amount of image data by +creating transformed copies of the original images. + +In the object detection sub-field, the transformation has to be done also +to the target rectangular bounding-boxes. However, such functionality is not +readily available in frameworks such as TensorFlow and PyTorch. + +While there are other powerful augmentation tools available, many of those +do not work well with the +[TPU](https://cloud.google.com/tpu) +when accessing from [Google Colab](https://colab.research.google.com/) or +[Kaggle Notebooks](https://www.kaggle.com/code), +which are popular options nowadays for a lot of people who do not have their +own hardware resources. + +Here comes Targetran to fill the gap. + +# What is Targetran? + +- A light-weight data augmentation library to assist object detection or + image classification model training. +- Has simple Python API to transform both the images and the target rectangular + bounding-boxes. +- Use dataset-idiomatic approach for TensorFlow and PyTorch. +- Can be used with the TPU for acceleration (TensorFlow Dataset only). + +![example](docs/example.png) + +(Figure produced by the example code [here](examples/local/run_tf_dataset_local_example.py).) + +# Table of contents + +- [Installation](#installation) +- [Usage](#usage) + - [Notations](#notations) + - [Data format](#data-format) + - [Design principles](#design-principles) + - [TensorFlow Dataset](#tensorflow-dataset) + - [PyTorch Dataset](#pytorch-dataset) + - [Image classification](#image-classification) + - [Examples](#examples) +- [API](#api) + +# Installation + +Tested for Python 3.8, 3.9, and 3.10. + +The best way to install Targetran with its dependencies is from PyPI: +```shell +python3 -m pip install --upgrade targetran +``` +Alternatively, to obtain the latest version from this repository: +```shell +git clone https://github.com/bhky/targetran.git +cd targetran +python3 -m pip install . +``` + +# Usage + +## Notations + +- `NDFloatArray`: NumPy float array type, which is an alias to `np.typing.NDArray[np.float_]`. + The values are converted to `np.float32` internally. +- `tf.Tensor`: General TensorFlow Tensor type. The values are converted to `tf.float32` internally. + +## Data format + +For object detection model training, which is the primary usage here, the following data are needed. +- `image_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(height, width, num_channels)`): + - images in channel-last format; + - image sizes can be different. +- `bboxes_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(num_bboxes_per_image, 4)`): + - each `bboxes` array/tensor provides the bounding-boxes associated with an image; + - each single bounding-box is given as `[top_left_x, top_left_y, bbox_width, bbox_height]`; + - empty array/tensor means no bounding-boxes (and labels) for that image. +- `labels_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(num_bboxes_per_image,)`): + - each `labels` array/tensor provides the bounding-box labels associated with an image; + - empty array/tensor means no labels (and bounding-boxes) for that image. + +Some dummy data are created below for illustration. Please note the required format. +```python +import numpy as np + +# Each image could have different sizes, but they must follow the channel-last format, +# i.e., (height, width, num_channels). +image_seq = [np.random.rand(480, 512, 3) for _ in range(3)] + +# The bounding-boxes (bboxes) are given as a sequence of NumPy arrays (or TF tensors). +# Each array represents the bboxes for one corresponding image. +# +# Each bbox is given as [top_left_x, top_left_y, bbox_width, bbox_height]. +# +# In case an image has no bboxes, an empty array should be provided. +bboxes_seq = [ + np.array([ # Image with 2 bboxes. + [214, 223, 10, 11], + [345, 230, 21, 9], + ]), + np.array([]), # Empty array for image with no bboxes. + np.array([ # Image with 3 bboxes. + [104, 151, 22, 10], + [99, 132, 20, 15], + [340, 220, 31, 12], + ]), +] + +# Labels for the bboxes are also given as a sequence of NumPy arrays (or TF tensors). +# The number of bboxes and labels should match. An empty array indicates no bboxes/labels. +labels_seq = [ + np.array([0, 1]), # 2 labels. + np.array([]), # No labels. + np.array([2, 3, 0]), # 3 labels. +] + +# During operation, all the data values will be converted to float32. +``` + +## Design principles + +- Bounding-boxes will always be rectangular with sides parallel to the image frame. +- After transformation, each resulting bounding-box is determined by the smallest + rectangle (with sides parallel to the image frame) enclosing the original transformed bounding-box. +- After transformation, resulting bounding-boxes with their centroids outside the + image frame will be removed, together with the corresponding labels. + +## TensorFlow Dataset + +```python +import tensorflow as tf + +from targetran.tf import ( + seqs_to_tf_dataset, + TFCombineAffine, + TFRandomFlipLeftRight, + TFRandomFlipUpDown, + TFRandomRotate, + TFRandomShear, + TFRandomTranslate, + TFRandomCrop, + TFResize, +) + +# Convert the above data sequences into a TensorFlow Dataset. +# Users can have their own way to create the Dataset, as long as for each iteration +# it returns a tuple of tensors for a single sample: (image, bboxes, labels). +ds = seqs_to_tf_dataset(image_seq, bboxes_seq, labels_seq) + +# The affine transformations can be combined into one operation for better performance. +# Note that cropping and resizing are not affine and cannot be combined. +# Option (1): +affine_transform = TFCombineAffine( + [TFRandomRotate(probability=0.8), # Probability to include each affine transformation step + TFRandomShear(probability=0.6), # can be specified, otherwise the default value is used. + TFRandomTranslate(), # Thus, the number of selected steps could vary. + TFRandomFlipLeftRight(), + TFRandomFlipUpDown()], + probability=1.0 # Probability to apply this single combined transformation. +) +# Option (2): +# Alternatively, one can decide the exact number of randomly selected transformations, +# e.g., use only any two of them. This could be a better option because too many +# transformation steps may deform the images too much. +affine_transform = TFCombineAffine( + [TFRandomRotate(), # Individual `probability` has no effect in this approach. + TFRandomShear(), + TFRandomTranslate(), + TFRandomFlipLeftRight(), + TFRandomFlipUpDown()], + num_selected_transforms=2, # Only two steps from the list will be selected. + selected_probabilities=[0.5, 0.0, 0.3, 0.2, 0.0], # Must sum up to 1.0, if given. + keep_order=True, # If True, the selected steps must be performed in the given order. + probability=1.0 # Probability to apply this single combined transformation. +) +# Please refer to the API manual for more parameter options. + +# Apply transformations. +auto_tune = tf.data.AUTOTUNE +ds = ds \ + .map(TFRandomCrop(probability=0.5), num_parallel_calls=auto_tune) \ + .map(affine_transform, num_parallel_calls=auto_tune) \ + .map(TFResize((256, 256)), num_parallel_calls=auto_tune) + +# In the Dataset `map` call, the parameter `num_parallel_calls` can be set to, +# e.g., tf.data.AUTOTUNE, for better performance. See docs for TensorFlow Dataset. +``` +```python +# Batching: +# Since the array/tensor shape of each sample could be different, conventional +# way of batching may not work. Users will have to consider their own use cases. +# One possibly useful way is the padded-batch. +ds = ds.padded_batch(batch_size=2, padding_values=-1.0) +``` + +## PyTorch Dataset + +```python +from typing import Optional, Sequence, Tuple + +import numpy.typing +from torch.utils.data import Dataset + +from targetran.np import ( + CombineAffine, + RandomFlipLeftRight, + RandomFlipUpDown, + RandomRotate, + RandomShear, + RandomTranslate, + RandomCrop, + Resize, +) +from targetran.utils import Compose + +NDFloatArray = numpy.typing.NDArray[numpy.float_] + + +class PTDataset(Dataset): + """ + A very simple PyTorch Dataset. + As per common practice, transforms are done on NumPy arrays. + """ + + def __init__( + self, + image_seq: Sequence[NDFloatArray], + bboxes_seq: Sequence[NDFloatArray], + labels_seq: Sequence[NDFloatArray], + transforms: Optional[Compose] + ) -> None: + self.image_seq = image_seq + self.bboxes_seq = bboxes_seq + self.labels_seq = labels_seq + self.transforms = transforms + + def __len__(self) -> int: + return len(self.image_seq) + + def __getitem__( + self, + idx: int + ) -> Tuple[NDFloatArray, NDFloatArray, NDFloatArray]: + if self.transforms: + return self.transforms( + self.image_seq[idx], + self.bboxes_seq[idx], + self.labels_seq[idx] + ) + return ( + self.image_seq[idx], + self.bboxes_seq[idx], + self.labels_seq[idx] + ) + + +# The affine transformations can be combined into one operation for better performance. +# Note that cropping and resizing are not affine and cannot be combined. +# Option (1): +affine_transform = CombineAffine( + [RandomRotate(probability=0.8), # Probability to include each affine transformation step + RandomShear(probability=0.6), # can be specified, otherwise the default value is used. + RandomTranslate(), # Thus, the number of selected steps could vary. + RandomFlipLeftRight(), + RandomFlipUpDown()], + probability=1.0 # Probability to apply this single combined transformation. +) +# Option (2): +# Alternatively, one can decide the exact number of randomly selected transformations, +# e.g., use only any two of them. This could be a better option because too many +# transformation steps may deform the images too much. +affine_transform = CombineAffine( + [RandomRotate(), # Individual `probability` has no effect in this approach. + RandomShear(), + RandomTranslate(), + RandomFlipLeftRight(), + RandomFlipUpDown()], + num_selected_transforms=2, # Only two steps from the list will be selected. + selected_probabilities=[0.5, 0.0, 0.3, 0.2, 0.0], # Must sum up to 1.0, if given. + keep_order=True, # If True, the selected steps must be performed in the given order. + probability=1.0 # Probability to apply this single combined transformation. +) +# Please refer to the API manual for more parameter options. + +# The `Compose` here is similar to that from the torchvision package, except +# that here it also supports callables with multiple inputs and outputs needed +# for objection detection tasks, i.e., (image, bboxes, labels). +transforms = Compose([ + RandomCrop(probability=0.5), + affine_transform, + Resize((256, 256)), +]) + +# Convert the above data sequences into a PyTorch Dataset. +# Users can have their own way to create the Dataset, as long as for each iteration +# it returns a tuple of arrays for a single sample: (image, bboxes, labels). +ds = PTDataset(image_seq, bboxes_seq, labels_seq, transforms=transforms) +``` +```python +# Batching: +# In PyTorch, it is common to use a Dataset with a DataLoader, which provides +# batching functionality. However, since the array/tensor shape of each sample +# could be different, the default batching may not work. Targetran provides +# a `collate_fn` that helps producing batches of (image_seq, bboxes_seq, labels_seq). +from torch.utils.data import DataLoader +from targetran.utils import collate_fn + +data_loader = DataLoader(ds, batch_size=2, collate_fn=collate_fn) +``` + +## Image classification + +While the tools here are primarily designed for object detection tasks, they can +also be used for image classification in which only the images are to be transformed, +e.g., given a dataset that returns `(image, label)` samples, or even only `image` samples. +The `image_only` function can be used to convert a transformation class for this purpose. + +If the dataset returns a tuple `(image, ...)` in each iteration, only the `image` +will be transformed, other parameters that followed such as `(..., label, weight)` +will be returned untouched. + +If the dataset returns `image` only (not a tuple), then only the transformed `image` will be returned. +```python +from targetran.utils import image_only +``` +```python +# TensorFlow. +ds = ds \ + .map(image_only(TFRandomCrop())) \ + .map(image_only(affine_transform)) \ + .map(image_only(TFResize((256, 256)))) \ + .batch(32) # Conventional batching can be used for classification setup. +``` +```python +# PyTorch. +transforms = Compose([ + image_only(RandomCrop()), + image_only(affine_transform), + image_only(Resize((256, 256))), +]) +ds = PTDataset(..., transforms=transforms) +data_loader = DataLoader(ds, batch_size=32) +``` + +## Examples + +- [Code examples in this repository](examples) +- [Construct a TensorFlow Dataset with Targetran + and object detection data](https://www.kaggle.com/boscoyung/targetran-example-with-tensorflow-dataset) + (Kaggle Notebook) +- [Image classification with TensorFlow and Targetran on TPU](https://www.kaggle.com/boscoyung/targetran-tpu-for-image-classification-example) + (Kaggle Notebook) + +# API + +See [here](docs/API.md) for API details. + + +%package help +Summary: Development documents and examples for targetran +Provides: python3-targetran-doc +%description help +![logo](logo/targetran_logo.png) + +[![ci](https://github.com/bhky/targetran/actions/workflows/ci.yml/badge.svg)](https://github.com/bhky/targetran/actions) +[![License MIT 1.0](https://img.shields.io/badge/license-MIT%201.0-blue.svg)](LICENSE) + +# Motivation + +[Data augmentation](https://en.wikipedia.org/wiki/Data_augmentation) +is a technique commonly used for training machine learning models in the +computer vision field, where one can increase the amount of image data by +creating transformed copies of the original images. + +In the object detection sub-field, the transformation has to be done also +to the target rectangular bounding-boxes. However, such functionality is not +readily available in frameworks such as TensorFlow and PyTorch. + +While there are other powerful augmentation tools available, many of those +do not work well with the +[TPU](https://cloud.google.com/tpu) +when accessing from [Google Colab](https://colab.research.google.com/) or +[Kaggle Notebooks](https://www.kaggle.com/code), +which are popular options nowadays for a lot of people who do not have their +own hardware resources. + +Here comes Targetran to fill the gap. + +# What is Targetran? + +- A light-weight data augmentation library to assist object detection or + image classification model training. +- Has simple Python API to transform both the images and the target rectangular + bounding-boxes. +- Use dataset-idiomatic approach for TensorFlow and PyTorch. +- Can be used with the TPU for acceleration (TensorFlow Dataset only). + +![example](docs/example.png) + +(Figure produced by the example code [here](examples/local/run_tf_dataset_local_example.py).) + +# Table of contents + +- [Installation](#installation) +- [Usage](#usage) + - [Notations](#notations) + - [Data format](#data-format) + - [Design principles](#design-principles) + - [TensorFlow Dataset](#tensorflow-dataset) + - [PyTorch Dataset](#pytorch-dataset) + - [Image classification](#image-classification) + - [Examples](#examples) +- [API](#api) + +# Installation + +Tested for Python 3.8, 3.9, and 3.10. + +The best way to install Targetran with its dependencies is from PyPI: +```shell +python3 -m pip install --upgrade targetran +``` +Alternatively, to obtain the latest version from this repository: +```shell +git clone https://github.com/bhky/targetran.git +cd targetran +python3 -m pip install . +``` + +# Usage + +## Notations + +- `NDFloatArray`: NumPy float array type, which is an alias to `np.typing.NDArray[np.float_]`. + The values are converted to `np.float32` internally. +- `tf.Tensor`: General TensorFlow Tensor type. The values are converted to `tf.float32` internally. + +## Data format + +For object detection model training, which is the primary usage here, the following data are needed. +- `image_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(height, width, num_channels)`): + - images in channel-last format; + - image sizes can be different. +- `bboxes_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(num_bboxes_per_image, 4)`): + - each `bboxes` array/tensor provides the bounding-boxes associated with an image; + - each single bounding-box is given as `[top_left_x, top_left_y, bbox_width, bbox_height]`; + - empty array/tensor means no bounding-boxes (and labels) for that image. +- `labels_seq` (Sequence of `NDFloatArray` or `tf.Tensor` of shape `(num_bboxes_per_image,)`): + - each `labels` array/tensor provides the bounding-box labels associated with an image; + - empty array/tensor means no labels (and bounding-boxes) for that image. + +Some dummy data are created below for illustration. Please note the required format. +```python +import numpy as np + +# Each image could have different sizes, but they must follow the channel-last format, +# i.e., (height, width, num_channels). +image_seq = [np.random.rand(480, 512, 3) for _ in range(3)] + +# The bounding-boxes (bboxes) are given as a sequence of NumPy arrays (or TF tensors). +# Each array represents the bboxes for one corresponding image. +# +# Each bbox is given as [top_left_x, top_left_y, bbox_width, bbox_height]. +# +# In case an image has no bboxes, an empty array should be provided. +bboxes_seq = [ + np.array([ # Image with 2 bboxes. + [214, 223, 10, 11], + [345, 230, 21, 9], + ]), + np.array([]), # Empty array for image with no bboxes. + np.array([ # Image with 3 bboxes. + [104, 151, 22, 10], + [99, 132, 20, 15], + [340, 220, 31, 12], + ]), +] + +# Labels for the bboxes are also given as a sequence of NumPy arrays (or TF tensors). +# The number of bboxes and labels should match. An empty array indicates no bboxes/labels. +labels_seq = [ + np.array([0, 1]), # 2 labels. + np.array([]), # No labels. + np.array([2, 3, 0]), # 3 labels. +] + +# During operation, all the data values will be converted to float32. +``` + +## Design principles + +- Bounding-boxes will always be rectangular with sides parallel to the image frame. +- After transformation, each resulting bounding-box is determined by the smallest + rectangle (with sides parallel to the image frame) enclosing the original transformed bounding-box. +- After transformation, resulting bounding-boxes with their centroids outside the + image frame will be removed, together with the corresponding labels. + +## TensorFlow Dataset + +```python +import tensorflow as tf + +from targetran.tf import ( + seqs_to_tf_dataset, + TFCombineAffine, + TFRandomFlipLeftRight, + TFRandomFlipUpDown, + TFRandomRotate, + TFRandomShear, + TFRandomTranslate, + TFRandomCrop, + TFResize, +) + +# Convert the above data sequences into a TensorFlow Dataset. +# Users can have their own way to create the Dataset, as long as for each iteration +# it returns a tuple of tensors for a single sample: (image, bboxes, labels). +ds = seqs_to_tf_dataset(image_seq, bboxes_seq, labels_seq) + +# The affine transformations can be combined into one operation for better performance. +# Note that cropping and resizing are not affine and cannot be combined. +# Option (1): +affine_transform = TFCombineAffine( + [TFRandomRotate(probability=0.8), # Probability to include each affine transformation step + TFRandomShear(probability=0.6), # can be specified, otherwise the default value is used. + TFRandomTranslate(), # Thus, the number of selected steps could vary. + TFRandomFlipLeftRight(), + TFRandomFlipUpDown()], + probability=1.0 # Probability to apply this single combined transformation. +) +# Option (2): +# Alternatively, one can decide the exact number of randomly selected transformations, +# e.g., use only any two of them. This could be a better option because too many +# transformation steps may deform the images too much. +affine_transform = TFCombineAffine( + [TFRandomRotate(), # Individual `probability` has no effect in this approach. + TFRandomShear(), + TFRandomTranslate(), + TFRandomFlipLeftRight(), + TFRandomFlipUpDown()], + num_selected_transforms=2, # Only two steps from the list will be selected. + selected_probabilities=[0.5, 0.0, 0.3, 0.2, 0.0], # Must sum up to 1.0, if given. + keep_order=True, # If True, the selected steps must be performed in the given order. + probability=1.0 # Probability to apply this single combined transformation. +) +# Please refer to the API manual for more parameter options. + +# Apply transformations. +auto_tune = tf.data.AUTOTUNE +ds = ds \ + .map(TFRandomCrop(probability=0.5), num_parallel_calls=auto_tune) \ + .map(affine_transform, num_parallel_calls=auto_tune) \ + .map(TFResize((256, 256)), num_parallel_calls=auto_tune) + +# In the Dataset `map` call, the parameter `num_parallel_calls` can be set to, +# e.g., tf.data.AUTOTUNE, for better performance. See docs for TensorFlow Dataset. +``` +```python +# Batching: +# Since the array/tensor shape of each sample could be different, conventional +# way of batching may not work. Users will have to consider their own use cases. +# One possibly useful way is the padded-batch. +ds = ds.padded_batch(batch_size=2, padding_values=-1.0) +``` + +## PyTorch Dataset + +```python +from typing import Optional, Sequence, Tuple + +import numpy.typing +from torch.utils.data import Dataset + +from targetran.np import ( + CombineAffine, + RandomFlipLeftRight, + RandomFlipUpDown, + RandomRotate, + RandomShear, + RandomTranslate, + RandomCrop, + Resize, +) +from targetran.utils import Compose + +NDFloatArray = numpy.typing.NDArray[numpy.float_] + + +class PTDataset(Dataset): + """ + A very simple PyTorch Dataset. + As per common practice, transforms are done on NumPy arrays. + """ + + def __init__( + self, + image_seq: Sequence[NDFloatArray], + bboxes_seq: Sequence[NDFloatArray], + labels_seq: Sequence[NDFloatArray], + transforms: Optional[Compose] + ) -> None: + self.image_seq = image_seq + self.bboxes_seq = bboxes_seq + self.labels_seq = labels_seq + self.transforms = transforms + + def __len__(self) -> int: + return len(self.image_seq) + + def __getitem__( + self, + idx: int + ) -> Tuple[NDFloatArray, NDFloatArray, NDFloatArray]: + if self.transforms: + return self.transforms( + self.image_seq[idx], + self.bboxes_seq[idx], + self.labels_seq[idx] + ) + return ( + self.image_seq[idx], + self.bboxes_seq[idx], + self.labels_seq[idx] + ) + + +# The affine transformations can be combined into one operation for better performance. +# Note that cropping and resizing are not affine and cannot be combined. +# Option (1): +affine_transform = CombineAffine( + [RandomRotate(probability=0.8), # Probability to include each affine transformation step + RandomShear(probability=0.6), # can be specified, otherwise the default value is used. + RandomTranslate(), # Thus, the number of selected steps could vary. + RandomFlipLeftRight(), + RandomFlipUpDown()], + probability=1.0 # Probability to apply this single combined transformation. +) +# Option (2): +# Alternatively, one can decide the exact number of randomly selected transformations, +# e.g., use only any two of them. This could be a better option because too many +# transformation steps may deform the images too much. +affine_transform = CombineAffine( + [RandomRotate(), # Individual `probability` has no effect in this approach. + RandomShear(), + RandomTranslate(), + RandomFlipLeftRight(), + RandomFlipUpDown()], + num_selected_transforms=2, # Only two steps from the list will be selected. + selected_probabilities=[0.5, 0.0, 0.3, 0.2, 0.0], # Must sum up to 1.0, if given. + keep_order=True, # If True, the selected steps must be performed in the given order. + probability=1.0 # Probability to apply this single combined transformation. +) +# Please refer to the API manual for more parameter options. + +# The `Compose` here is similar to that from the torchvision package, except +# that here it also supports callables with multiple inputs and outputs needed +# for objection detection tasks, i.e., (image, bboxes, labels). +transforms = Compose([ + RandomCrop(probability=0.5), + affine_transform, + Resize((256, 256)), +]) + +# Convert the above data sequences into a PyTorch Dataset. +# Users can have their own way to create the Dataset, as long as for each iteration +# it returns a tuple of arrays for a single sample: (image, bboxes, labels). +ds = PTDataset(image_seq, bboxes_seq, labels_seq, transforms=transforms) +``` +```python +# Batching: +# In PyTorch, it is common to use a Dataset with a DataLoader, which provides +# batching functionality. However, since the array/tensor shape of each sample +# could be different, the default batching may not work. Targetran provides +# a `collate_fn` that helps producing batches of (image_seq, bboxes_seq, labels_seq). +from torch.utils.data import DataLoader +from targetran.utils import collate_fn + +data_loader = DataLoader(ds, batch_size=2, collate_fn=collate_fn) +``` + +## Image classification + +While the tools here are primarily designed for object detection tasks, they can +also be used for image classification in which only the images are to be transformed, +e.g., given a dataset that returns `(image, label)` samples, or even only `image` samples. +The `image_only` function can be used to convert a transformation class for this purpose. + +If the dataset returns a tuple `(image, ...)` in each iteration, only the `image` +will be transformed, other parameters that followed such as `(..., label, weight)` +will be returned untouched. + +If the dataset returns `image` only (not a tuple), then only the transformed `image` will be returned. +```python +from targetran.utils import image_only +``` +```python +# TensorFlow. +ds = ds \ + .map(image_only(TFRandomCrop())) \ + .map(image_only(affine_transform)) \ + .map(image_only(TFResize((256, 256)))) \ + .batch(32) # Conventional batching can be used for classification setup. +``` +```python +# PyTorch. +transforms = Compose([ + image_only(RandomCrop()), + image_only(affine_transform), + image_only(Resize((256, 256))), +]) +ds = PTDataset(..., transforms=transforms) +data_loader = DataLoader(ds, batch_size=32) +``` + +## Examples + +- [Code examples in this repository](examples) +- [Construct a TensorFlow Dataset with Targetran + and object detection data](https://www.kaggle.com/boscoyung/targetran-example-with-tensorflow-dataset) + (Kaggle Notebook) +- [Image classification with TensorFlow and Targetran on TPU](https://www.kaggle.com/boscoyung/targetran-tpu-for-image-classification-example) + (Kaggle Notebook) + +# API + +See [here](docs/API.md) for API details. + + +%prep +%autosetup -n targetran-0.12.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-targetran -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 10 2023 Python_Bot - 0.12.0-1 +- Package Spec generated diff --git a/sources b/sources new file mode 100644 index 0000000..9cf4c65 --- /dev/null +++ b/sources @@ -0,0 +1 @@ +a633235e1b3e002bf349c551bedf41bd targetran-0.12.0.tar.gz -- cgit v1.2.3