summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-mlpug.spec1304
-rw-r--r--sources1
3 files changed, 1306 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..1de3c82 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/mlpug-0.0.57.tar.gz
diff --git a/python-mlpug.spec b/python-mlpug.spec
new file mode 100644
index 0000000..a7c310e
--- /dev/null
+++ b/python-mlpug.spec
@@ -0,0 +1,1304 @@
+%global _empty_manifest_terminate_build 0
+Name: python-mlpug
+Version: 0.0.57
+Release: 1
+Summary: A machine learning library agnostic framework for model training and evaluation
+License: Apache Software License
+URL: https://github.com/nuhame/mlpug
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/99/a0/fb7ed09cabdf20f37c1ee4f1ac2a74e81c63d02eb06d14080e2134f8bd3b/mlpug-0.0.57.tar.gz
+BuildArch: noarch
+
+Requires: python3-visionscaper-pybase
+Requires: python3-tensorboardX
+
+%description
+# MLPug
+MLPug is a Machine Learning library agnostic framework for model training and evaluation. A lot of the functionality you need to train and evaluate your model is independent of the ML library you're using, e.g. PyTorch or Tensorflow. MLPug provides a single framework with a unified API for all such training and evaluation functionality, independent of the ML library you are using.
+
+**Thus, when switching ML library, you don't have to learn a new training API and you can reuse your own training code with no, or minimal, change! 🤩🎉**
+
+## Dive right in!
+
+### Run the repository examples
+
+You can find the example code [here](mlpug/examples/documentation/).
+How MLPug is used in the examples is explained further [here](#hello-world-with-pytorch).
+
+Clone the MLPug repo:
+
+```
+git clone https://github.com/nuhame/mlpug.git
+```
+
+#### MLPug with PyTorch
+To run the PyTorch examples, install PyTorch first, further use Python >= 3.7.
+```
+cd mlpug
+
+# MLPug Hello World example
+python mlpug/examples/documentation/pytorch/hello_world.py
+
+# MLPug Fashion MNIST example
+# Run `fashion_mnist.py -h` for options
+python mlpug/examples/documentation/pytorch/fashion_mnist.py
+```
+
+There are similar [examples for using MLPug with PyTorch/XLA](mlpug/examples/documentation/pytorch/xla) (Training with Pytorch on TPUs).
+
+#### MLPug with Tensorflow
+To run the Tensorflow examples, install Tensorflow first, further use Python >= 3.7.
+```
+cd mlpug
+
+# MLPug Hello World example
+# Run hello_world.py or hello_world_not_eager.py
+python mlpug/examples/documentation/tensorflow/hello_world.py
+
+# MLPug Fashion MNIST example
+# Run `fashion_mnist.py -h` for options
+python mlpug/examples/documentation/pytorch/fashion_mnist.py
+```
+### Use MLPug in your own project
+
+```
+pip install mlpug
+```
+
+```Python
+# Using MLPug with PyTorch
+import mlpug.pytorch as mlp
+```
+
+```Python
+# Using MLPug with PyTorch/XLA (Training with Pytorch on TPUs)
+import mlpug.pytorch.xla as mlp
+```
+
+```Python
+# Using MLPug with Tensorflow
+import mlpug.tensorflow as mlp
+```
+
+# What is MLPug?
+MLPug is a machine learning library agnostic framework for model training and evaluation.
+
+A lot of the functionality you need to train and evaluate your machine learning model is
+independent of the machine learning library you're using, e.g. PyTorch and Tensorflow.
+For instance,
+
+ * checkpoint management,
+ * evaluation of validation set loss and other custom metrics,
+ * progress logging,
+ * progress visualization using Tensorboard,
+ * the use of gradient accumulation to train with large batch sizes using limited GPU memory, etc..
+
+You need such functionality no matter what machine learning framework you are using.
+
+MLPug provides a single framework with a unified API for all such training and evaluation functionality,
+independent of the machine learning library you are using. This also implies that when you switch library
+you can reuse your training code with no, or minimal, changes.
+
+## Supported deep learning libraries
+Currently, MLPug supports the following deep learning/machine learning libraries:
+
+ * PyTorch
+ * PyTorch/XLA (Training with Pytorch on TPUs)
+ * Tensorflow (in development, some features not available yet)
+
+## MLPug focus
+Although MLPug should be able to deal with any training job, its functionality is mostly focussed on dealing with
+training large models on large datasets, using limited hardware (GPU or TPU) resources and memory.
+
+## Almost at version 0.1!
+MLPug is still in development. If you are having trouble using MLPug for your use case, or
+when you have found a bug, please file an issue.
+
+## Contents
+[Installing MLPug](#installing-mlpug) \
+\
+[Hello World](#hello-world) ([PT](#hello-world-with-pytorch) |
+[XLA](#hello-world-with-pytorchxla) |
+[TF](#hello-world-with-tensorflow))
+
+[Feature parity list](#feature-parity-list)
+\
+\
+\
+The following sections are documentation **ToDo's**, but provide insight in to MLPug's features: \
+[The `logs` object](#the-logs-object) \
+\
+[Callbacks and the training life cycle](#callbacks-and-the-training-life-cycle) \
+\
+[Progress Logging](#progress-logging) \
+\
+[Model components vs Training model](#model-components-vs-training-model) \
+\
+[Distributed training](#distributed-training) \
+\
+[Checkpoint management](#checkpoint-management) \
+      [Using the CheckpointManager](#using-the-checkpointmanager) \
+      [Using training checkpoints](#using-training-checkpoints) \
+      [Using model checkpoints](#using-model-checkpoints) \
+      [Checkpointing on error or interrupt](#checkpointing-on-error-or-interrupt) \
+\
+[MLPug metric evaluators](#mlpug-metric-evaluators) \
+      [Auxiliary batch training results](#auxiliary-batch-training-results) \
+      [Calculating custom metrics](#calculating-custom-metrics) \
+      [Conditional computation of metrics](#conditional-computation-of-metrics) \
+\
+[Batch chunking, dealing with GPU memory limits](#batch-chunking-dealing-with-gpu-memory-limits) \
+      [Gradient Accumulation](#gradient-accumulation) \
+      [Chunked Metric Computation](#chunked-metric-computation) \
+\
+[Using Tensorboard](#using-tensorboard) \
+      [Tensorboard made easy with AutoTensorboard](#tensorboard-made-easy-with-auto-tensorboard) \
+      [More fine grained control](#more-fine-grained-control) \
+\
+[Learning Rate Scheduling](#learning-rate-scheduling) \
+\
+[Multi GPU training](#multi-gpu-training) \
+\
+[Mixed Precision Training](#mixed-precision-training) \
+\
+[CUDA Memory tools](#cuda-memory-tools) \
+\
+[Using multiple optimizers](#using-multiple-optimizers)
+
+## Installing MLPug
+Please ensure that you are using Python3.7+.
+
+Install as follows:
+```
+pip install mlpug
+```
+
+### Usage with PyTorch
+When you want to use MLPug with PyTorch, you will need to install it:
+```
+pip install torch torchvision
+```
+
+### Usage with Tensorflow
+When you want to use MLPug with Tensorflow, you will need to install it:
+```
+pip install tensorflow
+```
+
+## Hello World!
+This is the Hello World of training with MLPug. You will see that the usage of MLPug with Pytorch,
+Pytorch/XLA and Tensorflow is very similar.
+
+For details please see :
+
+ * [pytorch/hello_world.py](mlpug/examples/documentation/pytorch/hello_world.py),
+
+ * [pytorch/xla/hello_world.py](mlpug/examples/documentation/pytorch/xla/hello_world.py),
+
+ * [tensorflow/hello_world.py](mlpug/examples/documentation/tensorflow/hello_world.py) and
+[tensorflow/hello_world_not_eager.py](mlpug/examples/documentation/tensorflow/hello_world_not_eager.py)
+
+You can download and run these examples (for XLA you need to use a TPU on Google Cloud, or use Google Colab).
+
+When reading through the explanation below it might be that you still have a lot of questions about the why and how of
+training with MLPug, however I will expand the MLPug documentation soon, so you will get better insight.
+
+### 'Hello World' with PyTorch
+To use MLPug with Pytorch
+```python
+import mlpug.pytorch as mlp
+```
+
+Before we can start training we need an iterable dataset that can provide our training batches.
+
+```python
+training_dataset = torch.utils.data.DataLoader(training_data,
+ batch_size=batch_size,
+ shuffle=False,
+ num_workers=3)
+```
+
+... and a model we want to train
+```python
+classifier = torch.nn.Sequential(
+ torch.nn.Flatten(),
+ torch.nn.Linear(784, 128),
+ torch.nn.ReLU(),
+ torch.nn.Linear(128, 10))
+```
+
+MLPug needs a way to evaluate the loss of the model. One way to do that is to define a `TrainModel` that
+outputs the loss
+```python
+class TrainModel(torch.nn.Module):
+ def __init__(self, classifier):
+ super(TrainModel, self).__init__()
+
+ self.classifier = classifier
+ self.loss_func = torch.nn.CrossEntropyLoss()
+
+ def forward(self, batch_data, evaluate_settings, inference_mode=None):
+ images, true_labels = batch_data
+
+ logits = self.classifier(images)
+ return self.loss_func(logits, true_labels)
+
+train_model = TrainModel(classifier)
+```
+
+To train the model we will also need an optimizer
+```python
+optimizer = torch.optim.Adam(classifier.parameters(), eps=1e-7)
+```
+
+To now use MLPug to start training, we need to create a `Trainer` which will be used by a `TrainingManager`.
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer, model_components=classifier)
+```
+
+MLPug uses a callback system allowing you to customize and extend the training functionality.
+The list of callback instances you provide the `TrainingManager` will be called using hooks at different stages of the
+training process.
+```python
+# At minimum you want to log the loss in the training progress
+# By default the batch loss and the moving average of the loss are calculated and logged
+loss_evaluator = mlp.evaluation.MetricEvaluator(trainer=trainer)
+callbacks = [
+ mlp.callbacks.TrainingMetricsLogger(metric_evaluator=loss_evaluator),
+ # Calculate validation loss only once per epoch over the whole dataset
+ mlp.callbacks.TestMetricsLogger(validation_dataset,
+ 'validation',
+ metric_evaluator=loss_evaluator,
+ batch_level=False),
+ mlp.callbacks.LogProgress(log_period=progress_log_period, set_names=['training', 'validation']),
+]
+```
+
+The `TrainingMetricsLogger` and the `TestMetricsLogger` callback instances log training and validation set loss values
+in a `logs` object that is passed through all callbacks during training. The `LogProgress` callback instance logs the
+metric values stored in the received `logs` object.
+
+We can now instantiate the `TrainingManager` and pass it the `trainer`.
+```python
+manager = mlp.trainers.TrainingManager(trainer,
+ training_dataset,
+ num_epochs=num_epochs,
+ callbacks=callbacks)
+```
+
+Before we can start training we still have to provide the `train_model` to the trainer.
+```python
+trainer.set_training_model(train_model)
+```
+
+The final step is to actually start training:
+```python
+manager.start_training()
+```
+
+Running `pytorch/hello_world.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:08
+Moving average:
+training : loss 0.238.
+
+Computed over dataset:
+validation : loss 0.346.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+### 'Hello World' with PyTorch/XLA
+
+The Hello World example with PyTorch/XLA, is largely the same as with [PyTorch](#hello-world-with-pytorch). There are only
+two small differences.
+
+To use MLPug with Pytorch/XLA, load the correct backend
+```python
+import mlpug.pytorch.xla as mlp
+```
+
+Load your model on a TPU core:
+```python
+import torch_xla.core.xla_model as xm
+
+...
+
+device = xm.xla_device()
+
+train_model = TrainModel(classifier, device)
+classifier.to(device)
+```
+
+### 'Hello World' with Tensorflow
+Below we will focus only on the minor differences between using MLPug with [PyTorch](#hello-world-with-pytorch) and Tensorflow.
+
+To use MLPug with Tensorflow
+```python
+import mlpug.tensorflow as mlp
+```
+
+The only real difference is that, for Tensorflow, you can specify if the trainer needs to run in eager mode or not.
+If not, you need to specify the input `batch_data_signature`.
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer,
+ model_components=classifier,
+ eager_mode=True)
+```
+
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer,
+ model_components=classifier,
+ batch_data_signature=(tf.TensorSpec(shape=(None, 28, 28), dtype=tf.float64),
+ tf.TensorSpec(shape=(None,), dtype=tf.uint8),))
+```
+When you run [tensorflow/hello_world.py](mlpug/examples/documentation/tensorflow/hello_world.py) and
+[tensorflow/hello_world_not_eager.py](mlpug/examples/documentation/tensorflow/hello_world_not_eager.py) you will see
+that when not running in eager mode, training is much faster.
+
+Running `tensorflow/hello_world.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:15
+Moving average:
+training : loss 0.229.
+
+Computed over dataset:
+validation : loss 0.370.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+Running `tensorflow/hello_world_not_eager.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:06
+Moving average:
+training : loss 0.229.
+
+Computed over dataset:
+validation : loss 0.370.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+Note the difference in epoch duration!
+
+
+## Feature parity list
+
+
+| Feature | PyTorch | PyTorch/XLA | Tensorflow | JAX | Comments |
+|-----------------------------------------------|-------------|-------------|-------------|-------------|----------------------------------|
+| Callbacks and training life cycle | ✓ | ✓ | ✓ | | |
+| Progress Logging | ✓ | ✓ | ✓ | | |
+| Distributed training | ✓ | ✓ | ✓ | | Both multi-GPU and multi-TPU support for PyTorch and TF. TPU training with TF is untested |
+| Distributed evaluation | ✓ | ✓ | ✓ | | Both multi-GPU and multi-TPU support for PyTorch and TF. evaluation on TPU with TF is untested |
+| Model and training checkpoint management | ✓ | ✓ | ✓ | | |
+| Custom metric evaluation | ✓ | ✓ | ✓ | | |
+| Conditional evaluation of metrics | ✓ | ✓ | ✓ | | |
+| Batch Chunking: gradient accumulation | ✓ | ✓ | ✓ | | |
+| Batch Chunking: chunked evaluation of metrics | ✓ | ✓ | ✓ | | |
+| Tensorboard support | ✓ | ✓ | ✓ | | Might be refactored |
+| Learning Rate scheduling | ✓ | ✓ | ✓ | | Might be refactored |
+| Mixed Precision Training | ✓ | ❌ | ~ | | Should work with TF, but no specific support |
+| Using multiple optimizers | ✓ | ✓ | ✓ | | |
+| Multi-task training | ~ | ~ | ~ | | No support yet, but can be done when only one DataLoader is required |
+
+
+
+
+
+%package -n python3-mlpug
+Summary: A machine learning library agnostic framework for model training and evaluation
+Provides: python-mlpug
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-mlpug
+# MLPug
+MLPug is a Machine Learning library agnostic framework for model training and evaluation. A lot of the functionality you need to train and evaluate your model is independent of the ML library you're using, e.g. PyTorch or Tensorflow. MLPug provides a single framework with a unified API for all such training and evaluation functionality, independent of the ML library you are using.
+
+**Thus, when switching ML library, you don't have to learn a new training API and you can reuse your own training code with no, or minimal, change! 🤩🎉**
+
+## Dive right in!
+
+### Run the repository examples
+
+You can find the example code [here](mlpug/examples/documentation/).
+How MLPug is used in the examples is explained further [here](#hello-world-with-pytorch).
+
+Clone the MLPug repo:
+
+```
+git clone https://github.com/nuhame/mlpug.git
+```
+
+#### MLPug with PyTorch
+To run the PyTorch examples, install PyTorch first, further use Python >= 3.7.
+```
+cd mlpug
+
+# MLPug Hello World example
+python mlpug/examples/documentation/pytorch/hello_world.py
+
+# MLPug Fashion MNIST example
+# Run `fashion_mnist.py -h` for options
+python mlpug/examples/documentation/pytorch/fashion_mnist.py
+```
+
+There are similar [examples for using MLPug with PyTorch/XLA](mlpug/examples/documentation/pytorch/xla) (Training with Pytorch on TPUs).
+
+#### MLPug with Tensorflow
+To run the Tensorflow examples, install Tensorflow first, further use Python >= 3.7.
+```
+cd mlpug
+
+# MLPug Hello World example
+# Run hello_world.py or hello_world_not_eager.py
+python mlpug/examples/documentation/tensorflow/hello_world.py
+
+# MLPug Fashion MNIST example
+# Run `fashion_mnist.py -h` for options
+python mlpug/examples/documentation/pytorch/fashion_mnist.py
+```
+### Use MLPug in your own project
+
+```
+pip install mlpug
+```
+
+```Python
+# Using MLPug with PyTorch
+import mlpug.pytorch as mlp
+```
+
+```Python
+# Using MLPug with PyTorch/XLA (Training with Pytorch on TPUs)
+import mlpug.pytorch.xla as mlp
+```
+
+```Python
+# Using MLPug with Tensorflow
+import mlpug.tensorflow as mlp
+```
+
+# What is MLPug?
+MLPug is a machine learning library agnostic framework for model training and evaluation.
+
+A lot of the functionality you need to train and evaluate your machine learning model is
+independent of the machine learning library you're using, e.g. PyTorch and Tensorflow.
+For instance,
+
+ * checkpoint management,
+ * evaluation of validation set loss and other custom metrics,
+ * progress logging,
+ * progress visualization using Tensorboard,
+ * the use of gradient accumulation to train with large batch sizes using limited GPU memory, etc..
+
+You need such functionality no matter what machine learning framework you are using.
+
+MLPug provides a single framework with a unified API for all such training and evaluation functionality,
+independent of the machine learning library you are using. This also implies that when you switch library
+you can reuse your training code with no, or minimal, changes.
+
+## Supported deep learning libraries
+Currently, MLPug supports the following deep learning/machine learning libraries:
+
+ * PyTorch
+ * PyTorch/XLA (Training with Pytorch on TPUs)
+ * Tensorflow (in development, some features not available yet)
+
+## MLPug focus
+Although MLPug should be able to deal with any training job, its functionality is mostly focussed on dealing with
+training large models on large datasets, using limited hardware (GPU or TPU) resources and memory.
+
+## Almost at version 0.1!
+MLPug is still in development. If you are having trouble using MLPug for your use case, or
+when you have found a bug, please file an issue.
+
+## Contents
+[Installing MLPug](#installing-mlpug) \
+\
+[Hello World](#hello-world) ([PT](#hello-world-with-pytorch) |
+[XLA](#hello-world-with-pytorchxla) |
+[TF](#hello-world-with-tensorflow))
+
+[Feature parity list](#feature-parity-list)
+\
+\
+\
+The following sections are documentation **ToDo's**, but provide insight in to MLPug's features: \
+[The `logs` object](#the-logs-object) \
+\
+[Callbacks and the training life cycle](#callbacks-and-the-training-life-cycle) \
+\
+[Progress Logging](#progress-logging) \
+\
+[Model components vs Training model](#model-components-vs-training-model) \
+\
+[Distributed training](#distributed-training) \
+\
+[Checkpoint management](#checkpoint-management) \
+      [Using the CheckpointManager](#using-the-checkpointmanager) \
+      [Using training checkpoints](#using-training-checkpoints) \
+      [Using model checkpoints](#using-model-checkpoints) \
+      [Checkpointing on error or interrupt](#checkpointing-on-error-or-interrupt) \
+\
+[MLPug metric evaluators](#mlpug-metric-evaluators) \
+      [Auxiliary batch training results](#auxiliary-batch-training-results) \
+      [Calculating custom metrics](#calculating-custom-metrics) \
+      [Conditional computation of metrics](#conditional-computation-of-metrics) \
+\
+[Batch chunking, dealing with GPU memory limits](#batch-chunking-dealing-with-gpu-memory-limits) \
+      [Gradient Accumulation](#gradient-accumulation) \
+      [Chunked Metric Computation](#chunked-metric-computation) \
+\
+[Using Tensorboard](#using-tensorboard) \
+      [Tensorboard made easy with AutoTensorboard](#tensorboard-made-easy-with-auto-tensorboard) \
+      [More fine grained control](#more-fine-grained-control) \
+\
+[Learning Rate Scheduling](#learning-rate-scheduling) \
+\
+[Multi GPU training](#multi-gpu-training) \
+\
+[Mixed Precision Training](#mixed-precision-training) \
+\
+[CUDA Memory tools](#cuda-memory-tools) \
+\
+[Using multiple optimizers](#using-multiple-optimizers)
+
+## Installing MLPug
+Please ensure that you are using Python3.7+.
+
+Install as follows:
+```
+pip install mlpug
+```
+
+### Usage with PyTorch
+When you want to use MLPug with PyTorch, you will need to install it:
+```
+pip install torch torchvision
+```
+
+### Usage with Tensorflow
+When you want to use MLPug with Tensorflow, you will need to install it:
+```
+pip install tensorflow
+```
+
+## Hello World!
+This is the Hello World of training with MLPug. You will see that the usage of MLPug with Pytorch,
+Pytorch/XLA and Tensorflow is very similar.
+
+For details please see :
+
+ * [pytorch/hello_world.py](mlpug/examples/documentation/pytorch/hello_world.py),
+
+ * [pytorch/xla/hello_world.py](mlpug/examples/documentation/pytorch/xla/hello_world.py),
+
+ * [tensorflow/hello_world.py](mlpug/examples/documentation/tensorflow/hello_world.py) and
+[tensorflow/hello_world_not_eager.py](mlpug/examples/documentation/tensorflow/hello_world_not_eager.py)
+
+You can download and run these examples (for XLA you need to use a TPU on Google Cloud, or use Google Colab).
+
+When reading through the explanation below it might be that you still have a lot of questions about the why and how of
+training with MLPug, however I will expand the MLPug documentation soon, so you will get better insight.
+
+### 'Hello World' with PyTorch
+To use MLPug with Pytorch
+```python
+import mlpug.pytorch as mlp
+```
+
+Before we can start training we need an iterable dataset that can provide our training batches.
+
+```python
+training_dataset = torch.utils.data.DataLoader(training_data,
+ batch_size=batch_size,
+ shuffle=False,
+ num_workers=3)
+```
+
+... and a model we want to train
+```python
+classifier = torch.nn.Sequential(
+ torch.nn.Flatten(),
+ torch.nn.Linear(784, 128),
+ torch.nn.ReLU(),
+ torch.nn.Linear(128, 10))
+```
+
+MLPug needs a way to evaluate the loss of the model. One way to do that is to define a `TrainModel` that
+outputs the loss
+```python
+class TrainModel(torch.nn.Module):
+ def __init__(self, classifier):
+ super(TrainModel, self).__init__()
+
+ self.classifier = classifier
+ self.loss_func = torch.nn.CrossEntropyLoss()
+
+ def forward(self, batch_data, evaluate_settings, inference_mode=None):
+ images, true_labels = batch_data
+
+ logits = self.classifier(images)
+ return self.loss_func(logits, true_labels)
+
+train_model = TrainModel(classifier)
+```
+
+To train the model we will also need an optimizer
+```python
+optimizer = torch.optim.Adam(classifier.parameters(), eps=1e-7)
+```
+
+To now use MLPug to start training, we need to create a `Trainer` which will be used by a `TrainingManager`.
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer, model_components=classifier)
+```
+
+MLPug uses a callback system allowing you to customize and extend the training functionality.
+The list of callback instances you provide the `TrainingManager` will be called using hooks at different stages of the
+training process.
+```python
+# At minimum you want to log the loss in the training progress
+# By default the batch loss and the moving average of the loss are calculated and logged
+loss_evaluator = mlp.evaluation.MetricEvaluator(trainer=trainer)
+callbacks = [
+ mlp.callbacks.TrainingMetricsLogger(metric_evaluator=loss_evaluator),
+ # Calculate validation loss only once per epoch over the whole dataset
+ mlp.callbacks.TestMetricsLogger(validation_dataset,
+ 'validation',
+ metric_evaluator=loss_evaluator,
+ batch_level=False),
+ mlp.callbacks.LogProgress(log_period=progress_log_period, set_names=['training', 'validation']),
+]
+```
+
+The `TrainingMetricsLogger` and the `TestMetricsLogger` callback instances log training and validation set loss values
+in a `logs` object that is passed through all callbacks during training. The `LogProgress` callback instance logs the
+metric values stored in the received `logs` object.
+
+We can now instantiate the `TrainingManager` and pass it the `trainer`.
+```python
+manager = mlp.trainers.TrainingManager(trainer,
+ training_dataset,
+ num_epochs=num_epochs,
+ callbacks=callbacks)
+```
+
+Before we can start training we still have to provide the `train_model` to the trainer.
+```python
+trainer.set_training_model(train_model)
+```
+
+The final step is to actually start training:
+```python
+manager.start_training()
+```
+
+Running `pytorch/hello_world.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:08
+Moving average:
+training : loss 0.238.
+
+Computed over dataset:
+validation : loss 0.346.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+### 'Hello World' with PyTorch/XLA
+
+The Hello World example with PyTorch/XLA, is largely the same as with [PyTorch](#hello-world-with-pytorch). There are only
+two small differences.
+
+To use MLPug with Pytorch/XLA, load the correct backend
+```python
+import mlpug.pytorch.xla as mlp
+```
+
+Load your model on a TPU core:
+```python
+import torch_xla.core.xla_model as xm
+
+...
+
+device = xm.xla_device()
+
+train_model = TrainModel(classifier, device)
+classifier.to(device)
+```
+
+### 'Hello World' with Tensorflow
+Below we will focus only on the minor differences between using MLPug with [PyTorch](#hello-world-with-pytorch) and Tensorflow.
+
+To use MLPug with Tensorflow
+```python
+import mlpug.tensorflow as mlp
+```
+
+The only real difference is that, for Tensorflow, you can specify if the trainer needs to run in eager mode or not.
+If not, you need to specify the input `batch_data_signature`.
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer,
+ model_components=classifier,
+ eager_mode=True)
+```
+
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer,
+ model_components=classifier,
+ batch_data_signature=(tf.TensorSpec(shape=(None, 28, 28), dtype=tf.float64),
+ tf.TensorSpec(shape=(None,), dtype=tf.uint8),))
+```
+When you run [tensorflow/hello_world.py](mlpug/examples/documentation/tensorflow/hello_world.py) and
+[tensorflow/hello_world_not_eager.py](mlpug/examples/documentation/tensorflow/hello_world_not_eager.py) you will see
+that when not running in eager mode, training is much faster.
+
+Running `tensorflow/hello_world.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:15
+Moving average:
+training : loss 0.229.
+
+Computed over dataset:
+validation : loss 0.370.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+Running `tensorflow/hello_world_not_eager.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:06
+Moving average:
+training : loss 0.229.
+
+Computed over dataset:
+validation : loss 0.370.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+Note the difference in epoch duration!
+
+
+## Feature parity list
+
+
+| Feature | PyTorch | PyTorch/XLA | Tensorflow | JAX | Comments |
+|-----------------------------------------------|-------------|-------------|-------------|-------------|----------------------------------|
+| Callbacks and training life cycle | ✓ | ✓ | ✓ | | |
+| Progress Logging | ✓ | ✓ | ✓ | | |
+| Distributed training | ✓ | ✓ | ✓ | | Both multi-GPU and multi-TPU support for PyTorch and TF. TPU training with TF is untested |
+| Distributed evaluation | ✓ | ✓ | ✓ | | Both multi-GPU and multi-TPU support for PyTorch and TF. evaluation on TPU with TF is untested |
+| Model and training checkpoint management | ✓ | ✓ | ✓ | | |
+| Custom metric evaluation | ✓ | ✓ | ✓ | | |
+| Conditional evaluation of metrics | ✓ | ✓ | ✓ | | |
+| Batch Chunking: gradient accumulation | ✓ | ✓ | ✓ | | |
+| Batch Chunking: chunked evaluation of metrics | ✓ | ✓ | ✓ | | |
+| Tensorboard support | ✓ | ✓ | ✓ | | Might be refactored |
+| Learning Rate scheduling | ✓ | ✓ | ✓ | | Might be refactored |
+| Mixed Precision Training | ✓ | ❌ | ~ | | Should work with TF, but no specific support |
+| Using multiple optimizers | ✓ | ✓ | ✓ | | |
+| Multi-task training | ~ | ~ | ~ | | No support yet, but can be done when only one DataLoader is required |
+
+
+
+
+
+%package help
+Summary: Development documents and examples for mlpug
+Provides: python3-mlpug-doc
+%description help
+# MLPug
+MLPug is a Machine Learning library agnostic framework for model training and evaluation. A lot of the functionality you need to train and evaluate your model is independent of the ML library you're using, e.g. PyTorch or Tensorflow. MLPug provides a single framework with a unified API for all such training and evaluation functionality, independent of the ML library you are using.
+
+**Thus, when switching ML library, you don't have to learn a new training API and you can reuse your own training code with no, or minimal, change! 🤩🎉**
+
+## Dive right in!
+
+### Run the repository examples
+
+You can find the example code [here](mlpug/examples/documentation/).
+How MLPug is used in the examples is explained further [here](#hello-world-with-pytorch).
+
+Clone the MLPug repo:
+
+```
+git clone https://github.com/nuhame/mlpug.git
+```
+
+#### MLPug with PyTorch
+To run the PyTorch examples, install PyTorch first, further use Python >= 3.7.
+```
+cd mlpug
+
+# MLPug Hello World example
+python mlpug/examples/documentation/pytorch/hello_world.py
+
+# MLPug Fashion MNIST example
+# Run `fashion_mnist.py -h` for options
+python mlpug/examples/documentation/pytorch/fashion_mnist.py
+```
+
+There are similar [examples for using MLPug with PyTorch/XLA](mlpug/examples/documentation/pytorch/xla) (Training with Pytorch on TPUs).
+
+#### MLPug with Tensorflow
+To run the Tensorflow examples, install Tensorflow first, further use Python >= 3.7.
+```
+cd mlpug
+
+# MLPug Hello World example
+# Run hello_world.py or hello_world_not_eager.py
+python mlpug/examples/documentation/tensorflow/hello_world.py
+
+# MLPug Fashion MNIST example
+# Run `fashion_mnist.py -h` for options
+python mlpug/examples/documentation/pytorch/fashion_mnist.py
+```
+### Use MLPug in your own project
+
+```
+pip install mlpug
+```
+
+```Python
+# Using MLPug with PyTorch
+import mlpug.pytorch as mlp
+```
+
+```Python
+# Using MLPug with PyTorch/XLA (Training with Pytorch on TPUs)
+import mlpug.pytorch.xla as mlp
+```
+
+```Python
+# Using MLPug with Tensorflow
+import mlpug.tensorflow as mlp
+```
+
+# What is MLPug?
+MLPug is a machine learning library agnostic framework for model training and evaluation.
+
+A lot of the functionality you need to train and evaluate your machine learning model is
+independent of the machine learning library you're using, e.g. PyTorch and Tensorflow.
+For instance,
+
+ * checkpoint management,
+ * evaluation of validation set loss and other custom metrics,
+ * progress logging,
+ * progress visualization using Tensorboard,
+ * the use of gradient accumulation to train with large batch sizes using limited GPU memory, etc..
+
+You need such functionality no matter what machine learning framework you are using.
+
+MLPug provides a single framework with a unified API for all such training and evaluation functionality,
+independent of the machine learning library you are using. This also implies that when you switch library
+you can reuse your training code with no, or minimal, changes.
+
+## Supported deep learning libraries
+Currently, MLPug supports the following deep learning/machine learning libraries:
+
+ * PyTorch
+ * PyTorch/XLA (Training with Pytorch on TPUs)
+ * Tensorflow (in development, some features not available yet)
+
+## MLPug focus
+Although MLPug should be able to deal with any training job, its functionality is mostly focussed on dealing with
+training large models on large datasets, using limited hardware (GPU or TPU) resources and memory.
+
+## Almost at version 0.1!
+MLPug is still in development. If you are having trouble using MLPug for your use case, or
+when you have found a bug, please file an issue.
+
+## Contents
+[Installing MLPug](#installing-mlpug) \
+\
+[Hello World](#hello-world) ([PT](#hello-world-with-pytorch) |
+[XLA](#hello-world-with-pytorchxla) |
+[TF](#hello-world-with-tensorflow))
+
+[Feature parity list](#feature-parity-list)
+\
+\
+\
+The following sections are documentation **ToDo's**, but provide insight in to MLPug's features: \
+[The `logs` object](#the-logs-object) \
+\
+[Callbacks and the training life cycle](#callbacks-and-the-training-life-cycle) \
+\
+[Progress Logging](#progress-logging) \
+\
+[Model components vs Training model](#model-components-vs-training-model) \
+\
+[Distributed training](#distributed-training) \
+\
+[Checkpoint management](#checkpoint-management) \
+      [Using the CheckpointManager](#using-the-checkpointmanager) \
+      [Using training checkpoints](#using-training-checkpoints) \
+      [Using model checkpoints](#using-model-checkpoints) \
+      [Checkpointing on error or interrupt](#checkpointing-on-error-or-interrupt) \
+\
+[MLPug metric evaluators](#mlpug-metric-evaluators) \
+      [Auxiliary batch training results](#auxiliary-batch-training-results) \
+      [Calculating custom metrics](#calculating-custom-metrics) \
+      [Conditional computation of metrics](#conditional-computation-of-metrics) \
+\
+[Batch chunking, dealing with GPU memory limits](#batch-chunking-dealing-with-gpu-memory-limits) \
+      [Gradient Accumulation](#gradient-accumulation) \
+      [Chunked Metric Computation](#chunked-metric-computation) \
+\
+[Using Tensorboard](#using-tensorboard) \
+      [Tensorboard made easy with AutoTensorboard](#tensorboard-made-easy-with-auto-tensorboard) \
+      [More fine grained control](#more-fine-grained-control) \
+\
+[Learning Rate Scheduling](#learning-rate-scheduling) \
+\
+[Multi GPU training](#multi-gpu-training) \
+\
+[Mixed Precision Training](#mixed-precision-training) \
+\
+[CUDA Memory tools](#cuda-memory-tools) \
+\
+[Using multiple optimizers](#using-multiple-optimizers)
+
+## Installing MLPug
+Please ensure that you are using Python3.7+.
+
+Install as follows:
+```
+pip install mlpug
+```
+
+### Usage with PyTorch
+When you want to use MLPug with PyTorch, you will need to install it:
+```
+pip install torch torchvision
+```
+
+### Usage with Tensorflow
+When you want to use MLPug with Tensorflow, you will need to install it:
+```
+pip install tensorflow
+```
+
+## Hello World!
+This is the Hello World of training with MLPug. You will see that the usage of MLPug with Pytorch,
+Pytorch/XLA and Tensorflow is very similar.
+
+For details please see :
+
+ * [pytorch/hello_world.py](mlpug/examples/documentation/pytorch/hello_world.py),
+
+ * [pytorch/xla/hello_world.py](mlpug/examples/documentation/pytorch/xla/hello_world.py),
+
+ * [tensorflow/hello_world.py](mlpug/examples/documentation/tensorflow/hello_world.py) and
+[tensorflow/hello_world_not_eager.py](mlpug/examples/documentation/tensorflow/hello_world_not_eager.py)
+
+You can download and run these examples (for XLA you need to use a TPU on Google Cloud, or use Google Colab).
+
+When reading through the explanation below it might be that you still have a lot of questions about the why and how of
+training with MLPug, however I will expand the MLPug documentation soon, so you will get better insight.
+
+### 'Hello World' with PyTorch
+To use MLPug with Pytorch
+```python
+import mlpug.pytorch as mlp
+```
+
+Before we can start training we need an iterable dataset that can provide our training batches.
+
+```python
+training_dataset = torch.utils.data.DataLoader(training_data,
+ batch_size=batch_size,
+ shuffle=False,
+ num_workers=3)
+```
+
+... and a model we want to train
+```python
+classifier = torch.nn.Sequential(
+ torch.nn.Flatten(),
+ torch.nn.Linear(784, 128),
+ torch.nn.ReLU(),
+ torch.nn.Linear(128, 10))
+```
+
+MLPug needs a way to evaluate the loss of the model. One way to do that is to define a `TrainModel` that
+outputs the loss
+```python
+class TrainModel(torch.nn.Module):
+ def __init__(self, classifier):
+ super(TrainModel, self).__init__()
+
+ self.classifier = classifier
+ self.loss_func = torch.nn.CrossEntropyLoss()
+
+ def forward(self, batch_data, evaluate_settings, inference_mode=None):
+ images, true_labels = batch_data
+
+ logits = self.classifier(images)
+ return self.loss_func(logits, true_labels)
+
+train_model = TrainModel(classifier)
+```
+
+To train the model we will also need an optimizer
+```python
+optimizer = torch.optim.Adam(classifier.parameters(), eps=1e-7)
+```
+
+To now use MLPug to start training, we need to create a `Trainer` which will be used by a `TrainingManager`.
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer, model_components=classifier)
+```
+
+MLPug uses a callback system allowing you to customize and extend the training functionality.
+The list of callback instances you provide the `TrainingManager` will be called using hooks at different stages of the
+training process.
+```python
+# At minimum you want to log the loss in the training progress
+# By default the batch loss and the moving average of the loss are calculated and logged
+loss_evaluator = mlp.evaluation.MetricEvaluator(trainer=trainer)
+callbacks = [
+ mlp.callbacks.TrainingMetricsLogger(metric_evaluator=loss_evaluator),
+ # Calculate validation loss only once per epoch over the whole dataset
+ mlp.callbacks.TestMetricsLogger(validation_dataset,
+ 'validation',
+ metric_evaluator=loss_evaluator,
+ batch_level=False),
+ mlp.callbacks.LogProgress(log_period=progress_log_period, set_names=['training', 'validation']),
+]
+```
+
+The `TrainingMetricsLogger` and the `TestMetricsLogger` callback instances log training and validation set loss values
+in a `logs` object that is passed through all callbacks during training. The `LogProgress` callback instance logs the
+metric values stored in the received `logs` object.
+
+We can now instantiate the `TrainingManager` and pass it the `trainer`.
+```python
+manager = mlp.trainers.TrainingManager(trainer,
+ training_dataset,
+ num_epochs=num_epochs,
+ callbacks=callbacks)
+```
+
+Before we can start training we still have to provide the `train_model` to the trainer.
+```python
+trainer.set_training_model(train_model)
+```
+
+The final step is to actually start training:
+```python
+manager.start_training()
+```
+
+Running `pytorch/hello_world.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:08
+Moving average:
+training : loss 0.238.
+
+Computed over dataset:
+validation : loss 0.346.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+### 'Hello World' with PyTorch/XLA
+
+The Hello World example with PyTorch/XLA, is largely the same as with [PyTorch](#hello-world-with-pytorch). There are only
+two small differences.
+
+To use MLPug with Pytorch/XLA, load the correct backend
+```python
+import mlpug.pytorch.xla as mlp
+```
+
+Load your model on a TPU core:
+```python
+import torch_xla.core.xla_model as xm
+
+...
+
+device = xm.xla_device()
+
+train_model = TrainModel(classifier, device)
+classifier.to(device)
+```
+
+### 'Hello World' with Tensorflow
+Below we will focus only on the minor differences between using MLPug with [PyTorch](#hello-world-with-pytorch) and Tensorflow.
+
+To use MLPug with Tensorflow
+```python
+import mlpug.tensorflow as mlp
+```
+
+The only real difference is that, for Tensorflow, you can specify if the trainer needs to run in eager mode or not.
+If not, you need to specify the input `batch_data_signature`.
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer,
+ model_components=classifier,
+ eager_mode=True)
+```
+
+```python
+trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer,
+ model_components=classifier,
+ batch_data_signature=(tf.TensorSpec(shape=(None, 28, 28), dtype=tf.float64),
+ tf.TensorSpec(shape=(None,), dtype=tf.uint8),))
+```
+When you run [tensorflow/hello_world.py](mlpug/examples/documentation/tensorflow/hello_world.py) and
+[tensorflow/hello_world_not_eager.py](mlpug/examples/documentation/tensorflow/hello_world_not_eager.py) you will see
+that when not running in eager mode, training is much faster.
+
+Running `tensorflow/hello_world.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:15
+Moving average:
+training : loss 0.229.
+
+Computed over dataset:
+validation : loss 0.370.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+Running `tensorflow/hello_world_not_eager.py` finishes like this:
+```text
+###############################################################################
+Epoch 9/9 READY - Duration 0:00:06
+Moving average:
+training : loss 0.229.
+
+Computed over dataset:
+validation : loss 0.370.
+
+
+
+INFO : TrainingManager::_train : Training completed. All good! ❤️
+
+Using the classifier ...
+real label = 9, predicted label = 9
+```
+
+Note the difference in epoch duration!
+
+
+## Feature parity list
+
+
+| Feature | PyTorch | PyTorch/XLA | Tensorflow | JAX | Comments |
+|-----------------------------------------------|-------------|-------------|-------------|-------------|----------------------------------|
+| Callbacks and training life cycle | ✓ | ✓ | ✓ | | |
+| Progress Logging | ✓ | ✓ | ✓ | | |
+| Distributed training | ✓ | ✓ | ✓ | | Both multi-GPU and multi-TPU support for PyTorch and TF. TPU training with TF is untested |
+| Distributed evaluation | ✓ | ✓ | ✓ | | Both multi-GPU and multi-TPU support for PyTorch and TF. evaluation on TPU with TF is untested |
+| Model and training checkpoint management | ✓ | ✓ | ✓ | | |
+| Custom metric evaluation | ✓ | ✓ | ✓ | | |
+| Conditional evaluation of metrics | ✓ | ✓ | ✓ | | |
+| Batch Chunking: gradient accumulation | ✓ | ✓ | ✓ | | |
+| Batch Chunking: chunked evaluation of metrics | ✓ | ✓ | ✓ | | |
+| Tensorboard support | ✓ | ✓ | ✓ | | Might be refactored |
+| Learning Rate scheduling | ✓ | ✓ | ✓ | | Might be refactored |
+| Mixed Precision Training | ✓ | ❌ | ~ | | Should work with TF, but no specific support |
+| Using multiple optimizers | ✓ | ✓ | ✓ | | |
+| Multi-task training | ~ | ~ | ~ | | No support yet, but can be done when only one DataLoader is required |
+
+
+
+
+
+%prep
+%autosetup -n mlpug-0.0.57
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-mlpug -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.57-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..b460c22
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+d4bc06d3ad93f949e27202add5f33aa2 mlpug-0.0.57.tar.gz