diff options
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-rlax.spec | 478 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 480 insertions, 0 deletions
@@ -0,0 +1 @@ +/rlax-0.1.5.tar.gz diff --git a/python-rlax.spec b/python-rlax.spec new file mode 100644 index 0000000..6dc0d72 --- /dev/null +++ b/python-rlax.spec @@ -0,0 +1,478 @@ +%global _empty_manifest_terminate_build 0 +Name: python-rlax +Version: 0.1.5 +Release: 1 +Summary: A library of reinforcement learning building blocks in JAX. +License: Apache 2.0 +URL: https://github.com/deepmind/rlax +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/9b/49/486a4e55b1300c8f010240f29442d50afee498cef58f2fc73e8ba6cb6b19/rlax-0.1.5.tar.gz +BuildArch: noarch + +Requires: python3-absl-py +Requires: python3-chex +Requires: python3-distrax +Requires: python3-dm-env +Requires: python3-jax +Requires: python3-jaxlib +Requires: python3-numpy + +%description +# RLax + + + + + +RLax (pronounced "relax") is a library built on top of JAX that exposes +useful building blocks for implementing reinforcement learning agents. Full +documentation can be found at + [rlax.readthedocs.io](https://rlax.readthedocs.io/en/latest/index.html). + +## Installation + +You can install the latest released version of RLax from PyPI via: + +```sh +pip install rlax +``` + +or you can install the latest development version from GitHub: + +```sh +pip install git+https://github.com/deepmind/rlax.git +``` + +All RLax code may then be just in time compiled for different hardware +(e.g. CPU, GPU, TPU) using `jax.jit`. + +In order to run the `examples/` you will also need to clone the repo and +install the additional requirements: +[optax](https://github.com/deepmind/optax), +[haiku](https://github.com/deepmind/haiku), and +[bsuite](https://github.com/deepmind/bsuite). + +## Content + +The operations and functions provided are not complete algorithms, but +implementations of reinforcement learning specific mathematical operations that +are needed when building fully-functional agents capable of learning: + +* Values, including both state and action-values; +* Values for Non-linear generalizations of the Bellman equations. +* Return Distributions, aka distributional value functions; +* General Value Functions, for cumulants other than the main reward; +* Policies, via policy-gradients in both continuous and discrete action spaces. + +The library supports both on-policy and off-policy learning (i.e. learning from +data sampled from a policy different from the agent's policy). + +See file-level and function-level doc-strings for the documentation of these +functions and for references to the papers that introduced and/or used them. + +## Usage + +See `examples/` for examples of using some of the functions in RLax to +implement a few simple reinforcement learning agents, and demonstrate learning +on BSuite's version of the Catch environment (a common unit-test for +agent development in the reinforcement learning literature): + +Other examples of JAX reinforcement learning agents using `rlax` can be found in +[bsuite](https://github.com/deepmind/bsuite/tree/master/bsuite/baselines). + +## Background + +Reinforcement learning studies the problem of a learning system (the *agent*), +which must learn to interact with the universe it is embedded in (the +*environment*). + +Agent and environment interact on discrete steps. On each step the agent selects +an *action*, and is provided in return a (partial) snapshot of the state of the +environment (the *observation*), and a scalar feedback signal (the *reward*). + +The behaviour of the agent is characterized by a probability distribution over +actions, conditioned on past observations of the environment (the *policy*). The +agents seeks a policy that, from any given step, maximises the discounted +cumulative reward that will be collected from that point onwards (the *return*). + +Often the agent policy or the environment dynamics itself are stochastic. In +this case the return is a random variable, and the optimal agent's policy is +typically more precisely specified as a policy that maximises the expectation of +the return (the *value*), under the agent's and environment's stochasticity. + +## Reinforcement Learning Algorithms + +There are three prototypical families of reinforcement learning algorithms: + +1. those that estimate the value of states and actions, and infer a policy by + *inspection* (e.g. by selecting the action with highest estimated value) +2. those that learn a model of the environment (capable of predicting the + observations and rewards) and infer a policy via *planning*. +3. those that parameterize a policy that can be directly *executed*, + +In any case, policies, values or models are just functions. In deep +reinforcement learning such functions are represented by a neural network. +In this setting, it is common to formulate reinforcement learning updates as +differentiable pseudo-loss functions (analogously to (un-)supervised learning). +Under automatic differentiation, the original update rule is recovered. + +Note however, that in particular, the updates are only valid if the input data +is sampled in the correct manner. For example, a policy gradient loss is only +valid if the input trajectory is an unbiased sample from the current policy; +i.e. the data are on-policy. The library cannot check or enforce such +constraints. Links to papers describing how each operation is used are however +provided in the functions' doc-strings. + +## Naming Conventions and Developer Guidelines + +We define functions and operations for agents interacting with a single stream +of experience. The JAX construct `vmap` can be used to apply these same +functions to batches (e.g. to support *replay* and *parallel* data generation). + +Many functions consider policies, actions, rewards, values, in consecutive +timesteps in order to compute their outputs. In this case the suffix `_t` and +`tm1` is often to clarify on which step each input was generated, e.g: + +* `q_tm1`: the action value in the `source` state of a transition. +* `a_tm1`: the action that was selected in the `source` state. +* `r_t`: the resulting rewards collected in the `destination` state. +* `discount_t`: the `discount` associated with a transition. +* `q_t`: the action values in the `destination` state. + +Extensive testing is provided for each function. All tests should also verify +the output of `rlax` functions when compiled to XLA using `jax.jit` and when +performing batch operations using `jax.vmap`. + +## Citing RLax + +RLax is part of the [DeepMind JAX Ecosystem], to cite RLax please use +the [DeepMind JAX Ecosystem citation]. + +[DeepMind JAX Ecosystem]: https://deepmind.com/blog/article/using-jax-to-accelerate-our-research "DeepMind JAX Ecosystem" +[DeepMind JAX Ecosystem citation]: https://github.com/deepmind/jax/blob/main/deepmind2020jax.txt "Citation" + + + +%package -n python3-rlax +Summary: A library of reinforcement learning building blocks in JAX. +Provides: python-rlax +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-rlax +# RLax + + + + + +RLax (pronounced "relax") is a library built on top of JAX that exposes +useful building blocks for implementing reinforcement learning agents. Full +documentation can be found at + [rlax.readthedocs.io](https://rlax.readthedocs.io/en/latest/index.html). + +## Installation + +You can install the latest released version of RLax from PyPI via: + +```sh +pip install rlax +``` + +or you can install the latest development version from GitHub: + +```sh +pip install git+https://github.com/deepmind/rlax.git +``` + +All RLax code may then be just in time compiled for different hardware +(e.g. CPU, GPU, TPU) using `jax.jit`. + +In order to run the `examples/` you will also need to clone the repo and +install the additional requirements: +[optax](https://github.com/deepmind/optax), +[haiku](https://github.com/deepmind/haiku), and +[bsuite](https://github.com/deepmind/bsuite). + +## Content + +The operations and functions provided are not complete algorithms, but +implementations of reinforcement learning specific mathematical operations that +are needed when building fully-functional agents capable of learning: + +* Values, including both state and action-values; +* Values for Non-linear generalizations of the Bellman equations. +* Return Distributions, aka distributional value functions; +* General Value Functions, for cumulants other than the main reward; +* Policies, via policy-gradients in both continuous and discrete action spaces. + +The library supports both on-policy and off-policy learning (i.e. learning from +data sampled from a policy different from the agent's policy). + +See file-level and function-level doc-strings for the documentation of these +functions and for references to the papers that introduced and/or used them. + +## Usage + +See `examples/` for examples of using some of the functions in RLax to +implement a few simple reinforcement learning agents, and demonstrate learning +on BSuite's version of the Catch environment (a common unit-test for +agent development in the reinforcement learning literature): + +Other examples of JAX reinforcement learning agents using `rlax` can be found in +[bsuite](https://github.com/deepmind/bsuite/tree/master/bsuite/baselines). + +## Background + +Reinforcement learning studies the problem of a learning system (the *agent*), +which must learn to interact with the universe it is embedded in (the +*environment*). + +Agent and environment interact on discrete steps. On each step the agent selects +an *action*, and is provided in return a (partial) snapshot of the state of the +environment (the *observation*), and a scalar feedback signal (the *reward*). + +The behaviour of the agent is characterized by a probability distribution over +actions, conditioned on past observations of the environment (the *policy*). The +agents seeks a policy that, from any given step, maximises the discounted +cumulative reward that will be collected from that point onwards (the *return*). + +Often the agent policy or the environment dynamics itself are stochastic. In +this case the return is a random variable, and the optimal agent's policy is +typically more precisely specified as a policy that maximises the expectation of +the return (the *value*), under the agent's and environment's stochasticity. + +## Reinforcement Learning Algorithms + +There are three prototypical families of reinforcement learning algorithms: + +1. those that estimate the value of states and actions, and infer a policy by + *inspection* (e.g. by selecting the action with highest estimated value) +2. those that learn a model of the environment (capable of predicting the + observations and rewards) and infer a policy via *planning*. +3. those that parameterize a policy that can be directly *executed*, + +In any case, policies, values or models are just functions. In deep +reinforcement learning such functions are represented by a neural network. +In this setting, it is common to formulate reinforcement learning updates as +differentiable pseudo-loss functions (analogously to (un-)supervised learning). +Under automatic differentiation, the original update rule is recovered. + +Note however, that in particular, the updates are only valid if the input data +is sampled in the correct manner. For example, a policy gradient loss is only +valid if the input trajectory is an unbiased sample from the current policy; +i.e. the data are on-policy. The library cannot check or enforce such +constraints. Links to papers describing how each operation is used are however +provided in the functions' doc-strings. + +## Naming Conventions and Developer Guidelines + +We define functions and operations for agents interacting with a single stream +of experience. The JAX construct `vmap` can be used to apply these same +functions to batches (e.g. to support *replay* and *parallel* data generation). + +Many functions consider policies, actions, rewards, values, in consecutive +timesteps in order to compute their outputs. In this case the suffix `_t` and +`tm1` is often to clarify on which step each input was generated, e.g: + +* `q_tm1`: the action value in the `source` state of a transition. +* `a_tm1`: the action that was selected in the `source` state. +* `r_t`: the resulting rewards collected in the `destination` state. +* `discount_t`: the `discount` associated with a transition. +* `q_t`: the action values in the `destination` state. + +Extensive testing is provided for each function. All tests should also verify +the output of `rlax` functions when compiled to XLA using `jax.jit` and when +performing batch operations using `jax.vmap`. + +## Citing RLax + +RLax is part of the [DeepMind JAX Ecosystem], to cite RLax please use +the [DeepMind JAX Ecosystem citation]. + +[DeepMind JAX Ecosystem]: https://deepmind.com/blog/article/using-jax-to-accelerate-our-research "DeepMind JAX Ecosystem" +[DeepMind JAX Ecosystem citation]: https://github.com/deepmind/jax/blob/main/deepmind2020jax.txt "Citation" + + + +%package help +Summary: Development documents and examples for rlax +Provides: python3-rlax-doc +%description help +# RLax + + + + + +RLax (pronounced "relax") is a library built on top of JAX that exposes +useful building blocks for implementing reinforcement learning agents. Full +documentation can be found at + [rlax.readthedocs.io](https://rlax.readthedocs.io/en/latest/index.html). + +## Installation + +You can install the latest released version of RLax from PyPI via: + +```sh +pip install rlax +``` + +or you can install the latest development version from GitHub: + +```sh +pip install git+https://github.com/deepmind/rlax.git +``` + +All RLax code may then be just in time compiled for different hardware +(e.g. CPU, GPU, TPU) using `jax.jit`. + +In order to run the `examples/` you will also need to clone the repo and +install the additional requirements: +[optax](https://github.com/deepmind/optax), +[haiku](https://github.com/deepmind/haiku), and +[bsuite](https://github.com/deepmind/bsuite). + +## Content + +The operations and functions provided are not complete algorithms, but +implementations of reinforcement learning specific mathematical operations that +are needed when building fully-functional agents capable of learning: + +* Values, including both state and action-values; +* Values for Non-linear generalizations of the Bellman equations. +* Return Distributions, aka distributional value functions; +* General Value Functions, for cumulants other than the main reward; +* Policies, via policy-gradients in both continuous and discrete action spaces. + +The library supports both on-policy and off-policy learning (i.e. learning from +data sampled from a policy different from the agent's policy). + +See file-level and function-level doc-strings for the documentation of these +functions and for references to the papers that introduced and/or used them. + +## Usage + +See `examples/` for examples of using some of the functions in RLax to +implement a few simple reinforcement learning agents, and demonstrate learning +on BSuite's version of the Catch environment (a common unit-test for +agent development in the reinforcement learning literature): + +Other examples of JAX reinforcement learning agents using `rlax` can be found in +[bsuite](https://github.com/deepmind/bsuite/tree/master/bsuite/baselines). + +## Background + +Reinforcement learning studies the problem of a learning system (the *agent*), +which must learn to interact with the universe it is embedded in (the +*environment*). + +Agent and environment interact on discrete steps. On each step the agent selects +an *action*, and is provided in return a (partial) snapshot of the state of the +environment (the *observation*), and a scalar feedback signal (the *reward*). + +The behaviour of the agent is characterized by a probability distribution over +actions, conditioned on past observations of the environment (the *policy*). The +agents seeks a policy that, from any given step, maximises the discounted +cumulative reward that will be collected from that point onwards (the *return*). + +Often the agent policy or the environment dynamics itself are stochastic. In +this case the return is a random variable, and the optimal agent's policy is +typically more precisely specified as a policy that maximises the expectation of +the return (the *value*), under the agent's and environment's stochasticity. + +## Reinforcement Learning Algorithms + +There are three prototypical families of reinforcement learning algorithms: + +1. those that estimate the value of states and actions, and infer a policy by + *inspection* (e.g. by selecting the action with highest estimated value) +2. those that learn a model of the environment (capable of predicting the + observations and rewards) and infer a policy via *planning*. +3. those that parameterize a policy that can be directly *executed*, + +In any case, policies, values or models are just functions. In deep +reinforcement learning such functions are represented by a neural network. +In this setting, it is common to formulate reinforcement learning updates as +differentiable pseudo-loss functions (analogously to (un-)supervised learning). +Under automatic differentiation, the original update rule is recovered. + +Note however, that in particular, the updates are only valid if the input data +is sampled in the correct manner. For example, a policy gradient loss is only +valid if the input trajectory is an unbiased sample from the current policy; +i.e. the data are on-policy. The library cannot check or enforce such +constraints. Links to papers describing how each operation is used are however +provided in the functions' doc-strings. + +## Naming Conventions and Developer Guidelines + +We define functions and operations for agents interacting with a single stream +of experience. The JAX construct `vmap` can be used to apply these same +functions to batches (e.g. to support *replay* and *parallel* data generation). + +Many functions consider policies, actions, rewards, values, in consecutive +timesteps in order to compute their outputs. In this case the suffix `_t` and +`tm1` is often to clarify on which step each input was generated, e.g: + +* `q_tm1`: the action value in the `source` state of a transition. +* `a_tm1`: the action that was selected in the `source` state. +* `r_t`: the resulting rewards collected in the `destination` state. +* `discount_t`: the `discount` associated with a transition. +* `q_t`: the action values in the `destination` state. + +Extensive testing is provided for each function. All tests should also verify +the output of `rlax` functions when compiled to XLA using `jax.jit` and when +performing batch operations using `jax.vmap`. + +## Citing RLax + +RLax is part of the [DeepMind JAX Ecosystem], to cite RLax please use +the [DeepMind JAX Ecosystem citation]. + +[DeepMind JAX Ecosystem]: https://deepmind.com/blog/article/using-jax-to-accelerate-our-research "DeepMind JAX Ecosystem" +[DeepMind JAX Ecosystem citation]: https://github.com/deepmind/jax/blob/main/deepmind2020jax.txt "Citation" + + + +%prep +%autosetup -n rlax-0.1.5 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-rlax -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.5-1 +- Package Spec generated @@ -0,0 +1 @@ +98337785fddef8188284641e6179823d rlax-0.1.5.tar.gz |
