%global _empty_manifest_terminate_build 0
Name: python-sru
Version: 2.6.0
Release: 1
Summary: Simple Recurrent Units for Highly Parallelizable Recurrence
License: MIT
URL: https://github.com/taolei87/sru
Source0: https://mirrors.nju.edu.cn/pypi/web/packages/40/ca/7537e0ef8c3361402b1787474f0960521d4de82673ab45c1f11909e1c7a1/sru-2.6.0.tar.gz
BuildArch: noarch
Requires: python3-torch
Requires: python3-ninja
%description
## News
SRU++, a new SRU variant, is released. [[tech report](https://arxiv.org/pdf/2102.12459.pdf)] [[blog](https://www.asapp.com/blog/reducing-the-high-cost-of-training-nlp-models-with-sru/)]
The experimental code and SRU++ implementation are available on [the dev branch](https://github.com/asappresearch/sru/tree/3.0.0-dev/experiments/srupp_experiments) which will be merged into master later.
## About
**SRU** is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.
![](https://raw.githubusercontent.com/taolei87/sru/master/imgs/speed.png)
Average processing time of LSTM, conv2d and SRU, tested on GTX 1070
For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.
#### Reference:
Simple Recurrent Units for Highly Parallelizable Recurrence [[paper](https://arxiv.org/abs/1709.02755)]
```
@inproceedings{lei2018sru,
title={Simple Recurrent Units for Highly Parallelizable Recurrence},
author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
year={2018}
}
```
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute [[paper](https://arxiv.org/pdf/2102.12459)]
```
@article{lei2021srupp,
title={When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute},
author={Tao Lei},
journal={arXiv preprint arXiv:2102.12459},
year={2021}
}
```
## Requirements
- [PyTorch](http://pytorch.org/) >=1.6 recommended
- [ninja](https://ninja-build.org/)
Install requirements via `pip install -r requirements.txt`.
## Installation
#### From source:
SRU can be installed as a regular package via `python setup.py install` or `pip install .`.
#### From PyPi:
`pip install sru`
#### Directly use the source without installation:
Make sure this repo and CUDA library can be found by the system, e.g.
```
export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
```
## Examples
The usage of SRU is similar to `nn.LSTM`. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).
```python
import torch
from sru import SRU, SRUCell
# input has length 20, batch size 32 and dimension 128
x = torch.FloatTensor(20, 32, 128).cuda()
input_size, hidden_size = 128, 128
rnn = SRU(input_size, hidden_size,
num_layers = 2, # number of stacking RNN layers
dropout = 0.0, # dropout applied between RNN layers
bidirectional = False, # bidirectional RNN
layer_norm = False, # apply layer normalization on the output of each layer
highway_bias = -2, # initial bias of highway gate (<= 0)
)
rnn.cuda()
output_states, c_states = rnn(x) # forward pass
# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)
```
## Contributing
Please read and follow the [guidelines](CONTRIBUTING.md).
### Other Implementations
[@musyoku](https://github.com/musyoku) had a very nice [SRU implementaion](https://github.com/musyoku/chainer-sru) in chainer.
[@adrianbg](https://github.com/adrianbg) implemented the first [CPU version](https://github.com/taolei87/sru/pull/42).
%package -n python3-sru
Summary: Simple Recurrent Units for Highly Parallelizable Recurrence
Provides: python-sru
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
%description -n python3-sru
## News
SRU++, a new SRU variant, is released. [[tech report](https://arxiv.org/pdf/2102.12459.pdf)] [[blog](https://www.asapp.com/blog/reducing-the-high-cost-of-training-nlp-models-with-sru/)]
The experimental code and SRU++ implementation are available on [the dev branch](https://github.com/asappresearch/sru/tree/3.0.0-dev/experiments/srupp_experiments) which will be merged into master later.
## About
**SRU** is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.
![](https://raw.githubusercontent.com/taolei87/sru/master/imgs/speed.png)
Average processing time of LSTM, conv2d and SRU, tested on GTX 1070
For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.
#### Reference:
Simple Recurrent Units for Highly Parallelizable Recurrence [[paper](https://arxiv.org/abs/1709.02755)]
```
@inproceedings{lei2018sru,
title={Simple Recurrent Units for Highly Parallelizable Recurrence},
author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
year={2018}
}
```
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute [[paper](https://arxiv.org/pdf/2102.12459)]
```
@article{lei2021srupp,
title={When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute},
author={Tao Lei},
journal={arXiv preprint arXiv:2102.12459},
year={2021}
}
```
## Requirements
- [PyTorch](http://pytorch.org/) >=1.6 recommended
- [ninja](https://ninja-build.org/)
Install requirements via `pip install -r requirements.txt`.
## Installation
#### From source:
SRU can be installed as a regular package via `python setup.py install` or `pip install .`.
#### From PyPi:
`pip install sru`
#### Directly use the source without installation:
Make sure this repo and CUDA library can be found by the system, e.g.
```
export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
```
## Examples
The usage of SRU is similar to `nn.LSTM`. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).
```python
import torch
from sru import SRU, SRUCell
# input has length 20, batch size 32 and dimension 128
x = torch.FloatTensor(20, 32, 128).cuda()
input_size, hidden_size = 128, 128
rnn = SRU(input_size, hidden_size,
num_layers = 2, # number of stacking RNN layers
dropout = 0.0, # dropout applied between RNN layers
bidirectional = False, # bidirectional RNN
layer_norm = False, # apply layer normalization on the output of each layer
highway_bias = -2, # initial bias of highway gate (<= 0)
)
rnn.cuda()
output_states, c_states = rnn(x) # forward pass
# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)
```
## Contributing
Please read and follow the [guidelines](CONTRIBUTING.md).
### Other Implementations
[@musyoku](https://github.com/musyoku) had a very nice [SRU implementaion](https://github.com/musyoku/chainer-sru) in chainer.
[@adrianbg](https://github.com/adrianbg) implemented the first [CPU version](https://github.com/taolei87/sru/pull/42).
%package help
Summary: Development documents and examples for sru
Provides: python3-sru-doc
%description help
## News
SRU++, a new SRU variant, is released. [[tech report](https://arxiv.org/pdf/2102.12459.pdf)] [[blog](https://www.asapp.com/blog/reducing-the-high-cost-of-training-nlp-models-with-sru/)]
The experimental code and SRU++ implementation are available on [the dev branch](https://github.com/asappresearch/sru/tree/3.0.0-dev/experiments/srupp_experiments) which will be merged into master later.
## About
**SRU** is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.
![](https://raw.githubusercontent.com/taolei87/sru/master/imgs/speed.png)
Average processing time of LSTM, conv2d and SRU, tested on GTX 1070
For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.
#### Reference:
Simple Recurrent Units for Highly Parallelizable Recurrence [[paper](https://arxiv.org/abs/1709.02755)]
```
@inproceedings{lei2018sru,
title={Simple Recurrent Units for Highly Parallelizable Recurrence},
author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
year={2018}
}
```
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute [[paper](https://arxiv.org/pdf/2102.12459)]
```
@article{lei2021srupp,
title={When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute},
author={Tao Lei},
journal={arXiv preprint arXiv:2102.12459},
year={2021}
}
```
## Requirements
- [PyTorch](http://pytorch.org/) >=1.6 recommended
- [ninja](https://ninja-build.org/)
Install requirements via `pip install -r requirements.txt`.
## Installation
#### From source:
SRU can be installed as a regular package via `python setup.py install` or `pip install .`.
#### From PyPi:
`pip install sru`
#### Directly use the source without installation:
Make sure this repo and CUDA library can be found by the system, e.g.
```
export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
```
## Examples
The usage of SRU is similar to `nn.LSTM`. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).
```python
import torch
from sru import SRU, SRUCell
# input has length 20, batch size 32 and dimension 128
x = torch.FloatTensor(20, 32, 128).cuda()
input_size, hidden_size = 128, 128
rnn = SRU(input_size, hidden_size,
num_layers = 2, # number of stacking RNN layers
dropout = 0.0, # dropout applied between RNN layers
bidirectional = False, # bidirectional RNN
layer_norm = False, # apply layer normalization on the output of each layer
highway_bias = -2, # initial bias of highway gate (<= 0)
)
rnn.cuda()
output_states, c_states = rnn(x) # forward pass
# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)
```
## Contributing
Please read and follow the [guidelines](CONTRIBUTING.md).
### Other Implementations
[@musyoku](https://github.com/musyoku) had a very nice [SRU implementaion](https://github.com/musyoku/chainer-sru) in chainer.
[@adrianbg](https://github.com/adrianbg) implemented the first [CPU version](https://github.com/taolei87/sru/pull/42).
%prep
%autosetup -n sru-2.6.0
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-sru -f filelist.lst
%dir %{python3_sitelib}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Fri May 05 2023 Python_Bot - 2.6.0-1
- Package Spec generated