diff options
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-fastwarc.spec | 233 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 235 insertions, 0 deletions
@@ -0,0 +1 @@ +/FastWARC-0.14.3.tar.gz diff --git a/python-fastwarc.spec b/python-fastwarc.spec new file mode 100644 index 0000000..ebf477e --- /dev/null +++ b/python-fastwarc.spec @@ -0,0 +1,233 @@ +%global _empty_manifest_terminate_build 0 +Name: python-FastWARC +Version: 0.14.3 +Release: 1 +Summary: A high-performance WARC parsing library for Python written in C++/Cython. +License: Apache License 2.0 +URL: https://pypi.org/project/FastWARC/ +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/a9/b9/72ae33f875cfc04d61103d2431381abd104f39792b1e5d4e38df3b200ba8/FastWARC-0.14.3.tar.gz + +Requires: python3-brotli +Requires: python3-click +Requires: python3-tqdm +Requires: python3-pytest +Requires: python3-pytest-cov +Requires: python3-lz4 + +%description +# FastWARC + +FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4. + +FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing. + +## Installing FastWARC +Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi: +```bash +pip install fastwarc +``` +**However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself. + +## Building FastWARC From Source +You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows: +```bash +sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev +``` +To build and install FastWARC from PyPi, run +```bash +pip install --no-binary fastwarc fastwarc +``` +That's it. If you prefer to build and install directly from this repository instead, run: +```bash +pip install -e fastwarc +``` +To build the wheels without installing them, run: +```bash +pip wheel -e fastwarc + +# Or: +pip install build && python -m build --wheel fastwarc +``` + +## Usage Instructions +For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html). + +## Cite Us +If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103): +```bibtex +@InProceedings{bevendorff:2021, + author = {Janek Bevendorff and Martin Potthast and Benno Stein}, + booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)}, + editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt}, + month = oct, + publisher = {International Open Search Symposium}, + site = {CERN, Geneva, Switzerland}, + title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}}, + year = 2021 +} +``` + + +%package -n python3-FastWARC +Summary: A high-performance WARC parsing library for Python written in C++/Cython. +Provides: python-FastWARC +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +BuildRequires: python3-cffi +BuildRequires: gcc +BuildRequires: gdb +%description -n python3-FastWARC +# FastWARC + +FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4. + +FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing. + +## Installing FastWARC +Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi: +```bash +pip install fastwarc +``` +**However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself. + +## Building FastWARC From Source +You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows: +```bash +sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev +``` +To build and install FastWARC from PyPi, run +```bash +pip install --no-binary fastwarc fastwarc +``` +That's it. If you prefer to build and install directly from this repository instead, run: +```bash +pip install -e fastwarc +``` +To build the wheels without installing them, run: +```bash +pip wheel -e fastwarc + +# Or: +pip install build && python -m build --wheel fastwarc +``` + +## Usage Instructions +For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html). + +## Cite Us +If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103): +```bibtex +@InProceedings{bevendorff:2021, + author = {Janek Bevendorff and Martin Potthast and Benno Stein}, + booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)}, + editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt}, + month = oct, + publisher = {International Open Search Symposium}, + site = {CERN, Geneva, Switzerland}, + title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}}, + year = 2021 +} +``` + + +%package help +Summary: Development documents and examples for FastWARC +Provides: python3-FastWARC-doc +%description help +# FastWARC + +FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4. + +FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing. + +## Installing FastWARC +Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi: +```bash +pip install fastwarc +``` +**However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself. + +## Building FastWARC From Source +You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows: +```bash +sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev +``` +To build and install FastWARC from PyPi, run +```bash +pip install --no-binary fastwarc fastwarc +``` +That's it. If you prefer to build and install directly from this repository instead, run: +```bash +pip install -e fastwarc +``` +To build the wheels without installing them, run: +```bash +pip wheel -e fastwarc + +# Or: +pip install build && python -m build --wheel fastwarc +``` + +## Usage Instructions +For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html). + +## Cite Us +If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103): +```bibtex +@InProceedings{bevendorff:2021, + author = {Janek Bevendorff and Martin Potthast and Benno Stein}, + booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)}, + editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt}, + month = oct, + publisher = {International Open Search Symposium}, + site = {CERN, Geneva, Switzerland}, + title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}}, + year = 2021 +} +``` + + +%prep +%autosetup -n FastWARC-0.14.3 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-FastWARC -f filelist.lst +%dir %{python3_sitearch}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.14.3-1 +- Package Spec generated @@ -0,0 +1 @@ +95c9e247ffff14a9a72346b8c394296c FastWARC-0.14.3.tar.gz |
