%global _empty_manifest_terminate_build 0 Name: python-FastWARC Version: 0.14.3 Release: 1 Summary: A high-performance WARC parsing library for Python written in C++/Cython. License: Apache License 2.0 URL: https://pypi.org/project/FastWARC/ Source0: https://mirrors.nju.edu.cn/pypi/web/packages/a9/b9/72ae33f875cfc04d61103d2431381abd104f39792b1e5d4e38df3b200ba8/FastWARC-0.14.3.tar.gz Requires: python3-brotli Requires: python3-click Requires: python3-tqdm Requires: python3-pytest Requires: python3-pytest-cov Requires: python3-lz4 %description # FastWARC FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4. FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing. ## Installing FastWARC Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi: ```bash pip install fastwarc ``` **However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself. ## Building FastWARC From Source You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows: ```bash sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev ``` To build and install FastWARC from PyPi, run ```bash pip install --no-binary fastwarc fastwarc ``` That's it. If you prefer to build and install directly from this repository instead, run: ```bash pip install -e fastwarc ``` To build the wheels without installing them, run: ```bash pip wheel -e fastwarc # Or: pip install build && python -m build --wheel fastwarc ``` ## Usage Instructions For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html). ## Cite Us If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103): ```bibtex @InProceedings{bevendorff:2021, author = {Janek Bevendorff and Martin Potthast and Benno Stein}, booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)}, editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt}, month = oct, publisher = {International Open Search Symposium}, site = {CERN, Geneva, Switzerland}, title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}}, year = 2021 } ``` %package -n python3-FastWARC Summary: A high-performance WARC parsing library for Python written in C++/Cython. Provides: python-FastWARC BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip BuildRequires: python3-cffi BuildRequires: gcc BuildRequires: gdb %description -n python3-FastWARC # FastWARC FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4. FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing. ## Installing FastWARC Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi: ```bash pip install fastwarc ``` **However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself. ## Building FastWARC From Source You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows: ```bash sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev ``` To build and install FastWARC from PyPi, run ```bash pip install --no-binary fastwarc fastwarc ``` That's it. If you prefer to build and install directly from this repository instead, run: ```bash pip install -e fastwarc ``` To build the wheels without installing them, run: ```bash pip wheel -e fastwarc # Or: pip install build && python -m build --wheel fastwarc ``` ## Usage Instructions For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html). ## Cite Us If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103): ```bibtex @InProceedings{bevendorff:2021, author = {Janek Bevendorff and Martin Potthast and Benno Stein}, booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)}, editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt}, month = oct, publisher = {International Open Search Symposium}, site = {CERN, Geneva, Switzerland}, title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}}, year = 2021 } ``` %package help Summary: Development documents and examples for FastWARC Provides: python3-FastWARC-doc %description help # FastWARC FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4. FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing. ## Installing FastWARC Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi: ```bash pip install fastwarc ``` **However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself. ## Building FastWARC From Source You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows: ```bash sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev ``` To build and install FastWARC from PyPi, run ```bash pip install --no-binary fastwarc fastwarc ``` That's it. If you prefer to build and install directly from this repository instead, run: ```bash pip install -e fastwarc ``` To build the wheels without installing them, run: ```bash pip wheel -e fastwarc # Or: pip install build && python -m build --wheel fastwarc ``` ## Usage Instructions For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html). ## Cite Us If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103): ```bibtex @InProceedings{bevendorff:2021, author = {Janek Bevendorff and Martin Potthast and Benno Stein}, booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)}, editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt}, month = oct, publisher = {International Open Search Symposium}, site = {CERN, Geneva, Switzerland}, title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}}, year = 2021 } ``` %prep %autosetup -n FastWARC-0.14.3 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-FastWARC -f filelist.lst %dir %{python3_sitearch}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue Apr 11 2023 Python_Bot - 0.14.3-1 - Package Spec generated