summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-04-11 10:37:27 +0000
committerCoprDistGit <infra@openeuler.org>2023-04-11 10:37:27 +0000
commit5db029e67412a3946c7e392a636c90b224bb4419 (patch)
treea1bae24ba7730d2452efa660781aa3ffc363bf96
parent2aaadd09c208e0a35fe47e3b1b6f46ff263235ca (diff)
automatic import of python-fastwarc
-rw-r--r--.gitignore1
-rw-r--r--python-fastwarc.spec233
-rw-r--r--sources1
3 files changed, 235 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..fbb4b2a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/FastWARC-0.14.3.tar.gz
diff --git a/python-fastwarc.spec b/python-fastwarc.spec
new file mode 100644
index 0000000..ebf477e
--- /dev/null
+++ b/python-fastwarc.spec
@@ -0,0 +1,233 @@
+%global _empty_manifest_terminate_build 0
+Name: python-FastWARC
+Version: 0.14.3
+Release: 1
+Summary: A high-performance WARC parsing library for Python written in C++/Cython.
+License: Apache License 2.0
+URL: https://pypi.org/project/FastWARC/
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/a9/b9/72ae33f875cfc04d61103d2431381abd104f39792b1e5d4e38df3b200ba8/FastWARC-0.14.3.tar.gz
+
+Requires: python3-brotli
+Requires: python3-click
+Requires: python3-tqdm
+Requires: python3-pytest
+Requires: python3-pytest-cov
+Requires: python3-lz4
+
+%description
+# FastWARC
+
+FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4.
+
+FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing.
+
+## Installing FastWARC
+Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi:
+```bash
+pip install fastwarc
+```
+**However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself.
+
+## Building FastWARC From Source
+You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:
+```bash
+sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev
+```
+To build and install FastWARC from PyPi, run
+```bash
+pip install --no-binary fastwarc fastwarc
+```
+That's it. If you prefer to build and install directly from this repository instead, run:
+```bash
+pip install -e fastwarc
+```
+To build the wheels without installing them, run:
+```bash
+pip wheel -e fastwarc
+
+# Or:
+pip install build && python -m build --wheel fastwarc
+```
+
+## Usage Instructions
+For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html).
+
+## Cite Us
+If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103):
+```bibtex
+@InProceedings{bevendorff:2021,
+ author = {Janek Bevendorff and Martin Potthast and Benno Stein},
+ booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)},
+ editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt},
+ month = oct,
+ publisher = {International Open Search Symposium},
+ site = {CERN, Geneva, Switzerland},
+ title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}},
+ year = 2021
+}
+```
+
+
+%package -n python3-FastWARC
+Summary: A high-performance WARC parsing library for Python written in C++/Cython.
+Provides: python-FastWARC
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+BuildRequires: python3-cffi
+BuildRequires: gcc
+BuildRequires: gdb
+%description -n python3-FastWARC
+# FastWARC
+
+FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4.
+
+FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing.
+
+## Installing FastWARC
+Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi:
+```bash
+pip install fastwarc
+```
+**However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself.
+
+## Building FastWARC From Source
+You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:
+```bash
+sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev
+```
+To build and install FastWARC from PyPi, run
+```bash
+pip install --no-binary fastwarc fastwarc
+```
+That's it. If you prefer to build and install directly from this repository instead, run:
+```bash
+pip install -e fastwarc
+```
+To build the wheels without installing them, run:
+```bash
+pip wheel -e fastwarc
+
+# Or:
+pip install build && python -m build --wheel fastwarc
+```
+
+## Usage Instructions
+For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html).
+
+## Cite Us
+If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103):
+```bibtex
+@InProceedings{bevendorff:2021,
+ author = {Janek Bevendorff and Martin Potthast and Benno Stein},
+ booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)},
+ editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt},
+ month = oct,
+ publisher = {International Open Search Symposium},
+ site = {CERN, Geneva, Switzerland},
+ title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}},
+ year = 2021
+}
+```
+
+
+%package help
+Summary: Development documents and examples for FastWARC
+Provides: python3-FastWARC-doc
+%description help
+# FastWARC
+
+FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by [WARCIO](https://github.com/webrecorder/warcio), but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4.
+
+FastWARC belongs to the [ChatNoir Resiliparse toolkit](https://github.com/chatnoir-eu/chatnoir-resiliparse/) for fast and robust web data processing.
+
+## Installing FastWARC
+Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi:
+```bash
+pip install fastwarc
+```
+**However:** the Linux binaries are provided *solely for your convenience*. Since they are built on the very old `manylinux` base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself.
+
+## Building FastWARC From Source
+You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:
+```bash
+sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev
+```
+To build and install FastWARC from PyPi, run
+```bash
+pip install --no-binary fastwarc fastwarc
+```
+That's it. If you prefer to build and install directly from this repository instead, run:
+```bash
+pip install -e fastwarc
+```
+To build the wheels without installing them, run:
+```bash
+pip wheel -e fastwarc
+
+# Or:
+pip install build && python -m build --wheel fastwarc
+```
+
+## Usage Instructions
+For detailed usage instructions, please consult the [FastWARC User Manual](https://resiliparse.chatnoir.eu/en/latest/man/fastwarc.html).
+
+## Cite Us
+If you use FastWARC, please consider citing our [OSSYM 2021 abstract paper](https://arxiv.org/abs/2112.03103):
+```bibtex
+@InProceedings{bevendorff:2021,
+ author = {Janek Bevendorff and Martin Potthast and Benno Stein},
+ booktitle = {3rd International Symposium on Open Search Technology (OSSYM 2021)},
+ editor = {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt},
+ month = oct,
+ publisher = {International Open Search Symposium},
+ site = {CERN, Geneva, Switzerland},
+ title = {{FastWARC: Optimizing Large-Scale Web Archive Analytics}},
+ year = 2021
+}
+```
+
+
+%prep
+%autosetup -n FastWARC-0.14.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-FastWARC -f filelist.lst
+%dir %{python3_sitearch}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.14.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..f9031cb
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+95c9e247ffff14a9a72346b8c394296c FastWARC-0.14.3.tar.gz