%global _empty_manifest_terminate_build 0 Name: python-warc3-wet-clueweb09 Version: 0.2.5 Release: 1 Summary: Python library to work with ARC and WARC files, with fixes for ClueWeb09 License: GPLv2 URL: https://github.com/seanmacavaney/warc3-clueweb Source0: https://mirrors.nju.edu.cn/pypi/web/packages/9f/c1/dd817bf57e0274dacb10e0ac868cb6cd70876950cf361c41879c030a2b8b/warc3-wet-clueweb09-0.2.5.tar.gz BuildArch: noarch %description Note: This is a fork of the original (now dead) warc repository. Updated to handle problems with the ClueWeb09_ files. Changes are based on this repository_ (which only supports python2) WARC (Web ARChive) is a file format for storing web crawls. http://bibnum.bnf.fr/WARC/ This `warc` library makes it very easy to work with WARC files.:: import warc with warc.open("test.warc") as f: for record in f: print(record['WARC-Target-URI'], record['Content-Length']) And WET files.:: import warc with warc.open("test.warc.wet") as f: for record in f: print(record['WARC-Target-URI'], record['Content-Length']) %package -n python3-warc3-wet-clueweb09 Summary: Python library to work with ARC and WARC files, with fixes for ClueWeb09 Provides: python-warc3-wet-clueweb09 BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-warc3-wet-clueweb09 Note: This is a fork of the original (now dead) warc repository. Updated to handle problems with the ClueWeb09_ files. Changes are based on this repository_ (which only supports python2) WARC (Web ARChive) is a file format for storing web crawls. http://bibnum.bnf.fr/WARC/ This `warc` library makes it very easy to work with WARC files.:: import warc with warc.open("test.warc") as f: for record in f: print(record['WARC-Target-URI'], record['Content-Length']) And WET files.:: import warc with warc.open("test.warc.wet") as f: for record in f: print(record['WARC-Target-URI'], record['Content-Length']) %package help Summary: Development documents and examples for warc3-wet-clueweb09 Provides: python3-warc3-wet-clueweb09-doc %description help Note: This is a fork of the original (now dead) warc repository. Updated to handle problems with the ClueWeb09_ files. Changes are based on this repository_ (which only supports python2) WARC (Web ARChive) is a file format for storing web crawls. http://bibnum.bnf.fr/WARC/ This `warc` library makes it very easy to work with WARC files.:: import warc with warc.open("test.warc") as f: for record in f: print(record['WARC-Target-URI'], record['Content-Length']) And WET files.:: import warc with warc.open("test.warc.wet") as f: for record in f: print(record['WARC-Target-URI'], record['Content-Length']) %prep %autosetup -n warc3-wet-clueweb09-0.2.5 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-warc3-wet-clueweb09 -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue May 30 2023 Python_Bot - 0.2.5-1 - Package Spec generated