automatic import of python-archivebox

author: CoprDistGit <infra@openeuler.org> 2023-05-18 04:34:46 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-18 04:34:46 +0000
commit: 55743948e8506b6162b51a82d41dde9d7390130b (patch)
tree: 5e31ab2bc646f88f6a125c19e00477983db2dab3
parent: e4a9495e624d936402389671ff6a6fd50920f25a (diff)
3 files changed, 400 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..1aaa463 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/archivebox-0.6.2.tar.gz
diff --git a/python-archivebox.spec b/python-archivebox.spec
new file mode 100644
index 0000000..80638ae
--- /dev/null
+++ b/python-archivebox.spec
@@ -0,0 +1,398 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-archivebox
+Version:	0.6.2
+Release:	1
+Summary:	The self-hosted internet archive.
+License:	MIT
+URL:		https://github.com/ArchiveBox/ArchiveBox
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/b2/01/37fdcb4bd60ec7187aa8196393d667478b0d1c97ba1b6b78cfd7e4501d69/archivebox-0.6.2.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-requests
+Requires:	python3-mypy-extensions
+Requires:	python3-django
+Requires:	python3-django-extensions
+Requires:	python3-dateparser
+Requires:	python3-ipython
+Requires:	python3-youtube-dl
+Requires:	python3-crontab
+Requires:	python3-croniter
+Requires:	python3-w3lib
+Requires:	python3-setuptools
+Requires:	python3-twine
+Requires:	python3-wheel
+Requires:	python3-flake8
+Requires:	python3-ipdb
+Requires:	python3-mypy
+Requires:	python3-django-stubs
+Requires:	python3-sphinx
+Requires:	python3-sphinx-rtd-theme
+Requires:	python3-recommonmark
+Requires:	python3-pytest
+Requires:	python3-bottle
+Requires:	python3-stdeb
+Requires:	python3-django-debug-toolbar
+Requires:	python3-djdt-flamegraph
+Requires:	python3-sonic-client
+
+%description
+<div align="center">
+<img src="https://i.imgur.com/OUmgdlH.png" width="96%" alt="lego">
+</div>
+<br/>
+# Overview
+## Input formats
+ArchiveBox supports many input formats for URLs, including Pocket & Pinboard exports, Browser bookmarks, Browser history, plain text, HTML, markdown, and more!
+*Click these links for instructions on how to propare your links from these sources:*
+- <img src="https://nicksweeting.com/images/rss.svg" height="22px"/> TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file)
+- <img src="https://nicksweeting.com/images/bookmarks.png" height="22px"/> [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive))
+- <img src="https://getpocket.com/favicon.ico" height="22px"/> [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)
+```bash
+# archivebox add --help
+archivebox add 'https://example.com/some/page'
+archivebox add < ~/Downloads/firefox_bookmarks_export.html
+archivebox add --depth=1 'https://news.ycombinator.com#2020-12-12'
+echo 'http://example.com' | archivebox add
+echo 'any_text_with [urls](https://example.com) in it' | archivebox add
+# (if using docker add -i when piping stdin)
+echo 'https://example.com' | docker run -v $PWD:/data -i archivebox/archivebox add
+# (if using docker-compose add -T when piping stdin / stdout)
+echo 'https://example.com' | docker-compose run -T archivebox add
+```
+See the [Usage: CLI](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) page for documentation and examples.
+It also includes a built-in scheduled import feature with `archivebox schedule` and browser bookmarklet, so you can pull in URLs from RSS feeds, websites, or the filesystem regularly/on-demand.
+<br/>
+## Archive Layout
+All of ArchiveBox's state (including the index, snapshot data, and config file) is stored in a single folder called the "ArchiveBox data folder". All `archivebox` CLI commands must be run from inside this folder, and you first create it by running `archivebox init`.
+The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard `index.sqlite3` database in the root of the data folder (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `./archive/` subfolder.
+```bash
+./
+    index.sqlite3
+    ArchiveBox.conf
+    archive/
+        1617687755/
+            index.html
+            index.json
+            screenshot.png
+            media/some_video.mp4
+            warc/1617687755.warc.gz
+            git/somerepo.git
+```
+Each snapshot subfolder `./archive/<timestamp>/` includes a static `index.json` and `index.html` describing its contents, and the snapshot extrator outputs are plain files within the folder.
+<br/>
+## Output formats
+Inside each Snapshot folder, ArchiveBox save these different types of extractor outputs as plain files:
+`./archive/<timestamp>/*`
+- **Index:** `index.html` & `index.json` HTML and JSON index files containing metadata and details
+- **Title**, **Favicon**, **Headers** Response headers, site favicon, and parsed site title
+- **SingleFile:** `singlefile.html` HTML snapshot rendered with headless Chrome using SingleFile
+- **Wget Clone:** `example.com/page-name.html` wget clone of the site with  `warc/<timestamp>.gz`
+- Chrome Headless
+  - **PDF:** `output.pdf` Printed PDF of site using headless chrome
+  - **Screenshot:** `screenshot.png` 1440x900 screenshot of site using headless chrome
+  - **DOM Dump:** `output.html` DOM Dump of the HTML after rendering using headless chrome
+- **Article Text:** `article.html/json` Article text extraction using Readability & Mercury
+- **Archive.org Permalink:** `archive.org.txt` A link to the saved site on archive.org
+- **Audio & Video:** `media/` all audio/video files + playlists, including subtitles & metadata with youtube-dl
+- **Source Code:** `git/` clone of any repository found on github, bitbucket, or gitlab links
+- _More coming soon! See the [Roadmap](https://github.com/ArchiveBox/ArchiveBox/wiki/Roadmap)..._
+It does everything out-of-the-box by default, but you can disable or tweak [individual archive methods](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration) via environment variables / config.
+```bash
+# archivebox config --help
+archivebox config # see all currently configured options
+archivebox config --set SAVE_ARCHIVE_DOT_ORG=False
+archivebox config --set YOUTUBEDL_ARGS='--max-filesize=500m'
+```
+<br/>
+## Static Archive Exporting
+You can export the main index to browse it statically without needing to run a server.
+*Note about large exports: These exports are not paginated, exporting many URLs or the entire archive at once may be slow. Use the filtering CLI flags on the `archivebox list` command to export specific Snapshots or ranges.*
+```bash|
+# archivebox list --help
+archivebox list --html --with-headers > index.html     # export to static html table
+archivebox list --json --with-headers > index.json     # export to json blob
+archivebox list --csv=timestamp,url,title > index.csv  # export to csv spreadsheet
+# (if using docker-compose, add the -T flag when piping)
+docker-compose run -T archivebox list --html --filter-type=search snozzberries > index.json
+```
+The paths in the static exports are relative, make sure to keep them next to your `./archive` folder when backing them up or viewing them.
+<br/>
+## Dependencies
+For better security, easier updating, and to avoid polluting your host system with extra dependencies, **it is strongly recommended to use the official [Docker image](https://github.com/ArchiveBox/ArchiveBox/wiki/Docker)** with everything preinstalled for the best experience.
+To achieve high fidelity archives in as many situations as possible, ArchiveBox depends on a variety of 3rd-party tools and libraries that specialize in extracting different types of content. These optional dependencies used for archiving sites include:
+- `chromium` / `chrome` (for screenshots, PDF, DOM HTML, and headless JS scripts)
+- `node` & `npm` (for readability, mercury, and singlefile)
+- `wget` (for plain HTML, static files, and WARC saving)
+- `curl` (for fetching headers, favicon, and posting to Archive.org)
+- `youtube-dl` (for audio, video, and subtitles)
+- `git` (for cloning git repos)
+- and more as we grow...
+You don't need to install every dependency to use ArchiveBox. ArchiveBox will automatically disable extractors that rely on dependencies that aren't installed, based on what is configured and available in your `$PATH`.
+*If using Docker, you don't have to install any of these manually, all dependencies are set up properly out-of-the-box*.
+However, if you prefer not using Docker, you *can* install ArchiveBox and its dependencies using your [system package manager](https://github.com/ArchiveBox/ArchiveBox/wiki/Install) or `pip` directly on any Linux/macOS system. Just make sure to keep the dependencies up-to-date and check that ArchiveBox isn't reporting any incompatibility with the versions you install.
+```bash
+# install python3 and archivebox with your system package manager
+# apt/brew/pip/etc install ... (see Quickstart instructions above)
+archivebox setup       # auto install all the extractors and extras
+archivebox --version   # see info and check validity of installed dependencies
+```
+Installing directly on **Windows without Docker or WSL/WSL2/Cygwin is not officially supported**, but some advanced users have reported getting it working.
+
+%package -n python3-archivebox
+Summary:	The self-hosted internet archive.
+Provides:	python-archivebox
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-archivebox
+<div align="center">
+<img src="https://i.imgur.com/OUmgdlH.png" width="96%" alt="lego">
+</div>
+<br/>
+# Overview
+## Input formats
+ArchiveBox supports many input formats for URLs, including Pocket & Pinboard exports, Browser bookmarks, Browser history, plain text, HTML, markdown, and more!
+*Click these links for instructions on how to propare your links from these sources:*
+- <img src="https://nicksweeting.com/images/rss.svg" height="22px"/> TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file)
+- <img src="https://nicksweeting.com/images/bookmarks.png" height="22px"/> [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive))
+- <img src="https://getpocket.com/favicon.ico" height="22px"/> [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)
+```bash
+# archivebox add --help
+archivebox add 'https://example.com/some/page'
+archivebox add < ~/Downloads/firefox_bookmarks_export.html
+archivebox add --depth=1 'https://news.ycombinator.com#2020-12-12'
+echo 'http://example.com' | archivebox add
+echo 'any_text_with [urls](https://example.com) in it' | archivebox add
+# (if using docker add -i when piping stdin)
+echo 'https://example.com' | docker run -v $PWD:/data -i archivebox/archivebox add
+# (if using docker-compose add -T when piping stdin / stdout)
+echo 'https://example.com' | docker-compose run -T archivebox add
+```
+See the [Usage: CLI](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) page for documentation and examples.
+It also includes a built-in scheduled import feature with `archivebox schedule` and browser bookmarklet, so you can pull in URLs from RSS feeds, websites, or the filesystem regularly/on-demand.
+<br/>
+## Archive Layout
+All of ArchiveBox's state (including the index, snapshot data, and config file) is stored in a single folder called the "ArchiveBox data folder". All `archivebox` CLI commands must be run from inside this folder, and you first create it by running `archivebox init`.
+The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard `index.sqlite3` database in the root of the data folder (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `./archive/` subfolder.
+```bash
+./
+    index.sqlite3
+    ArchiveBox.conf
+    archive/
+        1617687755/
+            index.html
+            index.json
+            screenshot.png
+            media/some_video.mp4
+            warc/1617687755.warc.gz
+            git/somerepo.git
+```
+Each snapshot subfolder `./archive/<timestamp>/` includes a static `index.json` and `index.html` describing its contents, and the snapshot extrator outputs are plain files within the folder.
+<br/>
+## Output formats
+Inside each Snapshot folder, ArchiveBox save these different types of extractor outputs as plain files:
+`./archive/<timestamp>/*`
+- **Index:** `index.html` & `index.json` HTML and JSON index files containing metadata and details
+- **Title**, **Favicon**, **Headers** Response headers, site favicon, and parsed site title
+- **SingleFile:** `singlefile.html` HTML snapshot rendered with headless Chrome using SingleFile
+- **Wget Clone:** `example.com/page-name.html` wget clone of the site with  `warc/<timestamp>.gz`
+- Chrome Headless
+  - **PDF:** `output.pdf` Printed PDF of site using headless chrome
+  - **Screenshot:** `screenshot.png` 1440x900 screenshot of site using headless chrome
+  - **DOM Dump:** `output.html` DOM Dump of the HTML after rendering using headless chrome
+- **Article Text:** `article.html/json` Article text extraction using Readability & Mercury
+- **Archive.org Permalink:** `archive.org.txt` A link to the saved site on archive.org
+- **Audio & Video:** `media/` all audio/video files + playlists, including subtitles & metadata with youtube-dl
+- **Source Code:** `git/` clone of any repository found on github, bitbucket, or gitlab links
+- _More coming soon! See the [Roadmap](https://github.com/ArchiveBox/ArchiveBox/wiki/Roadmap)..._
+It does everything out-of-the-box by default, but you can disable or tweak [individual archive methods](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration) via environment variables / config.
+```bash
+# archivebox config --help
+archivebox config # see all currently configured options
+archivebox config --set SAVE_ARCHIVE_DOT_ORG=False
+archivebox config --set YOUTUBEDL_ARGS='--max-filesize=500m'
+```
+<br/>
+## Static Archive Exporting
+You can export the main index to browse it statically without needing to run a server.
+*Note about large exports: These exports are not paginated, exporting many URLs or the entire archive at once may be slow. Use the filtering CLI flags on the `archivebox list` command to export specific Snapshots or ranges.*
+```bash|
+# archivebox list --help
+archivebox list --html --with-headers > index.html     # export to static html table
+archivebox list --json --with-headers > index.json     # export to json blob
+archivebox list --csv=timestamp,url,title > index.csv  # export to csv spreadsheet
+# (if using docker-compose, add the -T flag when piping)
+docker-compose run -T archivebox list --html --filter-type=search snozzberries > index.json
+```
+The paths in the static exports are relative, make sure to keep them next to your `./archive` folder when backing them up or viewing them.
+<br/>
+## Dependencies
+For better security, easier updating, and to avoid polluting your host system with extra dependencies, **it is strongly recommended to use the official [Docker image](https://github.com/ArchiveBox/ArchiveBox/wiki/Docker)** with everything preinstalled for the best experience.
+To achieve high fidelity archives in as many situations as possible, ArchiveBox depends on a variety of 3rd-party tools and libraries that specialize in extracting different types of content. These optional dependencies used for archiving sites include:
+- `chromium` / `chrome` (for screenshots, PDF, DOM HTML, and headless JS scripts)
+- `node` & `npm` (for readability, mercury, and singlefile)
+- `wget` (for plain HTML, static files, and WARC saving)
+- `curl` (for fetching headers, favicon, and posting to Archive.org)
+- `youtube-dl` (for audio, video, and subtitles)
+- `git` (for cloning git repos)
+- and more as we grow...
+You don't need to install every dependency to use ArchiveBox. ArchiveBox will automatically disable extractors that rely on dependencies that aren't installed, based on what is configured and available in your `$PATH`.
+*If using Docker, you don't have to install any of these manually, all dependencies are set up properly out-of-the-box*.
+However, if you prefer not using Docker, you *can* install ArchiveBox and its dependencies using your [system package manager](https://github.com/ArchiveBox/ArchiveBox/wiki/Install) or `pip` directly on any Linux/macOS system. Just make sure to keep the dependencies up-to-date and check that ArchiveBox isn't reporting any incompatibility with the versions you install.
+```bash
+# install python3 and archivebox with your system package manager
+# apt/brew/pip/etc install ... (see Quickstart instructions above)
+archivebox setup       # auto install all the extractors and extras
+archivebox --version   # see info and check validity of installed dependencies
+```
+Installing directly on **Windows without Docker or WSL/WSL2/Cygwin is not officially supported**, but some advanced users have reported getting it working.
+
+%package help
+Summary:	Development documents and examples for archivebox
+Provides:	python3-archivebox-doc
+%description help
+<div align="center">
+<img src="https://i.imgur.com/OUmgdlH.png" width="96%" alt="lego">
+</div>
+<br/>
+# Overview
+## Input formats
+ArchiveBox supports many input formats for URLs, including Pocket & Pinboard exports, Browser bookmarks, Browser history, plain text, HTML, markdown, and more!
+*Click these links for instructions on how to propare your links from these sources:*
+- <img src="https://nicksweeting.com/images/rss.svg" height="22px"/> TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file)
+- <img src="https://nicksweeting.com/images/bookmarks.png" height="22px"/> [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive))
+- <img src="https://getpocket.com/favicon.ico" height="22px"/> [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)
+```bash
+# archivebox add --help
+archivebox add 'https://example.com/some/page'
+archivebox add < ~/Downloads/firefox_bookmarks_export.html
+archivebox add --depth=1 'https://news.ycombinator.com#2020-12-12'
+echo 'http://example.com' | archivebox add
+echo 'any_text_with [urls](https://example.com) in it' | archivebox add
+# (if using docker add -i when piping stdin)
+echo 'https://example.com' | docker run -v $PWD:/data -i archivebox/archivebox add
+# (if using docker-compose add -T when piping stdin / stdout)
+echo 'https://example.com' | docker-compose run -T archivebox add
+```
+See the [Usage: CLI](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) page for documentation and examples.
+It also includes a built-in scheduled import feature with `archivebox schedule` and browser bookmarklet, so you can pull in URLs from RSS feeds, websites, or the filesystem regularly/on-demand.
+<br/>
+## Archive Layout
+All of ArchiveBox's state (including the index, snapshot data, and config file) is stored in a single folder called the "ArchiveBox data folder". All `archivebox` CLI commands must be run from inside this folder, and you first create it by running `archivebox init`.
+The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard `index.sqlite3` database in the root of the data folder (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `./archive/` subfolder.
+```bash
+./
+    index.sqlite3
+    ArchiveBox.conf
+    archive/
+        1617687755/
+            index.html
+            index.json
+            screenshot.png
+            media/some_video.mp4
+            warc/1617687755.warc.gz
+            git/somerepo.git
+```
+Each snapshot subfolder `./archive/<timestamp>/` includes a static `index.json` and `index.html` describing its contents, and the snapshot extrator outputs are plain files within the folder.
+<br/>
+## Output formats
+Inside each Snapshot folder, ArchiveBox save these different types of extractor outputs as plain files:
+`./archive/<timestamp>/*`
+- **Index:** `index.html` & `index.json` HTML and JSON index files containing metadata and details
+- **Title**, **Favicon**, **Headers** Response headers, site favicon, and parsed site title
+- **SingleFile:** `singlefile.html` HTML snapshot rendered with headless Chrome using SingleFile
+- **Wget Clone:** `example.com/page-name.html` wget clone of the site with  `warc/<timestamp>.gz`
+- Chrome Headless
+  - **PDF:** `output.pdf` Printed PDF of site using headless chrome
+  - **Screenshot:** `screenshot.png` 1440x900 screenshot of site using headless chrome
+  - **DOM Dump:** `output.html` DOM Dump of the HTML after rendering using headless chrome
+- **Article Text:** `article.html/json` Article text extraction using Readability & Mercury
+- **Archive.org Permalink:** `archive.org.txt` A link to the saved site on archive.org
+- **Audio & Video:** `media/` all audio/video files + playlists, including subtitles & metadata with youtube-dl
+- **Source Code:** `git/` clone of any repository found on github, bitbucket, or gitlab links
+- _More coming soon! See the [Roadmap](https://github.com/ArchiveBox/ArchiveBox/wiki/Roadmap)..._
+It does everything out-of-the-box by default, but you can disable or tweak [individual archive methods](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration) via environment variables / config.
+```bash
+# archivebox config --help
+archivebox config # see all currently configured options
+archivebox config --set SAVE_ARCHIVE_DOT_ORG=False
+archivebox config --set YOUTUBEDL_ARGS='--max-filesize=500m'
+```
+<br/>
+## Static Archive Exporting
+You can export the main index to browse it statically without needing to run a server.
+*Note about large exports: These exports are not paginated, exporting many URLs or the entire archive at once may be slow. Use the filtering CLI flags on the `archivebox list` command to export specific Snapshots or ranges.*
+```bash|
+# archivebox list --help
+archivebox list --html --with-headers > index.html     # export to static html table
+archivebox list --json --with-headers > index.json     # export to json blob
+archivebox list --csv=timestamp,url,title > index.csv  # export to csv spreadsheet
+# (if using docker-compose, add the -T flag when piping)
+docker-compose run -T archivebox list --html --filter-type=search snozzberries > index.json
+```
+The paths in the static exports are relative, make sure to keep them next to your `./archive` folder when backing them up or viewing them.
+<br/>
+## Dependencies
+For better security, easier updating, and to avoid polluting your host system with extra dependencies, **it is strongly recommended to use the official [Docker image](https://github.com/ArchiveBox/ArchiveBox/wiki/Docker)** with everything preinstalled for the best experience.
+To achieve high fidelity archives in as many situations as possible, ArchiveBox depends on a variety of 3rd-party tools and libraries that specialize in extracting different types of content. These optional dependencies used for archiving sites include:
+- `chromium` / `chrome` (for screenshots, PDF, DOM HTML, and headless JS scripts)
+- `node` & `npm` (for readability, mercury, and singlefile)
+- `wget` (for plain HTML, static files, and WARC saving)
+- `curl` (for fetching headers, favicon, and posting to Archive.org)
+- `youtube-dl` (for audio, video, and subtitles)
+- `git` (for cloning git repos)
+- and more as we grow...
+You don't need to install every dependency to use ArchiveBox. ArchiveBox will automatically disable extractors that rely on dependencies that aren't installed, based on what is configured and available in your `$PATH`.
+*If using Docker, you don't have to install any of these manually, all dependencies are set up properly out-of-the-box*.
+However, if you prefer not using Docker, you *can* install ArchiveBox and its dependencies using your [system package manager](https://github.com/ArchiveBox/ArchiveBox/wiki/Install) or `pip` directly on any Linux/macOS system. Just make sure to keep the dependencies up-to-date and check that ArchiveBox isn't reporting any incompatibility with the versions you install.
+```bash
+# install python3 and archivebox with your system package manager
+# apt/brew/pip/etc install ... (see Quickstart instructions above)
+archivebox setup       # auto install all the extractors and extras
+archivebox --version   # see info and check validity of installed dependencies
+```
+Installing directly on **Windows without Docker or WSL/WSL2/Cygwin is not officially supported**, but some advanced users have reported getting it working.
+
+%prep
+%autosetup -n archivebox-0.6.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-archivebox -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Thu May 18 2023 Python_Bot <Python_Bot@openeuler.org> - 0.6.2-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..b26ccc3
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+c55e48402623a9cb4a5ac42fc5c873cf  archivebox-0.6.2.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-18 04:34:46 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-18 04:34:46 +0000
commit	55743948e8506b6162b51a82d41dde9d7390130b (patch)
tree	5e31ab2bc646f88f6a125c19e00477983db2dab3
parent	e4a9495e624d936402389671ff6a6fd50920f25a (diff)