%global _empty_manifest_terminate_build 0 Name: python-S3WebCache Version: 0.2.2 Release: 1 Summary: please add a summary manually as the author left a blank one License: MIT URL: https://pypi.org/project/S3WebCache/ Source0: https://mirrors.aliyun.com/pypi/web/packages/a5/46/ea7dfad1bf73ab6a179b387da3832a086d488e730f4194b20b041b95e6a7/S3WebCache-0.2.2.tar.gz BuildArch: noarch %description [![Build Status](https://travis-ci.org/wharton/S3WebCache.svg?branch=master)](https://travis-ci.org/wharton/S3WebCache) [![PyPI version](https://badge.fury.io/py/S3WebCache.svg)](https://badge.fury.io/py/S3WebCache) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) # S3 Web Cache This is a simple package for archiving web pages (HTML) to S3. It acts as a cache returning the S3 version of the page if it exists. If not it gets the url through [Requests](http://docs.python-requests.org/en/master/) and archives it in s3. Our use case: provide a reusable history of pages included in a web scrape. An archived version of a particular URL at a moment in time. Since the web is always changing, different research questions can be asked at a later date, without losing the original content. Please only use in this manner if you have obtained permission for the pages you are requesting. ## Quickstart ### Install `pip install s3webcache` ### Usage ``` from s3webcache import S3WebCache s3wc = S3WebCache( bucket_name=, aws_access_key_id=, aws_secret_key=, aws_default_region=) request = s3wc.get("https://en.wikipedia.org/wiki/Whole_Earth_Catalog") if request.success: html = request.message ``` If the required AWS credentials are not given it will fallback to using environment variables. The `.get(url)` operation returns a namedtuple Request: (success: bool, message: str). For successful operations, `.message` contains the url data. For unsuccessful operations, `.message` contains error information. ### Options S3WebCache() takes the following arguments with these defaults: - bucket_name: str - path_prefix: str = None\ Subdirectories to store URLs. `path_prefix='ht'` will start archiving at path s3://BUCKETNAME/ht/ - aws_access_key_id: str = None - aws_secret_key: str = None - aws_default_region: str = None - trim_website: bool = False Trim out the hostname. Defaults to storing the hostname as dot replaced underscores. `https://github.com/wharton/S3WebCache` would be `s3://BUCKETNAME/github_com/wharton/S3WebCache`.\ Set this to true and it will be stored as `s3://BUCKETNAME/wharton/S3WebCache`. - allow_forwarding: bool = True Will follow HTTP 300 class redirects. ## TODO - Add 'update s3 if file is older than...' behavior - Add transparent compression by default (gzip, lz4, etc) - Add rate limiting ## Reference [AWS S3 API documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html) ## License MIT ## Tests Through Travis-ci %package -n python3-S3WebCache Summary: please add a summary manually as the author left a blank one Provides: python-S3WebCache BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-S3WebCache [![Build Status](https://travis-ci.org/wharton/S3WebCache.svg?branch=master)](https://travis-ci.org/wharton/S3WebCache) [![PyPI version](https://badge.fury.io/py/S3WebCache.svg)](https://badge.fury.io/py/S3WebCache) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) # S3 Web Cache This is a simple package for archiving web pages (HTML) to S3. It acts as a cache returning the S3 version of the page if it exists. If not it gets the url through [Requests](http://docs.python-requests.org/en/master/) and archives it in s3. Our use case: provide a reusable history of pages included in a web scrape. An archived version of a particular URL at a moment in time. Since the web is always changing, different research questions can be asked at a later date, without losing the original content. Please only use in this manner if you have obtained permission for the pages you are requesting. ## Quickstart ### Install `pip install s3webcache` ### Usage ``` from s3webcache import S3WebCache s3wc = S3WebCache( bucket_name=, aws_access_key_id=, aws_secret_key=, aws_default_region=) request = s3wc.get("https://en.wikipedia.org/wiki/Whole_Earth_Catalog") if request.success: html = request.message ``` If the required AWS credentials are not given it will fallback to using environment variables. The `.get(url)` operation returns a namedtuple Request: (success: bool, message: str). For successful operations, `.message` contains the url data. For unsuccessful operations, `.message` contains error information. ### Options S3WebCache() takes the following arguments with these defaults: - bucket_name: str - path_prefix: str = None\ Subdirectories to store URLs. `path_prefix='ht'` will start archiving at path s3://BUCKETNAME/ht/ - aws_access_key_id: str = None - aws_secret_key: str = None - aws_default_region: str = None - trim_website: bool = False Trim out the hostname. Defaults to storing the hostname as dot replaced underscores. `https://github.com/wharton/S3WebCache` would be `s3://BUCKETNAME/github_com/wharton/S3WebCache`.\ Set this to true and it will be stored as `s3://BUCKETNAME/wharton/S3WebCache`. - allow_forwarding: bool = True Will follow HTTP 300 class redirects. ## TODO - Add 'update s3 if file is older than...' behavior - Add transparent compression by default (gzip, lz4, etc) - Add rate limiting ## Reference [AWS S3 API documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html) ## License MIT ## Tests Through Travis-ci %package help Summary: Development documents and examples for S3WebCache Provides: python3-S3WebCache-doc %description help [![Build Status](https://travis-ci.org/wharton/S3WebCache.svg?branch=master)](https://travis-ci.org/wharton/S3WebCache) [![PyPI version](https://badge.fury.io/py/S3WebCache.svg)](https://badge.fury.io/py/S3WebCache) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) # S3 Web Cache This is a simple package for archiving web pages (HTML) to S3. It acts as a cache returning the S3 version of the page if it exists. If not it gets the url through [Requests](http://docs.python-requests.org/en/master/) and archives it in s3. Our use case: provide a reusable history of pages included in a web scrape. An archived version of a particular URL at a moment in time. Since the web is always changing, different research questions can be asked at a later date, without losing the original content. Please only use in this manner if you have obtained permission for the pages you are requesting. ## Quickstart ### Install `pip install s3webcache` ### Usage ``` from s3webcache import S3WebCache s3wc = S3WebCache( bucket_name=, aws_access_key_id=, aws_secret_key=, aws_default_region=) request = s3wc.get("https://en.wikipedia.org/wiki/Whole_Earth_Catalog") if request.success: html = request.message ``` If the required AWS credentials are not given it will fallback to using environment variables. The `.get(url)` operation returns a namedtuple Request: (success: bool, message: str). For successful operations, `.message` contains the url data. For unsuccessful operations, `.message` contains error information. ### Options S3WebCache() takes the following arguments with these defaults: - bucket_name: str - path_prefix: str = None\ Subdirectories to store URLs. `path_prefix='ht'` will start archiving at path s3://BUCKETNAME/ht/ - aws_access_key_id: str = None - aws_secret_key: str = None - aws_default_region: str = None - trim_website: bool = False Trim out the hostname. Defaults to storing the hostname as dot replaced underscores. `https://github.com/wharton/S3WebCache` would be `s3://BUCKETNAME/github_com/wharton/S3WebCache`.\ Set this to true and it will be stored as `s3://BUCKETNAME/wharton/S3WebCache`. - allow_forwarding: bool = True Will follow HTTP 300 class redirects. ## TODO - Add 'update s3 if file is older than...' behavior - Add transparent compression by default (gzip, lz4, etc) - Add rate limiting ## Reference [AWS S3 API documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html) ## License MIT ## Tests Through Travis-ci %prep %autosetup -n S3WebCache-0.2.2 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-S3WebCache -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue Jun 20 2023 Python_Bot - 0.2.2-1 - Package Spec generated