summaryrefslogtreecommitdiff
path: root/python-webpreview.spec
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-05 06:20:28 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-05 06:20:28 +0000
commit1129bedaafcaf025910ab5ca5cbc26ddd47488bd (patch)
treebf523e548a057a613ea1c280d97443a0eb623ef1 /python-webpreview.spec
parentd8a11786f0ce99205ec785a9813447d88d09f2b2 (diff)
automatic import of python-webpreviewopeneuler20.03
Diffstat (limited to 'python-webpreview.spec')
-rw-r--r--python-webpreview.spec548
1 files changed, 548 insertions, 0 deletions
diff --git a/python-webpreview.spec b/python-webpreview.spec
new file mode 100644
index 0000000..887f088
--- /dev/null
+++ b/python-webpreview.spec
@@ -0,0 +1,548 @@
+%global _empty_manifest_terminate_build 0
+Name: python-webpreview
+Version: 1.7.2
+Release: 1
+Summary: Extracts OpenGraph, TwitterCard and Schema properties from a webpage.
+License: MIT
+URL: https://github.com/ludbek/webpreview
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/6e/81/c8ae4f53ba30a3d36b47c128a3e723e1fa6159a7208655283dcaf73f8d05/webpreview-1.7.2.tar.gz
+BuildArch: noarch
+
+Requires: python3-requests
+Requires: python3-beautifulsoup4
+
+%description
+# webpreview
+
+For a given URL, `webpreview` extracts its **title**, **description**, and **image url** using
+[Open Graph](http://ogp.me/), [Twitter Card](https://dev.twitter.com/cards/overview), or
+[Schema](http://schema.org/) meta tags, or, as an alternative, parses it as a generic webpage.
+
+<p>
+ <a href="https://pypi.org/project/webpreview/"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/webpreview"></a>
+ <a href="https://pypi.org/project/webpreview/"><img alt="PyPI" src="https://img.shields.io/pypi/v/webpreview?logo=pypi&color=blue"></a>
+ <a href="https://github.com/ludbek/webpreview/actions?query=workflow%3Atest"><img alt="Build status" src="https://img.shields.io/github/workflow/status/ludbek/webpreview/test?label=build&logo=github"></a>
+ <a href="https://codecov.io/gh/ludbek/webpreview"><img alt="Code coverage report" src="https://img.shields.io/codecov/c/github/ludbek/webpreview?logo=codecov"></a>
+</p>
+
+
+## Installation
+
+```shell
+pip install webpreview
+```
+
+## Usage
+
+Use the generic `webpreview` method (added in *v1.7.0*) to parse the page independent of its nature.
+This method fetches a page and tries to extracts a *title, description, and a preview image* from it.
+
+It first attempts to parse the values from **Open Graph** properties, then it falls back to
+**Twitter Card** format, and then to **Schema**. If none of these methods succeed in extracting all
+three properties, then the web page's content is parsed using a generic HTML parser.
+
+```python
+>>> from webpreview import webpreview
+
+>>> p = webpreview("https://en.wikipedia.org/wiki/Enrico_Fermi")
+>>> p.title
+'Enrico Fermi - Wikipedia'
+>>> p.description
+'Italian-American physicist (1901–1954)'
+>>> p.image
+'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg'
+
+# Access the parsed fields both as attributes and items
+>>> p["url"] == p.url
+True
+
+# Check if all three of the title, description, and image are in the parsing result
+>>> p.is_complete()
+True
+
+# Provide page content from somewhere else
+>>> content = """
+<html>
+ <head>
+ <title>The Dormouse's story</title>
+ <meta property="og:description" content="A Mad Tea-Party story" />
+ </head>
+ <body>
+ <p class="title"><b>The Dormouse's story</b></p>
+ <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
+ </body>
+</html>
+"""
+
+# The the function's invocation won't make any external calls,
+# only relying on the supplied content, unlike the example above
+>>> webpreview("aa.com", content=content)
+WebPreview(url="http://aa.com", title="The Dormouse's story", description="A Mad Tea-Party story")
+```
+
+### Using the command line
+
+When `webpreview` is installed via `pip`, then the accompanying command-line tool is
+installed alongside.
+
+```shell
+$ webpreview https://en.wikipedia.org/wiki/Enrico_Fermi
+title: Enrico Fermi - Wikipedia
+description: Italian-American physicist (1901–1954)
+image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
+
+$ webpreview https://github.com/ --absolute-url
+title: GitHub: Where the world builds software
+description: GitHub is where over 83 million developers shape the future of software, together.
+image: https://github.githubassets.com/images/modules/site/social-cards/github-social.png
+```
+
+### Using compatibility API
+
+Before *v1.7.0* the package mainly exposed a different set of the API methods.
+All of them are supported and may continue to be used.
+
+```python
+# WARNING:
+# The API below is left for BACKWARD COMPATIBILITY ONLY.
+
+from webpreview import web_preview
+title, description, image = web_preview("aurl.com")
+
+# specifing timeout which gets passed to requests.get()
+title, description, image = web_preview("a_slow_url.com", timeout=1000)
+
+# passing headers
+headers = {'User-Agent': 'Mozilla/5.0'}
+title, description, image = web_preview("a_slow_url.com", headers=headers)
+
+# pass html content thus avoiding making http call again to fetch content.
+content = """<html><head><title>Dummy HTML</title></head></html>"""
+title, description, image = web_preview("aurl.com", content=content)
+
+# specifing the parser
+# by default webpreview uses 'html.parser'
+title, description, image = web_preview("aurl.com", content=content, parser='lxml')
+```
+
+## Run with Docker
+
+The docker image can be built and ran similarly to the command line.
+The default entry point is the `webpreview` command-line function.
+
+```shell
+$ docker build -t webpreview .
+$ docker run -it --rm webpreview "https://en.m.wikipedia.org/wiki/Enrico_Fermi"
+title: Enrico Fermi - Wikipedia
+description: Enrico Fermi (Italian: [enˈriːko ˈfermi]; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age"[1] and the "architect of the atomic bomb".
+image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
+```
+
+*Note*: built docker image weighs around 210MB.
+
+## Testing
+
+```shell
+# Execute the tests
+poetry run pytest webpreview
+
+# OR execute until the first failed test
+poetry run pytest webpreview -x
+```
+
+## Setting up development environment
+
+```shell
+# Install a correct minimal supported version of python
+pyenv install 3.7.13
+
+# Create a virtual environment
+# By default, the project already contains a .python-version file that points
+# to 3.7.13.
+python -m venv .venv
+
+# Install dependencies
+# Poetry will automatically install them into the local .venv
+poetry install
+
+# If you have errors likes this:
+ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
+
+# Then do this:
+.venv/bin/pip install --upgrade setuptools
+```
+
+%package -n python3-webpreview
+Summary: Extracts OpenGraph, TwitterCard and Schema properties from a webpage.
+Provides: python-webpreview
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-webpreview
+# webpreview
+
+For a given URL, `webpreview` extracts its **title**, **description**, and **image url** using
+[Open Graph](http://ogp.me/), [Twitter Card](https://dev.twitter.com/cards/overview), or
+[Schema](http://schema.org/) meta tags, or, as an alternative, parses it as a generic webpage.
+
+<p>
+ <a href="https://pypi.org/project/webpreview/"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/webpreview"></a>
+ <a href="https://pypi.org/project/webpreview/"><img alt="PyPI" src="https://img.shields.io/pypi/v/webpreview?logo=pypi&color=blue"></a>
+ <a href="https://github.com/ludbek/webpreview/actions?query=workflow%3Atest"><img alt="Build status" src="https://img.shields.io/github/workflow/status/ludbek/webpreview/test?label=build&logo=github"></a>
+ <a href="https://codecov.io/gh/ludbek/webpreview"><img alt="Code coverage report" src="https://img.shields.io/codecov/c/github/ludbek/webpreview?logo=codecov"></a>
+</p>
+
+
+## Installation
+
+```shell
+pip install webpreview
+```
+
+## Usage
+
+Use the generic `webpreview` method (added in *v1.7.0*) to parse the page independent of its nature.
+This method fetches a page and tries to extracts a *title, description, and a preview image* from it.
+
+It first attempts to parse the values from **Open Graph** properties, then it falls back to
+**Twitter Card** format, and then to **Schema**. If none of these methods succeed in extracting all
+three properties, then the web page's content is parsed using a generic HTML parser.
+
+```python
+>>> from webpreview import webpreview
+
+>>> p = webpreview("https://en.wikipedia.org/wiki/Enrico_Fermi")
+>>> p.title
+'Enrico Fermi - Wikipedia'
+>>> p.description
+'Italian-American physicist (1901–1954)'
+>>> p.image
+'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg'
+
+# Access the parsed fields both as attributes and items
+>>> p["url"] == p.url
+True
+
+# Check if all three of the title, description, and image are in the parsing result
+>>> p.is_complete()
+True
+
+# Provide page content from somewhere else
+>>> content = """
+<html>
+ <head>
+ <title>The Dormouse's story</title>
+ <meta property="og:description" content="A Mad Tea-Party story" />
+ </head>
+ <body>
+ <p class="title"><b>The Dormouse's story</b></p>
+ <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
+ </body>
+</html>
+"""
+
+# The the function's invocation won't make any external calls,
+# only relying on the supplied content, unlike the example above
+>>> webpreview("aa.com", content=content)
+WebPreview(url="http://aa.com", title="The Dormouse's story", description="A Mad Tea-Party story")
+```
+
+### Using the command line
+
+When `webpreview` is installed via `pip`, then the accompanying command-line tool is
+installed alongside.
+
+```shell
+$ webpreview https://en.wikipedia.org/wiki/Enrico_Fermi
+title: Enrico Fermi - Wikipedia
+description: Italian-American physicist (1901–1954)
+image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
+
+$ webpreview https://github.com/ --absolute-url
+title: GitHub: Where the world builds software
+description: GitHub is where over 83 million developers shape the future of software, together.
+image: https://github.githubassets.com/images/modules/site/social-cards/github-social.png
+```
+
+### Using compatibility API
+
+Before *v1.7.0* the package mainly exposed a different set of the API methods.
+All of them are supported and may continue to be used.
+
+```python
+# WARNING:
+# The API below is left for BACKWARD COMPATIBILITY ONLY.
+
+from webpreview import web_preview
+title, description, image = web_preview("aurl.com")
+
+# specifing timeout which gets passed to requests.get()
+title, description, image = web_preview("a_slow_url.com", timeout=1000)
+
+# passing headers
+headers = {'User-Agent': 'Mozilla/5.0'}
+title, description, image = web_preview("a_slow_url.com", headers=headers)
+
+# pass html content thus avoiding making http call again to fetch content.
+content = """<html><head><title>Dummy HTML</title></head></html>"""
+title, description, image = web_preview("aurl.com", content=content)
+
+# specifing the parser
+# by default webpreview uses 'html.parser'
+title, description, image = web_preview("aurl.com", content=content, parser='lxml')
+```
+
+## Run with Docker
+
+The docker image can be built and ran similarly to the command line.
+The default entry point is the `webpreview` command-line function.
+
+```shell
+$ docker build -t webpreview .
+$ docker run -it --rm webpreview "https://en.m.wikipedia.org/wiki/Enrico_Fermi"
+title: Enrico Fermi - Wikipedia
+description: Enrico Fermi (Italian: [enˈriːko ˈfermi]; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age"[1] and the "architect of the atomic bomb".
+image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
+```
+
+*Note*: built docker image weighs around 210MB.
+
+## Testing
+
+```shell
+# Execute the tests
+poetry run pytest webpreview
+
+# OR execute until the first failed test
+poetry run pytest webpreview -x
+```
+
+## Setting up development environment
+
+```shell
+# Install a correct minimal supported version of python
+pyenv install 3.7.13
+
+# Create a virtual environment
+# By default, the project already contains a .python-version file that points
+# to 3.7.13.
+python -m venv .venv
+
+# Install dependencies
+# Poetry will automatically install them into the local .venv
+poetry install
+
+# If you have errors likes this:
+ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
+
+# Then do this:
+.venv/bin/pip install --upgrade setuptools
+```
+
+%package help
+Summary: Development documents and examples for webpreview
+Provides: python3-webpreview-doc
+%description help
+# webpreview
+
+For a given URL, `webpreview` extracts its **title**, **description**, and **image url** using
+[Open Graph](http://ogp.me/), [Twitter Card](https://dev.twitter.com/cards/overview), or
+[Schema](http://schema.org/) meta tags, or, as an alternative, parses it as a generic webpage.
+
+<p>
+ <a href="https://pypi.org/project/webpreview/"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/webpreview"></a>
+ <a href="https://pypi.org/project/webpreview/"><img alt="PyPI" src="https://img.shields.io/pypi/v/webpreview?logo=pypi&color=blue"></a>
+ <a href="https://github.com/ludbek/webpreview/actions?query=workflow%3Atest"><img alt="Build status" src="https://img.shields.io/github/workflow/status/ludbek/webpreview/test?label=build&logo=github"></a>
+ <a href="https://codecov.io/gh/ludbek/webpreview"><img alt="Code coverage report" src="https://img.shields.io/codecov/c/github/ludbek/webpreview?logo=codecov"></a>
+</p>
+
+
+## Installation
+
+```shell
+pip install webpreview
+```
+
+## Usage
+
+Use the generic `webpreview` method (added in *v1.7.0*) to parse the page independent of its nature.
+This method fetches a page and tries to extracts a *title, description, and a preview image* from it.
+
+It first attempts to parse the values from **Open Graph** properties, then it falls back to
+**Twitter Card** format, and then to **Schema**. If none of these methods succeed in extracting all
+three properties, then the web page's content is parsed using a generic HTML parser.
+
+```python
+>>> from webpreview import webpreview
+
+>>> p = webpreview("https://en.wikipedia.org/wiki/Enrico_Fermi")
+>>> p.title
+'Enrico Fermi - Wikipedia'
+>>> p.description
+'Italian-American physicist (1901–1954)'
+>>> p.image
+'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg'
+
+# Access the parsed fields both as attributes and items
+>>> p["url"] == p.url
+True
+
+# Check if all three of the title, description, and image are in the parsing result
+>>> p.is_complete()
+True
+
+# Provide page content from somewhere else
+>>> content = """
+<html>
+ <head>
+ <title>The Dormouse's story</title>
+ <meta property="og:description" content="A Mad Tea-Party story" />
+ </head>
+ <body>
+ <p class="title"><b>The Dormouse's story</b></p>
+ <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
+ </body>
+</html>
+"""
+
+# The the function's invocation won't make any external calls,
+# only relying on the supplied content, unlike the example above
+>>> webpreview("aa.com", content=content)
+WebPreview(url="http://aa.com", title="The Dormouse's story", description="A Mad Tea-Party story")
+```
+
+### Using the command line
+
+When `webpreview` is installed via `pip`, then the accompanying command-line tool is
+installed alongside.
+
+```shell
+$ webpreview https://en.wikipedia.org/wiki/Enrico_Fermi
+title: Enrico Fermi - Wikipedia
+description: Italian-American physicist (1901–1954)
+image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
+
+$ webpreview https://github.com/ --absolute-url
+title: GitHub: Where the world builds software
+description: GitHub is where over 83 million developers shape the future of software, together.
+image: https://github.githubassets.com/images/modules/site/social-cards/github-social.png
+```
+
+### Using compatibility API
+
+Before *v1.7.0* the package mainly exposed a different set of the API methods.
+All of them are supported and may continue to be used.
+
+```python
+# WARNING:
+# The API below is left for BACKWARD COMPATIBILITY ONLY.
+
+from webpreview import web_preview
+title, description, image = web_preview("aurl.com")
+
+# specifing timeout which gets passed to requests.get()
+title, description, image = web_preview("a_slow_url.com", timeout=1000)
+
+# passing headers
+headers = {'User-Agent': 'Mozilla/5.0'}
+title, description, image = web_preview("a_slow_url.com", headers=headers)
+
+# pass html content thus avoiding making http call again to fetch content.
+content = """<html><head><title>Dummy HTML</title></head></html>"""
+title, description, image = web_preview("aurl.com", content=content)
+
+# specifing the parser
+# by default webpreview uses 'html.parser'
+title, description, image = web_preview("aurl.com", content=content, parser='lxml')
+```
+
+## Run with Docker
+
+The docker image can be built and ran similarly to the command line.
+The default entry point is the `webpreview` command-line function.
+
+```shell
+$ docker build -t webpreview .
+$ docker run -it --rm webpreview "https://en.m.wikipedia.org/wiki/Enrico_Fermi"
+title: Enrico Fermi - Wikipedia
+description: Enrico Fermi (Italian: [enˈriːko ˈfermi]; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age"[1] and the "architect of the atomic bomb".
+image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
+```
+
+*Note*: built docker image weighs around 210MB.
+
+## Testing
+
+```shell
+# Execute the tests
+poetry run pytest webpreview
+
+# OR execute until the first failed test
+poetry run pytest webpreview -x
+```
+
+## Setting up development environment
+
+```shell
+# Install a correct minimal supported version of python
+pyenv install 3.7.13
+
+# Create a virtual environment
+# By default, the project already contains a .python-version file that points
+# to 3.7.13.
+python -m venv .venv
+
+# Install dependencies
+# Poetry will automatically install them into the local .venv
+poetry install
+
+# If you have errors likes this:
+ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
+
+# Then do this:
+.venv/bin/pip install --upgrade setuptools
+```
+
+%prep
+%autosetup -n webpreview-1.7.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-webpreview -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.7.2-1
+- Package Spec generated