%global _empty_manifest_terminate_build 0
Name: python-webpreview
Version: 1.7.2
Release: 1
Summary: Extracts OpenGraph, TwitterCard and Schema properties from a webpage.
License: MIT
URL: https://github.com/ludbek/webpreview
Source0: https://mirrors.nju.edu.cn/pypi/web/packages/6e/81/c8ae4f53ba30a3d36b47c128a3e723e1fa6159a7208655283dcaf73f8d05/webpreview-1.7.2.tar.gz
BuildArch: noarch
Requires: python3-requests
Requires: python3-beautifulsoup4
%description
# webpreview
For a given URL, `webpreview` extracts its **title**, **description**, and **image url** using
[Open Graph](http://ogp.me/), [Twitter Card](https://dev.twitter.com/cards/overview), or
[Schema](http://schema.org/) meta tags, or, as an alternative, parses it as a generic webpage.
## Installation
```shell
pip install webpreview
```
## Usage
Use the generic `webpreview` method (added in *v1.7.0*) to parse the page independent of its nature.
This method fetches a page and tries to extracts a *title, description, and a preview image* from it.
It first attempts to parse the values from **Open Graph** properties, then it falls back to
**Twitter Card** format, and then to **Schema**. If none of these methods succeed in extracting all
three properties, then the web page's content is parsed using a generic HTML parser.
```python
>>> from webpreview import webpreview
>>> p = webpreview("https://en.wikipedia.org/wiki/Enrico_Fermi")
>>> p.title
'Enrico Fermi - Wikipedia'
>>> p.description
'Italian-American physicist (1901–1954)'
>>> p.image
'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg'
# Access the parsed fields both as attributes and items
>>> p["url"] == p.url
True
# Check if all three of the title, description, and image are in the parsing result
>>> p.is_complete()
True
# Provide page content from somewhere else
>>> content = """
The Dormouse's story
The Dormouse's story
Elsie
"""
# The the function's invocation won't make any external calls,
# only relying on the supplied content, unlike the example above
>>> webpreview("aa.com", content=content)
WebPreview(url="http://aa.com", title="The Dormouse's story", description="A Mad Tea-Party story")
```
### Using the command line
When `webpreview` is installed via `pip`, then the accompanying command-line tool is
installed alongside.
```shell
$ webpreview https://en.wikipedia.org/wiki/Enrico_Fermi
title: Enrico Fermi - Wikipedia
description: Italian-American physicist (1901–1954)
image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
$ webpreview https://github.com/ --absolute-url
title: GitHub: Where the world builds software
description: GitHub is where over 83 million developers shape the future of software, together.
image: https://github.githubassets.com/images/modules/site/social-cards/github-social.png
```
### Using compatibility API
Before *v1.7.0* the package mainly exposed a different set of the API methods.
All of them are supported and may continue to be used.
```python
# WARNING:
# The API below is left for BACKWARD COMPATIBILITY ONLY.
from webpreview import web_preview
title, description, image = web_preview("aurl.com")
# specifing timeout which gets passed to requests.get()
title, description, image = web_preview("a_slow_url.com", timeout=1000)
# passing headers
headers = {'User-Agent': 'Mozilla/5.0'}
title, description, image = web_preview("a_slow_url.com", headers=headers)
# pass html content thus avoiding making http call again to fetch content.
content = """Dummy HTML"""
title, description, image = web_preview("aurl.com", content=content)
# specifing the parser
# by default webpreview uses 'html.parser'
title, description, image = web_preview("aurl.com", content=content, parser='lxml')
```
## Run with Docker
The docker image can be built and ran similarly to the command line.
The default entry point is the `webpreview` command-line function.
```shell
$ docker build -t webpreview .
$ docker run -it --rm webpreview "https://en.m.wikipedia.org/wiki/Enrico_Fermi"
title: Enrico Fermi - Wikipedia
description: Enrico Fermi (Italian: [enˈriːko ˈfermi]; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age"[1] and the "architect of the atomic bomb".
image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
```
*Note*: built docker image weighs around 210MB.
## Testing
```shell
# Execute the tests
poetry run pytest webpreview
# OR execute until the first failed test
poetry run pytest webpreview -x
```
## Setting up development environment
```shell
# Install a correct minimal supported version of python
pyenv install 3.7.13
# Create a virtual environment
# By default, the project already contains a .python-version file that points
# to 3.7.13.
python -m venv .venv
# Install dependencies
# Poetry will automatically install them into the local .venv
poetry install
# If you have errors likes this:
ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
# Then do this:
.venv/bin/pip install --upgrade setuptools
```
%package -n python3-webpreview
Summary: Extracts OpenGraph, TwitterCard and Schema properties from a webpage.
Provides: python-webpreview
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
%description -n python3-webpreview
# webpreview
For a given URL, `webpreview` extracts its **title**, **description**, and **image url** using
[Open Graph](http://ogp.me/), [Twitter Card](https://dev.twitter.com/cards/overview), or
[Schema](http://schema.org/) meta tags, or, as an alternative, parses it as a generic webpage.
## Installation
```shell
pip install webpreview
```
## Usage
Use the generic `webpreview` method (added in *v1.7.0*) to parse the page independent of its nature.
This method fetches a page and tries to extracts a *title, description, and a preview image* from it.
It first attempts to parse the values from **Open Graph** properties, then it falls back to
**Twitter Card** format, and then to **Schema**. If none of these methods succeed in extracting all
three properties, then the web page's content is parsed using a generic HTML parser.
```python
>>> from webpreview import webpreview
>>> p = webpreview("https://en.wikipedia.org/wiki/Enrico_Fermi")
>>> p.title
'Enrico Fermi - Wikipedia'
>>> p.description
'Italian-American physicist (1901–1954)'
>>> p.image
'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg'
# Access the parsed fields both as attributes and items
>>> p["url"] == p.url
True
# Check if all three of the title, description, and image are in the parsing result
>>> p.is_complete()
True
# Provide page content from somewhere else
>>> content = """
The Dormouse's story
The Dormouse's story
Elsie
"""
# The the function's invocation won't make any external calls,
# only relying on the supplied content, unlike the example above
>>> webpreview("aa.com", content=content)
WebPreview(url="http://aa.com", title="The Dormouse's story", description="A Mad Tea-Party story")
```
### Using the command line
When `webpreview` is installed via `pip`, then the accompanying command-line tool is
installed alongside.
```shell
$ webpreview https://en.wikipedia.org/wiki/Enrico_Fermi
title: Enrico Fermi - Wikipedia
description: Italian-American physicist (1901–1954)
image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
$ webpreview https://github.com/ --absolute-url
title: GitHub: Where the world builds software
description: GitHub is where over 83 million developers shape the future of software, together.
image: https://github.githubassets.com/images/modules/site/social-cards/github-social.png
```
### Using compatibility API
Before *v1.7.0* the package mainly exposed a different set of the API methods.
All of them are supported and may continue to be used.
```python
# WARNING:
# The API below is left for BACKWARD COMPATIBILITY ONLY.
from webpreview import web_preview
title, description, image = web_preview("aurl.com")
# specifing timeout which gets passed to requests.get()
title, description, image = web_preview("a_slow_url.com", timeout=1000)
# passing headers
headers = {'User-Agent': 'Mozilla/5.0'}
title, description, image = web_preview("a_slow_url.com", headers=headers)
# pass html content thus avoiding making http call again to fetch content.
content = """Dummy HTML"""
title, description, image = web_preview("aurl.com", content=content)
# specifing the parser
# by default webpreview uses 'html.parser'
title, description, image = web_preview("aurl.com", content=content, parser='lxml')
```
## Run with Docker
The docker image can be built and ran similarly to the command line.
The default entry point is the `webpreview` command-line function.
```shell
$ docker build -t webpreview .
$ docker run -it --rm webpreview "https://en.m.wikipedia.org/wiki/Enrico_Fermi"
title: Enrico Fermi - Wikipedia
description: Enrico Fermi (Italian: [enˈriːko ˈfermi]; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age"[1] and the "architect of the atomic bomb".
image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
```
*Note*: built docker image weighs around 210MB.
## Testing
```shell
# Execute the tests
poetry run pytest webpreview
# OR execute until the first failed test
poetry run pytest webpreview -x
```
## Setting up development environment
```shell
# Install a correct minimal supported version of python
pyenv install 3.7.13
# Create a virtual environment
# By default, the project already contains a .python-version file that points
# to 3.7.13.
python -m venv .venv
# Install dependencies
# Poetry will automatically install them into the local .venv
poetry install
# If you have errors likes this:
ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
# Then do this:
.venv/bin/pip install --upgrade setuptools
```
%package help
Summary: Development documents and examples for webpreview
Provides: python3-webpreview-doc
%description help
# webpreview
For a given URL, `webpreview` extracts its **title**, **description**, and **image url** using
[Open Graph](http://ogp.me/), [Twitter Card](https://dev.twitter.com/cards/overview), or
[Schema](http://schema.org/) meta tags, or, as an alternative, parses it as a generic webpage.
## Installation
```shell
pip install webpreview
```
## Usage
Use the generic `webpreview` method (added in *v1.7.0*) to parse the page independent of its nature.
This method fetches a page and tries to extracts a *title, description, and a preview image* from it.
It first attempts to parse the values from **Open Graph** properties, then it falls back to
**Twitter Card** format, and then to **Schema**. If none of these methods succeed in extracting all
three properties, then the web page's content is parsed using a generic HTML parser.
```python
>>> from webpreview import webpreview
>>> p = webpreview("https://en.wikipedia.org/wiki/Enrico_Fermi")
>>> p.title
'Enrico Fermi - Wikipedia'
>>> p.description
'Italian-American physicist (1901–1954)'
>>> p.image
'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg'
# Access the parsed fields both as attributes and items
>>> p["url"] == p.url
True
# Check if all three of the title, description, and image are in the parsing result
>>> p.is_complete()
True
# Provide page content from somewhere else
>>> content = """
The Dormouse's story
The Dormouse's story
Elsie
"""
# The the function's invocation won't make any external calls,
# only relying on the supplied content, unlike the example above
>>> webpreview("aa.com", content=content)
WebPreview(url="http://aa.com", title="The Dormouse's story", description="A Mad Tea-Party story")
```
### Using the command line
When `webpreview` is installed via `pip`, then the accompanying command-line tool is
installed alongside.
```shell
$ webpreview https://en.wikipedia.org/wiki/Enrico_Fermi
title: Enrico Fermi - Wikipedia
description: Italian-American physicist (1901–1954)
image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
$ webpreview https://github.com/ --absolute-url
title: GitHub: Where the world builds software
description: GitHub is where over 83 million developers shape the future of software, together.
image: https://github.githubassets.com/images/modules/site/social-cards/github-social.png
```
### Using compatibility API
Before *v1.7.0* the package mainly exposed a different set of the API methods.
All of them are supported and may continue to be used.
```python
# WARNING:
# The API below is left for BACKWARD COMPATIBILITY ONLY.
from webpreview import web_preview
title, description, image = web_preview("aurl.com")
# specifing timeout which gets passed to requests.get()
title, description, image = web_preview("a_slow_url.com", timeout=1000)
# passing headers
headers = {'User-Agent': 'Mozilla/5.0'}
title, description, image = web_preview("a_slow_url.com", headers=headers)
# pass html content thus avoiding making http call again to fetch content.
content = """Dummy HTML"""
title, description, image = web_preview("aurl.com", content=content)
# specifing the parser
# by default webpreview uses 'html.parser'
title, description, image = web_preview("aurl.com", content=content, parser='lxml')
```
## Run with Docker
The docker image can be built and ran similarly to the command line.
The default entry point is the `webpreview` command-line function.
```shell
$ docker build -t webpreview .
$ docker run -it --rm webpreview "https://en.m.wikipedia.org/wiki/Enrico_Fermi"
title: Enrico Fermi - Wikipedia
description: Enrico Fermi (Italian: [enˈriːko ˈfermi]; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age"[1] and the "architect of the atomic bomb".
image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
```
*Note*: built docker image weighs around 210MB.
## Testing
```shell
# Execute the tests
poetry run pytest webpreview
# OR execute until the first failed test
poetry run pytest webpreview -x
```
## Setting up development environment
```shell
# Install a correct minimal supported version of python
pyenv install 3.7.13
# Create a virtual environment
# By default, the project already contains a .python-version file that points
# to 3.7.13.
python -m venv .venv
# Install dependencies
# Poetry will automatically install them into the local .venv
poetry install
# If you have errors likes this:
ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
# Then do this:
.venv/bin/pip install --upgrade setuptools
```
%prep
%autosetup -n webpreview-1.7.2
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-webpreview -f filelist.lst
%dir %{python3_sitelib}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Fri May 05 2023 Python_Bot - 1.7.2-1
- Package Spec generated