%global _empty_manifest_terminate_build 0
Name:		python-scrapy-requests
Version:	0.2.0
Release:	1
Summary:	Scrapy with requests-html
License:	MIT License
URL:		https://github.com/rafyzg/scrapy-requests
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/33/10/76fc04b22ad261867080471d9d18ff45ff6acd41e051f71664b7deda68a1/scrapy-requests-0.2.0.tar.gz
BuildArch:	noarch

Requires:	python3-scrapy
Requires:	python3-requests-html

%description
# scrapy-requests
![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)

Scrapy middleware to asynchronously handle javascript pages using requests-html.

requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")

## Requirements
- Python >= 3.6
- Scrapy >= 2.0
- requests-html

## Installation
```
 pip install scrapy-requests
```
## Configuration
Make twisted use Asyncio event loop 
And add RequestsMiddleware to the downloader middleware
#### settings.py

 ```python
 TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'

 DOWNLOADER_MIDDLEWARES = {
     'scrapy_requests.RequestsMiddleware': 800
 }
 ```
## Usage
Use scrapy_requests.HtmlRequest instead of scrapy.Request
```python
from scrapy_requests import HtmlRequest

yield HtmlRequest(url=url, callback=self.parse)
```
The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
```python
def parse(self, response):
    page = response.request.meta['page']
    page.html.render()
```

## Additional settings

If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
```python
yield HtmlRequest(url=url, callback=self.parse, render=True)
```
You could choose a more speific functionality for the HTML object. 

For example - 
You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
```python
script = "document.body.querySelector('.btn').click();"
yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
```

You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... 
You do this by specifying an addtional variable in `settings.py`
```python
DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
    'verify': False, # Verifying SSL certificates
    'mock_browser': True, # Mock browser user-agent
    'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], 
}
```

## Notes
Please star this repo if you found it useful.

Feel free to contribute and propose issues & additional features.

License is MIT.


%package -n python3-scrapy-requests
Summary:	Scrapy with requests-html
Provides:	python-scrapy-requests
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-scrapy-requests
# scrapy-requests
![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)

Scrapy middleware to asynchronously handle javascript pages using requests-html.

requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")

## Requirements
- Python >= 3.6
- Scrapy >= 2.0
- requests-html

## Installation
```
 pip install scrapy-requests
```
## Configuration
Make twisted use Asyncio event loop 
And add RequestsMiddleware to the downloader middleware
#### settings.py

 ```python
 TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'

 DOWNLOADER_MIDDLEWARES = {
     'scrapy_requests.RequestsMiddleware': 800
 }
 ```
## Usage
Use scrapy_requests.HtmlRequest instead of scrapy.Request
```python
from scrapy_requests import HtmlRequest

yield HtmlRequest(url=url, callback=self.parse)
```
The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
```python
def parse(self, response):
    page = response.request.meta['page']
    page.html.render()
```

## Additional settings

If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
```python
yield HtmlRequest(url=url, callback=self.parse, render=True)
```
You could choose a more speific functionality for the HTML object. 

For example - 
You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
```python
script = "document.body.querySelector('.btn').click();"
yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
```

You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... 
You do this by specifying an addtional variable in `settings.py`
```python
DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
    'verify': False, # Verifying SSL certificates
    'mock_browser': True, # Mock browser user-agent
    'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], 
}
```

## Notes
Please star this repo if you found it useful.

Feel free to contribute and propose issues & additional features.

License is MIT.


%package help
Summary:	Development documents and examples for scrapy-requests
Provides:	python3-scrapy-requests-doc
%description help
# scrapy-requests
![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)

Scrapy middleware to asynchronously handle javascript pages using requests-html.

requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")

## Requirements
- Python >= 3.6
- Scrapy >= 2.0
- requests-html

## Installation
```
 pip install scrapy-requests
```
## Configuration
Make twisted use Asyncio event loop 
And add RequestsMiddleware to the downloader middleware
#### settings.py

 ```python
 TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'

 DOWNLOADER_MIDDLEWARES = {
     'scrapy_requests.RequestsMiddleware': 800
 }
 ```
## Usage
Use scrapy_requests.HtmlRequest instead of scrapy.Request
```python
from scrapy_requests import HtmlRequest

yield HtmlRequest(url=url, callback=self.parse)
```
The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
```python
def parse(self, response):
    page = response.request.meta['page']
    page.html.render()
```

## Additional settings

If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
```python
yield HtmlRequest(url=url, callback=self.parse, render=True)
```
You could choose a more speific functionality for the HTML object. 

For example - 
You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
```python
script = "document.body.querySelector('.btn').click();"
yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
```

You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... 
You do this by specifying an addtional variable in `settings.py`
```python
DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
    'verify': False, # Verifying SSL certificates
    'mock_browser': True, # Mock browser user-agent
    'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], 
}
```

## Notes
Please star this repo if you found it useful.

Feel free to contribute and propose issues & additional features.

License is MIT.


%prep
%autosetup -n scrapy-requests-0.2.0

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-scrapy-requests -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.0-1
- Package Spec generated