automatic import of python-scrapy-requests

author: CoprDistGit <infra@openeuler.org> 2023-05-29 11:48:10 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-29 11:48:10 +0000
commit: 091b2b406381ed31b42ef953b3e94d6843bd96e1 (patch)
tree: 43bd8b6d071c52cce93808dd427dac973830723e
parent: 7b69ab4c792f3639b6c487da52cab8eef510bb12 (diff)
3 files changed, 310 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..37c9caf 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/scrapy-requests-0.2.0.tar.gz
diff --git a/python-scrapy-requests.spec b/python-scrapy-requests.spec
new file mode 100644
index 0000000..57ad7cf
--- /dev/null
+++ b/python-scrapy-requests.spec
@@ -0,0 +1,308 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-scrapy-requests
+Version:	0.2.0
+Release:	1
+Summary:	Scrapy with requests-html
+License:	MIT License
+URL:		https://github.com/rafyzg/scrapy-requests
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/33/10/76fc04b22ad261867080471d9d18ff45ff6acd41e051f71664b7deda68a1/scrapy-requests-0.2.0.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-scrapy
+Requires:	python3-requests-html
+
+%description
+# scrapy-requests
+![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
+[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
+![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)
+
+Scrapy middleware to asynchronously handle javascript pages using requests-html.
+
+requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
+Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")
+
+## Requirements
+- Python >= 3.6
+- Scrapy >= 2.0
+- requests-html
+
+## Installation
+```
+ pip install scrapy-requests
+```
+## Configuration
+Make twisted use Asyncio event loop 
+And add RequestsMiddleware to the downloader middleware
+#### settings.py
+
+ ```python
+ TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
+
+ DOWNLOADER_MIDDLEWARES = {
+     'scrapy_requests.RequestsMiddleware': 800
+ }
+ ```
+## Usage
+Use scrapy_requests.HtmlRequest instead of scrapy.Request
+```python
+from scrapy_requests import HtmlRequest
+
+yield HtmlRequest(url=url, callback=self.parse)
+```
+The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
+```python
+def parse(self, response):
+    page = response.request.meta['page']
+    page.html.render()
+```
+
+## Additional settings
+
+If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
+```python
+yield HtmlRequest(url=url, callback=self.parse, render=True)
+```
+You could choose a more speific functionality for the HTML object. 
+
+For example - 
+You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
+```python
+script = "document.body.querySelector('.btn').click();"
+yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
+```
+
+You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... 
+You do this by specifying an addtional variable in `settings.py`
+```python
+DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
+    'verify': False, # Verifying SSL certificates
+    'mock_browser': True, # Mock browser user-agent
+    'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], 
+}
+```
+
+## Notes
+Please star this repo if you found it useful.
+
+Feel free to contribute and propose issues & additional features.
+
+License is MIT.
+
+
+
+
+%package -n python3-scrapy-requests
+Summary:	Scrapy with requests-html
+Provides:	python-scrapy-requests
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-scrapy-requests
+# scrapy-requests
+![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
+[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
+![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)
+
+Scrapy middleware to asynchronously handle javascript pages using requests-html.
+
+requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
+Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")
+
+## Requirements
+- Python >= 3.6
+- Scrapy >= 2.0
+- requests-html
+
+## Installation
+```
+ pip install scrapy-requests
+```
+## Configuration
+Make twisted use Asyncio event loop 
+And add RequestsMiddleware to the downloader middleware
+#### settings.py
+
+ ```python
+ TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
+
+ DOWNLOADER_MIDDLEWARES = {
+     'scrapy_requests.RequestsMiddleware': 800
+ }
+ ```
+## Usage
+Use scrapy_requests.HtmlRequest instead of scrapy.Request
+```python
+from scrapy_requests import HtmlRequest
+
+yield HtmlRequest(url=url, callback=self.parse)
+```
+The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
+```python
+def parse(self, response):
+    page = response.request.meta['page']
+    page.html.render()
+```
+
+## Additional settings
+
+If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
+```python
+yield HtmlRequest(url=url, callback=self.parse, render=True)
+```
+You could choose a more speific functionality for the HTML object. 
+
+For example - 
+You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
+```python
+script = "document.body.querySelector('.btn').click();"
+yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
+```
+
+You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... 
+You do this by specifying an addtional variable in `settings.py`
+```python
+DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
+    'verify': False, # Verifying SSL certificates
+    'mock_browser': True, # Mock browser user-agent
+    'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], 
+}
+```
+
+## Notes
+Please star this repo if you found it useful.
+
+Feel free to contribute and propose issues & additional features.
+
+License is MIT.
+
+
+
+
+%package help
+Summary:	Development documents and examples for scrapy-requests
+Provides:	python3-scrapy-requests-doc
+%description help
+# scrapy-requests
+![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
+[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
+![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)
+
+Scrapy middleware to asynchronously handle javascript pages using requests-html.
+
+requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
+Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")
+
+## Requirements
+- Python >= 3.6
+- Scrapy >= 2.0
+- requests-html
+
+## Installation
+```
+ pip install scrapy-requests
+```
+## Configuration
+Make twisted use Asyncio event loop 
+And add RequestsMiddleware to the downloader middleware
+#### settings.py
+
+ ```python
+ TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
+
+ DOWNLOADER_MIDDLEWARES = {
+     'scrapy_requests.RequestsMiddleware': 800
+ }
+ ```
+## Usage
+Use scrapy_requests.HtmlRequest instead of scrapy.Request
+```python
+from scrapy_requests import HtmlRequest
+
+yield HtmlRequest(url=url, callback=self.parse)
+```
+The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
+```python
+def parse(self, response):
+    page = response.request.meta['page']
+    page.html.render()
+```
+
+## Additional settings
+
+If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
+```python
+yield HtmlRequest(url=url, callback=self.parse, render=True)
+```
+You could choose a more speific functionality for the HTML object. 
+
+For example - 
+You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
+```python
+script = "document.body.querySelector('.btn').click();"
+yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
+```
+
+You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... 
+You do this by specifying an addtional variable in `settings.py`
+```python
+DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
+    'verify': False, # Verifying SSL certificates
+    'mock_browser': True, # Mock browser user-agent
+    'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], 
+}
+```
+
+## Notes
+Please star this repo if you found it useful.
+
+Feel free to contribute and propose issues & additional features.
+
+License is MIT.
+
+
+
+
+%prep
+%autosetup -n scrapy-requests-0.2.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-scrapy-requests -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..36e4bd0
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+9765df2bb5d2abf5ae9f385da65652cc  scrapy-requests-0.2.0.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-29 11:48:10 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-29 11:48:10 +0000
commit	091b2b406381ed31b42ef953b3e94d6843bd96e1 (patch)
tree	43bd8b6d071c52cce93808dd427dac973830723e
parent	7b69ab4c792f3639b6c487da52cab8eef510bb12 (diff)