summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-29 11:48:10 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-29 11:48:10 +0000
commit091b2b406381ed31b42ef953b3e94d6843bd96e1 (patch)
tree43bd8b6d071c52cce93808dd427dac973830723e
parent7b69ab4c792f3639b6c487da52cab8eef510bb12 (diff)
automatic import of python-scrapy-requests
-rw-r--r--.gitignore1
-rw-r--r--python-scrapy-requests.spec308
-rw-r--r--sources1
3 files changed, 310 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..37c9caf 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/scrapy-requests-0.2.0.tar.gz
diff --git a/python-scrapy-requests.spec b/python-scrapy-requests.spec
new file mode 100644
index 0000000..57ad7cf
--- /dev/null
+++ b/python-scrapy-requests.spec
@@ -0,0 +1,308 @@
+%global _empty_manifest_terminate_build 0
+Name: python-scrapy-requests
+Version: 0.2.0
+Release: 1
+Summary: Scrapy with requests-html
+License: MIT License
+URL: https://github.com/rafyzg/scrapy-requests
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/33/10/76fc04b22ad261867080471d9d18ff45ff6acd41e051f71664b7deda68a1/scrapy-requests-0.2.0.tar.gz
+BuildArch: noarch
+
+Requires: python3-scrapy
+Requires: python3-requests-html
+
+%description
+# scrapy-requests
+![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
+[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
+![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)
+
+Scrapy middleware to asynchronously handle javascript pages using requests-html.
+
+requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
+Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")
+
+## Requirements
+- Python >= 3.6
+- Scrapy >= 2.0
+- requests-html
+
+## Installation
+```
+ pip install scrapy-requests
+```
+## Configuration
+Make twisted use Asyncio event loop
+And add RequestsMiddleware to the downloader middleware
+#### settings.py
+
+ ```python
+ TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
+
+ DOWNLOADER_MIDDLEWARES = {
+ 'scrapy_requests.RequestsMiddleware': 800
+ }
+ ```
+## Usage
+Use scrapy_requests.HtmlRequest instead of scrapy.Request
+```python
+from scrapy_requests import HtmlRequest
+
+yield HtmlRequest(url=url, callback=self.parse)
+```
+The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
+```python
+def parse(self, response):
+ page = response.request.meta['page']
+ page.html.render()
+```
+
+## Additional settings
+
+If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
+```python
+yield HtmlRequest(url=url, callback=self.parse, render=True)
+```
+You could choose a more speific functionality for the HTML object.
+
+For example -
+You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
+```python
+script = "document.body.querySelector('.btn').click();"
+yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
+```
+
+You could pass default settings to requests-html session - specifying header, proxies, auth settings etc...
+You do this by specifying an addtional variable in `settings.py`
+```python
+DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
+ 'verify': False, # Verifying SSL certificates
+ 'mock_browser': True, # Mock browser user-agent
+ 'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'],
+}
+```
+
+## Notes
+Please star this repo if you found it useful.
+
+Feel free to contribute and propose issues & additional features.
+
+License is MIT.
+
+
+
+
+%package -n python3-scrapy-requests
+Summary: Scrapy with requests-html
+Provides: python-scrapy-requests
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-scrapy-requests
+# scrapy-requests
+![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
+[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
+![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)
+
+Scrapy middleware to asynchronously handle javascript pages using requests-html.
+
+requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
+Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")
+
+## Requirements
+- Python >= 3.6
+- Scrapy >= 2.0
+- requests-html
+
+## Installation
+```
+ pip install scrapy-requests
+```
+## Configuration
+Make twisted use Asyncio event loop
+And add RequestsMiddleware to the downloader middleware
+#### settings.py
+
+ ```python
+ TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
+
+ DOWNLOADER_MIDDLEWARES = {
+ 'scrapy_requests.RequestsMiddleware': 800
+ }
+ ```
+## Usage
+Use scrapy_requests.HtmlRequest instead of scrapy.Request
+```python
+from scrapy_requests import HtmlRequest
+
+yield HtmlRequest(url=url, callback=self.parse)
+```
+The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
+```python
+def parse(self, response):
+ page = response.request.meta['page']
+ page.html.render()
+```
+
+## Additional settings
+
+If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
+```python
+yield HtmlRequest(url=url, callback=self.parse, render=True)
+```
+You could choose a more speific functionality for the HTML object.
+
+For example -
+You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
+```python
+script = "document.body.querySelector('.btn').click();"
+yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
+```
+
+You could pass default settings to requests-html session - specifying header, proxies, auth settings etc...
+You do this by specifying an addtional variable in `settings.py`
+```python
+DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
+ 'verify': False, # Verifying SSL certificates
+ 'mock_browser': True, # Mock browser user-agent
+ 'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'],
+}
+```
+
+## Notes
+Please star this repo if you found it useful.
+
+Feel free to contribute and propose issues & additional features.
+
+License is MIT.
+
+
+
+
+%package help
+Summary: Development documents and examples for scrapy-requests
+Provides: python3-scrapy-requests-doc
+%description help
+# scrapy-requests
+![PyPI](https://img.shields.io/pypi/v/scrapy-requests)
+[![Build Status](https://travis-ci.org/rafyzg/scrapy-requests.svg?branch=main)](https://travis-ci.org/rafyzg/scrapy-requests)
+![Codecov](https://img.shields.io/codecov/c/github/rafyzg/scrapy-requests)
+
+Scrapy middleware to asynchronously handle javascript pages using requests-html.
+
+requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you.
+Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo")
+
+## Requirements
+- Python >= 3.6
+- Scrapy >= 2.0
+- requests-html
+
+## Installation
+```
+ pip install scrapy-requests
+```
+## Configuration
+Make twisted use Asyncio event loop
+And add RequestsMiddleware to the downloader middleware
+#### settings.py
+
+ ```python
+ TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
+
+ DOWNLOADER_MIDDLEWARES = {
+ 'scrapy_requests.RequestsMiddleware': 800
+ }
+ ```
+## Usage
+Use scrapy_requests.HtmlRequest instead of scrapy.Request
+```python
+from scrapy_requests import HtmlRequest
+
+yield HtmlRequest(url=url, callback=self.parse)
+```
+The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object.
+```python
+def parse(self, response):
+ page = response.request.meta['page']
+ page.html.render()
+```
+
+## Additional settings
+
+If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater.
+```python
+yield HtmlRequest(url=url, callback=self.parse, render=True)
+```
+You could choose a more speific functionality for the HTML object.
+
+For example -
+You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way:
+```python
+script = "document.body.querySelector('.btn').click();"
+yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script})
+```
+
+You could pass default settings to requests-html session - specifying header, proxies, auth settings etc...
+You do this by specifying an addtional variable in `settings.py`
+```python
+DEFAULT_SCRAPY_REQUESTS_SETTINGS = {
+ 'verify': False, # Verifying SSL certificates
+ 'mock_browser': True, # Mock browser user-agent
+ 'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'],
+}
+```
+
+## Notes
+Please star this repo if you found it useful.
+
+Feel free to contribute and propose issues & additional features.
+
+License is MIT.
+
+
+
+
+%prep
+%autosetup -n scrapy-requests-0.2.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-scrapy-requests -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..36e4bd0
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+9765df2bb5d2abf5ae9f385da65652cc scrapy-requests-0.2.0.tar.gz