diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-29 11:48:10 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-29 11:48:10 +0000 |
| commit | 091b2b406381ed31b42ef953b3e94d6843bd96e1 (patch) | |
| tree | 43bd8b6d071c52cce93808dd427dac973830723e | |
| parent | 7b69ab4c792f3639b6c487da52cab8eef510bb12 (diff) | |
automatic import of python-scrapy-requests
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-scrapy-requests.spec | 308 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 310 insertions, 0 deletions
@@ -0,0 +1 @@ +/scrapy-requests-0.2.0.tar.gz diff --git a/python-scrapy-requests.spec b/python-scrapy-requests.spec new file mode 100644 index 0000000..57ad7cf --- /dev/null +++ b/python-scrapy-requests.spec @@ -0,0 +1,308 @@ +%global _empty_manifest_terminate_build 0 +Name: python-scrapy-requests +Version: 0.2.0 +Release: 1 +Summary: Scrapy with requests-html +License: MIT License +URL: https://github.com/rafyzg/scrapy-requests +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/33/10/76fc04b22ad261867080471d9d18ff45ff6acd41e051f71664b7deda68a1/scrapy-requests-0.2.0.tar.gz +BuildArch: noarch + +Requires: python3-scrapy +Requires: python3-requests-html + +%description +# scrapy-requests + +[](https://travis-ci.org/rafyzg/scrapy-requests) + + +Scrapy middleware to asynchronously handle javascript pages using requests-html. + +requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you. +Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo") + +## Requirements +- Python >= 3.6 +- Scrapy >= 2.0 +- requests-html + +## Installation +``` + pip install scrapy-requests +``` +## Configuration +Make twisted use Asyncio event loop +And add RequestsMiddleware to the downloader middleware +#### settings.py + + ```python + TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor' + + DOWNLOADER_MIDDLEWARES = { + 'scrapy_requests.RequestsMiddleware': 800 + } + ``` +## Usage +Use scrapy_requests.HtmlRequest instead of scrapy.Request +```python +from scrapy_requests import HtmlRequest + +yield HtmlRequest(url=url, callback=self.parse) +``` +The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object. +```python +def parse(self, response): + page = response.request.meta['page'] + page.html.render() +``` + +## Additional settings + +If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater. +```python +yield HtmlRequest(url=url, callback=self.parse, render=True) +``` +You could choose a more speific functionality for the HTML object. + +For example - +You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way: +```python +script = "document.body.querySelector('.btn').click();" +yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script}) +``` + +You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... +You do this by specifying an addtional variable in `settings.py` +```python +DEFAULT_SCRAPY_REQUESTS_SETTINGS = { + 'verify': False, # Verifying SSL certificates + 'mock_browser': True, # Mock browser user-agent + 'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], +} +``` + +## Notes +Please star this repo if you found it useful. + +Feel free to contribute and propose issues & additional features. + +License is MIT. + + + + +%package -n python3-scrapy-requests +Summary: Scrapy with requests-html +Provides: python-scrapy-requests +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-scrapy-requests +# scrapy-requests + +[](https://travis-ci.org/rafyzg/scrapy-requests) + + +Scrapy middleware to asynchronously handle javascript pages using requests-html. + +requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you. +Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo") + +## Requirements +- Python >= 3.6 +- Scrapy >= 2.0 +- requests-html + +## Installation +``` + pip install scrapy-requests +``` +## Configuration +Make twisted use Asyncio event loop +And add RequestsMiddleware to the downloader middleware +#### settings.py + + ```python + TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor' + + DOWNLOADER_MIDDLEWARES = { + 'scrapy_requests.RequestsMiddleware': 800 + } + ``` +## Usage +Use scrapy_requests.HtmlRequest instead of scrapy.Request +```python +from scrapy_requests import HtmlRequest + +yield HtmlRequest(url=url, callback=self.parse) +``` +The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object. +```python +def parse(self, response): + page = response.request.meta['page'] + page.html.render() +``` + +## Additional settings + +If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater. +```python +yield HtmlRequest(url=url, callback=self.parse, render=True) +``` +You could choose a more speific functionality for the HTML object. + +For example - +You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way: +```python +script = "document.body.querySelector('.btn').click();" +yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script}) +``` + +You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... +You do this by specifying an addtional variable in `settings.py` +```python +DEFAULT_SCRAPY_REQUESTS_SETTINGS = { + 'verify': False, # Verifying SSL certificates + 'mock_browser': True, # Mock browser user-agent + 'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], +} +``` + +## Notes +Please star this repo if you found it useful. + +Feel free to contribute and propose issues & additional features. + +License is MIT. + + + + +%package help +Summary: Development documents and examples for scrapy-requests +Provides: python3-scrapy-requests-doc +%description help +# scrapy-requests + +[](https://travis-ci.org/rafyzg/scrapy-requests) + + +Scrapy middleware to asynchronously handle javascript pages using requests-html. + +requests-html uses pyppeteer to load javascript pages, and handles user-agent specification for you. +Using requests-html is very intuitive and simple. [Check out their documentation.](https://github.com/psf/requests-html "requests_html repo") + +## Requirements +- Python >= 3.6 +- Scrapy >= 2.0 +- requests-html + +## Installation +``` + pip install scrapy-requests +``` +## Configuration +Make twisted use Asyncio event loop +And add RequestsMiddleware to the downloader middleware +#### settings.py + + ```python + TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor' + + DOWNLOADER_MIDDLEWARES = { + 'scrapy_requests.RequestsMiddleware': 800 + } + ``` +## Usage +Use scrapy_requests.HtmlRequest instead of scrapy.Request +```python +from scrapy_requests import HtmlRequest + +yield HtmlRequest(url=url, callback=self.parse) +``` +The requests will be handled by requests_html, and the request will add an additional meta varialble `page` containing the HTML object. +```python +def parse(self, response): + page = response.request.meta['page'] + page.html.render() +``` + +## Additional settings + +If you would like the page to be rendered by pyppeteer - pass `True` to the `render` key paramater. +```python +yield HtmlRequest(url=url, callback=self.parse, render=True) +``` +You could choose a more speific functionality for the HTML object. + +For example - +You could set up a sleep timer before loading the page, and js script execution when loading the page - doing it this way: +```python +script = "document.body.querySelector('.btn').click();" +yield HtmlRequest(url=url, callback=self.parse, render=True, options={'sleep': 2, 'script': script}) +``` + +You could pass default settings to requests-html session - specifying header, proxies, auth settings etc... +You do this by specifying an addtional variable in `settings.py` +```python +DEFAULT_SCRAPY_REQUESTS_SETTINGS = { + 'verify': False, # Verifying SSL certificates + 'mock_browser': True, # Mock browser user-agent + 'browser_args': ['--no-sandbox', '--proxy-server=x.x.x.x:xxxx'], +} +``` + +## Notes +Please star this repo if you found it useful. + +Feel free to contribute and propose issues & additional features. + +License is MIT. + + + + +%prep +%autosetup -n scrapy-requests-0.2.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-scrapy-requests -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.0-1 +- Package Spec generated @@ -0,0 +1 @@ +9765df2bb5d2abf5ae9f385da65652cc scrapy-requests-0.2.0.tar.gz |
