diff options
author | CoprDistGit <infra@openeuler.org> | 2023-06-20 04:37:49 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-06-20 04:37:49 +0000 |
commit | 7355463ef98ffca19e614050cc8f2df2eaa67f63 (patch) | |
tree | 04f7d0820b2a88da9427196e4ff6c35c3e23e2b9 | |
parent | 6013a24d0096cb517c38264cc620b1136c0606ad (diff) |
automatic import of python-javasoupopeneuler20.03
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-javasoup.spec | 198 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 200 insertions, 0 deletions
@@ -0,0 +1 @@ +/javasoup-0.8.tar.gz diff --git a/python-javasoup.spec b/python-javasoup.spec new file mode 100644 index 0000000..88039c4 --- /dev/null +++ b/python-javasoup.spec @@ -0,0 +1,198 @@ +%global _empty_manifest_terminate_build 0 +Name: python-javasoup +Version: 0.8 +Release: 1 +Summary: Simple python library that uses puppeteer to pull HTML from a loaded SPA (REQUIRES NODE, NPM, AND PUPPETEER) +License: MIT +URL: https://github.com/R-s0n/javasoup +Source0: https://mirrors.aliyun.com/pypi/web/packages/14/18/d47fb4266262f513545fed8f0dbb0346813639f4a06edea10d6559ad9de0/javasoup-0.8.tar.gz +BuildArch: noarch + + +%description +# JavaSoup - Modern Web Scraping + +This library returns the HTML of a Single Page Application (SPA) after the page has loaded. This HTML can then be passed to BeautifulSoup for parsing. + +#### !! WARNING: This library requires node, npm, and puppeteer to work !! +##### !! If the method runs and nothing is returned, this is MOST LIKELY the issue !! + +##### Install JavaScript Components (on Kali): + +`sudo apt-get install -y nodejs npm` + +`npm i puppeteer` + +### Library Overview + +The traditional method of web scraping in Python w/ requests and BeautifulSoup isn't effective for more modern pages and SPAs. This library dynamically generates a JavaScript file that uses puppeteer to fully load the page and return the HTML that is dynamicaly generated in the Document Object Model (DOM). + +The primary method, get_soup, accepts a full URL as a string and returns the page's content as a string. + +Typical Workflow (requests/BeautifulSoup): + +`res = requests.get('http://example.com')` + +`soup = BeautifulSoup(res.text, 'html.parser')` + +New Workflow w/ javasoup: + +`import javasoup` + +`soup = BeautifulSoup(javasoup.get_soup('http://example.com'), 'html.parser')` + +### Execution Process + +1. Creates the necessary JavaScript file in the current working directory (MUST HAVE WRITE PRIVILEGES) +2. Runs that JavaScript file using the URL provided and stores the value returned +3. Deletes the temporary JavaScript file +4. Returns the HTML content + +### Install + +#### After JavaScript components have been installed, javasoup can be downloaded through pip + +`pip install javasoup` + +%package -n python3-javasoup +Summary: Simple python library that uses puppeteer to pull HTML from a loaded SPA (REQUIRES NODE, NPM, AND PUPPETEER) +Provides: python-javasoup +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-javasoup +# JavaSoup - Modern Web Scraping + +This library returns the HTML of a Single Page Application (SPA) after the page has loaded. This HTML can then be passed to BeautifulSoup for parsing. + +#### !! WARNING: This library requires node, npm, and puppeteer to work !! +##### !! If the method runs and nothing is returned, this is MOST LIKELY the issue !! + +##### Install JavaScript Components (on Kali): + +`sudo apt-get install -y nodejs npm` + +`npm i puppeteer` + +### Library Overview + +The traditional method of web scraping in Python w/ requests and BeautifulSoup isn't effective for more modern pages and SPAs. This library dynamically generates a JavaScript file that uses puppeteer to fully load the page and return the HTML that is dynamicaly generated in the Document Object Model (DOM). + +The primary method, get_soup, accepts a full URL as a string and returns the page's content as a string. + +Typical Workflow (requests/BeautifulSoup): + +`res = requests.get('http://example.com')` + +`soup = BeautifulSoup(res.text, 'html.parser')` + +New Workflow w/ javasoup: + +`import javasoup` + +`soup = BeautifulSoup(javasoup.get_soup('http://example.com'), 'html.parser')` + +### Execution Process + +1. Creates the necessary JavaScript file in the current working directory (MUST HAVE WRITE PRIVILEGES) +2. Runs that JavaScript file using the URL provided and stores the value returned +3. Deletes the temporary JavaScript file +4. Returns the HTML content + +### Install + +#### After JavaScript components have been installed, javasoup can be downloaded through pip + +`pip install javasoup` + +%package help +Summary: Development documents and examples for javasoup +Provides: python3-javasoup-doc +%description help +# JavaSoup - Modern Web Scraping + +This library returns the HTML of a Single Page Application (SPA) after the page has loaded. This HTML can then be passed to BeautifulSoup for parsing. + +#### !! WARNING: This library requires node, npm, and puppeteer to work !! +##### !! If the method runs and nothing is returned, this is MOST LIKELY the issue !! + +##### Install JavaScript Components (on Kali): + +`sudo apt-get install -y nodejs npm` + +`npm i puppeteer` + +### Library Overview + +The traditional method of web scraping in Python w/ requests and BeautifulSoup isn't effective for more modern pages and SPAs. This library dynamically generates a JavaScript file that uses puppeteer to fully load the page and return the HTML that is dynamicaly generated in the Document Object Model (DOM). + +The primary method, get_soup, accepts a full URL as a string and returns the page's content as a string. + +Typical Workflow (requests/BeautifulSoup): + +`res = requests.get('http://example.com')` + +`soup = BeautifulSoup(res.text, 'html.parser')` + +New Workflow w/ javasoup: + +`import javasoup` + +`soup = BeautifulSoup(javasoup.get_soup('http://example.com'), 'html.parser')` + +### Execution Process + +1. Creates the necessary JavaScript file in the current working directory (MUST HAVE WRITE PRIVILEGES) +2. Runs that JavaScript file using the URL provided and stores the value returned +3. Deletes the temporary JavaScript file +4. Returns the HTML content + +### Install + +#### After JavaScript components have been installed, javasoup can be downloaded through pip + +`pip install javasoup` + +%prep +%autosetup -n javasoup-0.8 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-javasoup -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.8-1 +- Package Spec generated @@ -0,0 +1 @@ +e0c61442a87fb4e630261ceb00c5dd89 javasoup-0.8.tar.gz |