diff options
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-websearch-python.spec | 1303 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 1305 insertions, 0 deletions
@@ -0,0 +1 @@ +/websearch-python-1.2.1.tar.gz diff --git a/python-websearch-python.spec b/python-websearch-python.spec new file mode 100644 index 0000000..74cf25c --- /dev/null +++ b/python-websearch-python.spec @@ -0,0 +1,1303 @@ +%global _empty_manifest_terminate_build 0 +Name: python-websearch-python +Version: 1.2.1 +Release: 1 +Summary: Module allowing you to do various searches for links on the Web +License: MIT License +URL: https://github.com/iTeam-S/WebSearch +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/59/8b/937e04dc93938bc0dc811d52b44855f3ce852740f7856fb6fd178f5dc799/websearch-python-1.2.1.tar.gz +BuildArch: noarch + +Requires: python3-BeautifulSoup4 +Requires: python3-requests +Requires: python3-gevent +Requires: python3-flask + +%description +# WebSearch + + + +> Python module allowing you to do various searches for links on the Web. + + +[](https://github.com/iTeam-S/WebSearch/actions/workflows/python-test.yml) +[](https://github.com/iTeam-S/WebSearch/actions/workflows/pip-upload.yml) + +[](https://pypi.org/project/websearch-python/) +[](https://pypi.org/project/websearch-python/) + + + +## Installation + +```s +pip3 install websearch-python +``` +**OR** you can install dev version +```s +pip3 install https://github.com/iTeam-S/WebSearch/archive/refs/heads/main.zip +``` + +## Use + +### Quick Start as Module + +```python +from websearch import WebSearch as web +for page in web('iTeam-$').pages[:2]: + print(page) +``` + +``` +[RESULTS] + + https://iteam-s.mg/ + https://github.com/iTeam-S +``` + + +### Quick Start as Webserver + +```s +# run webserver +websearch --host 0.0.0.0 --port 7845 +``` + +**OR** + +```s +# run webserver +python -m websearch --host 0.0.0.0 --port 7845 +``` + +```s +# requests contents +curl http://0.0.0.0:7845/pages/botoravony+arleme +``` + + ```json + [ + "https://portfolio.iteam-s.mg/?id=2", + "https://portfolio.iteam-s.mg/libs/cv/arleme.pdf", + "https://madagascar.webcup.fr/team-webcup/iteams" + ] +``` + +### Use Deployed Version +```s +curl https://websearch-python.herokuapp.com/pages/botoravony+arleme +``` + +__________________________ + +<details> + <summary style='font-size:24'> FULL DOCUMENTATION </summary> + +### Initialization + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +``` +You can pass a `list` for mutliple keyword. + +```python +web = WebSearch(['Gaetan Jonathan BAKARY', 'iTeam-S']) +``` +You can also specify a `website` as a reference. + +```python +web = WebSearch('Gaetan Jonathan', site='iteam-s.mg') +``` + + +### Webpages results + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +webpages = web.pages +for wp in webpages[:5]: + print(wp) +``` + +``` +[RESULTS] + + https://mg.linkedin.com/in/gaetanj + https://portfolio.iteam-s.mg/?u=gaetan + https://github.com/gaetan1903 + https://medium.com/@gaetan1903 + https://gitlab.com/gaetan1903 +``` + + +### Images results + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +webimages = web.images +for im in webimages[:5]: + print(im) +``` + +``` +[RESULTS] + + https://tse3.mm.bing.net/th?id=OIP.-K25y8TqkOi9UG_40Ti8bgAAAA + https://tse1.mm.bing.net/th?id=OIP.yJPVcDx6znFSOewLdQBbHgHaJA + https://tse3.mm.bing.net/th?id=OIP.7rO2T_nDAS0bXm4tQ4LKQAHaJA + https://tse2.mm.bing.net/th?id=OIP.IUIEkGQVzYRKaDA7WeeV7QHaEF + https://tse3.explicit.bing.net/th?id=OIP.OmvVnMIVu2ZdNZHZzJK_hgAAAA +``` + + +### PDF results + +```python +from websearch import WebSearch +web = WebSearch('Math 220') +pdfs = web.pdf +for pdf in pdfs[:5]: + print(pdf) +``` + +``` +[RESULTS] + + https://www.coconino.edu/resources/files/pdfs/registration/curriculum/course-outlines/m/mat/mat_220.pdf + https://www.jmu.edu/mathstat/Files/ALEKSmatrix.pdf + https://www.jjc.edu/sites/default/files/Academics/Math/M220%20Master%20Syllabus%20SP18.pdf + https://www.sonoma.edu/sites/www/files/2018-19cat-11math.pdf + https://www.svsd.net/cms/lib5/PA01001234/Centricity/Domain/1009/3.3-3.3B-Practice-KEY.pdf +``` + +To prevent the search for attachments with format verification, set `verif=False`, which is `True` by default. + +Format verification is presented [here](https://github.com/iTeam-S/WebSearch/pull/4) + +```python +from websearch import WebSearch +web = WebSearch('Math 220', verif=False) +``` + + +### DOCX results +```python +from websearch import WebSearch: +web = WebSearch('python') +words = web.docx +for word in words[:3]: + print(word) +``` + +``` +[RESULTS] + + https://www.ocr.org.uk/Images/572953-j277-programming-techniques-python.docx + https://www.niu.edu/brown/_pdf/physics374_spring2021/l1-19-21.docx + https://ent2d.ac-bordeaux.fr/disciplines/mathematiques/wp-content/uploads/sites/3/2017/09/de-Scratch-%C3%A0-Python.docx +``` + + +### XLSX results +```python +from websearch import WebSearch: +web = WebSearch('datalist') +excels = web.xlsx +for excel in excels[:3]: + print(excel) +``` + +``` +[RESULTS] + + https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/979255/Detailed_Single_Data_List_-_2021-2022.xlsx + https://www.jaist.ac.jp/top/data/list-achievement-research-e.xlsx + https://img1.wsimg.com/blobby/go/bed8f8d7-d6c2-488d-9aa3-5910e18aa8d2/downloads/Datalist.xlsx +``` + + +### PPTX results +```python +from websearch import WebSearch: +web = WebSearch('Leadership') +powerpoints = web.pptx +for powerpoint in powerpoints[:3]: + print(powerpoint) +``` + +``` +[RESULTS] + + https://www.plainviewisd.org/cms/lib6/TX01918200/Centricity/Domain/853/Leadership%20Behav.%20Styles.pptx + https://www.yorksandhumberdeanery.nhs.uk/sites/default/files/leadership_activity_and_msf.pptx + https://www.itfglobal.org/sites/default/files/node/resources/files/Stage%203.1%20Powerpoint.pptx +``` + + +### ODT results +```python +from websearch import WebSearch +web = WebSearch('Finance') +documents = web.odt +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] + https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/970748/Green_Finance_Report.odt + https://iati.fcdo.gov.uk/iati_documents/3678707.odt + +``` + +### ODS results +```python +from websearch import WebSearch +web = WebSearch('Commerce') +documents = web.ods +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] +http://www.justice.gouv.fr/art_pix/Stat_RSJ_12.7_Civil_Les_tribunaux_de_commerce.ods +https://www.insee.fr/fr/metadonnees/source/fichier/Precision-principaux-indicateurs-crise-sanitaire-2020.ods +``` + +### ODP results +```python +from websearch import WebSearch +web = WebSearch('Renaissance') +documents = web.odp +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] +http://ekladata.com/9sHTcbLYfwbNGKU9cpnZXjlsbfA/17-Art-Renaissance.odp +https://www.college-yvescoppens-malestroit.ac-rennes.fr/sites/college-yvescoppens-malestroit.ac-rennes.fr/IMG/odp/diapo-presentation-voyage-5e.odp +``` + +### KML results +```python +from websearch import WebSearch +web = WebSearch('Madagascar') +maps = web.kml +for map in maps[:3]: + print(map) +``` + +``` +[RESULTS] +http://www.hydrosciences.fr/sierem/kmz_files/MGPLGRA.kml +https://www.ngoaidmap.org/downloads?doc=kml&name=association-intercooperation-madagascar-aim_projects&partners%5B%5D=6160§ors%5B%5D=1&status=active +https://ngoaidmap.org/downloads?doc=kml&name=nemp-madagascar-cyclone-enawo-response_projects&projects%5B%5D=20655&status=active +``` + +### CUSTOM results + +For other extensions, not present, use the `custom` function + +Second arg can be taken [here](https://developer.mozilla.org/fr/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types) + +```python +from websearch import WebSearch +web = WebSearch('Biologie') +ps_documents = web.custom('ps', 'application/postscript') +for doc in ps_documents[:3]: + print(doc) +``` + +``` +[RESULTS] + +http://irma.math.unistra.fr/~fbertran/Master1_2020_2/L3Court.ps +http://jfla.inria.fr/2002/actes/10-michel.ps +https://www.crstra.dz/telechargement/pnr/ps/environnement/fadel-djamel.ps +``` + + +### Webserver + +you can deploy as webserver and send an http request + +```s + python -m websearch --host [host] --port [port] + [*] default host : 0.0.0.0 + [*] default port : 7845 +``` +Exemple for page: + ```s + curl http://<host>:<port>/pages/botoravony+arleme + + + [ + + "https://portfolio.iteam-s.mg/?id=2", + "https://portfolio.iteam-s.mg/libs/cv/arleme.pdf", + "https://madagascar.webcup.fr/team-webcup/iteams" + ] +``` + +Exemple for image: +```s + curl http://<host>:<port>/images/one+piece + + + [ + "https://tse1.mm.bing.net/th?id=OIP.GlNk7idD3RCI_SYLiVzSBAHaE7", + "https://tse2.mm.bing.net/th?id=OIP.uePUN5rwpB-7wicu1uxQcgHaFj", + "https://tse2.mm.bing.net/th?id=OIP.dwWBU-A_6KPvvEYsL2nhVgHaFc", + "https://tse1.mm.bing.net/th?id=OIP.5M8tKIhIWvbqGO1prhUGfAHaJ4", + ..... + "https://tse4.mm.bing.net/th?id=OIP.uvp3efwHRLDJnUWZ5KLWCwHaE8", + "https://tse3.mm.bing.net/th?id=OIP.d_uUoc-8R13RZ1bb76yhZgHaKp", + "https://tse1.mm.bing.net/th?id=OIP.cBWDvspBM036p6h4DS6RTAHaFj" + } +``` + +Search by extension : `curl http://<host>:<port>/<extension>/<query>` + +Where extension is from this list: + +``` +swf, pdf, ps, dwf, kml, kmz, gpx, hwp, htm, html, xls, xlsx, +ppt, pptx, doc, docx, odp, ods, odt, rtf, svg, tex, txt, text, +bas, c, cc, cpp, cxx, h, hpp, cs, java, pl, py, wml, wap, xml +``` + +Exemple : +```s + curl http://<host>:<port>/kml/madagascar+antananarivo + + + [ + "https://ifl.francophonelibre.org/atelier/ActionOSMMG2019/wms/kml?layers=ActionOSMMG2019:MG_Antananarivo_pharmacy_point_OSM_20190427" + ] +``` + +You can use the parameter `limit` to limit results +``` + curl http://<host>:<port>/images/one+piece?limit=4 + + + [ + "https://tse1.mm.bing.net/th?id=OIP.GlNk7idD3RCI_SYLiVzSBAHaE7", + "https://tse2.mm.bing.net/th?id=OIP.uePUN5rwpB-7wicu1uxQcgHaFj", + "https://tse2.mm.bing.net/th?id=OIP.dwWBU-A_6KPvvEYsL2nhVgHaFc", + "https://tse1.mm.bing.net/th?id=OIP.5M8tKIhIWvbqGO1prhUGfAHaJ4" + ] + +``` + + + </details> + + +_____________________________________________________________________ + + +## Show your support + Give a star 🌟 if this project helped you! + + [](https://www.buymeacoffee.com/gaetan1903) + + +## License + +MIT License + +Copyright (c) 2021 [iTeam-$](https://iteam-s.mg) + + +___________________________________________________________________ + + ## Contributors + + + + + + +%package -n python3-websearch-python +Summary: Module allowing you to do various searches for links on the Web +Provides: python-websearch-python +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-websearch-python +# WebSearch + + + +> Python module allowing you to do various searches for links on the Web. + + +[](https://github.com/iTeam-S/WebSearch/actions/workflows/python-test.yml) +[](https://github.com/iTeam-S/WebSearch/actions/workflows/pip-upload.yml) + +[](https://pypi.org/project/websearch-python/) +[](https://pypi.org/project/websearch-python/) + + + +## Installation + +```s +pip3 install websearch-python +``` +**OR** you can install dev version +```s +pip3 install https://github.com/iTeam-S/WebSearch/archive/refs/heads/main.zip +``` + +## Use + +### Quick Start as Module + +```python +from websearch import WebSearch as web +for page in web('iTeam-$').pages[:2]: + print(page) +``` + +``` +[RESULTS] + + https://iteam-s.mg/ + https://github.com/iTeam-S +``` + + +### Quick Start as Webserver + +```s +# run webserver +websearch --host 0.0.0.0 --port 7845 +``` + +**OR** + +```s +# run webserver +python -m websearch --host 0.0.0.0 --port 7845 +``` + +```s +# requests contents +curl http://0.0.0.0:7845/pages/botoravony+arleme +``` + + ```json + [ + "https://portfolio.iteam-s.mg/?id=2", + "https://portfolio.iteam-s.mg/libs/cv/arleme.pdf", + "https://madagascar.webcup.fr/team-webcup/iteams" + ] +``` + +### Use Deployed Version +```s +curl https://websearch-python.herokuapp.com/pages/botoravony+arleme +``` + +__________________________ + +<details> + <summary style='font-size:24'> FULL DOCUMENTATION </summary> + +### Initialization + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +``` +You can pass a `list` for mutliple keyword. + +```python +web = WebSearch(['Gaetan Jonathan BAKARY', 'iTeam-S']) +``` +You can also specify a `website` as a reference. + +```python +web = WebSearch('Gaetan Jonathan', site='iteam-s.mg') +``` + + +### Webpages results + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +webpages = web.pages +for wp in webpages[:5]: + print(wp) +``` + +``` +[RESULTS] + + https://mg.linkedin.com/in/gaetanj + https://portfolio.iteam-s.mg/?u=gaetan + https://github.com/gaetan1903 + https://medium.com/@gaetan1903 + https://gitlab.com/gaetan1903 +``` + + +### Images results + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +webimages = web.images +for im in webimages[:5]: + print(im) +``` + +``` +[RESULTS] + + https://tse3.mm.bing.net/th?id=OIP.-K25y8TqkOi9UG_40Ti8bgAAAA + https://tse1.mm.bing.net/th?id=OIP.yJPVcDx6znFSOewLdQBbHgHaJA + https://tse3.mm.bing.net/th?id=OIP.7rO2T_nDAS0bXm4tQ4LKQAHaJA + https://tse2.mm.bing.net/th?id=OIP.IUIEkGQVzYRKaDA7WeeV7QHaEF + https://tse3.explicit.bing.net/th?id=OIP.OmvVnMIVu2ZdNZHZzJK_hgAAAA +``` + + +### PDF results + +```python +from websearch import WebSearch +web = WebSearch('Math 220') +pdfs = web.pdf +for pdf in pdfs[:5]: + print(pdf) +``` + +``` +[RESULTS] + + https://www.coconino.edu/resources/files/pdfs/registration/curriculum/course-outlines/m/mat/mat_220.pdf + https://www.jmu.edu/mathstat/Files/ALEKSmatrix.pdf + https://www.jjc.edu/sites/default/files/Academics/Math/M220%20Master%20Syllabus%20SP18.pdf + https://www.sonoma.edu/sites/www/files/2018-19cat-11math.pdf + https://www.svsd.net/cms/lib5/PA01001234/Centricity/Domain/1009/3.3-3.3B-Practice-KEY.pdf +``` + +To prevent the search for attachments with format verification, set `verif=False`, which is `True` by default. + +Format verification is presented [here](https://github.com/iTeam-S/WebSearch/pull/4) + +```python +from websearch import WebSearch +web = WebSearch('Math 220', verif=False) +``` + + +### DOCX results +```python +from websearch import WebSearch: +web = WebSearch('python') +words = web.docx +for word in words[:3]: + print(word) +``` + +``` +[RESULTS] + + https://www.ocr.org.uk/Images/572953-j277-programming-techniques-python.docx + https://www.niu.edu/brown/_pdf/physics374_spring2021/l1-19-21.docx + https://ent2d.ac-bordeaux.fr/disciplines/mathematiques/wp-content/uploads/sites/3/2017/09/de-Scratch-%C3%A0-Python.docx +``` + + +### XLSX results +```python +from websearch import WebSearch: +web = WebSearch('datalist') +excels = web.xlsx +for excel in excels[:3]: + print(excel) +``` + +``` +[RESULTS] + + https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/979255/Detailed_Single_Data_List_-_2021-2022.xlsx + https://www.jaist.ac.jp/top/data/list-achievement-research-e.xlsx + https://img1.wsimg.com/blobby/go/bed8f8d7-d6c2-488d-9aa3-5910e18aa8d2/downloads/Datalist.xlsx +``` + + +### PPTX results +```python +from websearch import WebSearch: +web = WebSearch('Leadership') +powerpoints = web.pptx +for powerpoint in powerpoints[:3]: + print(powerpoint) +``` + +``` +[RESULTS] + + https://www.plainviewisd.org/cms/lib6/TX01918200/Centricity/Domain/853/Leadership%20Behav.%20Styles.pptx + https://www.yorksandhumberdeanery.nhs.uk/sites/default/files/leadership_activity_and_msf.pptx + https://www.itfglobal.org/sites/default/files/node/resources/files/Stage%203.1%20Powerpoint.pptx +``` + + +### ODT results +```python +from websearch import WebSearch +web = WebSearch('Finance') +documents = web.odt +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] + https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/970748/Green_Finance_Report.odt + https://iati.fcdo.gov.uk/iati_documents/3678707.odt + +``` + +### ODS results +```python +from websearch import WebSearch +web = WebSearch('Commerce') +documents = web.ods +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] +http://www.justice.gouv.fr/art_pix/Stat_RSJ_12.7_Civil_Les_tribunaux_de_commerce.ods +https://www.insee.fr/fr/metadonnees/source/fichier/Precision-principaux-indicateurs-crise-sanitaire-2020.ods +``` + +### ODP results +```python +from websearch import WebSearch +web = WebSearch('Renaissance') +documents = web.odp +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] +http://ekladata.com/9sHTcbLYfwbNGKU9cpnZXjlsbfA/17-Art-Renaissance.odp +https://www.college-yvescoppens-malestroit.ac-rennes.fr/sites/college-yvescoppens-malestroit.ac-rennes.fr/IMG/odp/diapo-presentation-voyage-5e.odp +``` + +### KML results +```python +from websearch import WebSearch +web = WebSearch('Madagascar') +maps = web.kml +for map in maps[:3]: + print(map) +``` + +``` +[RESULTS] +http://www.hydrosciences.fr/sierem/kmz_files/MGPLGRA.kml +https://www.ngoaidmap.org/downloads?doc=kml&name=association-intercooperation-madagascar-aim_projects&partners%5B%5D=6160§ors%5B%5D=1&status=active +https://ngoaidmap.org/downloads?doc=kml&name=nemp-madagascar-cyclone-enawo-response_projects&projects%5B%5D=20655&status=active +``` + +### CUSTOM results + +For other extensions, not present, use the `custom` function + +Second arg can be taken [here](https://developer.mozilla.org/fr/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types) + +```python +from websearch import WebSearch +web = WebSearch('Biologie') +ps_documents = web.custom('ps', 'application/postscript') +for doc in ps_documents[:3]: + print(doc) +``` + +``` +[RESULTS] + +http://irma.math.unistra.fr/~fbertran/Master1_2020_2/L3Court.ps +http://jfla.inria.fr/2002/actes/10-michel.ps +https://www.crstra.dz/telechargement/pnr/ps/environnement/fadel-djamel.ps +``` + + +### Webserver + +you can deploy as webserver and send an http request + +```s + python -m websearch --host [host] --port [port] + [*] default host : 0.0.0.0 + [*] default port : 7845 +``` +Exemple for page: + ```s + curl http://<host>:<port>/pages/botoravony+arleme + + + [ + + "https://portfolio.iteam-s.mg/?id=2", + "https://portfolio.iteam-s.mg/libs/cv/arleme.pdf", + "https://madagascar.webcup.fr/team-webcup/iteams" + ] +``` + +Exemple for image: +```s + curl http://<host>:<port>/images/one+piece + + + [ + "https://tse1.mm.bing.net/th?id=OIP.GlNk7idD3RCI_SYLiVzSBAHaE7", + "https://tse2.mm.bing.net/th?id=OIP.uePUN5rwpB-7wicu1uxQcgHaFj", + "https://tse2.mm.bing.net/th?id=OIP.dwWBU-A_6KPvvEYsL2nhVgHaFc", + "https://tse1.mm.bing.net/th?id=OIP.5M8tKIhIWvbqGO1prhUGfAHaJ4", + ..... + "https://tse4.mm.bing.net/th?id=OIP.uvp3efwHRLDJnUWZ5KLWCwHaE8", + "https://tse3.mm.bing.net/th?id=OIP.d_uUoc-8R13RZ1bb76yhZgHaKp", + "https://tse1.mm.bing.net/th?id=OIP.cBWDvspBM036p6h4DS6RTAHaFj" + } +``` + +Search by extension : `curl http://<host>:<port>/<extension>/<query>` + +Where extension is from this list: + +``` +swf, pdf, ps, dwf, kml, kmz, gpx, hwp, htm, html, xls, xlsx, +ppt, pptx, doc, docx, odp, ods, odt, rtf, svg, tex, txt, text, +bas, c, cc, cpp, cxx, h, hpp, cs, java, pl, py, wml, wap, xml +``` + +Exemple : +```s + curl http://<host>:<port>/kml/madagascar+antananarivo + + + [ + "https://ifl.francophonelibre.org/atelier/ActionOSMMG2019/wms/kml?layers=ActionOSMMG2019:MG_Antananarivo_pharmacy_point_OSM_20190427" + ] +``` + +You can use the parameter `limit` to limit results +``` + curl http://<host>:<port>/images/one+piece?limit=4 + + + [ + "https://tse1.mm.bing.net/th?id=OIP.GlNk7idD3RCI_SYLiVzSBAHaE7", + "https://tse2.mm.bing.net/th?id=OIP.uePUN5rwpB-7wicu1uxQcgHaFj", + "https://tse2.mm.bing.net/th?id=OIP.dwWBU-A_6KPvvEYsL2nhVgHaFc", + "https://tse1.mm.bing.net/th?id=OIP.5M8tKIhIWvbqGO1prhUGfAHaJ4" + ] + +``` + + + </details> + + +_____________________________________________________________________ + + +## Show your support + Give a star 🌟 if this project helped you! + + [](https://www.buymeacoffee.com/gaetan1903) + + +## License + +MIT License + +Copyright (c) 2021 [iTeam-$](https://iteam-s.mg) + + +___________________________________________________________________ + + ## Contributors + + + + + + +%package help +Summary: Development documents and examples for websearch-python +Provides: python3-websearch-python-doc +%description help +# WebSearch + + + +> Python module allowing you to do various searches for links on the Web. + + +[](https://github.com/iTeam-S/WebSearch/actions/workflows/python-test.yml) +[](https://github.com/iTeam-S/WebSearch/actions/workflows/pip-upload.yml) + +[](https://pypi.org/project/websearch-python/) +[](https://pypi.org/project/websearch-python/) + + + +## Installation + +```s +pip3 install websearch-python +``` +**OR** you can install dev version +```s +pip3 install https://github.com/iTeam-S/WebSearch/archive/refs/heads/main.zip +``` + +## Use + +### Quick Start as Module + +```python +from websearch import WebSearch as web +for page in web('iTeam-$').pages[:2]: + print(page) +``` + +``` +[RESULTS] + + https://iteam-s.mg/ + https://github.com/iTeam-S +``` + + +### Quick Start as Webserver + +```s +# run webserver +websearch --host 0.0.0.0 --port 7845 +``` + +**OR** + +```s +# run webserver +python -m websearch --host 0.0.0.0 --port 7845 +``` + +```s +# requests contents +curl http://0.0.0.0:7845/pages/botoravony+arleme +``` + + ```json + [ + "https://portfolio.iteam-s.mg/?id=2", + "https://portfolio.iteam-s.mg/libs/cv/arleme.pdf", + "https://madagascar.webcup.fr/team-webcup/iteams" + ] +``` + +### Use Deployed Version +```s +curl https://websearch-python.herokuapp.com/pages/botoravony+arleme +``` + +__________________________ + +<details> + <summary style='font-size:24'> FULL DOCUMENTATION </summary> + +### Initialization + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +``` +You can pass a `list` for mutliple keyword. + +```python +web = WebSearch(['Gaetan Jonathan BAKARY', 'iTeam-S']) +``` +You can also specify a `website` as a reference. + +```python +web = WebSearch('Gaetan Jonathan', site='iteam-s.mg') +``` + + +### Webpages results + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +webpages = web.pages +for wp in webpages[:5]: + print(wp) +``` + +``` +[RESULTS] + + https://mg.linkedin.com/in/gaetanj + https://portfolio.iteam-s.mg/?u=gaetan + https://github.com/gaetan1903 + https://medium.com/@gaetan1903 + https://gitlab.com/gaetan1903 +``` + + +### Images results + +```python +from websearch import WebSearch +web = WebSearch('Gaetan Jonathan BAKARY') +webimages = web.images +for im in webimages[:5]: + print(im) +``` + +``` +[RESULTS] + + https://tse3.mm.bing.net/th?id=OIP.-K25y8TqkOi9UG_40Ti8bgAAAA + https://tse1.mm.bing.net/th?id=OIP.yJPVcDx6znFSOewLdQBbHgHaJA + https://tse3.mm.bing.net/th?id=OIP.7rO2T_nDAS0bXm4tQ4LKQAHaJA + https://tse2.mm.bing.net/th?id=OIP.IUIEkGQVzYRKaDA7WeeV7QHaEF + https://tse3.explicit.bing.net/th?id=OIP.OmvVnMIVu2ZdNZHZzJK_hgAAAA +``` + + +### PDF results + +```python +from websearch import WebSearch +web = WebSearch('Math 220') +pdfs = web.pdf +for pdf in pdfs[:5]: + print(pdf) +``` + +``` +[RESULTS] + + https://www.coconino.edu/resources/files/pdfs/registration/curriculum/course-outlines/m/mat/mat_220.pdf + https://www.jmu.edu/mathstat/Files/ALEKSmatrix.pdf + https://www.jjc.edu/sites/default/files/Academics/Math/M220%20Master%20Syllabus%20SP18.pdf + https://www.sonoma.edu/sites/www/files/2018-19cat-11math.pdf + https://www.svsd.net/cms/lib5/PA01001234/Centricity/Domain/1009/3.3-3.3B-Practice-KEY.pdf +``` + +To prevent the search for attachments with format verification, set `verif=False`, which is `True` by default. + +Format verification is presented [here](https://github.com/iTeam-S/WebSearch/pull/4) + +```python +from websearch import WebSearch +web = WebSearch('Math 220', verif=False) +``` + + +### DOCX results +```python +from websearch import WebSearch: +web = WebSearch('python') +words = web.docx +for word in words[:3]: + print(word) +``` + +``` +[RESULTS] + + https://www.ocr.org.uk/Images/572953-j277-programming-techniques-python.docx + https://www.niu.edu/brown/_pdf/physics374_spring2021/l1-19-21.docx + https://ent2d.ac-bordeaux.fr/disciplines/mathematiques/wp-content/uploads/sites/3/2017/09/de-Scratch-%C3%A0-Python.docx +``` + + +### XLSX results +```python +from websearch import WebSearch: +web = WebSearch('datalist') +excels = web.xlsx +for excel in excels[:3]: + print(excel) +``` + +``` +[RESULTS] + + https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/979255/Detailed_Single_Data_List_-_2021-2022.xlsx + https://www.jaist.ac.jp/top/data/list-achievement-research-e.xlsx + https://img1.wsimg.com/blobby/go/bed8f8d7-d6c2-488d-9aa3-5910e18aa8d2/downloads/Datalist.xlsx +``` + + +### PPTX results +```python +from websearch import WebSearch: +web = WebSearch('Leadership') +powerpoints = web.pptx +for powerpoint in powerpoints[:3]: + print(powerpoint) +``` + +``` +[RESULTS] + + https://www.plainviewisd.org/cms/lib6/TX01918200/Centricity/Domain/853/Leadership%20Behav.%20Styles.pptx + https://www.yorksandhumberdeanery.nhs.uk/sites/default/files/leadership_activity_and_msf.pptx + https://www.itfglobal.org/sites/default/files/node/resources/files/Stage%203.1%20Powerpoint.pptx +``` + + +### ODT results +```python +from websearch import WebSearch +web = WebSearch('Finance') +documents = web.odt +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] + https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/970748/Green_Finance_Report.odt + https://iati.fcdo.gov.uk/iati_documents/3678707.odt + +``` + +### ODS results +```python +from websearch import WebSearch +web = WebSearch('Commerce') +documents = web.ods +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] +http://www.justice.gouv.fr/art_pix/Stat_RSJ_12.7_Civil_Les_tribunaux_de_commerce.ods +https://www.insee.fr/fr/metadonnees/source/fichier/Precision-principaux-indicateurs-crise-sanitaire-2020.ods +``` + +### ODP results +```python +from websearch import WebSearch +web = WebSearch('Renaissance') +documents = web.odp +for doc in documents[:2]: + print(doc) +``` + +``` +[RESULTS] +http://ekladata.com/9sHTcbLYfwbNGKU9cpnZXjlsbfA/17-Art-Renaissance.odp +https://www.college-yvescoppens-malestroit.ac-rennes.fr/sites/college-yvescoppens-malestroit.ac-rennes.fr/IMG/odp/diapo-presentation-voyage-5e.odp +``` + +### KML results +```python +from websearch import WebSearch +web = WebSearch('Madagascar') +maps = web.kml +for map in maps[:3]: + print(map) +``` + +``` +[RESULTS] +http://www.hydrosciences.fr/sierem/kmz_files/MGPLGRA.kml +https://www.ngoaidmap.org/downloads?doc=kml&name=association-intercooperation-madagascar-aim_projects&partners%5B%5D=6160§ors%5B%5D=1&status=active +https://ngoaidmap.org/downloads?doc=kml&name=nemp-madagascar-cyclone-enawo-response_projects&projects%5B%5D=20655&status=active +``` + +### CUSTOM results + +For other extensions, not present, use the `custom` function + +Second arg can be taken [here](https://developer.mozilla.org/fr/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types) + +```python +from websearch import WebSearch +web = WebSearch('Biologie') +ps_documents = web.custom('ps', 'application/postscript') +for doc in ps_documents[:3]: + print(doc) +``` + +``` +[RESULTS] + +http://irma.math.unistra.fr/~fbertran/Master1_2020_2/L3Court.ps +http://jfla.inria.fr/2002/actes/10-michel.ps +https://www.crstra.dz/telechargement/pnr/ps/environnement/fadel-djamel.ps +``` + + +### Webserver + +you can deploy as webserver and send an http request + +```s + python -m websearch --host [host] --port [port] + [*] default host : 0.0.0.0 + [*] default port : 7845 +``` +Exemple for page: + ```s + curl http://<host>:<port>/pages/botoravony+arleme + + + [ + + "https://portfolio.iteam-s.mg/?id=2", + "https://portfolio.iteam-s.mg/libs/cv/arleme.pdf", + "https://madagascar.webcup.fr/team-webcup/iteams" + ] +``` + +Exemple for image: +```s + curl http://<host>:<port>/images/one+piece + + + [ + "https://tse1.mm.bing.net/th?id=OIP.GlNk7idD3RCI_SYLiVzSBAHaE7", + "https://tse2.mm.bing.net/th?id=OIP.uePUN5rwpB-7wicu1uxQcgHaFj", + "https://tse2.mm.bing.net/th?id=OIP.dwWBU-A_6KPvvEYsL2nhVgHaFc", + "https://tse1.mm.bing.net/th?id=OIP.5M8tKIhIWvbqGO1prhUGfAHaJ4", + ..... + "https://tse4.mm.bing.net/th?id=OIP.uvp3efwHRLDJnUWZ5KLWCwHaE8", + "https://tse3.mm.bing.net/th?id=OIP.d_uUoc-8R13RZ1bb76yhZgHaKp", + "https://tse1.mm.bing.net/th?id=OIP.cBWDvspBM036p6h4DS6RTAHaFj" + } +``` + +Search by extension : `curl http://<host>:<port>/<extension>/<query>` + +Where extension is from this list: + +``` +swf, pdf, ps, dwf, kml, kmz, gpx, hwp, htm, html, xls, xlsx, +ppt, pptx, doc, docx, odp, ods, odt, rtf, svg, tex, txt, text, +bas, c, cc, cpp, cxx, h, hpp, cs, java, pl, py, wml, wap, xml +``` + +Exemple : +```s + curl http://<host>:<port>/kml/madagascar+antananarivo + + + [ + "https://ifl.francophonelibre.org/atelier/ActionOSMMG2019/wms/kml?layers=ActionOSMMG2019:MG_Antananarivo_pharmacy_point_OSM_20190427" + ] +``` + +You can use the parameter `limit` to limit results +``` + curl http://<host>:<port>/images/one+piece?limit=4 + + + [ + "https://tse1.mm.bing.net/th?id=OIP.GlNk7idD3RCI_SYLiVzSBAHaE7", + "https://tse2.mm.bing.net/th?id=OIP.uePUN5rwpB-7wicu1uxQcgHaFj", + "https://tse2.mm.bing.net/th?id=OIP.dwWBU-A_6KPvvEYsL2nhVgHaFc", + "https://tse1.mm.bing.net/th?id=OIP.5M8tKIhIWvbqGO1prhUGfAHaJ4" + ] + +``` + + + </details> + + +_____________________________________________________________________ + + +## Show your support + Give a star 🌟 if this project helped you! + + [](https://www.buymeacoffee.com/gaetan1903) + + +## License + +MIT License + +Copyright (c) 2021 [iTeam-$](https://iteam-s.mg) + + +___________________________________________________________________ + + ## Contributors + + + + + + +%prep +%autosetup -n websearch-python-1.2.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-websearch-python -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 1.2.1-1 +- Package Spec generated @@ -0,0 +1 @@ +d024be2b68985f76662e10fa28c7e0ae websearch-python-1.2.1.tar.gz |
