automatic import of python-urlfinderlib

author: CoprDistGit <infra@openeuler.org> 2023-05-29 10:12:13 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-29 10:12:13 +0000
commit: efad75ada41d1764ca65bffefd1785fd8b173629 (patch)
tree: 27ef47154e320c1f774abb277cb200dd2c0016ba
parent: a439100f00b4f5150e20349a5eb250fda1a1e6f1 (diff)
3 files changed, 262 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..d258277 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/urlfinderlib-0.18.6.tar.gz
diff --git a/python-urlfinderlib.spec b/python-urlfinderlib.spec
new file mode 100644
index 0000000..3e5cef6
--- /dev/null
+++ b/python-urlfinderlib.spec
@@ -0,0 +1,260 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-urlfinderlib
+Version:	0.18.6
+Release:	1
+Summary:	Library to find URLs and check their validity.
+License:	Apache 2.0
+URL:		https://github.com/ace-ecosystem/urlfinderlib
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/f7/43/bb555dc65a18849062bc69f494b90bb47da0d4553c41f747d70b693c08b9/urlfinderlib-0.18.6.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-icalendar
+Requires:	python3-idna
+Requires:	python3-lxml
+Requires:	python3-pytest
+Requires:	python3-pytest-cov
+Requires:	python3-magic
+Requires:	python3-tld
+Requires:	python3-validators
+
+%description
+# urlfinderlib
+
+This is a Python (3.6+) library for finding URLs in documents and checking their validity.
+
+## Supported Documents
+
+Extracts URLs from the following types of documents:
+
+* Binary files (finds URLs within strings)
+* CSV files
+* HTML files
+* iCalendar/vCalendar files
+* PDF files
+* Text files (ASCII or UTF-8)
+* XML files
+
+Every extracted URL is validated such that it contains a domain with a valid TLD (or a valid IP address) and does not contain any invalid characters.
+
+## URL Permutations
+
+This was originally written to accommodate finding both valid and obfuscated or slightly malformed URLs used by malicious actors and using them as indicators of compromise (IOCs). As such, the extracted URLs will also include the following permutations:
+
+* URL with any Unicode characters in its domain
+* URL with any Unicode characters converted to its IDNA equivalent
+
+For both domain variations, the following permutations are also returned:
+
+* URL with its path %-encoded
+* URL with its path %-decoded
+* URL with encoded HTML entities in its path
+* URL with decoded HTML entities in its path
+* URL with its path %-decoded and HTML entities decoded
+
+## Child URLs
+
+This library also attempts to extract or decode child URLs found in the paths of URLs. The following formats are supported:
+
+* Barracuda protected URLs
+* Base64-encoded URLs found within the URL's path
+* Google redirect URLs
+* Mandrill/Mailchimp redirect URLs
+* Outlook Safe Links URLs
+* Proofpoint protected URLs
+* URLs found in the URL's path query parameters
+
+## Basic usage
+
+    from urlfinderlib import find_urls
+    
+    with open('/path/to/file', 'rb') as f:
+        print(find_urls(f.read())
+
+### base_url Parameter
+
+If you are trying to find URLs inside of an HTML file, the paths in the URLs are often relative to their location on the server hosting the HTML. You can use the *base_url* parameter in this case to extract these "relative" URLs.
+
+    from urlfinderlib import find_urls
+    
+    with open('/path/to/file', 'rb') as f:
+        print(find_urls(f.read(), base_url='http://example.com')
+
+
+%package -n python3-urlfinderlib
+Summary:	Library to find URLs and check their validity.
+Provides:	python-urlfinderlib
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-urlfinderlib
+# urlfinderlib
+
+This is a Python (3.6+) library for finding URLs in documents and checking their validity.
+
+## Supported Documents
+
+Extracts URLs from the following types of documents:
+
+* Binary files (finds URLs within strings)
+* CSV files
+* HTML files
+* iCalendar/vCalendar files
+* PDF files
+* Text files (ASCII or UTF-8)
+* XML files
+
+Every extracted URL is validated such that it contains a domain with a valid TLD (or a valid IP address) and does not contain any invalid characters.
+
+## URL Permutations
+
+This was originally written to accommodate finding both valid and obfuscated or slightly malformed URLs used by malicious actors and using them as indicators of compromise (IOCs). As such, the extracted URLs will also include the following permutations:
+
+* URL with any Unicode characters in its domain
+* URL with any Unicode characters converted to its IDNA equivalent
+
+For both domain variations, the following permutations are also returned:
+
+* URL with its path %-encoded
+* URL with its path %-decoded
+* URL with encoded HTML entities in its path
+* URL with decoded HTML entities in its path
+* URL with its path %-decoded and HTML entities decoded
+
+## Child URLs
+
+This library also attempts to extract or decode child URLs found in the paths of URLs. The following formats are supported:
+
+* Barracuda protected URLs
+* Base64-encoded URLs found within the URL's path
+* Google redirect URLs
+* Mandrill/Mailchimp redirect URLs
+* Outlook Safe Links URLs
+* Proofpoint protected URLs
+* URLs found in the URL's path query parameters
+
+## Basic usage
+
+    from urlfinderlib import find_urls
+    
+    with open('/path/to/file', 'rb') as f:
+        print(find_urls(f.read())
+
+### base_url Parameter
+
+If you are trying to find URLs inside of an HTML file, the paths in the URLs are often relative to their location on the server hosting the HTML. You can use the *base_url* parameter in this case to extract these "relative" URLs.
+
+    from urlfinderlib import find_urls
+    
+    with open('/path/to/file', 'rb') as f:
+        print(find_urls(f.read(), base_url='http://example.com')
+
+
+%package help
+Summary:	Development documents and examples for urlfinderlib
+Provides:	python3-urlfinderlib-doc
+%description help
+# urlfinderlib
+
+This is a Python (3.6+) library for finding URLs in documents and checking their validity.
+
+## Supported Documents
+
+Extracts URLs from the following types of documents:
+
+* Binary files (finds URLs within strings)
+* CSV files
+* HTML files
+* iCalendar/vCalendar files
+* PDF files
+* Text files (ASCII or UTF-8)
+* XML files
+
+Every extracted URL is validated such that it contains a domain with a valid TLD (or a valid IP address) and does not contain any invalid characters.
+
+## URL Permutations
+
+This was originally written to accommodate finding both valid and obfuscated or slightly malformed URLs used by malicious actors and using them as indicators of compromise (IOCs). As such, the extracted URLs will also include the following permutations:
+
+* URL with any Unicode characters in its domain
+* URL with any Unicode characters converted to its IDNA equivalent
+
+For both domain variations, the following permutations are also returned:
+
+* URL with its path %-encoded
+* URL with its path %-decoded
+* URL with encoded HTML entities in its path
+* URL with decoded HTML entities in its path
+* URL with its path %-decoded and HTML entities decoded
+
+## Child URLs
+
+This library also attempts to extract or decode child URLs found in the paths of URLs. The following formats are supported:
+
+* Barracuda protected URLs
+* Base64-encoded URLs found within the URL's path
+* Google redirect URLs
+* Mandrill/Mailchimp redirect URLs
+* Outlook Safe Links URLs
+* Proofpoint protected URLs
+* URLs found in the URL's path query parameters
+
+## Basic usage
+
+    from urlfinderlib import find_urls
+    
+    with open('/path/to/file', 'rb') as f:
+        print(find_urls(f.read())
+
+### base_url Parameter
+
+If you are trying to find URLs inside of an HTML file, the paths in the URLs are often relative to their location on the server hosting the HTML. You can use the *base_url* parameter in this case to extract these "relative" URLs.
+
+    from urlfinderlib import find_urls
+    
+    with open('/path/to/file', 'rb') as f:
+        print(find_urls(f.read(), base_url='http://example.com')
+
+
+%prep
+%autosetup -n urlfinderlib-0.18.6
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-urlfinderlib -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.18.6-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..8a545d2
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+3c5a6a19c2becb6b69b1699875c8367f  urlfinderlib-0.18.6.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-29 10:12:13 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-29 10:12:13 +0000
commit	efad75ada41d1764ca65bffefd1785fd8b173629 (patch)
tree	27ef47154e320c1f774abb277cb200dd2c0016ba
parent	a439100f00b4f5150e20349a5eb250fda1a1e6f1 (diff)