summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-py-tlsh.spec387
-rw-r--r--sources1
3 files changed, 389 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..353e711 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/py-tlsh-4.7.2.tar.gz
diff --git a/python-py-tlsh.spec b/python-py-tlsh.spec
new file mode 100644
index 0000000..30d189d
--- /dev/null
+++ b/python-py-tlsh.spec
@@ -0,0 +1,387 @@
+%global _empty_manifest_terminate_build 0
+Name: python-py-tlsh
+Version: 4.7.2
+Release: 1
+Summary: TLSH (C++ Python extension)
+License: Apache or BSD
+URL: https://github.com/trendmicro/tlsh
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/ba/5b/4d860cffd3e6e7afb277e159d97e11583fc3b611d22355799364dff325f1/py-tlsh-4.7.2.tar.gz
+BuildArch: noarch
+
+
+%description
+# TLSH - C++ extension for Python
+
+[TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library.
+Given a byte stream with a minimum length of 50 bytes
+TLSH generates a hash value which can be used for similarity comparisons.
+Similar objects will have similar hash values which allows for
+the detection of similar objects by comparing their hash values. Note that
+the byte stream should have a sufficient amount of complexity. For example,
+a byte stream of identical bytes will not generate a hash value.
+
+## What's new in py-tlsh 4.7.2
+This Python module supercedes the python-tlsh package on PyPi.
+The improvements in 4.7.2, are that we added additional functions to Python
+* lvalue
+* q1ratio
+* q2ratio
+* checksum
+* bucket_value
+* is_valid
+
+The improvements 4.5.0 were:
+* fixed this package so that it works on Windows
+* compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes
+* fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79)
+
+## Usage
+
+```python
+import tlsh
+
+tlsh.hash(data)
+```
+
+Note data needs to be bytes - not a string.
+This is because TLSH is for binary data and binary data can contain a NULL (zero) byte.
+
+In default mode the data must contain at least 50 bytes to generate a hash value and that
+it must have a certain amount of randomness.
+To get the hash value of a file, try
+
+```python
+tlsh.hash(open(file, 'rb').read())
+```
+
+Note: the open statement has opened the file in binary mode.
+
+## Example
+```python
+import tlsh
+
+h1 = tlsh.hash(data)
+h2 = tlsh.hash(similar_data)
+score = tlsh.diff(h1, h2)
+
+h3 = tlsh.Tlsh()
+with open('file', 'rb') as f:
+ for buf in iter(lambda: f.read(512), b''):
+ h3.update(buf)
+ h3.final()
+# this assertion is stating that the distance between a TLSH and itself must be zero
+assert h3.diff(h3) == 0
+score = h3.diff(h1)
+```
+
+## Extra Options
+
+The `diffxlen` function removes the file length component of the tlsh header from the comparison.
+
+```python
+tlsh.diffxlen(h1, h2)
+```
+
+If a file with a repeating pattern is compared to a file with only a single instance of the pattern,
+then the difference will be increased if the file lenght is included.
+But by using the `diffxlen` function, the file length will be removed from consideration.
+
+## Backwards Compatibility Options
+
+If you use the "conservative" option, then the data must contain at least 256 characters.
+For example,
+
+```python
+import os
+tlsh.conservativehash(os.urandom(256))
+```
+
+should generate a hash, but
+
+```python
+tlsh.conservativehash(os.urandom(100))
+```
+
+will generate TNULL as it is less than 256 bytes.
+
+If you need to generate old style hashes (without the "T1" prefix) then use
+
+```python
+tlsh.oldhash(os.urandom(100))
+```
+
+
+The old and conservative options may be combined:
+
+```python
+tlsh.oldconservativehash(os.urandom(500))
+```
+
+%package -n python3-py-tlsh
+Summary: TLSH (C++ Python extension)
+Provides: python-py-tlsh
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-py-tlsh
+# TLSH - C++ extension for Python
+
+[TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library.
+Given a byte stream with a minimum length of 50 bytes
+TLSH generates a hash value which can be used for similarity comparisons.
+Similar objects will have similar hash values which allows for
+the detection of similar objects by comparing their hash values. Note that
+the byte stream should have a sufficient amount of complexity. For example,
+a byte stream of identical bytes will not generate a hash value.
+
+## What's new in py-tlsh 4.7.2
+This Python module supercedes the python-tlsh package on PyPi.
+The improvements in 4.7.2, are that we added additional functions to Python
+* lvalue
+* q1ratio
+* q2ratio
+* checksum
+* bucket_value
+* is_valid
+
+The improvements 4.5.0 were:
+* fixed this package so that it works on Windows
+* compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes
+* fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79)
+
+## Usage
+
+```python
+import tlsh
+
+tlsh.hash(data)
+```
+
+Note data needs to be bytes - not a string.
+This is because TLSH is for binary data and binary data can contain a NULL (zero) byte.
+
+In default mode the data must contain at least 50 bytes to generate a hash value and that
+it must have a certain amount of randomness.
+To get the hash value of a file, try
+
+```python
+tlsh.hash(open(file, 'rb').read())
+```
+
+Note: the open statement has opened the file in binary mode.
+
+## Example
+```python
+import tlsh
+
+h1 = tlsh.hash(data)
+h2 = tlsh.hash(similar_data)
+score = tlsh.diff(h1, h2)
+
+h3 = tlsh.Tlsh()
+with open('file', 'rb') as f:
+ for buf in iter(lambda: f.read(512), b''):
+ h3.update(buf)
+ h3.final()
+# this assertion is stating that the distance between a TLSH and itself must be zero
+assert h3.diff(h3) == 0
+score = h3.diff(h1)
+```
+
+## Extra Options
+
+The `diffxlen` function removes the file length component of the tlsh header from the comparison.
+
+```python
+tlsh.diffxlen(h1, h2)
+```
+
+If a file with a repeating pattern is compared to a file with only a single instance of the pattern,
+then the difference will be increased if the file lenght is included.
+But by using the `diffxlen` function, the file length will be removed from consideration.
+
+## Backwards Compatibility Options
+
+If you use the "conservative" option, then the data must contain at least 256 characters.
+For example,
+
+```python
+import os
+tlsh.conservativehash(os.urandom(256))
+```
+
+should generate a hash, but
+
+```python
+tlsh.conservativehash(os.urandom(100))
+```
+
+will generate TNULL as it is less than 256 bytes.
+
+If you need to generate old style hashes (without the "T1" prefix) then use
+
+```python
+tlsh.oldhash(os.urandom(100))
+```
+
+
+The old and conservative options may be combined:
+
+```python
+tlsh.oldconservativehash(os.urandom(500))
+```
+
+%package help
+Summary: Development documents and examples for py-tlsh
+Provides: python3-py-tlsh-doc
+%description help
+# TLSH - C++ extension for Python
+
+[TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library.
+Given a byte stream with a minimum length of 50 bytes
+TLSH generates a hash value which can be used for similarity comparisons.
+Similar objects will have similar hash values which allows for
+the detection of similar objects by comparing their hash values. Note that
+the byte stream should have a sufficient amount of complexity. For example,
+a byte stream of identical bytes will not generate a hash value.
+
+## What's new in py-tlsh 4.7.2
+This Python module supercedes the python-tlsh package on PyPi.
+The improvements in 4.7.2, are that we added additional functions to Python
+* lvalue
+* q1ratio
+* q2ratio
+* checksum
+* bucket_value
+* is_valid
+
+The improvements 4.5.0 were:
+* fixed this package so that it works on Windows
+* compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes
+* fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79)
+
+## Usage
+
+```python
+import tlsh
+
+tlsh.hash(data)
+```
+
+Note data needs to be bytes - not a string.
+This is because TLSH is for binary data and binary data can contain a NULL (zero) byte.
+
+In default mode the data must contain at least 50 bytes to generate a hash value and that
+it must have a certain amount of randomness.
+To get the hash value of a file, try
+
+```python
+tlsh.hash(open(file, 'rb').read())
+```
+
+Note: the open statement has opened the file in binary mode.
+
+## Example
+```python
+import tlsh
+
+h1 = tlsh.hash(data)
+h2 = tlsh.hash(similar_data)
+score = tlsh.diff(h1, h2)
+
+h3 = tlsh.Tlsh()
+with open('file', 'rb') as f:
+ for buf in iter(lambda: f.read(512), b''):
+ h3.update(buf)
+ h3.final()
+# this assertion is stating that the distance between a TLSH and itself must be zero
+assert h3.diff(h3) == 0
+score = h3.diff(h1)
+```
+
+## Extra Options
+
+The `diffxlen` function removes the file length component of the tlsh header from the comparison.
+
+```python
+tlsh.diffxlen(h1, h2)
+```
+
+If a file with a repeating pattern is compared to a file with only a single instance of the pattern,
+then the difference will be increased if the file lenght is included.
+But by using the `diffxlen` function, the file length will be removed from consideration.
+
+## Backwards Compatibility Options
+
+If you use the "conservative" option, then the data must contain at least 256 characters.
+For example,
+
+```python
+import os
+tlsh.conservativehash(os.urandom(256))
+```
+
+should generate a hash, but
+
+```python
+tlsh.conservativehash(os.urandom(100))
+```
+
+will generate TNULL as it is less than 256 bytes.
+
+If you need to generate old style hashes (without the "T1" prefix) then use
+
+```python
+tlsh.oldhash(os.urandom(100))
+```
+
+
+The old and conservative options may be combined:
+
+```python
+tlsh.oldconservativehash(os.urandom(500))
+```
+
+%prep
+%autosetup -n py-tlsh-4.7.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-py-tlsh -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed Apr 12 2023 Python_Bot <Python_Bot@openeuler.org> - 4.7.2-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..9f2fa76
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+a293ed098b90bbf2cf5e7e31b7d3c267 py-tlsh-4.7.2.tar.gz