%global _empty_manifest_terminate_build 0 Name: python-py-tlsh Version: 4.7.2 Release: 1 Summary: TLSH (C++ Python extension) License: Apache or BSD URL: https://github.com/trendmicro/tlsh Source0: https://mirrors.nju.edu.cn/pypi/web/packages/ba/5b/4d860cffd3e6e7afb277e159d97e11583fc3b611d22355799364dff325f1/py-tlsh-4.7.2.tar.gz BuildArch: noarch %description # TLSH - C++ extension for Python [TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library. Given a byte stream with a minimum length of 50 bytes TLSH generates a hash value which can be used for similarity comparisons. Similar objects will have similar hash values which allows for the detection of similar objects by comparing their hash values. Note that the byte stream should have a sufficient amount of complexity. For example, a byte stream of identical bytes will not generate a hash value. ## What's new in py-tlsh 4.7.2 This Python module supercedes the python-tlsh package on PyPi. The improvements in 4.7.2, are that we added additional functions to Python * lvalue * q1ratio * q2ratio * checksum * bucket_value * is_valid The improvements 4.5.0 were: * fixed this package so that it works on Windows * compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes * fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79) ## Usage ```python import tlsh tlsh.hash(data) ``` Note data needs to be bytes - not a string. This is because TLSH is for binary data and binary data can contain a NULL (zero) byte. In default mode the data must contain at least 50 bytes to generate a hash value and that it must have a certain amount of randomness. To get the hash value of a file, try ```python tlsh.hash(open(file, 'rb').read()) ``` Note: the open statement has opened the file in binary mode. ## Example ```python import tlsh h1 = tlsh.hash(data) h2 = tlsh.hash(similar_data) score = tlsh.diff(h1, h2) h3 = tlsh.Tlsh() with open('file', 'rb') as f: for buf in iter(lambda: f.read(512), b''): h3.update(buf) h3.final() # this assertion is stating that the distance between a TLSH and itself must be zero assert h3.diff(h3) == 0 score = h3.diff(h1) ``` ## Extra Options The `diffxlen` function removes the file length component of the tlsh header from the comparison. ```python tlsh.diffxlen(h1, h2) ``` If a file with a repeating pattern is compared to a file with only a single instance of the pattern, then the difference will be increased if the file lenght is included. But by using the `diffxlen` function, the file length will be removed from consideration. ## Backwards Compatibility Options If you use the "conservative" option, then the data must contain at least 256 characters. For example, ```python import os tlsh.conservativehash(os.urandom(256)) ``` should generate a hash, but ```python tlsh.conservativehash(os.urandom(100)) ``` will generate TNULL as it is less than 256 bytes. If you need to generate old style hashes (without the "T1" prefix) then use ```python tlsh.oldhash(os.urandom(100)) ``` The old and conservative options may be combined: ```python tlsh.oldconservativehash(os.urandom(500)) ``` %package -n python3-py-tlsh Summary: TLSH (C++ Python extension) Provides: python-py-tlsh BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-py-tlsh # TLSH - C++ extension for Python [TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library. Given a byte stream with a minimum length of 50 bytes TLSH generates a hash value which can be used for similarity comparisons. Similar objects will have similar hash values which allows for the detection of similar objects by comparing their hash values. Note that the byte stream should have a sufficient amount of complexity. For example, a byte stream of identical bytes will not generate a hash value. ## What's new in py-tlsh 4.7.2 This Python module supercedes the python-tlsh package on PyPi. The improvements in 4.7.2, are that we added additional functions to Python * lvalue * q1ratio * q2ratio * checksum * bucket_value * is_valid The improvements 4.5.0 were: * fixed this package so that it works on Windows * compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes * fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79) ## Usage ```python import tlsh tlsh.hash(data) ``` Note data needs to be bytes - not a string. This is because TLSH is for binary data and binary data can contain a NULL (zero) byte. In default mode the data must contain at least 50 bytes to generate a hash value and that it must have a certain amount of randomness. To get the hash value of a file, try ```python tlsh.hash(open(file, 'rb').read()) ``` Note: the open statement has opened the file in binary mode. ## Example ```python import tlsh h1 = tlsh.hash(data) h2 = tlsh.hash(similar_data) score = tlsh.diff(h1, h2) h3 = tlsh.Tlsh() with open('file', 'rb') as f: for buf in iter(lambda: f.read(512), b''): h3.update(buf) h3.final() # this assertion is stating that the distance between a TLSH and itself must be zero assert h3.diff(h3) == 0 score = h3.diff(h1) ``` ## Extra Options The `diffxlen` function removes the file length component of the tlsh header from the comparison. ```python tlsh.diffxlen(h1, h2) ``` If a file with a repeating pattern is compared to a file with only a single instance of the pattern, then the difference will be increased if the file lenght is included. But by using the `diffxlen` function, the file length will be removed from consideration. ## Backwards Compatibility Options If you use the "conservative" option, then the data must contain at least 256 characters. For example, ```python import os tlsh.conservativehash(os.urandom(256)) ``` should generate a hash, but ```python tlsh.conservativehash(os.urandom(100)) ``` will generate TNULL as it is less than 256 bytes. If you need to generate old style hashes (without the "T1" prefix) then use ```python tlsh.oldhash(os.urandom(100)) ``` The old and conservative options may be combined: ```python tlsh.oldconservativehash(os.urandom(500)) ``` %package help Summary: Development documents and examples for py-tlsh Provides: python3-py-tlsh-doc %description help # TLSH - C++ extension for Python [TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library. Given a byte stream with a minimum length of 50 bytes TLSH generates a hash value which can be used for similarity comparisons. Similar objects will have similar hash values which allows for the detection of similar objects by comparing their hash values. Note that the byte stream should have a sufficient amount of complexity. For example, a byte stream of identical bytes will not generate a hash value. ## What's new in py-tlsh 4.7.2 This Python module supercedes the python-tlsh package on PyPi. The improvements in 4.7.2, are that we added additional functions to Python * lvalue * q1ratio * q2ratio * checksum * bucket_value * is_valid The improvements 4.5.0 were: * fixed this package so that it works on Windows * compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes * fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79) ## Usage ```python import tlsh tlsh.hash(data) ``` Note data needs to be bytes - not a string. This is because TLSH is for binary data and binary data can contain a NULL (zero) byte. In default mode the data must contain at least 50 bytes to generate a hash value and that it must have a certain amount of randomness. To get the hash value of a file, try ```python tlsh.hash(open(file, 'rb').read()) ``` Note: the open statement has opened the file in binary mode. ## Example ```python import tlsh h1 = tlsh.hash(data) h2 = tlsh.hash(similar_data) score = tlsh.diff(h1, h2) h3 = tlsh.Tlsh() with open('file', 'rb') as f: for buf in iter(lambda: f.read(512), b''): h3.update(buf) h3.final() # this assertion is stating that the distance between a TLSH and itself must be zero assert h3.diff(h3) == 0 score = h3.diff(h1) ``` ## Extra Options The `diffxlen` function removes the file length component of the tlsh header from the comparison. ```python tlsh.diffxlen(h1, h2) ``` If a file with a repeating pattern is compared to a file with only a single instance of the pattern, then the difference will be increased if the file lenght is included. But by using the `diffxlen` function, the file length will be removed from consideration. ## Backwards Compatibility Options If you use the "conservative" option, then the data must contain at least 256 characters. For example, ```python import os tlsh.conservativehash(os.urandom(256)) ``` should generate a hash, but ```python tlsh.conservativehash(os.urandom(100)) ``` will generate TNULL as it is less than 256 bytes. If you need to generate old style hashes (without the "T1" prefix) then use ```python tlsh.oldhash(os.urandom(100)) ``` The old and conservative options may be combined: ```python tlsh.oldconservativehash(os.urandom(500)) ``` %prep %autosetup -n py-tlsh-4.7.2 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-py-tlsh -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Wed Apr 12 2023 Python_Bot - 4.7.2-1 - Package Spec generated