diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-04-12 01:10:19 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-04-12 01:10:19 +0000 |
| commit | a3aae790d0b9d9aa30680d507a59d3ab97474434 (patch) | |
| tree | e723a5d017aa5f263127237f952531614ba7e122 /python-py-tlsh.spec | |
| parent | e7bb64762a75794fe58ba1e3034015a256e03a73 (diff) | |
automatic import of python-py-tlsh
Diffstat (limited to 'python-py-tlsh.spec')
| -rw-r--r-- | python-py-tlsh.spec | 387 |
1 files changed, 387 insertions, 0 deletions
diff --git a/python-py-tlsh.spec b/python-py-tlsh.spec new file mode 100644 index 0000000..30d189d --- /dev/null +++ b/python-py-tlsh.spec @@ -0,0 +1,387 @@ +%global _empty_manifest_terminate_build 0 +Name: python-py-tlsh +Version: 4.7.2 +Release: 1 +Summary: TLSH (C++ Python extension) +License: Apache or BSD +URL: https://github.com/trendmicro/tlsh +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/ba/5b/4d860cffd3e6e7afb277e159d97e11583fc3b611d22355799364dff325f1/py-tlsh-4.7.2.tar.gz +BuildArch: noarch + + +%description +# TLSH - C++ extension for Python + +[TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library. +Given a byte stream with a minimum length of 50 bytes +TLSH generates a hash value which can be used for similarity comparisons. +Similar objects will have similar hash values which allows for +the detection of similar objects by comparing their hash values. Note that +the byte stream should have a sufficient amount of complexity. For example, +a byte stream of identical bytes will not generate a hash value. + +## What's new in py-tlsh 4.7.2 +This Python module supercedes the python-tlsh package on PyPi. +The improvements in 4.7.2, are that we added additional functions to Python +* lvalue +* q1ratio +* q2ratio +* checksum +* bucket_value +* is_valid + +The improvements 4.5.0 were: +* fixed this package so that it works on Windows +* compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes +* fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79) + +## Usage + +```python +import tlsh + +tlsh.hash(data) +``` + +Note data needs to be bytes - not a string. +This is because TLSH is for binary data and binary data can contain a NULL (zero) byte. + +In default mode the data must contain at least 50 bytes to generate a hash value and that +it must have a certain amount of randomness. +To get the hash value of a file, try + +```python +tlsh.hash(open(file, 'rb').read()) +``` + +Note: the open statement has opened the file in binary mode. + +## Example +```python +import tlsh + +h1 = tlsh.hash(data) +h2 = tlsh.hash(similar_data) +score = tlsh.diff(h1, h2) + +h3 = tlsh.Tlsh() +with open('file', 'rb') as f: + for buf in iter(lambda: f.read(512), b''): + h3.update(buf) + h3.final() +# this assertion is stating that the distance between a TLSH and itself must be zero +assert h3.diff(h3) == 0 +score = h3.diff(h1) +``` + +## Extra Options + +The `diffxlen` function removes the file length component of the tlsh header from the comparison. + +```python +tlsh.diffxlen(h1, h2) +``` + +If a file with a repeating pattern is compared to a file with only a single instance of the pattern, +then the difference will be increased if the file lenght is included. +But by using the `diffxlen` function, the file length will be removed from consideration. + +## Backwards Compatibility Options + +If you use the "conservative" option, then the data must contain at least 256 characters. +For example, + +```python +import os +tlsh.conservativehash(os.urandom(256)) +``` + +should generate a hash, but + +```python +tlsh.conservativehash(os.urandom(100)) +``` + +will generate TNULL as it is less than 256 bytes. + +If you need to generate old style hashes (without the "T1" prefix) then use + +```python +tlsh.oldhash(os.urandom(100)) +``` + + +The old and conservative options may be combined: + +```python +tlsh.oldconservativehash(os.urandom(500)) +``` + +%package -n python3-py-tlsh +Summary: TLSH (C++ Python extension) +Provides: python-py-tlsh +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-py-tlsh +# TLSH - C++ extension for Python + +[TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library. +Given a byte stream with a minimum length of 50 bytes +TLSH generates a hash value which can be used for similarity comparisons. +Similar objects will have similar hash values which allows for +the detection of similar objects by comparing their hash values. Note that +the byte stream should have a sufficient amount of complexity. For example, +a byte stream of identical bytes will not generate a hash value. + +## What's new in py-tlsh 4.7.2 +This Python module supercedes the python-tlsh package on PyPi. +The improvements in 4.7.2, are that we added additional functions to Python +* lvalue +* q1ratio +* q2ratio +* checksum +* bucket_value +* is_valid + +The improvements 4.5.0 were: +* fixed this package so that it works on Windows +* compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes +* fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79) + +## Usage + +```python +import tlsh + +tlsh.hash(data) +``` + +Note data needs to be bytes - not a string. +This is because TLSH is for binary data and binary data can contain a NULL (zero) byte. + +In default mode the data must contain at least 50 bytes to generate a hash value and that +it must have a certain amount of randomness. +To get the hash value of a file, try + +```python +tlsh.hash(open(file, 'rb').read()) +``` + +Note: the open statement has opened the file in binary mode. + +## Example +```python +import tlsh + +h1 = tlsh.hash(data) +h2 = tlsh.hash(similar_data) +score = tlsh.diff(h1, h2) + +h3 = tlsh.Tlsh() +with open('file', 'rb') as f: + for buf in iter(lambda: f.read(512), b''): + h3.update(buf) + h3.final() +# this assertion is stating that the distance between a TLSH and itself must be zero +assert h3.diff(h3) == 0 +score = h3.diff(h1) +``` + +## Extra Options + +The `diffxlen` function removes the file length component of the tlsh header from the comparison. + +```python +tlsh.diffxlen(h1, h2) +``` + +If a file with a repeating pattern is compared to a file with only a single instance of the pattern, +then the difference will be increased if the file lenght is included. +But by using the `diffxlen` function, the file length will be removed from consideration. + +## Backwards Compatibility Options + +If you use the "conservative" option, then the data must contain at least 256 characters. +For example, + +```python +import os +tlsh.conservativehash(os.urandom(256)) +``` + +should generate a hash, but + +```python +tlsh.conservativehash(os.urandom(100)) +``` + +will generate TNULL as it is less than 256 bytes. + +If you need to generate old style hashes (without the "T1" prefix) then use + +```python +tlsh.oldhash(os.urandom(100)) +``` + + +The old and conservative options may be combined: + +```python +tlsh.oldconservativehash(os.urandom(500)) +``` + +%package help +Summary: Development documents and examples for py-tlsh +Provides: python3-py-tlsh-doc +%description help +# TLSH - C++ extension for Python + +[TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library. +Given a byte stream with a minimum length of 50 bytes +TLSH generates a hash value which can be used for similarity comparisons. +Similar objects will have similar hash values which allows for +the detection of similar objects by comparing their hash values. Note that +the byte stream should have a sufficient amount of complexity. For example, +a byte stream of identical bytes will not generate a hash value. + +## What's new in py-tlsh 4.7.2 +This Python module supercedes the python-tlsh package on PyPi. +The improvements in 4.7.2, are that we added additional functions to Python +* lvalue +* q1ratio +* q2ratio +* checksum +* bucket_value +* is_valid + +The improvements 4.5.0 were: +* fixed this package so that it works on Windows +* compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes +* fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79) + +## Usage + +```python +import tlsh + +tlsh.hash(data) +``` + +Note data needs to be bytes - not a string. +This is because TLSH is for binary data and binary data can contain a NULL (zero) byte. + +In default mode the data must contain at least 50 bytes to generate a hash value and that +it must have a certain amount of randomness. +To get the hash value of a file, try + +```python +tlsh.hash(open(file, 'rb').read()) +``` + +Note: the open statement has opened the file in binary mode. + +## Example +```python +import tlsh + +h1 = tlsh.hash(data) +h2 = tlsh.hash(similar_data) +score = tlsh.diff(h1, h2) + +h3 = tlsh.Tlsh() +with open('file', 'rb') as f: + for buf in iter(lambda: f.read(512), b''): + h3.update(buf) + h3.final() +# this assertion is stating that the distance between a TLSH and itself must be zero +assert h3.diff(h3) == 0 +score = h3.diff(h1) +``` + +## Extra Options + +The `diffxlen` function removes the file length component of the tlsh header from the comparison. + +```python +tlsh.diffxlen(h1, h2) +``` + +If a file with a repeating pattern is compared to a file with only a single instance of the pattern, +then the difference will be increased if the file lenght is included. +But by using the `diffxlen` function, the file length will be removed from consideration. + +## Backwards Compatibility Options + +If you use the "conservative" option, then the data must contain at least 256 characters. +For example, + +```python +import os +tlsh.conservativehash(os.urandom(256)) +``` + +should generate a hash, but + +```python +tlsh.conservativehash(os.urandom(100)) +``` + +will generate TNULL as it is less than 256 bytes. + +If you need to generate old style hashes (without the "T1" prefix) then use + +```python +tlsh.oldhash(os.urandom(100)) +``` + + +The old and conservative options may be combined: + +```python +tlsh.oldconservativehash(os.urandom(500)) +``` + +%prep +%autosetup -n py-tlsh-4.7.2 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-py-tlsh -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed Apr 12 2023 Python_Bot <Python_Bot@openeuler.org> - 4.7.2-1 +- Package Spec generated |
