%global _empty_manifest_terminate_build 0 Name: python-smaz-py3 Version: 1.1.2 Release: 1 Summary: Small string compression using smaz, supports Python 3. License: BSD URL: https://github.com/originell/smaz-py3 Source0: https://mirrors.nju.edu.cn/pypi/web/packages/01/e5/c7672eeb7e969d1ffa05bbdacab25c877b27eeddfa5e7e45b757b39fce5d/smaz-py3-1.1.2.tar.gz %description # smaz-py3 Small string compression using [_smaz_](https://github.com/antirez/smaz) compression algorithm. This library wraps the original C code, so it should be quite fast. It also has a testsuite that uses [hypothesis](https://hypothesis.readthedocs.io/en/latest/) based property testing - a fancy way of saying that the tests are run with randomly generated strings using most of unicode, to better guard against edge cases. ## Why do I need this? You are working with tons of short strings (text messages, urls,...) and want to save space. According to the original code and notes, it achieves the best compression with english strings (up to 50%) that do not contain a ton of numbers. However, any other language might just work as well (allegedly still up to 30%). Note that in certain cases it is possible that the compression increases the size. Keep that in mind and maybe first run some tests. Measuring size is explained in the example below as well. ## How do I use this? Let's install: ```sh $ pip install smaz-py3 ``` _Note_: the `-py3` is important. There is an original release, kudos to Benjamin Sergeant, but it does not work with Python 3+. Now, a usage example. ```python import smaz # First we compress our example sentence. compressed = smaz.compress("The quick brown fox jumps over the lazy dog.") # The output is raw bytes. As can be seen in the decompress() call below. # Now, we decompress these raw bytes again. This should return our example sentence. decompressed = smaz.decompress(b'H\x00\xfeq&\x83\xfek^sA)\xdc\xfa\x00\xfej&-<\x95\xe7\r\x0b\x89\xdbG\x18\x06;n') # This does not fail, which means we have successfully compressed and decompressed # without damaging anything. assert decompressed == "The quick brown fox jumps over the lazy dog." ``` How much did we compress? ```python # First, we get the actual byte size of our example string. original_size = len("The quick brown fox jumps over the lazy dog.".encode("utf-8")) # 44 bytes # As `compressed` is already raw bytes, we can also call len() on this compressed_size = len(compressed) # 31 bytes compression_ratio = 1 - compressed_size / original_size # 0.295 ``` So we saved about 30% (0.295 \* 100 and some rounding 😉). If the compression ratio would be below 0, we would have actually increased the string. Yes, this can happen. Again, smaz works best on _small_ strings. ### A small note about NULL bytes Currently, `smaz-py3` does not support strings with NULL bytes (`\x00`) in compression: ```python >>> import smaz >>> smaz.compress("The quick brown fox\x00 jumps over the lazy dog.") Traceback (most recent call last): File "", line 1, in ValueError: embedded null character ``` My reasoning behind this is that in most scenarios you want to clean that away beforehand anyways. If you think this is wrong, please open up an [issue on github](https://github.com/originell/smaz-py3). I am happy for further input! ## Migrating from Python 2 `smaz` If you have been using the [Python 2 `smaz` library](https://pypi.org/project/smaz/), this Python 3 version exposes the same API, so it is a drop-in replacement. **Important**: While developing this extension, I think I found a bug in the original library. Using Python 2.7.16: ```python >>> import smaz >>> smaz.compress("The quick brown fox jumps over the lazy dog.") 'H' # this is wrong. >>> small = smaz.compress("The quick brown fox jumps over the lazy dog.") >>> smaz.decompress(small) 'The' # information lost. ``` So, if you are actually upgrading from this, please make sure that you are not affected by this. `smaz-py3` is not prone to this bug. Behind the scenes, smaz uses NULL bytes in compression. However, when converting from C back to a Python string object, NULL is used to mark the end of the string. The above sentence, compressed, has the NULL byte right after the `H` (`H\x00\xfeq…`). That's why it stops right then and there. Again, `smaz-py3` is not affected by this, mostly because I got lucky in choosing this example sentence. ## Credits Credit where credit is due. First to [antirez's SMAZ compression](https://github.com/antirez/smaz) and to the [original python 2 wrapper](https://pypi.org/project/smaz/) by Benjamin Sergeant. %package -n python3-smaz-py3 Summary: Small string compression using smaz, supports Python 3. Provides: python-smaz-py3 BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip BuildRequires: python3-cffi BuildRequires: gcc BuildRequires: gdb %description -n python3-smaz-py3 # smaz-py3 Small string compression using [_smaz_](https://github.com/antirez/smaz) compression algorithm. This library wraps the original C code, so it should be quite fast. It also has a testsuite that uses [hypothesis](https://hypothesis.readthedocs.io/en/latest/) based property testing - a fancy way of saying that the tests are run with randomly generated strings using most of unicode, to better guard against edge cases. ## Why do I need this? You are working with tons of short strings (text messages, urls,...) and want to save space. According to the original code and notes, it achieves the best compression with english strings (up to 50%) that do not contain a ton of numbers. However, any other language might just work as well (allegedly still up to 30%). Note that in certain cases it is possible that the compression increases the size. Keep that in mind and maybe first run some tests. Measuring size is explained in the example below as well. ## How do I use this? Let's install: ```sh $ pip install smaz-py3 ``` _Note_: the `-py3` is important. There is an original release, kudos to Benjamin Sergeant, but it does not work with Python 3+. Now, a usage example. ```python import smaz # First we compress our example sentence. compressed = smaz.compress("The quick brown fox jumps over the lazy dog.") # The output is raw bytes. As can be seen in the decompress() call below. # Now, we decompress these raw bytes again. This should return our example sentence. decompressed = smaz.decompress(b'H\x00\xfeq&\x83\xfek^sA)\xdc\xfa\x00\xfej&-<\x95\xe7\r\x0b\x89\xdbG\x18\x06;n') # This does not fail, which means we have successfully compressed and decompressed # without damaging anything. assert decompressed == "The quick brown fox jumps over the lazy dog." ``` How much did we compress? ```python # First, we get the actual byte size of our example string. original_size = len("The quick brown fox jumps over the lazy dog.".encode("utf-8")) # 44 bytes # As `compressed` is already raw bytes, we can also call len() on this compressed_size = len(compressed) # 31 bytes compression_ratio = 1 - compressed_size / original_size # 0.295 ``` So we saved about 30% (0.295 \* 100 and some rounding 😉). If the compression ratio would be below 0, we would have actually increased the string. Yes, this can happen. Again, smaz works best on _small_ strings. ### A small note about NULL bytes Currently, `smaz-py3` does not support strings with NULL bytes (`\x00`) in compression: ```python >>> import smaz >>> smaz.compress("The quick brown fox\x00 jumps over the lazy dog.") Traceback (most recent call last): File "", line 1, in ValueError: embedded null character ``` My reasoning behind this is that in most scenarios you want to clean that away beforehand anyways. If you think this is wrong, please open up an [issue on github](https://github.com/originell/smaz-py3). I am happy for further input! ## Migrating from Python 2 `smaz` If you have been using the [Python 2 `smaz` library](https://pypi.org/project/smaz/), this Python 3 version exposes the same API, so it is a drop-in replacement. **Important**: While developing this extension, I think I found a bug in the original library. Using Python 2.7.16: ```python >>> import smaz >>> smaz.compress("The quick brown fox jumps over the lazy dog.") 'H' # this is wrong. >>> small = smaz.compress("The quick brown fox jumps over the lazy dog.") >>> smaz.decompress(small) 'The' # information lost. ``` So, if you are actually upgrading from this, please make sure that you are not affected by this. `smaz-py3` is not prone to this bug. Behind the scenes, smaz uses NULL bytes in compression. However, when converting from C back to a Python string object, NULL is used to mark the end of the string. The above sentence, compressed, has the NULL byte right after the `H` (`H\x00\xfeq…`). That's why it stops right then and there. Again, `smaz-py3` is not affected by this, mostly because I got lucky in choosing this example sentence. ## Credits Credit where credit is due. First to [antirez's SMAZ compression](https://github.com/antirez/smaz) and to the [original python 2 wrapper](https://pypi.org/project/smaz/) by Benjamin Sergeant. %package help Summary: Development documents and examples for smaz-py3 Provides: python3-smaz-py3-doc %description help # smaz-py3 Small string compression using [_smaz_](https://github.com/antirez/smaz) compression algorithm. This library wraps the original C code, so it should be quite fast. It also has a testsuite that uses [hypothesis](https://hypothesis.readthedocs.io/en/latest/) based property testing - a fancy way of saying that the tests are run with randomly generated strings using most of unicode, to better guard against edge cases. ## Why do I need this? You are working with tons of short strings (text messages, urls,...) and want to save space. According to the original code and notes, it achieves the best compression with english strings (up to 50%) that do not contain a ton of numbers. However, any other language might just work as well (allegedly still up to 30%). Note that in certain cases it is possible that the compression increases the size. Keep that in mind and maybe first run some tests. Measuring size is explained in the example below as well. ## How do I use this? Let's install: ```sh $ pip install smaz-py3 ``` _Note_: the `-py3` is important. There is an original release, kudos to Benjamin Sergeant, but it does not work with Python 3+. Now, a usage example. ```python import smaz # First we compress our example sentence. compressed = smaz.compress("The quick brown fox jumps over the lazy dog.") # The output is raw bytes. As can be seen in the decompress() call below. # Now, we decompress these raw bytes again. This should return our example sentence. decompressed = smaz.decompress(b'H\x00\xfeq&\x83\xfek^sA)\xdc\xfa\x00\xfej&-<\x95\xe7\r\x0b\x89\xdbG\x18\x06;n') # This does not fail, which means we have successfully compressed and decompressed # without damaging anything. assert decompressed == "The quick brown fox jumps over the lazy dog." ``` How much did we compress? ```python # First, we get the actual byte size of our example string. original_size = len("The quick brown fox jumps over the lazy dog.".encode("utf-8")) # 44 bytes # As `compressed` is already raw bytes, we can also call len() on this compressed_size = len(compressed) # 31 bytes compression_ratio = 1 - compressed_size / original_size # 0.295 ``` So we saved about 30% (0.295 \* 100 and some rounding 😉). If the compression ratio would be below 0, we would have actually increased the string. Yes, this can happen. Again, smaz works best on _small_ strings. ### A small note about NULL bytes Currently, `smaz-py3` does not support strings with NULL bytes (`\x00`) in compression: ```python >>> import smaz >>> smaz.compress("The quick brown fox\x00 jumps over the lazy dog.") Traceback (most recent call last): File "", line 1, in ValueError: embedded null character ``` My reasoning behind this is that in most scenarios you want to clean that away beforehand anyways. If you think this is wrong, please open up an [issue on github](https://github.com/originell/smaz-py3). I am happy for further input! ## Migrating from Python 2 `smaz` If you have been using the [Python 2 `smaz` library](https://pypi.org/project/smaz/), this Python 3 version exposes the same API, so it is a drop-in replacement. **Important**: While developing this extension, I think I found a bug in the original library. Using Python 2.7.16: ```python >>> import smaz >>> smaz.compress("The quick brown fox jumps over the lazy dog.") 'H' # this is wrong. >>> small = smaz.compress("The quick brown fox jumps over the lazy dog.") >>> smaz.decompress(small) 'The' # information lost. ``` So, if you are actually upgrading from this, please make sure that you are not affected by this. `smaz-py3` is not prone to this bug. Behind the scenes, smaz uses NULL bytes in compression. However, when converting from C back to a Python string object, NULL is used to mark the end of the string. The above sentence, compressed, has the NULL byte right after the `H` (`H\x00\xfeq…`). That's why it stops right then and there. Again, `smaz-py3` is not affected by this, mostly because I got lucky in choosing this example sentence. ## Credits Credit where credit is due. First to [antirez's SMAZ compression](https://github.com/antirez/smaz) and to the [original python 2 wrapper](https://pypi.org/project/smaz/) by Benjamin Sergeant. %prep %autosetup -n smaz-py3-1.1.2 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-smaz-py3 -f filelist.lst %dir %{python3_sitearch}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue May 30 2023 Python_Bot - 1.1.2-1 - Package Spec generated