automatic import of python-tdigest

author: CoprDistGit <infra@openeuler.org> 2023-04-10 16:31:05 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-04-10 16:31:05 +0000
commit: 90925f8cd6a9bc03a13e889cd9cd999462526e3b (patch)
tree: fdf5c21403046dbb325cfcdc19215b867fa5804f
parent: c4d2b1888921de04ca8adb6d6d26c0028cee4166 (diff)
3 files changed, 419 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..0cc1358 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/tdigest-0.5.2.2.tar.gz
diff --git a/python-tdigest.spec b/python-tdigest.spec
new file mode 100644
index 0000000..17eb80f
--- /dev/null
+++ b/python-tdigest.spec
@@ -0,0 +1,417 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-tdigest
+Version:	0.5.2.2
+Release:	1
+Summary:	T-Digest data structure
+License:	MIT
+URL:		https://github.com/CamDavidsonPilon/tdigest
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/dd/34/7e2f78d1ed0af7d0039ab2cff45b6bf8512234b9f178bb21713084a1f2f0/tdigest-0.5.2.2.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-accumulation-tree
+Requires:	python3-pyudorandom
+Requires:	python3-pytest
+Requires:	python3-pytest-timeout
+Requires:	python3-pytest-cov
+Requires:	python3-numpy
+
+%description
+# tdigest
+### Efficient percentile estimation of streaming or distributed data
+[![PyPI version](https://badge.fury.io/py/tdigest.svg)](https://badge.fury.io/py/tdigest)
+[![Build Status](https://travis-ci.org/CamDavidsonPilon/tdigest.svg?branch=master)](https://travis-ci.org/CamDavidsonPilon/tdigest)
+
+
+This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
+
+See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
+
+
+### Installation
+*tdigest* is compatible with both Python 2 and Python 3. 
+
+```
+pip install tdigest
+```
+
+### Usage
+
+#### Update the digest sequentially
+
+```
+from tdigest import TDigest
+from numpy.random import random
+
+digest = TDigest()
+for x in range(5000):
+    digest.update(random())
+
+print(digest.percentile(15))  # about 0.15, as 0.15 is the 15th percentile of the Uniform(0,1) distribution
+```
+
+#### Update the digest in batches
+
+```
+another_digest = TDigest()
+another_digest.batch_update(random(5000))
+print(another_digest.percentile(15))
+```
+
+#### Sum two digests to create a new digest
+
+```
+sum_digest = digest + another_digest 
+sum_digest.percentile(30)  # about 0.3
+```
+
+#### To dict or serializing a digest with JSON
+
+You can use the to_dict() method to turn a TDigest object into a standard Python dictionary.
+```
+digest = TDigest()
+digest.update(1)
+digest.update(2)
+digest.update(3)
+print(digest.to_dict())
+```
+Or you can get only a list of Centroids with `centroids_to_list()`.
+```
+digest.centroids_to_list()
+```
+
+Similarly, you can restore a Python dict of digest values with `update_from_dict()`. Centroids are merged with any existing ones in the digest.
+For example, make a fresh digest and restore values from a python dictionary.
+```
+digest = TDigest()
+digest.update_from_dict({'K': 25, 'delta': 0.01, 'centroids': [{'c': 1.0, 'm': 1.0}, {'c': 1.0, 'm': 2.0}, {'c': 1.0, 'm': 3.0}]})
+```
+
+K and delta values are optional, or you can provide only a list of centroids with `update_centroids_from_list()`.
+```
+digest = TDigest()
+digest.update_centroids([{'c': 1.0, 'm': 1.0}, {'c': 1.0, 'm': 2.0}, {'c': 1.0, 'm': 3.0}])
+```
+
+If you want to serialize with other tools like JSON, you can first convert to_dict().
+```
+json.dumps(digest.to_dict())
+```
+
+Alternatively, make a custom encoder function to provide as default to the standard json module.
+```
+def encoder(digest_obj):
+    return digest_obj.to_dict()
+```
+Then pass the encoder function as the default parameter.
+```
+json.dumps(digest, default=encoder)
+```
+
+
+### API 
+
+`TDigest.`
+
+ - `update(x, w=1)`: update the tdigest with value `x` and weight `w`.
+ - `batch_update(x, w=1)`: update the tdigest with values in array `x` and weight `w`.
+ - `compress()`: perform a compression on the underlying data structure that will shrink the memory footprint of it, without hurting accuracy. Good to perform after adding many values. 
+ - `percentile(p)`: return the `p`th percentile. Example: `p=50` is the median.
+ - `cdf(x)`: return the CDF the value `x` is at. 
+ - `trimmed_mean(p1, p2)`: return the mean of data set without the values below and above the `p1` and `p2` percentile respectively. 
+ - `to_dict()`: return a Python dictionary of the TDigest and internal Centroid values.
+ - `update_from_dict(dict_values)`: update from serialized dictionary values into the TDigest object.
+ - `centroids_to_list()`: return a Python list of the TDigest object's internal Centroid values.
+ - `update_centroids_from_list(list_values)`: update Centroids from a python list.
+
+
+
+
+
+
+
+
+
+%package -n python3-tdigest
+Summary:	T-Digest data structure
+Provides:	python-tdigest
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-tdigest
+# tdigest
+### Efficient percentile estimation of streaming or distributed data
+[![PyPI version](https://badge.fury.io/py/tdigest.svg)](https://badge.fury.io/py/tdigest)
+[![Build Status](https://travis-ci.org/CamDavidsonPilon/tdigest.svg?branch=master)](https://travis-ci.org/CamDavidsonPilon/tdigest)
+
+
+This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
+
+See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
+
+
+### Installation
+*tdigest* is compatible with both Python 2 and Python 3. 
+
+```
+pip install tdigest
+```
+
+### Usage
+
+#### Update the digest sequentially
+
+```
+from tdigest import TDigest
+from numpy.random import random
+
+digest = TDigest()
+for x in range(5000):
+    digest.update(random())
+
+print(digest.percentile(15))  # about 0.15, as 0.15 is the 15th percentile of the Uniform(0,1) distribution
+```
+
+#### Update the digest in batches
+
+```
+another_digest = TDigest()
+another_digest.batch_update(random(5000))
+print(another_digest.percentile(15))
+```
+
+#### Sum two digests to create a new digest
+
+```
+sum_digest = digest + another_digest 
+sum_digest.percentile(30)  # about 0.3
+```
+
+#### To dict or serializing a digest with JSON
+
+You can use the to_dict() method to turn a TDigest object into a standard Python dictionary.
+```
+digest = TDigest()
+digest.update(1)
+digest.update(2)
+digest.update(3)
+print(digest.to_dict())
+```
+Or you can get only a list of Centroids with `centroids_to_list()`.
+```
+digest.centroids_to_list()
+```
+
+Similarly, you can restore a Python dict of digest values with `update_from_dict()`. Centroids are merged with any existing ones in the digest.
+For example, make a fresh digest and restore values from a python dictionary.
+```
+digest = TDigest()
+digest.update_from_dict({'K': 25, 'delta': 0.01, 'centroids': [{'c': 1.0, 'm': 1.0}, {'c': 1.0, 'm': 2.0}, {'c': 1.0, 'm': 3.0}]})
+```
+
+K and delta values are optional, or you can provide only a list of centroids with `update_centroids_from_list()`.
+```
+digest = TDigest()
+digest.update_centroids([{'c': 1.0, 'm': 1.0}, {'c': 1.0, 'm': 2.0}, {'c': 1.0, 'm': 3.0}])
+```
+
+If you want to serialize with other tools like JSON, you can first convert to_dict().
+```
+json.dumps(digest.to_dict())
+```
+
+Alternatively, make a custom encoder function to provide as default to the standard json module.
+```
+def encoder(digest_obj):
+    return digest_obj.to_dict()
+```
+Then pass the encoder function as the default parameter.
+```
+json.dumps(digest, default=encoder)
+```
+
+
+### API 
+
+`TDigest.`
+
+ - `update(x, w=1)`: update the tdigest with value `x` and weight `w`.
+ - `batch_update(x, w=1)`: update the tdigest with values in array `x` and weight `w`.
+ - `compress()`: perform a compression on the underlying data structure that will shrink the memory footprint of it, without hurting accuracy. Good to perform after adding many values. 
+ - `percentile(p)`: return the `p`th percentile. Example: `p=50` is the median.
+ - `cdf(x)`: return the CDF the value `x` is at. 
+ - `trimmed_mean(p1, p2)`: return the mean of data set without the values below and above the `p1` and `p2` percentile respectively. 
+ - `to_dict()`: return a Python dictionary of the TDigest and internal Centroid values.
+ - `update_from_dict(dict_values)`: update from serialized dictionary values into the TDigest object.
+ - `centroids_to_list()`: return a Python list of the TDigest object's internal Centroid values.
+ - `update_centroids_from_list(list_values)`: update Centroids from a python list.
+
+
+
+
+
+
+
+
+
+%package help
+Summary:	Development documents and examples for tdigest
+Provides:	python3-tdigest-doc
+%description help
+# tdigest
+### Efficient percentile estimation of streaming or distributed data
+[![PyPI version](https://badge.fury.io/py/tdigest.svg)](https://badge.fury.io/py/tdigest)
+[![Build Status](https://travis-ci.org/CamDavidsonPilon/tdigest.svg?branch=master)](https://travis-ci.org/CamDavidsonPilon/tdigest)
+
+
+This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
+
+See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
+
+
+### Installation
+*tdigest* is compatible with both Python 2 and Python 3. 
+
+```
+pip install tdigest
+```
+
+### Usage
+
+#### Update the digest sequentially
+
+```
+from tdigest import TDigest
+from numpy.random import random
+
+digest = TDigest()
+for x in range(5000):
+    digest.update(random())
+
+print(digest.percentile(15))  # about 0.15, as 0.15 is the 15th percentile of the Uniform(0,1) distribution
+```
+
+#### Update the digest in batches
+
+```
+another_digest = TDigest()
+another_digest.batch_update(random(5000))
+print(another_digest.percentile(15))
+```
+
+#### Sum two digests to create a new digest
+
+```
+sum_digest = digest + another_digest 
+sum_digest.percentile(30)  # about 0.3
+```
+
+#### To dict or serializing a digest with JSON
+
+You can use the to_dict() method to turn a TDigest object into a standard Python dictionary.
+```
+digest = TDigest()
+digest.update(1)
+digest.update(2)
+digest.update(3)
+print(digest.to_dict())
+```
+Or you can get only a list of Centroids with `centroids_to_list()`.
+```
+digest.centroids_to_list()
+```
+
+Similarly, you can restore a Python dict of digest values with `update_from_dict()`. Centroids are merged with any existing ones in the digest.
+For example, make a fresh digest and restore values from a python dictionary.
+```
+digest = TDigest()
+digest.update_from_dict({'K': 25, 'delta': 0.01, 'centroids': [{'c': 1.0, 'm': 1.0}, {'c': 1.0, 'm': 2.0}, {'c': 1.0, 'm': 3.0}]})
+```
+
+K and delta values are optional, or you can provide only a list of centroids with `update_centroids_from_list()`.
+```
+digest = TDigest()
+digest.update_centroids([{'c': 1.0, 'm': 1.0}, {'c': 1.0, 'm': 2.0}, {'c': 1.0, 'm': 3.0}])
+```
+
+If you want to serialize with other tools like JSON, you can first convert to_dict().
+```
+json.dumps(digest.to_dict())
+```
+
+Alternatively, make a custom encoder function to provide as default to the standard json module.
+```
+def encoder(digest_obj):
+    return digest_obj.to_dict()
+```
+Then pass the encoder function as the default parameter.
+```
+json.dumps(digest, default=encoder)
+```
+
+
+### API 
+
+`TDigest.`
+
+ - `update(x, w=1)`: update the tdigest with value `x` and weight `w`.
+ - `batch_update(x, w=1)`: update the tdigest with values in array `x` and weight `w`.
+ - `compress()`: perform a compression on the underlying data structure that will shrink the memory footprint of it, without hurting accuracy. Good to perform after adding many values. 
+ - `percentile(p)`: return the `p`th percentile. Example: `p=50` is the median.
+ - `cdf(x)`: return the CDF the value `x` is at. 
+ - `trimmed_mean(p1, p2)`: return the mean of data set without the values below and above the `p1` and `p2` percentile respectively. 
+ - `to_dict()`: return a Python dictionary of the TDigest and internal Centroid values.
+ - `update_from_dict(dict_values)`: update from serialized dictionary values into the TDigest object.
+ - `centroids_to_list()`: return a Python list of the TDigest object's internal Centroid values.
+ - `update_centroids_from_list(list_values)`: update Centroids from a python list.
+
+
+
+
+
+
+
+
+
+%prep
+%autosetup -n tdigest-0.5.2.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-tdigest -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon Apr 10 2023 Python_Bot <Python_Bot@openeuler.org> - 0.5.2.2-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..87b154a
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+07637824cb88ef904bb5dade8e7408d1  tdigest-0.5.2.2.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-04-10 16:31:05 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-04-10 16:31:05 +0000
commit	90925f8cd6a9bc03a13e889cd9cd999462526e3b (patch)
tree	fdf5c21403046dbb325cfcdc19215b867fa5804f
parent	c4d2b1888921de04ca8adb6d6d26c0028cee4166 (diff)