automatic import of python-signal-processing-algorithms

author: CoprDistGit <infra@openeuler.org> 2023-04-12 02:10:22 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-04-12 02:10:22 +0000
commit: 39c352b37cd8b9708c7bf2e56b809a4cefe367f5 (patch)
tree: c15e1e9759b7989a30dba3a5ce52cacb348daee2
parent: 840cced2b1f3c7e4a5a16a4142da1db22ed12e8a (diff)
3 files changed, 499 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..22cd54d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/signal-processing-algorithms-2.0.1.tar.gz
diff --git a/python-signal-processing-algorithms.spec b/python-signal-processing-algorithms.spec
new file mode 100644
index 0000000..8a9c622
--- /dev/null
+++ b/python-signal-processing-algorithms.spec
@@ -0,0 +1,497 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-signal-processing-algorithms
+Version:	2.0.1
+Release:	1
+Summary:	Signal Processing Algorithms from MongoDB
+License:	Apache-2.0
+URL:		https://github.com/mongodb/signal-processing-algorithms
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/79/b9/c212922a7f0d367be1834a3e7c6341594060d3ff37cd8ec95b0a2459efc6/signal-processing-algorithms-2.0.1.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-more-itertools
+Requires:	python3-numpy
+Requires:	python3-scipy
+Requires:	python3-structlog
+Requires:	python3-typing-extensions
+
+%description
+# Signal Processing Algorithms
+
+A suite of algorithms implementing [Energy
+Statistics](https://en.wikipedia.org/wiki/Energy_distance), [E-Divisive with
+Means](https://arxiv.org/pdf/1306.4933.pdf) and [Generalized ESD Test for
+Outliers](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h3.htm) in python. See [The Use
+of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration
+System](https://dl.acm.org/doi/abs/10.1145/3358960.3375791) and [Creating a Virtuous Cycle in
+Performance Testing at MongoDB](https://dl.acm.org/doi/pdf/10.1145/3427921.3450234) for background
+on the development and use of this library.
+
+## Getting Started - Users
+```
+pip install signal-processing-algorithms
+```
+
+## Getting Started - Developers
+
+Getting the code:
+
+```
+$ git clone git@github.com:mongodb/signal-processing-algorithms.git
+$ cd signal-processing-algorithms
+```
+
+Installation
+```
+$ pip install poetry
+$ poetry install
+```
+Testing/linting:
+```
+$ poetry run pytest
+```
+
+Running the slow tests:
+```
+$ poetry run pytest --runslow
+```
+
+**Some of the larger tests can take a significant amount of time (more than 2 hours).**
+
+## Energy statistics
+[Energy Statistics](https://en.wikipedia.org/wiki/Energy_distance) is the statistical concept of Energy Distance 
+and can be used to measure how similar/different two distributions are.
+
+For statistical samples from two random variables X and Y:
+x1, x2, ..., xn and y1, y2, ..., yn
+
+E-Statistic is defined as:
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=E_{n,m}(X,Y):=2A-B-C" target="_blank"><img src="https://latex.codecogs.com/gif.latex?E_{n,m}(X,Y):=2A-B-C" title="E_{n,m}(X,Y):=2A-B-C" /></a>
+
+where,
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=A:={\frac&space;{1}{nm}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac&space;{1}{n^{2}}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac&space;{1}{m^{2}}}\sum&space;_{i=1}^{m}\sum&space;_{j=1}^{m}\|y_{i}-y_{j}\|" target="_blank"><img src="https://latex.codecogs.com/gif.latex?A:={\frac&space;{1}{nm}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac&space;{1}{n^{2}}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac&space;{1}{m^{2}}}\sum&space;_{i=1}^{m}\sum&space;_{j=1}^{m}\|y_{i}-y_{j}\|" title="A:={\frac {1}{nm}}\sum _{i=1}^{n}\sum _{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac {1}{m^{2}}}\sum _{i=1}^{m}\sum _{j=1}^{m}\|y_{i}-y_{j}\|" /></a>
+
+T-statistic is defined as: 
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" target="_blank"><img src="https://latex.codecogs.com/gif.latex?T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" title="T={\frac {nm}{n+m}}E_{{n,m}}(X,Y)" /></a>
+
+E-coefficient of inhomogeneity is defined as:
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=H=\frac{2E||X-Y||&space;-&space;E||X-X'||&space;-&space;E||Y-Y'||}{2E||X-Y||}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?H=\frac{2E||X-Y||&space;-&space;E||X-X'||&space;-&space;E||Y-Y'||}{2E||X-Y||}" title="H=\frac{2E||X-Y|| - E||X-X'|| - E||Y-Y'||}{2E||X-Y||}" /></a>
+
+
+```
+from signal_processing_algorithms.energy_statistics import energy_statistics
+from some_module import series1, series2
+
+# To get Energy Statistics of the distributions.
+es = energy_statistics.get_energy_statistics(series1, series2)
+
+# To get Energy Statistics and permutation test results of the distributions.
+es_with_probabilities = energy_statistics.get_energy_statistics_and_probabilities(series1, series2, permutations=100)
+
+```
+
+## Intro to E-Divisive
+
+Detecting distributional changes in a series of numerical values can be surprisingly difficult. Simple systems based on thresholds or
+ mean values can be yield false positives due to outliers in the data, and will fail to detect changes in the noise
+ profile of the series you are analyzing.
+ 
+One robust way of detecting many of the changes missed by other approaches is to use [E-Divisive with Means](https://arxiv.org/pdf/1306.4933.pdf), an energy
+ statistic based approach that compares the expected distance (Euclidean norm) between samples of two portions of the
+ series with the expected distance between samples within those portions.
+ 
+That is to say, assuming that the two portions can each be modeled as i.i.d. samples drawn from distinct random variables
+ (X for the first portion, Y for the second portion), you would expect the E-statistic to be non-zero if there is a
+ difference between the two portions: 
+ 
+ <a href="https://www.codecogs.com/eqnedit.php?latex=E_{n,m}(X,Y):=2A-B-C" target="_blank"><img src="https://latex.codecogs.com/gif.latex?E_{n,m}(X,Y):=2A-B-C" title="E_{n,m}(X,Y):=2A-B-C" /></a>
+ where A, B and C are as defined in the Energy Statistics above.
+
+One can prove that <a href="https://www.codecogs.com/eqnedit.php?latex={E_{n,m}(X,Y)\geq&space;0}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?{E_{n,m}(X,Y)\geq&space;0}" title="{E_{n,m}(X,Y)\geq 0}" /></a> and that the corresponding population value is zero if and only if X and Y have the same distribution. Under this null hypothesis the test statistic
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" target="_blank"><img src="https://latex.codecogs.com/gif.latex?T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" title="T={\frac {nm}{n+m}}E_{{n,m}}(X,Y)" /></a>
+
+converges in distribution to a quadratic form of independent standard normal random variables. Under the alternative hypothesis T tends to infinity. This makes it possible to construct a consistent statistical test, the energy test for equal distributions
+  
+Thus for a series Z of length L,
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=Z&space;=&space;\{Z_{1},&space;...,&space;Z_{\tau}&space;,&space;...&space;,&space;Z_{L}\},&space;X&space;=\{Z_{1},...,Z_{\tau}\},&space;Y=\{Z_{\tau&plus;1}\,...,Z_{L}\}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?Z&space;=&space;\{Z_{1},&space;...,&space;Z_{\tau}&space;,&space;...&space;,&space;Z_{L}\},&space;X&space;=\{Z_{1},...,Z_{\tau}\},&space;Y=\{Z_{\tau&plus;1}\,...,Z_{L}\}" title="Z = \{Z_{1}, ..., Z_{\tau} , ... , Z_{L}\}, X =\{Z_{1},...,Z_{\tau}\}, Y=\{Z_{\tau+1}\,...,Z_{L}\}" /></a>
+
+we find the most likely change point by solving the following for &tau; such that it has the maximum T statistic value.
+
+### Multiple Change Points
+
+The algorithm for finding multiple change points is also simple.
+
+Assuming you have some k known change points:
+1. Partition the series into segments between/around these change points.
+2. Find the maximum value of our divergence metric _within_ each partition.
+3. Take the maximum of the maxima we have just found --> this is our k+1th change point.
+4. Return to step 1 and continue until reaching your stopping criterion.
+
+### Stopping Criterion
+
+In this package we have implemented a permutation based test as a stopping criterion:
+
+After step 3 of the multiple change point procedure above, randomly permute all of the data _within_ each cluster, and
+ find the most likely change point for this permuted data using the procedure laid out above. 
+ 
+After performing this operation z times, count the number of
+ permuted change points z' that have higher divergence metrics than the change point you calculated with un-permuted data.
+ The significance level of your change point is thus z'/(z+1). 
+
+We allow users to configure a permutation tester with `pvalue`
+ and `permutations` representing the significance cutoff for algorithm termination and permutations to perform for each
+ test, respectively.
+ 
+### Example
+```
+from signal_processing_algorithms.energy_statistics import energy_statistics
+from some_module import series
+
+change_points = energy_statistics.e_divisive(series, pvalue=0.01, permutations=100)
+```
+
+
+
+%package -n python3-signal-processing-algorithms
+Summary:	Signal Processing Algorithms from MongoDB
+Provides:	python-signal-processing-algorithms
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-signal-processing-algorithms
+# Signal Processing Algorithms
+
+A suite of algorithms implementing [Energy
+Statistics](https://en.wikipedia.org/wiki/Energy_distance), [E-Divisive with
+Means](https://arxiv.org/pdf/1306.4933.pdf) and [Generalized ESD Test for
+Outliers](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h3.htm) in python. See [The Use
+of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration
+System](https://dl.acm.org/doi/abs/10.1145/3358960.3375791) and [Creating a Virtuous Cycle in
+Performance Testing at MongoDB](https://dl.acm.org/doi/pdf/10.1145/3427921.3450234) for background
+on the development and use of this library.
+
+## Getting Started - Users
+```
+pip install signal-processing-algorithms
+```
+
+## Getting Started - Developers
+
+Getting the code:
+
+```
+$ git clone git@github.com:mongodb/signal-processing-algorithms.git
+$ cd signal-processing-algorithms
+```
+
+Installation
+```
+$ pip install poetry
+$ poetry install
+```
+Testing/linting:
+```
+$ poetry run pytest
+```
+
+Running the slow tests:
+```
+$ poetry run pytest --runslow
+```
+
+**Some of the larger tests can take a significant amount of time (more than 2 hours).**
+
+## Energy statistics
+[Energy Statistics](https://en.wikipedia.org/wiki/Energy_distance) is the statistical concept of Energy Distance 
+and can be used to measure how similar/different two distributions are.
+
+For statistical samples from two random variables X and Y:
+x1, x2, ..., xn and y1, y2, ..., yn
+
+E-Statistic is defined as:
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=E_{n,m}(X,Y):=2A-B-C" target="_blank"><img src="https://latex.codecogs.com/gif.latex?E_{n,m}(X,Y):=2A-B-C" title="E_{n,m}(X,Y):=2A-B-C" /></a>
+
+where,
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=A:={\frac&space;{1}{nm}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac&space;{1}{n^{2}}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac&space;{1}{m^{2}}}\sum&space;_{i=1}^{m}\sum&space;_{j=1}^{m}\|y_{i}-y_{j}\|" target="_blank"><img src="https://latex.codecogs.com/gif.latex?A:={\frac&space;{1}{nm}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac&space;{1}{n^{2}}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac&space;{1}{m^{2}}}\sum&space;_{i=1}^{m}\sum&space;_{j=1}^{m}\|y_{i}-y_{j}\|" title="A:={\frac {1}{nm}}\sum _{i=1}^{n}\sum _{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac {1}{m^{2}}}\sum _{i=1}^{m}\sum _{j=1}^{m}\|y_{i}-y_{j}\|" /></a>
+
+T-statistic is defined as: 
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" target="_blank"><img src="https://latex.codecogs.com/gif.latex?T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" title="T={\frac {nm}{n+m}}E_{{n,m}}(X,Y)" /></a>
+
+E-coefficient of inhomogeneity is defined as:
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=H=\frac{2E||X-Y||&space;-&space;E||X-X'||&space;-&space;E||Y-Y'||}{2E||X-Y||}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?H=\frac{2E||X-Y||&space;-&space;E||X-X'||&space;-&space;E||Y-Y'||}{2E||X-Y||}" title="H=\frac{2E||X-Y|| - E||X-X'|| - E||Y-Y'||}{2E||X-Y||}" /></a>
+
+
+```
+from signal_processing_algorithms.energy_statistics import energy_statistics
+from some_module import series1, series2
+
+# To get Energy Statistics of the distributions.
+es = energy_statistics.get_energy_statistics(series1, series2)
+
+# To get Energy Statistics and permutation test results of the distributions.
+es_with_probabilities = energy_statistics.get_energy_statistics_and_probabilities(series1, series2, permutations=100)
+
+```
+
+## Intro to E-Divisive
+
+Detecting distributional changes in a series of numerical values can be surprisingly difficult. Simple systems based on thresholds or
+ mean values can be yield false positives due to outliers in the data, and will fail to detect changes in the noise
+ profile of the series you are analyzing.
+ 
+One robust way of detecting many of the changes missed by other approaches is to use [E-Divisive with Means](https://arxiv.org/pdf/1306.4933.pdf), an energy
+ statistic based approach that compares the expected distance (Euclidean norm) between samples of two portions of the
+ series with the expected distance between samples within those portions.
+ 
+That is to say, assuming that the two portions can each be modeled as i.i.d. samples drawn from distinct random variables
+ (X for the first portion, Y for the second portion), you would expect the E-statistic to be non-zero if there is a
+ difference between the two portions: 
+ 
+ <a href="https://www.codecogs.com/eqnedit.php?latex=E_{n,m}(X,Y):=2A-B-C" target="_blank"><img src="https://latex.codecogs.com/gif.latex?E_{n,m}(X,Y):=2A-B-C" title="E_{n,m}(X,Y):=2A-B-C" /></a>
+ where A, B and C are as defined in the Energy Statistics above.
+
+One can prove that <a href="https://www.codecogs.com/eqnedit.php?latex={E_{n,m}(X,Y)\geq&space;0}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?{E_{n,m}(X,Y)\geq&space;0}" title="{E_{n,m}(X,Y)\geq 0}" /></a> and that the corresponding population value is zero if and only if X and Y have the same distribution. Under this null hypothesis the test statistic
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" target="_blank"><img src="https://latex.codecogs.com/gif.latex?T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" title="T={\frac {nm}{n+m}}E_{{n,m}}(X,Y)" /></a>
+
+converges in distribution to a quadratic form of independent standard normal random variables. Under the alternative hypothesis T tends to infinity. This makes it possible to construct a consistent statistical test, the energy test for equal distributions
+  
+Thus for a series Z of length L,
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=Z&space;=&space;\{Z_{1},&space;...,&space;Z_{\tau}&space;,&space;...&space;,&space;Z_{L}\},&space;X&space;=\{Z_{1},...,Z_{\tau}\},&space;Y=\{Z_{\tau&plus;1}\,...,Z_{L}\}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?Z&space;=&space;\{Z_{1},&space;...,&space;Z_{\tau}&space;,&space;...&space;,&space;Z_{L}\},&space;X&space;=\{Z_{1},...,Z_{\tau}\},&space;Y=\{Z_{\tau&plus;1}\,...,Z_{L}\}" title="Z = \{Z_{1}, ..., Z_{\tau} , ... , Z_{L}\}, X =\{Z_{1},...,Z_{\tau}\}, Y=\{Z_{\tau+1}\,...,Z_{L}\}" /></a>
+
+we find the most likely change point by solving the following for &tau; such that it has the maximum T statistic value.
+
+### Multiple Change Points
+
+The algorithm for finding multiple change points is also simple.
+
+Assuming you have some k known change points:
+1. Partition the series into segments between/around these change points.
+2. Find the maximum value of our divergence metric _within_ each partition.
+3. Take the maximum of the maxima we have just found --> this is our k+1th change point.
+4. Return to step 1 and continue until reaching your stopping criterion.
+
+### Stopping Criterion
+
+In this package we have implemented a permutation based test as a stopping criterion:
+
+After step 3 of the multiple change point procedure above, randomly permute all of the data _within_ each cluster, and
+ find the most likely change point for this permuted data using the procedure laid out above. 
+ 
+After performing this operation z times, count the number of
+ permuted change points z' that have higher divergence metrics than the change point you calculated with un-permuted data.
+ The significance level of your change point is thus z'/(z+1). 
+
+We allow users to configure a permutation tester with `pvalue`
+ and `permutations` representing the significance cutoff for algorithm termination and permutations to perform for each
+ test, respectively.
+ 
+### Example
+```
+from signal_processing_algorithms.energy_statistics import energy_statistics
+from some_module import series
+
+change_points = energy_statistics.e_divisive(series, pvalue=0.01, permutations=100)
+```
+
+
+
+%package help
+Summary:	Development documents and examples for signal-processing-algorithms
+Provides:	python3-signal-processing-algorithms-doc
+%description help
+# Signal Processing Algorithms
+
+A suite of algorithms implementing [Energy
+Statistics](https://en.wikipedia.org/wiki/Energy_distance), [E-Divisive with
+Means](https://arxiv.org/pdf/1306.4933.pdf) and [Generalized ESD Test for
+Outliers](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h3.htm) in python. See [The Use
+of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration
+System](https://dl.acm.org/doi/abs/10.1145/3358960.3375791) and [Creating a Virtuous Cycle in
+Performance Testing at MongoDB](https://dl.acm.org/doi/pdf/10.1145/3427921.3450234) for background
+on the development and use of this library.
+
+## Getting Started - Users
+```
+pip install signal-processing-algorithms
+```
+
+## Getting Started - Developers
+
+Getting the code:
+
+```
+$ git clone git@github.com:mongodb/signal-processing-algorithms.git
+$ cd signal-processing-algorithms
+```
+
+Installation
+```
+$ pip install poetry
+$ poetry install
+```
+Testing/linting:
+```
+$ poetry run pytest
+```
+
+Running the slow tests:
+```
+$ poetry run pytest --runslow
+```
+
+**Some of the larger tests can take a significant amount of time (more than 2 hours).**
+
+## Energy statistics
+[Energy Statistics](https://en.wikipedia.org/wiki/Energy_distance) is the statistical concept of Energy Distance 
+and can be used to measure how similar/different two distributions are.
+
+For statistical samples from two random variables X and Y:
+x1, x2, ..., xn and y1, y2, ..., yn
+
+E-Statistic is defined as:
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=E_{n,m}(X,Y):=2A-B-C" target="_blank"><img src="https://latex.codecogs.com/gif.latex?E_{n,m}(X,Y):=2A-B-C" title="E_{n,m}(X,Y):=2A-B-C" /></a>
+
+where,
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=A:={\frac&space;{1}{nm}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac&space;{1}{n^{2}}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac&space;{1}{m^{2}}}\sum&space;_{i=1}^{m}\sum&space;_{j=1}^{m}\|y_{i}-y_{j}\|" target="_blank"><img src="https://latex.codecogs.com/gif.latex?A:={\frac&space;{1}{nm}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac&space;{1}{n^{2}}}\sum&space;_{i=1}^{n}\sum&space;_{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac&space;{1}{m^{2}}}\sum&space;_{i=1}^{m}\sum&space;_{j=1}^{m}\|y_{i}-y_{j}\|" title="A:={\frac {1}{nm}}\sum _{i=1}^{n}\sum _{j=1}^{m}\|x_{i}-y_{j}\|,B:={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}\|x_{i}-x_{j}\|,C:={\frac {1}{m^{2}}}\sum _{i=1}^{m}\sum _{j=1}^{m}\|y_{i}-y_{j}\|" /></a>
+
+T-statistic is defined as: 
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" target="_blank"><img src="https://latex.codecogs.com/gif.latex?T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" title="T={\frac {nm}{n+m}}E_{{n,m}}(X,Y)" /></a>
+
+E-coefficient of inhomogeneity is defined as:
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=H=\frac{2E||X-Y||&space;-&space;E||X-X'||&space;-&space;E||Y-Y'||}{2E||X-Y||}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?H=\frac{2E||X-Y||&space;-&space;E||X-X'||&space;-&space;E||Y-Y'||}{2E||X-Y||}" title="H=\frac{2E||X-Y|| - E||X-X'|| - E||Y-Y'||}{2E||X-Y||}" /></a>
+
+
+```
+from signal_processing_algorithms.energy_statistics import energy_statistics
+from some_module import series1, series2
+
+# To get Energy Statistics of the distributions.
+es = energy_statistics.get_energy_statistics(series1, series2)
+
+# To get Energy Statistics and permutation test results of the distributions.
+es_with_probabilities = energy_statistics.get_energy_statistics_and_probabilities(series1, series2, permutations=100)
+
+```
+
+## Intro to E-Divisive
+
+Detecting distributional changes in a series of numerical values can be surprisingly difficult. Simple systems based on thresholds or
+ mean values can be yield false positives due to outliers in the data, and will fail to detect changes in the noise
+ profile of the series you are analyzing.
+ 
+One robust way of detecting many of the changes missed by other approaches is to use [E-Divisive with Means](https://arxiv.org/pdf/1306.4933.pdf), an energy
+ statistic based approach that compares the expected distance (Euclidean norm) between samples of two portions of the
+ series with the expected distance between samples within those portions.
+ 
+That is to say, assuming that the two portions can each be modeled as i.i.d. samples drawn from distinct random variables
+ (X for the first portion, Y for the second portion), you would expect the E-statistic to be non-zero if there is a
+ difference between the two portions: 
+ 
+ <a href="https://www.codecogs.com/eqnedit.php?latex=E_{n,m}(X,Y):=2A-B-C" target="_blank"><img src="https://latex.codecogs.com/gif.latex?E_{n,m}(X,Y):=2A-B-C" title="E_{n,m}(X,Y):=2A-B-C" /></a>
+ where A, B and C are as defined in the Energy Statistics above.
+
+One can prove that <a href="https://www.codecogs.com/eqnedit.php?latex={E_{n,m}(X,Y)\geq&space;0}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?{E_{n,m}(X,Y)\geq&space;0}" title="{E_{n,m}(X,Y)\geq 0}" /></a> and that the corresponding population value is zero if and only if X and Y have the same distribution. Under this null hypothesis the test statistic
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" target="_blank"><img src="https://latex.codecogs.com/gif.latex?T={\frac&space;{nm}{n&plus;m}}E_{{n,m}}(X,Y)" title="T={\frac {nm}{n+m}}E_{{n,m}}(X,Y)" /></a>
+
+converges in distribution to a quadratic form of independent standard normal random variables. Under the alternative hypothesis T tends to infinity. This makes it possible to construct a consistent statistical test, the energy test for equal distributions
+  
+Thus for a series Z of length L,
+
+<a href="https://www.codecogs.com/eqnedit.php?latex=Z&space;=&space;\{Z_{1},&space;...,&space;Z_{\tau}&space;,&space;...&space;,&space;Z_{L}\},&space;X&space;=\{Z_{1},...,Z_{\tau}\},&space;Y=\{Z_{\tau&plus;1}\,...,Z_{L}\}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?Z&space;=&space;\{Z_{1},&space;...,&space;Z_{\tau}&space;,&space;...&space;,&space;Z_{L}\},&space;X&space;=\{Z_{1},...,Z_{\tau}\},&space;Y=\{Z_{\tau&plus;1}\,...,Z_{L}\}" title="Z = \{Z_{1}, ..., Z_{\tau} , ... , Z_{L}\}, X =\{Z_{1},...,Z_{\tau}\}, Y=\{Z_{\tau+1}\,...,Z_{L}\}" /></a>
+
+we find the most likely change point by solving the following for &tau; such that it has the maximum T statistic value.
+
+### Multiple Change Points
+
+The algorithm for finding multiple change points is also simple.
+
+Assuming you have some k known change points:
+1. Partition the series into segments between/around these change points.
+2. Find the maximum value of our divergence metric _within_ each partition.
+3. Take the maximum of the maxima we have just found --> this is our k+1th change point.
+4. Return to step 1 and continue until reaching your stopping criterion.
+
+### Stopping Criterion
+
+In this package we have implemented a permutation based test as a stopping criterion:
+
+After step 3 of the multiple change point procedure above, randomly permute all of the data _within_ each cluster, and
+ find the most likely change point for this permuted data using the procedure laid out above. 
+ 
+After performing this operation z times, count the number of
+ permuted change points z' that have higher divergence metrics than the change point you calculated with un-permuted data.
+ The significance level of your change point is thus z'/(z+1). 
+
+We allow users to configure a permutation tester with `pvalue`
+ and `permutations` representing the significance cutoff for algorithm termination and permutations to perform for each
+ test, respectively.
+ 
+### Example
+```
+from signal_processing_algorithms.energy_statistics import energy_statistics
+from some_module import series
+
+change_points = energy_statistics.e_divisive(series, pvalue=0.01, permutations=100)
+```
+
+
+
+%prep
+%autosetup -n signal-processing-algorithms-2.0.1
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-signal-processing-algorithms -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed Apr 12 2023 Python_Bot <Python_Bot@openeuler.org> - 2.0.1-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..adc7eab
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+50d5d08861eed86435cd58b6d7fa53a3  signal-processing-algorithms-2.0.1.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-04-12 02:10:22 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-04-12 02:10:22 +0000
commit	39c352b37cd8b9708c7bf2e56b809a4cefe367f5 (patch)
tree	c15e1e9759b7989a30dba3a5ce52cacb348daee2
parent	840cced2b1f3c7e4a5a16a4142da1db22ed12e8a (diff)