%global _empty_manifest_terminate_build 0
Name:		python-TwoSampleHC
Version:	0.2.1
Release:	1
Summary:	Two-sample Higher Criticism
License:	MIT License
URL:		https://github.com/alonkipnis/TwoSampleHC
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/4e/51/70605976fb6eb917e428e582849525fb91aefbd642d8d43abc2db127a823/TwoSampleHC-0.2.1.tar.gz
BuildArch:	noarch

Requires:	python3-numpy
Requires:	python3-scipy
Requires:	python3-pandas

%description
# TwoSampleHC -- Higher Criticism Test between Two Frequency Tables

This package provides an adaptation of the Donoho-Jin-Tukey Higher-Critisim (HC)
test to frequency tables. This adapatation uses a binomial allocation model for 
the number of occurances of each feature in two samples, each of which is
associated with a frequency table. The exact binomial test associated with each
feature yields a p-value. The HC statistic combines these P-values to a global 
test against the null hypothesis that the two tables are two realizations of the 
same data generating mechanism. 

This test is particularly useful in identifying non-null effects under weak and
sparse alternatives, i.e., when the difference between the tables is due to few features, and the evidence each such feature provide is realtively weak.
More details and applications can be found in
[1] Alon Kipnis. (2019). Higher Criticism for Discriminating Word Frequency Tables and Testing Authorship.
[2] David Donoho and Alon Kipnis. (2020). Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare and WeakPerturbations.  
[3] Alon Kipnis. (2021). Log-Chisquared P-values under Rare and Weak Departures.

## Example:
```
import numpy as np

N = 1000 # number of features
n = 5 * N #number of samples

P = 1 / np.arange(1,N+1) # Zipf base distribution
P = P / P.sum()

ep = 0.03 #fraction of features to perturb
mu = 0.005 #intensity of perturbation

TH = np.random.rand(N) < ep
Q = P.copy()
Q[TH] += mu
Q = Q / np.sum(Q)

smp_P = np.random.multinomial(n, P)  # sample form P
smp_Q = np.random.multinomial(n, Q)  # sample from Q

pv = two_sample_pvals(smp_Q, smp_P) # binomial P-values
hc = HC(pv)
hv_val, p_th = HC.HCstar(alpha = 0.25) # Small sample Higher Criticism test

print("TV distance between P and Q: ", 0.5*np.sum(np.abs(P-Q)))
print("Higher-Criticism score for testing P == Q: ", HC)  
# (HC score rarely goes above 2.5 if P == Q)
```


%package -n python3-TwoSampleHC
Summary:	Two-sample Higher Criticism
Provides:	python-TwoSampleHC
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-TwoSampleHC
# TwoSampleHC -- Higher Criticism Test between Two Frequency Tables

This package provides an adaptation of the Donoho-Jin-Tukey Higher-Critisim (HC)
test to frequency tables. This adapatation uses a binomial allocation model for 
the number of occurances of each feature in two samples, each of which is
associated with a frequency table. The exact binomial test associated with each
feature yields a p-value. The HC statistic combines these P-values to a global 
test against the null hypothesis that the two tables are two realizations of the 
same data generating mechanism. 

This test is particularly useful in identifying non-null effects under weak and
sparse alternatives, i.e., when the difference between the tables is due to few features, and the evidence each such feature provide is realtively weak.
More details and applications can be found in
[1] Alon Kipnis. (2019). Higher Criticism for Discriminating Word Frequency Tables and Testing Authorship.
[2] David Donoho and Alon Kipnis. (2020). Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare and WeakPerturbations.  
[3] Alon Kipnis. (2021). Log-Chisquared P-values under Rare and Weak Departures.

## Example:
```
import numpy as np

N = 1000 # number of features
n = 5 * N #number of samples

P = 1 / np.arange(1,N+1) # Zipf base distribution
P = P / P.sum()

ep = 0.03 #fraction of features to perturb
mu = 0.005 #intensity of perturbation

TH = np.random.rand(N) < ep
Q = P.copy()
Q[TH] += mu
Q = Q / np.sum(Q)

smp_P = np.random.multinomial(n, P)  # sample form P
smp_Q = np.random.multinomial(n, Q)  # sample from Q

pv = two_sample_pvals(smp_Q, smp_P) # binomial P-values
hc = HC(pv)
hv_val, p_th = HC.HCstar(alpha = 0.25) # Small sample Higher Criticism test

print("TV distance between P and Q: ", 0.5*np.sum(np.abs(P-Q)))
print("Higher-Criticism score for testing P == Q: ", HC)  
# (HC score rarely goes above 2.5 if P == Q)
```


%package help
Summary:	Development documents and examples for TwoSampleHC
Provides:	python3-TwoSampleHC-doc
%description help
# TwoSampleHC -- Higher Criticism Test between Two Frequency Tables

This package provides an adaptation of the Donoho-Jin-Tukey Higher-Critisim (HC)
test to frequency tables. This adapatation uses a binomial allocation model for 
the number of occurances of each feature in two samples, each of which is
associated with a frequency table. The exact binomial test associated with each
feature yields a p-value. The HC statistic combines these P-values to a global 
test against the null hypothesis that the two tables are two realizations of the 
same data generating mechanism. 

This test is particularly useful in identifying non-null effects under weak and
sparse alternatives, i.e., when the difference between the tables is due to few features, and the evidence each such feature provide is realtively weak.
More details and applications can be found in
[1] Alon Kipnis. (2019). Higher Criticism for Discriminating Word Frequency Tables and Testing Authorship.
[2] David Donoho and Alon Kipnis. (2020). Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare and WeakPerturbations.  
[3] Alon Kipnis. (2021). Log-Chisquared P-values under Rare and Weak Departures.

## Example:
```
import numpy as np

N = 1000 # number of features
n = 5 * N #number of samples

P = 1 / np.arange(1,N+1) # Zipf base distribution
P = P / P.sum()

ep = 0.03 #fraction of features to perturb
mu = 0.005 #intensity of perturbation

TH = np.random.rand(N) < ep
Q = P.copy()
Q[TH] += mu
Q = Q / np.sum(Q)

smp_P = np.random.multinomial(n, P)  # sample form P
smp_Q = np.random.multinomial(n, Q)  # sample from Q

pv = two_sample_pvals(smp_Q, smp_P) # binomial P-values
hc = HC(pv)
hv_val, p_th = HC.HCstar(alpha = 0.25) # Small sample Higher Criticism test

print("TV distance between P and Q: ", 0.5*np.sum(np.abs(P-Q)))
print("Higher-Criticism score for testing P == Q: ", HC)  
# (HC score rarely goes above 2.5 if P == Q)
```


%prep
%autosetup -n TwoSampleHC-0.2.1

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-TwoSampleHC -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Tue May 30 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.1-1
- Package Spec generated