%global _empty_manifest_terminate_build 0 Name: python-topn Version: 0.0.7 Release: 1 Summary: This package boosts a group-wise nlargest sort License: MIT URL: https://github.com/ParticularMiner/topn Source0: https://mirrors.nju.edu.cn/pypi/web/packages/67/06/82733b9a88ad6120dca0b88045909211654aaeb882804730a6dfe804966c/topn-0.0.7.tar.gz BuildArch: noarch %description # topn Cython utility functions to be used instead of pandas' `SeriesGroupBy` `nlargest()` function (since [pandas does it so slowly](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.SeriesGroupBy.nlargest.html)). Contains 3 functions: 1. `awesome_topn()`, 2. `awesome_hstack_topn()`, 3. `awesome_hstack()`: (for CSR matrices only; at least twice as fast as `scipy.sparse.hstack` in scipy version 1.6.1) See [Short Description](#desc) for details. This is how it may be done with pandas: ```python import pandas as pd import numpy as np r = np.array([0, 1, 2, 1, 2, 3, 2]) c = np.array([1, 1, 0, 3, 1, 2, 3]) d = np.array([0.3, 0.2, 0.1, 1.0, 0.9, 0.4, 0.6]) rcd = pd.DataFrame({'r': r, 'c': c, 'd': d}) rcd ```
r c d
0 0 1 0.3
1 1 1 0.2
2 2 0 0.1
3 1 3 1.0
4 2 1 0.9
5 3 2 0.4
6 2 3 0.6
```python ntop = 2 ``` ```python rcd.set_index('c').groupby('r')['d'].nlargest(ntop).reset_index().sort_values(['r', 'd'], ascending = [True, False]) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
## Usage ```python from topn import awesome_topn o_r, o_c, o_d = awesome_topn(r, c, d, ntop, n_jobs=7) pd.DataFrame({'r': o_r, 'c': o_c, 'd': o_d}) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
Alternatively, if one had a matrix encoding the above data: ```python from scipy.sparse import csr_matrix csr = csr_matrix((d, (r, c)), shape=(4, 4)) ``` then one could use the function `awesome_hstack_topn()` instead: ```python from topn import awesome_hstack_topn topn_matrix = awesome_hstack_topn([csr], ntop=ntop) o_r, o_c = topn_matrix.nonzero() o_d = topn_matrix.data pd.DataFrame({'r': o_r, 'c': o_c, 'd': o_d}) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
## Short Description Contains 3 functions: 1. `awesome_topn()`, 2. `awesome_hstack_topn()`, 3. `awesome_hstack()` ```python def awesome_topn(r, c, d, ntop, n_rows=-1, n_jobs=1): """ r, c, and d are 1D numpy arrays all of the same length N. This function will return arrays rn, cn, and dn of length n <= N such that the set of triples {(rn[i], cn[i], dn[i]) : 0 < i < n} is a subset of {(r[j], c[j], d[j]) : 0 < j < N} and that for every distinct value x = rn[i], dn[i] is among the first ntop existing largest d[j]'s whose r[j] = x. Input: r and c: two 1D integer arrays of the same length d: 1D array of single or double precision floating point type of the same length as r or c ntop maximum number of maximum d's returned n_rows: an int. If > -1 it will replace output rn with Rn the index pointer array for the compressed sparse row (CSR) matrix whose elements are {C[rn[i], cn[i]] = dn: 0 < i < n}. This matrix will have its number of rows = n_rows. Thus the length of Rn is n_rows + 1 n_jobs: number of threads, must be >= 1 Output: (rn, cn, dn) where rn, cn, dn are all arrays as described above, or (Rn, cn, dn) where Rn is described above """ def awesome_hstack_topn(blocks, ntop, sort=True, use_threads=False, n_jobs=1): """ Returns, in CSR format, the matrix formed by horizontally stacking the sequence of CSR matrices in parameter 'blocks', with only the largest ntop elements of each row returned. Also, each row will be sorted in descending order only when ntop < total number of columns in blocks or sort=True, otherwise the rows will be unsorted. :param blocks: list of CSR matrices to be stacked horizontally. :param ntop: int. The maximum number of elements to be returned for each row. :param sort: bool. Each row of the returned matrix will be sorted in descending order only when ntop < total number of columns in blocks or sort=True, otherwise the rows will be unsorted. :param use_threads: bool. Will use the multi-threaded versions of this routine if True otherwise the single threaded version will be used. In multi-core systems setting this to True can lead to acceleration. :param n_jobs: int. When use_threads=True, denotes the number of threads that are to be spawned by the multi-threaded routines. Recommended value is number of cores minus one. Output: (scipy.sparse.csr_matrix) matrix in CSR format """ def awesome_hstack(blocks, use_threads=False, n_jobs=1): """ Returns, in CSR format, the matrix formed by horizontally stacking the sequence of CSR matrices in parameter blocks. :param blocks: list of CSR matrices to be stacked horizontally. :param use_threads: bool. Will use the multi-threaded versions of this routine if True otherwise the single threaded version will be used. In multi-core systems setting this to True can lead to acceleration. :param n_jobs: int. When use_threads=True, denotes the number of threads that are to be spawned by the multi-threaded routines. Recommended value is number of cores minus one. Output: (scipy.sparse.csr_matrix) matrix in CSR format """ ``` %package -n python3-topn Summary: This package boosts a group-wise nlargest sort Provides: python-topn BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-topn # topn Cython utility functions to be used instead of pandas' `SeriesGroupBy` `nlargest()` function (since [pandas does it so slowly](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.SeriesGroupBy.nlargest.html)). Contains 3 functions: 1. `awesome_topn()`, 2. `awesome_hstack_topn()`, 3. `awesome_hstack()`: (for CSR matrices only; at least twice as fast as `scipy.sparse.hstack` in scipy version 1.6.1) See [Short Description](#desc) for details. This is how it may be done with pandas: ```python import pandas as pd import numpy as np r = np.array([0, 1, 2, 1, 2, 3, 2]) c = np.array([1, 1, 0, 3, 1, 2, 3]) d = np.array([0.3, 0.2, 0.1, 1.0, 0.9, 0.4, 0.6]) rcd = pd.DataFrame({'r': r, 'c': c, 'd': d}) rcd ```
r c d
0 0 1 0.3
1 1 1 0.2
2 2 0 0.1
3 1 3 1.0
4 2 1 0.9
5 3 2 0.4
6 2 3 0.6
```python ntop = 2 ``` ```python rcd.set_index('c').groupby('r')['d'].nlargest(ntop).reset_index().sort_values(['r', 'd'], ascending = [True, False]) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
## Usage ```python from topn import awesome_topn o_r, o_c, o_d = awesome_topn(r, c, d, ntop, n_jobs=7) pd.DataFrame({'r': o_r, 'c': o_c, 'd': o_d}) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
Alternatively, if one had a matrix encoding the above data: ```python from scipy.sparse import csr_matrix csr = csr_matrix((d, (r, c)), shape=(4, 4)) ``` then one could use the function `awesome_hstack_topn()` instead: ```python from topn import awesome_hstack_topn topn_matrix = awesome_hstack_topn([csr], ntop=ntop) o_r, o_c = topn_matrix.nonzero() o_d = topn_matrix.data pd.DataFrame({'r': o_r, 'c': o_c, 'd': o_d}) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
## Short Description Contains 3 functions: 1. `awesome_topn()`, 2. `awesome_hstack_topn()`, 3. `awesome_hstack()` ```python def awesome_topn(r, c, d, ntop, n_rows=-1, n_jobs=1): """ r, c, and d are 1D numpy arrays all of the same length N. This function will return arrays rn, cn, and dn of length n <= N such that the set of triples {(rn[i], cn[i], dn[i]) : 0 < i < n} is a subset of {(r[j], c[j], d[j]) : 0 < j < N} and that for every distinct value x = rn[i], dn[i] is among the first ntop existing largest d[j]'s whose r[j] = x. Input: r and c: two 1D integer arrays of the same length d: 1D array of single or double precision floating point type of the same length as r or c ntop maximum number of maximum d's returned n_rows: an int. If > -1 it will replace output rn with Rn the index pointer array for the compressed sparse row (CSR) matrix whose elements are {C[rn[i], cn[i]] = dn: 0 < i < n}. This matrix will have its number of rows = n_rows. Thus the length of Rn is n_rows + 1 n_jobs: number of threads, must be >= 1 Output: (rn, cn, dn) where rn, cn, dn are all arrays as described above, or (Rn, cn, dn) where Rn is described above """ def awesome_hstack_topn(blocks, ntop, sort=True, use_threads=False, n_jobs=1): """ Returns, in CSR format, the matrix formed by horizontally stacking the sequence of CSR matrices in parameter 'blocks', with only the largest ntop elements of each row returned. Also, each row will be sorted in descending order only when ntop < total number of columns in blocks or sort=True, otherwise the rows will be unsorted. :param blocks: list of CSR matrices to be stacked horizontally. :param ntop: int. The maximum number of elements to be returned for each row. :param sort: bool. Each row of the returned matrix will be sorted in descending order only when ntop < total number of columns in blocks or sort=True, otherwise the rows will be unsorted. :param use_threads: bool. Will use the multi-threaded versions of this routine if True otherwise the single threaded version will be used. In multi-core systems setting this to True can lead to acceleration. :param n_jobs: int. When use_threads=True, denotes the number of threads that are to be spawned by the multi-threaded routines. Recommended value is number of cores minus one. Output: (scipy.sparse.csr_matrix) matrix in CSR format """ def awesome_hstack(blocks, use_threads=False, n_jobs=1): """ Returns, in CSR format, the matrix formed by horizontally stacking the sequence of CSR matrices in parameter blocks. :param blocks: list of CSR matrices to be stacked horizontally. :param use_threads: bool. Will use the multi-threaded versions of this routine if True otherwise the single threaded version will be used. In multi-core systems setting this to True can lead to acceleration. :param n_jobs: int. When use_threads=True, denotes the number of threads that are to be spawned by the multi-threaded routines. Recommended value is number of cores minus one. Output: (scipy.sparse.csr_matrix) matrix in CSR format """ ``` %package help Summary: Development documents and examples for topn Provides: python3-topn-doc %description help # topn Cython utility functions to be used instead of pandas' `SeriesGroupBy` `nlargest()` function (since [pandas does it so slowly](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.SeriesGroupBy.nlargest.html)). Contains 3 functions: 1. `awesome_topn()`, 2. `awesome_hstack_topn()`, 3. `awesome_hstack()`: (for CSR matrices only; at least twice as fast as `scipy.sparse.hstack` in scipy version 1.6.1) See [Short Description](#desc) for details. This is how it may be done with pandas: ```python import pandas as pd import numpy as np r = np.array([0, 1, 2, 1, 2, 3, 2]) c = np.array([1, 1, 0, 3, 1, 2, 3]) d = np.array([0.3, 0.2, 0.1, 1.0, 0.9, 0.4, 0.6]) rcd = pd.DataFrame({'r': r, 'c': c, 'd': d}) rcd ```
r c d
0 0 1 0.3
1 1 1 0.2
2 2 0 0.1
3 1 3 1.0
4 2 1 0.9
5 3 2 0.4
6 2 3 0.6
```python ntop = 2 ``` ```python rcd.set_index('c').groupby('r')['d'].nlargest(ntop).reset_index().sort_values(['r', 'd'], ascending = [True, False]) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
## Usage ```python from topn import awesome_topn o_r, o_c, o_d = awesome_topn(r, c, d, ntop, n_jobs=7) pd.DataFrame({'r': o_r, 'c': o_c, 'd': o_d}) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
Alternatively, if one had a matrix encoding the above data: ```python from scipy.sparse import csr_matrix csr = csr_matrix((d, (r, c)), shape=(4, 4)) ``` then one could use the function `awesome_hstack_topn()` instead: ```python from topn import awesome_hstack_topn topn_matrix = awesome_hstack_topn([csr], ntop=ntop) o_r, o_c = topn_matrix.nonzero() o_d = topn_matrix.data pd.DataFrame({'r': o_r, 'c': o_c, 'd': o_d}) ```
r c d
0 0 1 0.3
1 1 3 1.0
2 1 1 0.2
3 2 1 0.9
4 2 3 0.6
5 3 2 0.4
## Short Description Contains 3 functions: 1. `awesome_topn()`, 2. `awesome_hstack_topn()`, 3. `awesome_hstack()` ```python def awesome_topn(r, c, d, ntop, n_rows=-1, n_jobs=1): """ r, c, and d are 1D numpy arrays all of the same length N. This function will return arrays rn, cn, and dn of length n <= N such that the set of triples {(rn[i], cn[i], dn[i]) : 0 < i < n} is a subset of {(r[j], c[j], d[j]) : 0 < j < N} and that for every distinct value x = rn[i], dn[i] is among the first ntop existing largest d[j]'s whose r[j] = x. Input: r and c: two 1D integer arrays of the same length d: 1D array of single or double precision floating point type of the same length as r or c ntop maximum number of maximum d's returned n_rows: an int. If > -1 it will replace output rn with Rn the index pointer array for the compressed sparse row (CSR) matrix whose elements are {C[rn[i], cn[i]] = dn: 0 < i < n}. This matrix will have its number of rows = n_rows. Thus the length of Rn is n_rows + 1 n_jobs: number of threads, must be >= 1 Output: (rn, cn, dn) where rn, cn, dn are all arrays as described above, or (Rn, cn, dn) where Rn is described above """ def awesome_hstack_topn(blocks, ntop, sort=True, use_threads=False, n_jobs=1): """ Returns, in CSR format, the matrix formed by horizontally stacking the sequence of CSR matrices in parameter 'blocks', with only the largest ntop elements of each row returned. Also, each row will be sorted in descending order only when ntop < total number of columns in blocks or sort=True, otherwise the rows will be unsorted. :param blocks: list of CSR matrices to be stacked horizontally. :param ntop: int. The maximum number of elements to be returned for each row. :param sort: bool. Each row of the returned matrix will be sorted in descending order only when ntop < total number of columns in blocks or sort=True, otherwise the rows will be unsorted. :param use_threads: bool. Will use the multi-threaded versions of this routine if True otherwise the single threaded version will be used. In multi-core systems setting this to True can lead to acceleration. :param n_jobs: int. When use_threads=True, denotes the number of threads that are to be spawned by the multi-threaded routines. Recommended value is number of cores minus one. Output: (scipy.sparse.csr_matrix) matrix in CSR format """ def awesome_hstack(blocks, use_threads=False, n_jobs=1): """ Returns, in CSR format, the matrix formed by horizontally stacking the sequence of CSR matrices in parameter blocks. :param blocks: list of CSR matrices to be stacked horizontally. :param use_threads: bool. Will use the multi-threaded versions of this routine if True otherwise the single threaded version will be used. In multi-core systems setting this to True can lead to acceleration. :param n_jobs: int. When use_threads=True, denotes the number of threads that are to be spawned by the multi-threaded routines. Recommended value is number of cores minus one. Output: (scipy.sparse.csr_matrix) matrix in CSR format """ ``` %prep %autosetup -n topn-0.0.7 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-topn -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue Apr 25 2023 Python_Bot - 0.0.7-1 - Package Spec generated