summaryrefslogtreecommitdiff
path: root/python-simba.spec
diff options
context:
space:
mode:
Diffstat (limited to 'python-simba.spec')
-rw-r--r--python-simba.spec300
1 files changed, 300 insertions, 0 deletions
diff --git a/python-simba.spec b/python-simba.spec
new file mode 100644
index 0000000..ec1ce68
--- /dev/null
+++ b/python-simba.spec
@@ -0,0 +1,300 @@
+%global _empty_manifest_terminate_build 0
+Name: python-simba
+Version: 0.1.1
+Release: 1
+Summary: Semantic similarity measures from Babylon Health
+License: Proprietary
+URL: https://github.com/babylonhealth/simba
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b6/11/d28277e11d32bb9452eb23e9f1f1b811874142b0f38ddc8eb02862a643ce/simba-0.1.1.tar.gz
+BuildArch: noarch
+
+
+%description
+# simba :lion:
+
+Similarity measures from Babylon Health.
+
+## Installation
+
+```bash
+$ pip install simba
+```
+
+You can also checkout this repository and install from the root folder:
+```bash
+$ pip install .
+```
+
+Many of the similarity measures in simba rely on pre-trained embeddings.
+If you don't have your own encoding logic already, you can register your
+embedding files to use them easily with simba, as long as they're in the
+standard text format for word vectors (as described [here](https://fasttext.cc/docs/en/english-vectors.html)).
+For example, if you want to use fastText vectors that you've saved to `/path/to/fasttext`,
+you can just run
+```bash
+$ simba embs register --name fasttext --path /path/to/fasttext
+```
+and simba will recognise them under the name `fasttext`.
+
+You can do something similar for frequencies files (like [these](https://github.com/PrincetonML/SIF/blob/master/auxiliary_data/enwiki_vocab_min200.txt)):
+```bash
+$ simba freqs register --name wiki --path /path/to/wiki/counts
+```
+
+## Usage
+```python
+from simba.similarities import dynamax_jaccard
+from simba.core import embed
+
+sentences = ('The king has returned', 'Change is good')
+
+# Assuming you've registered fasttext embeddings as described above
+x, y = embed([s.split() for s in sentences], embedding='fasttext')
+sim = dynamax_jaccard(x, y)
+```
+There are more examples, including comparing different similarity metrics on a dataset
+of pairs, in the `examples` directory.
+
+## Similarity Measures
+
+This library contains implementations of the following methods in `simba.similarities`.
+Please consider citing the corresponding papers in your work if you find them useful.
+
+| Method | Description | Paper |
+| - | - | - |
+| `avg_cosine` | Average vector compared with cosine similarity | - |
+| `batch_avg_pca` | Average vector with principal component removal | [1] |
+| `fbow_jaccard_factory` | Factory method for general fuzzy bag-of-words given a universe matrix | [2] |
+| `max_jaccard` | Max-pooled vectors compared with Jaccard coefficient | [2] |
+| `dynamax_{jaccard, otsuka, dice}` | DynaMax using Jaccard, Otsuka-Ochiai, and Dice coefficients | [2] |
+| `gaussian_correction_{tic, aic}` | Takeuchi and Akaike Information Criteria (TIC and AIC) for Gaussian likelihood | [3] |
+| `spherical_gaussian_correction_{tic, aic}` | TIC and AIC for spherical Gaussian likelihood | [3] |
+| `von_mises_correction_{tic, aic}` | TIC and AIC for von Mises Fisher likelihood | [3] |
+| `avg_{pearson, spearman, kendall}` | Average vector compared with Pearson, Spearman, and Kendall correlation | [4] |
+| `max_spearman` | Max-pooled vectors compared with Spearman correlation | [5] |
+| `cka_factory` | Factory method for general Centered Kernel Alignment (CKA) | [5] |
+| `cka_{linear, gaussian}`| CKA with linear and Gaussian kernels | [5] |
+| `dcorr` | CKA with distance kernel (distance correlation) | [5] |
+
+Papers:
+1. [Arora et al., ICLR 2017. *A Simple but Tough-to-Beat Baseline for Sentence Embeddings*](https://openreview.net/forum?id=SyK00v5xx)
+2. [Zhelezniak et al., ICLR 2019. *Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors*](https://openreview.net/forum?id=SkxXg2C5FX)
+3. [Vargas et al., ICML 2019. *Model Comparison for Semantic Grouping*](http://proceedings.mlr.press/v97/vargas19a.html)
+4. [Zhelezniak et al., NAACL-HLT 2019. *Correlation Coefficients and Semantic Textual Similarity*](https://www.aclweb.org/anthology/N19-1100/)
+5. [Zhelezniak et al., EMNLP-IJCNLP 2019. *Correlations between Word Vector Sets*](https://arxiv.org/abs/1910.02902)
+
+## Contact
+* [April Shen](https://github.com/apriltuesday)
+* [Sasho Savkov](https://github.com/savkov)
+* [Vitalii Zhelezniak](https://github.com/ironvital)
+
+%package -n python3-simba
+Summary: Semantic similarity measures from Babylon Health
+Provides: python-simba
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-simba
+# simba :lion:
+
+Similarity measures from Babylon Health.
+
+## Installation
+
+```bash
+$ pip install simba
+```
+
+You can also checkout this repository and install from the root folder:
+```bash
+$ pip install .
+```
+
+Many of the similarity measures in simba rely on pre-trained embeddings.
+If you don't have your own encoding logic already, you can register your
+embedding files to use them easily with simba, as long as they're in the
+standard text format for word vectors (as described [here](https://fasttext.cc/docs/en/english-vectors.html)).
+For example, if you want to use fastText vectors that you've saved to `/path/to/fasttext`,
+you can just run
+```bash
+$ simba embs register --name fasttext --path /path/to/fasttext
+```
+and simba will recognise them under the name `fasttext`.
+
+You can do something similar for frequencies files (like [these](https://github.com/PrincetonML/SIF/blob/master/auxiliary_data/enwiki_vocab_min200.txt)):
+```bash
+$ simba freqs register --name wiki --path /path/to/wiki/counts
+```
+
+## Usage
+```python
+from simba.similarities import dynamax_jaccard
+from simba.core import embed
+
+sentences = ('The king has returned', 'Change is good')
+
+# Assuming you've registered fasttext embeddings as described above
+x, y = embed([s.split() for s in sentences], embedding='fasttext')
+sim = dynamax_jaccard(x, y)
+```
+There are more examples, including comparing different similarity metrics on a dataset
+of pairs, in the `examples` directory.
+
+## Similarity Measures
+
+This library contains implementations of the following methods in `simba.similarities`.
+Please consider citing the corresponding papers in your work if you find them useful.
+
+| Method | Description | Paper |
+| - | - | - |
+| `avg_cosine` | Average vector compared with cosine similarity | - |
+| `batch_avg_pca` | Average vector with principal component removal | [1] |
+| `fbow_jaccard_factory` | Factory method for general fuzzy bag-of-words given a universe matrix | [2] |
+| `max_jaccard` | Max-pooled vectors compared with Jaccard coefficient | [2] |
+| `dynamax_{jaccard, otsuka, dice}` | DynaMax using Jaccard, Otsuka-Ochiai, and Dice coefficients | [2] |
+| `gaussian_correction_{tic, aic}` | Takeuchi and Akaike Information Criteria (TIC and AIC) for Gaussian likelihood | [3] |
+| `spherical_gaussian_correction_{tic, aic}` | TIC and AIC for spherical Gaussian likelihood | [3] |
+| `von_mises_correction_{tic, aic}` | TIC and AIC for von Mises Fisher likelihood | [3] |
+| `avg_{pearson, spearman, kendall}` | Average vector compared with Pearson, Spearman, and Kendall correlation | [4] |
+| `max_spearman` | Max-pooled vectors compared with Spearman correlation | [5] |
+| `cka_factory` | Factory method for general Centered Kernel Alignment (CKA) | [5] |
+| `cka_{linear, gaussian}`| CKA with linear and Gaussian kernels | [5] |
+| `dcorr` | CKA with distance kernel (distance correlation) | [5] |
+
+Papers:
+1. [Arora et al., ICLR 2017. *A Simple but Tough-to-Beat Baseline for Sentence Embeddings*](https://openreview.net/forum?id=SyK00v5xx)
+2. [Zhelezniak et al., ICLR 2019. *Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors*](https://openreview.net/forum?id=SkxXg2C5FX)
+3. [Vargas et al., ICML 2019. *Model Comparison for Semantic Grouping*](http://proceedings.mlr.press/v97/vargas19a.html)
+4. [Zhelezniak et al., NAACL-HLT 2019. *Correlation Coefficients and Semantic Textual Similarity*](https://www.aclweb.org/anthology/N19-1100/)
+5. [Zhelezniak et al., EMNLP-IJCNLP 2019. *Correlations between Word Vector Sets*](https://arxiv.org/abs/1910.02902)
+
+## Contact
+* [April Shen](https://github.com/apriltuesday)
+* [Sasho Savkov](https://github.com/savkov)
+* [Vitalii Zhelezniak](https://github.com/ironvital)
+
+%package help
+Summary: Development documents and examples for simba
+Provides: python3-simba-doc
+%description help
+# simba :lion:
+
+Similarity measures from Babylon Health.
+
+## Installation
+
+```bash
+$ pip install simba
+```
+
+You can also checkout this repository and install from the root folder:
+```bash
+$ pip install .
+```
+
+Many of the similarity measures in simba rely on pre-trained embeddings.
+If you don't have your own encoding logic already, you can register your
+embedding files to use them easily with simba, as long as they're in the
+standard text format for word vectors (as described [here](https://fasttext.cc/docs/en/english-vectors.html)).
+For example, if you want to use fastText vectors that you've saved to `/path/to/fasttext`,
+you can just run
+```bash
+$ simba embs register --name fasttext --path /path/to/fasttext
+```
+and simba will recognise them under the name `fasttext`.
+
+You can do something similar for frequencies files (like [these](https://github.com/PrincetonML/SIF/blob/master/auxiliary_data/enwiki_vocab_min200.txt)):
+```bash
+$ simba freqs register --name wiki --path /path/to/wiki/counts
+```
+
+## Usage
+```python
+from simba.similarities import dynamax_jaccard
+from simba.core import embed
+
+sentences = ('The king has returned', 'Change is good')
+
+# Assuming you've registered fasttext embeddings as described above
+x, y = embed([s.split() for s in sentences], embedding='fasttext')
+sim = dynamax_jaccard(x, y)
+```
+There are more examples, including comparing different similarity metrics on a dataset
+of pairs, in the `examples` directory.
+
+## Similarity Measures
+
+This library contains implementations of the following methods in `simba.similarities`.
+Please consider citing the corresponding papers in your work if you find them useful.
+
+| Method | Description | Paper |
+| - | - | - |
+| `avg_cosine` | Average vector compared with cosine similarity | - |
+| `batch_avg_pca` | Average vector with principal component removal | [1] |
+| `fbow_jaccard_factory` | Factory method for general fuzzy bag-of-words given a universe matrix | [2] |
+| `max_jaccard` | Max-pooled vectors compared with Jaccard coefficient | [2] |
+| `dynamax_{jaccard, otsuka, dice}` | DynaMax using Jaccard, Otsuka-Ochiai, and Dice coefficients | [2] |
+| `gaussian_correction_{tic, aic}` | Takeuchi and Akaike Information Criteria (TIC and AIC) for Gaussian likelihood | [3] |
+| `spherical_gaussian_correction_{tic, aic}` | TIC and AIC for spherical Gaussian likelihood | [3] |
+| `von_mises_correction_{tic, aic}` | TIC and AIC for von Mises Fisher likelihood | [3] |
+| `avg_{pearson, spearman, kendall}` | Average vector compared with Pearson, Spearman, and Kendall correlation | [4] |
+| `max_spearman` | Max-pooled vectors compared with Spearman correlation | [5] |
+| `cka_factory` | Factory method for general Centered Kernel Alignment (CKA) | [5] |
+| `cka_{linear, gaussian}`| CKA with linear and Gaussian kernels | [5] |
+| `dcorr` | CKA with distance kernel (distance correlation) | [5] |
+
+Papers:
+1. [Arora et al., ICLR 2017. *A Simple but Tough-to-Beat Baseline for Sentence Embeddings*](https://openreview.net/forum?id=SyK00v5xx)
+2. [Zhelezniak et al., ICLR 2019. *Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors*](https://openreview.net/forum?id=SkxXg2C5FX)
+3. [Vargas et al., ICML 2019. *Model Comparison for Semantic Grouping*](http://proceedings.mlr.press/v97/vargas19a.html)
+4. [Zhelezniak et al., NAACL-HLT 2019. *Correlation Coefficients and Semantic Textual Similarity*](https://www.aclweb.org/anthology/N19-1100/)
+5. [Zhelezniak et al., EMNLP-IJCNLP 2019. *Correlations between Word Vector Sets*](https://arxiv.org/abs/1910.02902)
+
+## Contact
+* [April Shen](https://github.com/apriltuesday)
+* [Sasho Savkov](https://github.com/savkov)
+* [Vitalii Zhelezniak](https://github.com/ironvital)
+
+%prep
+%autosetup -n simba-0.1.1
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-simba -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.1-1
+- Package Spec generated