summaryrefslogtreecommitdiff
path: root/python-pyserini.spec
diff options
context:
space:
mode:
Diffstat (limited to 'python-pyserini.spec')
-rw-r--r--python-pyserini.spec347
1 files changed, 347 insertions, 0 deletions
diff --git a/python-pyserini.spec b/python-pyserini.spec
new file mode 100644
index 0000000..8ecbf6c
--- /dev/null
+++ b/python-pyserini.spec
@@ -0,0 +1,347 @@
+%global _empty_manifest_terminate_build 0
+Name: python-pyserini
+Version: 0.21.0
+Release: 1
+Summary: A Python toolkit for reproducible information retrieval research with sparse and dense representations
+License: Apache Software License
+URL: https://github.com/castorini/pyserini
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/20/d2/d62af52f076f6b94f03dc082a35314b54deaeae28edfd799f8a0b692aade/pyserini-0.21.0.tar.gz
+BuildArch: noarch
+
+Requires: python3-Cython
+Requires: python3-numpy
+Requires: python3-pandas
+Requires: python3-pyjnius
+Requires: python3-scikit-learn
+Requires: python3-scipy
+Requires: python3-tqdm
+Requires: python3-transformers
+Requires: python3-sentencepiece
+Requires: python3-nmslib
+Requires: python3-onnxruntime
+Requires: python3-lightgbm
+Requires: python3-spacy
+Requires: python3-pyyaml
+
+%description
+Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
+Retrieval using sparse representations is provided via integration with our group's [Anserini](http://anserini.io/) IR toolkit, which is built on Lucene.
+Retrieval using dense representations is provided via integration with Facebook's [Faiss](https://github.com/facebookresearch/faiss) library.
+
+Pyserini is primarily designed to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture.
+Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections
+
+## Installation
+
+Install via PyPI:
+
+```
+pip install pyserini
+```
+
+Pyserini requires Python 3.8+ and Java 11 (due to its dependency on [Anserini](http://anserini.io/)).
+
+Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
+A `pip` installation will automatically pull in the [🤗 Transformers library](https://github.com/huggingface/transformers) to satisfy the package requirements.
+Pyserini also depends on [PyTorch](https://pytorch.org/) and [Faiss](https://github.com/facebookresearch/faiss), but since these packages may require platform-specific custom configuration, they are _not_ explicitly listed in the package requirements.
+We leave the installation of these packages to you.
+Refer to documentation in [our repo](https://github.com/castorini/pyserini/) for additional details.
+
+## Usage
+
+The `LuceneSearcher` class provides the entry point for sparse retrieval using bag-of-words representations.
+Anserini supports a number of pre-built indexes for common collections that it'll automatically download for you and store in `~/.cache/pyserini/indexes/`.
+Here's how to use a pre-built index for the [MS MARCO passage ranking task](http://www.msmarco.org/) and issue a query interactively (using BM25 ranking):
+
+```python
+from pyserini.search.lucene import LuceneSearcher
+
+searcher = LuceneSearcher.from_prebuilt_index('msmarco-v1-passage')
+hits = searcher.search('what is a lobster roll?')
+
+for i in range(0, 10):
+ print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')
+```
+
+The results should be as follows:
+
+```
+ 1 7157707 11.00830
+ 2 6034357 10.94310
+ 3 5837606 10.81740
+ 4 7157715 10.59820
+ 5 6034350 10.48360
+ 6 2900045 10.31190
+ 7 7157713 10.12300
+ 8 1584344 10.05290
+ 9 533614 9.96350
+10 6234461 9.92200
+```
+
+The `FaissSearcher` class provides the entry point for dense retrieval, and its usage is quite similar to `LuceneSearcher`.
+The only additional thing we need to specify for dense retrieval is the query encoder.
+
+```python
+from pyserini.search.faiss import FaissSearcher, TctColBertQueryEncoder
+
+encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
+searcher = FaissSearcher.from_prebuilt_index(
+ 'msmarco-passage-tct_colbert-hnsw',
+ encoder
+)
+hits = searcher.search('what is a lobster roll')
+
+for i in range(0, 10):
+ print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')
+```
+
+The results should be as follows:
+
+```
+ 1 7157710 70.53742
+ 2 7157715 70.50040
+ 3 7157707 70.13804
+ 4 6034350 69.93666
+ 5 6321969 69.62683
+ 6 4112862 69.34587
+ 7 5515474 69.21354
+ 8 7157708 69.08416
+ 9 6321974 69.06841
+10 2920399 69.01737
+```
+
+For complete documentation, please refer to [our repo](https://github.com/castorini/pyserini/).
+
+
+%package -n python3-pyserini
+Summary: A Python toolkit for reproducible information retrieval research with sparse and dense representations
+Provides: python-pyserini
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-pyserini
+Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
+Retrieval using sparse representations is provided via integration with our group's [Anserini](http://anserini.io/) IR toolkit, which is built on Lucene.
+Retrieval using dense representations is provided via integration with Facebook's [Faiss](https://github.com/facebookresearch/faiss) library.
+
+Pyserini is primarily designed to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture.
+Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections
+
+## Installation
+
+Install via PyPI:
+
+```
+pip install pyserini
+```
+
+Pyserini requires Python 3.8+ and Java 11 (due to its dependency on [Anserini](http://anserini.io/)).
+
+Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
+A `pip` installation will automatically pull in the [🤗 Transformers library](https://github.com/huggingface/transformers) to satisfy the package requirements.
+Pyserini also depends on [PyTorch](https://pytorch.org/) and [Faiss](https://github.com/facebookresearch/faiss), but since these packages may require platform-specific custom configuration, they are _not_ explicitly listed in the package requirements.
+We leave the installation of these packages to you.
+Refer to documentation in [our repo](https://github.com/castorini/pyserini/) for additional details.
+
+## Usage
+
+The `LuceneSearcher` class provides the entry point for sparse retrieval using bag-of-words representations.
+Anserini supports a number of pre-built indexes for common collections that it'll automatically download for you and store in `~/.cache/pyserini/indexes/`.
+Here's how to use a pre-built index for the [MS MARCO passage ranking task](http://www.msmarco.org/) and issue a query interactively (using BM25 ranking):
+
+```python
+from pyserini.search.lucene import LuceneSearcher
+
+searcher = LuceneSearcher.from_prebuilt_index('msmarco-v1-passage')
+hits = searcher.search('what is a lobster roll?')
+
+for i in range(0, 10):
+ print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')
+```
+
+The results should be as follows:
+
+```
+ 1 7157707 11.00830
+ 2 6034357 10.94310
+ 3 5837606 10.81740
+ 4 7157715 10.59820
+ 5 6034350 10.48360
+ 6 2900045 10.31190
+ 7 7157713 10.12300
+ 8 1584344 10.05290
+ 9 533614 9.96350
+10 6234461 9.92200
+```
+
+The `FaissSearcher` class provides the entry point for dense retrieval, and its usage is quite similar to `LuceneSearcher`.
+The only additional thing we need to specify for dense retrieval is the query encoder.
+
+```python
+from pyserini.search.faiss import FaissSearcher, TctColBertQueryEncoder
+
+encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
+searcher = FaissSearcher.from_prebuilt_index(
+ 'msmarco-passage-tct_colbert-hnsw',
+ encoder
+)
+hits = searcher.search('what is a lobster roll')
+
+for i in range(0, 10):
+ print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')
+```
+
+The results should be as follows:
+
+```
+ 1 7157710 70.53742
+ 2 7157715 70.50040
+ 3 7157707 70.13804
+ 4 6034350 69.93666
+ 5 6321969 69.62683
+ 6 4112862 69.34587
+ 7 5515474 69.21354
+ 8 7157708 69.08416
+ 9 6321974 69.06841
+10 2920399 69.01737
+```
+
+For complete documentation, please refer to [our repo](https://github.com/castorini/pyserini/).
+
+
+%package help
+Summary: Development documents and examples for pyserini
+Provides: python3-pyserini-doc
+%description help
+Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
+Retrieval using sparse representations is provided via integration with our group's [Anserini](http://anserini.io/) IR toolkit, which is built on Lucene.
+Retrieval using dense representations is provided via integration with Facebook's [Faiss](https://github.com/facebookresearch/faiss) library.
+
+Pyserini is primarily designed to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture.
+Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections
+
+## Installation
+
+Install via PyPI:
+
+```
+pip install pyserini
+```
+
+Pyserini requires Python 3.8+ and Java 11 (due to its dependency on [Anserini](http://anserini.io/)).
+
+Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
+A `pip` installation will automatically pull in the [🤗 Transformers library](https://github.com/huggingface/transformers) to satisfy the package requirements.
+Pyserini also depends on [PyTorch](https://pytorch.org/) and [Faiss](https://github.com/facebookresearch/faiss), but since these packages may require platform-specific custom configuration, they are _not_ explicitly listed in the package requirements.
+We leave the installation of these packages to you.
+Refer to documentation in [our repo](https://github.com/castorini/pyserini/) for additional details.
+
+## Usage
+
+The `LuceneSearcher` class provides the entry point for sparse retrieval using bag-of-words representations.
+Anserini supports a number of pre-built indexes for common collections that it'll automatically download for you and store in `~/.cache/pyserini/indexes/`.
+Here's how to use a pre-built index for the [MS MARCO passage ranking task](http://www.msmarco.org/) and issue a query interactively (using BM25 ranking):
+
+```python
+from pyserini.search.lucene import LuceneSearcher
+
+searcher = LuceneSearcher.from_prebuilt_index('msmarco-v1-passage')
+hits = searcher.search('what is a lobster roll?')
+
+for i in range(0, 10):
+ print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')
+```
+
+The results should be as follows:
+
+```
+ 1 7157707 11.00830
+ 2 6034357 10.94310
+ 3 5837606 10.81740
+ 4 7157715 10.59820
+ 5 6034350 10.48360
+ 6 2900045 10.31190
+ 7 7157713 10.12300
+ 8 1584344 10.05290
+ 9 533614 9.96350
+10 6234461 9.92200
+```
+
+The `FaissSearcher` class provides the entry point for dense retrieval, and its usage is quite similar to `LuceneSearcher`.
+The only additional thing we need to specify for dense retrieval is the query encoder.
+
+```python
+from pyserini.search.faiss import FaissSearcher, TctColBertQueryEncoder
+
+encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
+searcher = FaissSearcher.from_prebuilt_index(
+ 'msmarco-passage-tct_colbert-hnsw',
+ encoder
+)
+hits = searcher.search('what is a lobster roll')
+
+for i in range(0, 10):
+ print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')
+```
+
+The results should be as follows:
+
+```
+ 1 7157710 70.53742
+ 2 7157715 70.50040
+ 3 7157707 70.13804
+ 4 6034350 69.93666
+ 5 6321969 69.62683
+ 6 4112862 69.34587
+ 7 5515474 69.21354
+ 8 7157708 69.08416
+ 9 6321974 69.06841
+10 2920399 69.01737
+```
+
+For complete documentation, please refer to [our repo](https://github.com/castorini/pyserini/).
+
+
+%prep
+%autosetup -n pyserini-0.21.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pyserini -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 0.21.0-1
+- Package Spec generated