summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-rank-bm25.spec290
-rw-r--r--sources1
3 files changed, 292 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..af92794 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/rank_bm25-0.2.2.tar.gz
diff --git a/python-rank-bm25.spec b/python-rank-bm25.spec
new file mode 100644
index 0000000..ff6ac5c
--- /dev/null
+++ b/python-rank-bm25.spec
@@ -0,0 +1,290 @@
+%global _empty_manifest_terminate_build 0
+Name: python-rank-bm25
+Version: 0.2.2
+Release: 1
+Summary: Various BM25 algorithms for document ranking
+License: Apache2.0
+URL: https://github.com/dorianbrown/rank_bm25
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/fc/0a/f9579384aa017d8b4c15613f86954b92a95a93d641cc849182467cf0bb3b/rank_bm25-0.2.2.tar.gz
+BuildArch: noarch
+
+Requires: python3-numpy
+Requires: python3-pytest
+
+%description
+
+# Rank-BM25: A two line search engine
+
+![Build Status](https://github.com/dorianbrown/rank_bm25/workflows/pytest/badge.svg)
+[![PyPI version](https://badge.fury.io/py/rank-bm25.svg)](https://badge.fury.io/py/rank-bm25)
+[![DOI](https://zenodo.org/badge/166720547.svg)](https://zenodo.org/badge/latestdoi/166720547)
+
+A collection of algorithms for querying a set of documents and returning the ones most relevant to the query. The most common use case for these algorithms is, as you might have guessed, to create search engines.
+
+So far the algorithms that have been implemented are:
+- [x] Okapi BM25
+- [x] BM25L
+- [x] BM25+
+- [ ] BM25-Adpt
+- [ ] BM25T
+
+These algorithms were taken from [this paper](http://www.cs.otago.ac.nz/homepages/andrew/papers/2014-2.pdf), which gives a nice overview of each method, and also benchmarks them against each other. A nice inclusion is that they compare different kinds of preprocessing like stemming vs no-stemming, stopword removal or not, etc. Great read if you're new to the topic.
+
+## Installation
+The easiest way to install this package is through `pip`, using
+```bash
+pip install rank_bm25
+```
+If you want to be sure you're getting the newest version, you can install it directly from github with
+```bash
+pip install git+ssh://git@github.com/dorianbrown/rank_bm25.git
+```
+
+## Usage
+For this example we'll be using the `BM25Okapi` algorithm, but the others are used in pretty much the same way.
+
+### Initalizing
+
+First thing to do is create an instance of the BM25 class, which reads in a corpus of text and does some indexing on it:
+```python
+from rank_bm25 import BM25Okapi
+
+corpus = [
+ "Hello there good man!",
+ "It is quite windy in London",
+ "How is the weather today?"
+]
+
+tokenized_corpus = [doc.split(" ") for doc in corpus]
+
+bm25 = BM25Okapi(tokenized_corpus)
+# <rank_bm25.BM25Okapi at 0x1047881d0>
+```
+Note that this package doesn't do any text preprocessing. If you want to do things like lowercasing, stopword removal, stemming, etc, you need to do it yourself.
+
+The only requirements is that the class receives a list of lists of strings, which are the document tokens.
+
+### Ranking of documents
+
+Now that we've created our document indexes, we can give it queries and see which documents are the most relevant:
+```python
+query = "windy London"
+tokenized_query = query.split(" ")
+
+doc_scores = bm25.get_scores(tokenized_query)
+# array([0. , 0.93729472, 0. ])
+```
+Good to note that we also need to tokenize our query, and apply the same preprocessing steps we did to the documents in order to have an apples-to-apples comparison
+
+Instead of getting the document scores, you can also just retrieve the best documents with
+```python
+bm25.get_top_n(tokenized_query, corpus, n=1)
+# ['It is quite windy in London']
+```
+And that's pretty much it!
+
+
+
+
+%package -n python3-rank-bm25
+Summary: Various BM25 algorithms for document ranking
+Provides: python-rank-bm25
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-rank-bm25
+
+# Rank-BM25: A two line search engine
+
+![Build Status](https://github.com/dorianbrown/rank_bm25/workflows/pytest/badge.svg)
+[![PyPI version](https://badge.fury.io/py/rank-bm25.svg)](https://badge.fury.io/py/rank-bm25)
+[![DOI](https://zenodo.org/badge/166720547.svg)](https://zenodo.org/badge/latestdoi/166720547)
+
+A collection of algorithms for querying a set of documents and returning the ones most relevant to the query. The most common use case for these algorithms is, as you might have guessed, to create search engines.
+
+So far the algorithms that have been implemented are:
+- [x] Okapi BM25
+- [x] BM25L
+- [x] BM25+
+- [ ] BM25-Adpt
+- [ ] BM25T
+
+These algorithms were taken from [this paper](http://www.cs.otago.ac.nz/homepages/andrew/papers/2014-2.pdf), which gives a nice overview of each method, and also benchmarks them against each other. A nice inclusion is that they compare different kinds of preprocessing like stemming vs no-stemming, stopword removal or not, etc. Great read if you're new to the topic.
+
+## Installation
+The easiest way to install this package is through `pip`, using
+```bash
+pip install rank_bm25
+```
+If you want to be sure you're getting the newest version, you can install it directly from github with
+```bash
+pip install git+ssh://git@github.com/dorianbrown/rank_bm25.git
+```
+
+## Usage
+For this example we'll be using the `BM25Okapi` algorithm, but the others are used in pretty much the same way.
+
+### Initalizing
+
+First thing to do is create an instance of the BM25 class, which reads in a corpus of text and does some indexing on it:
+```python
+from rank_bm25 import BM25Okapi
+
+corpus = [
+ "Hello there good man!",
+ "It is quite windy in London",
+ "How is the weather today?"
+]
+
+tokenized_corpus = [doc.split(" ") for doc in corpus]
+
+bm25 = BM25Okapi(tokenized_corpus)
+# <rank_bm25.BM25Okapi at 0x1047881d0>
+```
+Note that this package doesn't do any text preprocessing. If you want to do things like lowercasing, stopword removal, stemming, etc, you need to do it yourself.
+
+The only requirements is that the class receives a list of lists of strings, which are the document tokens.
+
+### Ranking of documents
+
+Now that we've created our document indexes, we can give it queries and see which documents are the most relevant:
+```python
+query = "windy London"
+tokenized_query = query.split(" ")
+
+doc_scores = bm25.get_scores(tokenized_query)
+# array([0. , 0.93729472, 0. ])
+```
+Good to note that we also need to tokenize our query, and apply the same preprocessing steps we did to the documents in order to have an apples-to-apples comparison
+
+Instead of getting the document scores, you can also just retrieve the best documents with
+```python
+bm25.get_top_n(tokenized_query, corpus, n=1)
+# ['It is quite windy in London']
+```
+And that's pretty much it!
+
+
+
+
+%package help
+Summary: Development documents and examples for rank-bm25
+Provides: python3-rank-bm25-doc
+%description help
+
+# Rank-BM25: A two line search engine
+
+![Build Status](https://github.com/dorianbrown/rank_bm25/workflows/pytest/badge.svg)
+[![PyPI version](https://badge.fury.io/py/rank-bm25.svg)](https://badge.fury.io/py/rank-bm25)
+[![DOI](https://zenodo.org/badge/166720547.svg)](https://zenodo.org/badge/latestdoi/166720547)
+
+A collection of algorithms for querying a set of documents and returning the ones most relevant to the query. The most common use case for these algorithms is, as you might have guessed, to create search engines.
+
+So far the algorithms that have been implemented are:
+- [x] Okapi BM25
+- [x] BM25L
+- [x] BM25+
+- [ ] BM25-Adpt
+- [ ] BM25T
+
+These algorithms were taken from [this paper](http://www.cs.otago.ac.nz/homepages/andrew/papers/2014-2.pdf), which gives a nice overview of each method, and also benchmarks them against each other. A nice inclusion is that they compare different kinds of preprocessing like stemming vs no-stemming, stopword removal or not, etc. Great read if you're new to the topic.
+
+## Installation
+The easiest way to install this package is through `pip`, using
+```bash
+pip install rank_bm25
+```
+If you want to be sure you're getting the newest version, you can install it directly from github with
+```bash
+pip install git+ssh://git@github.com/dorianbrown/rank_bm25.git
+```
+
+## Usage
+For this example we'll be using the `BM25Okapi` algorithm, but the others are used in pretty much the same way.
+
+### Initalizing
+
+First thing to do is create an instance of the BM25 class, which reads in a corpus of text and does some indexing on it:
+```python
+from rank_bm25 import BM25Okapi
+
+corpus = [
+ "Hello there good man!",
+ "It is quite windy in London",
+ "How is the weather today?"
+]
+
+tokenized_corpus = [doc.split(" ") for doc in corpus]
+
+bm25 = BM25Okapi(tokenized_corpus)
+# <rank_bm25.BM25Okapi at 0x1047881d0>
+```
+Note that this package doesn't do any text preprocessing. If you want to do things like lowercasing, stopword removal, stemming, etc, you need to do it yourself.
+
+The only requirements is that the class receives a list of lists of strings, which are the document tokens.
+
+### Ranking of documents
+
+Now that we've created our document indexes, we can give it queries and see which documents are the most relevant:
+```python
+query = "windy London"
+tokenized_query = query.split(" ")
+
+doc_scores = bm25.get_scores(tokenized_query)
+# array([0. , 0.93729472, 0. ])
+```
+Good to note that we also need to tokenize our query, and apply the same preprocessing steps we did to the documents in order to have an apples-to-apples comparison
+
+Instead of getting the document scores, you can also just retrieve the best documents with
+```python
+bm25.get_top_n(tokenized_query, corpus, n=1)
+# ['It is quite windy in London']
+```
+And that's pretty much it!
+
+
+
+
+%prep
+%autosetup -n rank-bm25-0.2.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-rank-bm25 -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.2-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..e17adb1
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+48562f27ad8795c3097bff5fec1721eb rank_bm25-0.2.2.tar.gz