automatic import of python-fuzzy-searchopeneuler20.03

author: CoprDistGit <infra@openeuler.org> 2023-06-20 09:55:44 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-06-20 09:55:44 +0000
commit: c512a2cb191f1b732d54e09554f4e7cae7b50542 (patch)
tree: bf8cc4f780cf9f463773fbfe8fae0a067e6202dc
parent: 8a280396539eb5036d8dc208fdcf253cf5fc0318 (diff)
3 files changed, 584 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..c60cb20 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/fuzzy_search-1.6.0.tar.gz
diff --git a/python-fuzzy-search.spec b/python-fuzzy-search.spec
new file mode 100644
index 0000000..da4f1ac
--- /dev/null
+++ b/python-fuzzy-search.spec
@@ -0,0 +1,582 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-fuzzy-search
+Version:	1.6.0
+Release:	1
+Summary:	Tool for fuzzy searching in texts with historical language use and OCR/HTR errors
+License:	MIT
+URL:		https://github.com/marijnkoolen/fuzzy-search
+Source0:	https://mirrors.aliyun.com/pypi/web/packages/c6/e5/58b98ae002bc5561bc7dff5cc8fb813cb2030193744f52601136f6c8104b/fuzzy_search-1.6.0.tar.gz
+BuildArch:	noarch
+
+
+%description
+# fuzzy-search
+Fuzzy search module for searching lists of words in low quality OCR and HTR text.
+
+Project page on PyPI: [https://pypi.org/project/fuzzy-search/](https://pypi.org/project/fuzzy-search/)
+
+## Installing
+
+```commandline
+pip install -u fuzzy-search
+```
+
+## Usage
+
+```python
+from fuzzy_search.fuzzy_phrase_searcher import FuzzyPhraseSearcher
+from fuzzy_search.fuzzy_phrase_model import PhraseModel
+
+# highger matching thresholds for higher quality OCR/HTR (higher precision, recall should be good anyway)
+# lower matching thresholds for lower quality OCR/HTR (higher recall, as that's the main problem)
+config = {
+    "char_match_threshold": 0.8,
+    "ngram_threshold": 0.6,
+    "levenshtein_threshold": 0.8,
+    "ignorecase": False,
+    "ngram_size": 3,
+    "skip_size": 0,
+}
+
+# initialize a new searcher instance with the config
+fuzzy_searcher = FuzzyPhraseSearcher(config)
+
+# create a list of domain keywords and phrases
+domain_phrases = [
+    # terms for the chair and attendants of a meeting
+    "PRAESIDE",
+    "PRAESENTIBUS",
+    # some weekdays in Latin
+    "Veneris", 
+    "Mercuri",
+    # some date phrase where any date in January 1725 should match
+    "den .. Januarii 1725"
+]
+
+# create a PhraseModel object from the domain phrases
+phrase_model = PhraseModel(phrases=domain_phrases)
+
+# register the phrase model with the searcher
+fuzzy_searcher.index_phrase_model(phrase_model)
+
+# take some example texts: meetings of the Dutch States General in January 1725
+text1 = "ie Veucris den 5. Januaris 1725. PR&ASIDE, Den Heere Bentinck. PRASENTIEBUS, De Heeren Jan Welderen , van Dam, Torck , met een extraordinaris Gedeputeerde uyt de Provincie van Gelderlandt. Van Maasdam , vanden Boeizelaar , Raadtpenfionaris van Hoornbeeck , met een extraordinaris Gedeputeerde uyt de Provincie van Hollandt ende Welt-Vrieslandt. Velters, Ockere , Noey; van Hoorn , met een extraordinaris Gedeputeerde uyt de Provincie van Zeelandt. Van Renswoude , van Voor{t. Van Schwartzenbergh, vander Waayen, Vegilin Van I{elmuden. Van Iddekinge ‚ van Tamminga."
+
+text2 = "Mercuri: den 10. Jangarii , | 1725. ia PRESIDE, Den Heere an Iddekinge. PRA&SENTIBUS, De Heeren /an Welderen , van Dam, van Wynbergen, Torck, met een extraordinaris Gedeputeerde uyt de Provincie van Gelderland. Van Maasdam , Raadtpenfionaris van Hoorn=beeck. Velters, Ockerfe, Noey. Taats van Amerongen, van Renswoude. Vander Waasen , Vegilin, ’ Bentinck, van I(elmaden. Van Tamminga."
+
+```
+
+The `find_matches` method returns match objects:
+
+```python
+# look for matches in the first example text
+for match in fuzzy_searcher.find_matches(text1):
+    print(match)
+```
+
+Printing the matches directly yields the following output:
+```python
+Match(phrase: "Veneris", variant: "Veneris",string: "Veucris", offset: 3)
+Match(phrase: "den .. Januarii 1725", variant: "den .. Januarii 1725",string: "den 5. Januaris 1725.", offset: 11)
+Match(phrase: "PRAESIDE", variant: "PRAESIDE",string: "PR&ASIDE,", offset: 33)
+Match(phrase: "PRAESENTIBUS", variant: "PRAESENTIBUS",string: "PRASENTIEBUS,", offset: 63)
+```
+
+Alternatively, each match object can generate a JSON representation of the match containing all information:
+
+```python
+# look for matches in the first example text
+for match in fuzzy_searcher.find_matches(text1):
+    print(match.json())
+```
+
+This yields more detailed output:
+
+```js
+{'match_keyword': 'Veneris', 'match_term': 'Veneris', 'match_string': 'Veucris', 'match_offset': 3, 'char_match': 0.7142857142857143, 'ngram_match': 0.625, 'levenshtein_distance': 0.7142857142857143}
+{'match_keyword': 'den .. Januarii 1725', 'match_term': 'den .. Januarii 1725', 'match_string': 'den 5. Januaris 1725', 'match_offset': 11, 'char_match': 0.9, 'ngram_match': 0.8095238095238095, 'levenshtein_distance': 0.9}
+{'match_keyword': 'PRAESIDE', 'match_term': 'PRAESIDE', 'match_string': 'PR&ASIDE', 'match_offset': 33, 'char_match': 0.875, 'ngram_match': 0.6666666666666666, 'levenshtein_distance': 0.75}
+{'match_keyword': 'PRAESENTIBUS', 'match_term': 'PRAESENTIBUS', 'match_string': 'PRASENTIEBUS', 'match_offset': 63, 'char_match': 1.0, 'ngram_match': 0.7692307692307693, 'levenshtein_distance': 0.8333333333333334}
+```
+
+Running the searcher on the second text:
+
+```python
+# look for matches in the second example text
+for match in fuzzy_searcher.find_candidates(text2):
+    print(match.json())
+```
+
+This yields the following output:
+
+```js
+{'phrase': 'Veneris', 'variant': 'Veneris', 'string': 'Veucris', 'offset': 3, 'match_scores': {'char_match': 0.7142857142857143, 'ngram_match': 0.625, 'levenshtein_similarity': 0.7142857142857143}}
+{'phrase': 'den .. Januarii 1725', 'variant': 'den .. Januarii 1725', 'string': 'den 5. Januaris 1725.', 'offset': 11, 'match_scores': {'char_match': 0.95, 'ngram_match': 0.7619047619047619, 'levenshtein_similarity': 0.8571428571428572}}
+{'phrase': 'PRAESIDE', 'variant': 'PRAESIDE', 'string': 'PR&ASIDE,', 'offset': 33, 'match_scores': {'char_match': 0.875, 'ngram_match': 0.5555555555555556, 'levenshtein_similarity': 0.6666666666666667}}
+{'phrase': 'PRAESENTIBUS', 'variant': 'PRAESENTIBUS', 'string': 'PRASENTIEBUS,', 'offset': 63, 'match_scores': {'char_match': 1.0, 'ngram_match': 0.6923076923076923, 'levenshtein_similarity': 0.7692307692307692}}
+```
+
+## Matches as Web Annotations
+
+If texts are passed to `find_matches` as dictionaries with an identifier, the resulting matches
+include the text identifier and can generate Web Annotation representations:
+
+```python
+# create a dictionary for the second text and add an identifier
+text2_with_id = {
+    "text": text2,
+    "id": "urn:republic:3783_0076:page=151:para=4"
+}
+matches = fuzzy_searcher.find_matches(text2_with_id)
+
+import json
+
+# use json.dumps to pretty print the first match as Web Annotation
+print(json.dumps(matches[0].as_web_anno(), indent=2))
+```
+
+Output:
+
+```json
+{
+  "@context": "http://www.w3.org/ns/anno.jsonld",
+  "id": "cca6740d-e584-4322-b517-67d92e0e508a",
+  "type": "Annotation",
+  "motivation": "classifying",
+  "created": "2020-12-08T10:22:26.838154",
+  "generator": {
+    "id": "https://github.com/marijnkoolen/fuzzy-search",
+    "type": "Software",
+    "name": "FuzzySearcher"
+  },
+  "target": {
+    "source": "urn:republic:3783_0076:page=151:para=4",
+    "selector": {
+      "type": "TextPositionSelector",
+      "start": 0,
+      "end": 8
+    }
+  },
+  "body": {
+    "type": "Dataset",
+    "value": {
+      "match_phrase": "Mercurii",
+      "match_variant": "Mercurii",
+      "match_string": "Mercuri:",
+      "phrase_metadata": {
+        "phrase": "Mercurii"
+      }
+    }
+  }
+}
+```
+
+[HTML docs](html_docs/index.html)
+
+
+## Documentation To Do
+
+- adding variant phrases and distractors
+- multiple searchers and searching in the context of other matches
+
+
+
+
+%package -n python3-fuzzy-search
+Summary:	Tool for fuzzy searching in texts with historical language use and OCR/HTR errors
+Provides:	python-fuzzy-search
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-fuzzy-search
+# fuzzy-search
+Fuzzy search module for searching lists of words in low quality OCR and HTR text.
+
+Project page on PyPI: [https://pypi.org/project/fuzzy-search/](https://pypi.org/project/fuzzy-search/)
+
+## Installing
+
+```commandline
+pip install -u fuzzy-search
+```
+
+## Usage
+
+```python
+from fuzzy_search.fuzzy_phrase_searcher import FuzzyPhraseSearcher
+from fuzzy_search.fuzzy_phrase_model import PhraseModel
+
+# highger matching thresholds for higher quality OCR/HTR (higher precision, recall should be good anyway)
+# lower matching thresholds for lower quality OCR/HTR (higher recall, as that's the main problem)
+config = {
+    "char_match_threshold": 0.8,
+    "ngram_threshold": 0.6,
+    "levenshtein_threshold": 0.8,
+    "ignorecase": False,
+    "ngram_size": 3,
+    "skip_size": 0,
+}
+
+# initialize a new searcher instance with the config
+fuzzy_searcher = FuzzyPhraseSearcher(config)
+
+# create a list of domain keywords and phrases
+domain_phrases = [
+    # terms for the chair and attendants of a meeting
+    "PRAESIDE",
+    "PRAESENTIBUS",
+    # some weekdays in Latin
+    "Veneris", 
+    "Mercuri",
+    # some date phrase where any date in January 1725 should match
+    "den .. Januarii 1725"
+]
+
+# create a PhraseModel object from the domain phrases
+phrase_model = PhraseModel(phrases=domain_phrases)
+
+# register the phrase model with the searcher
+fuzzy_searcher.index_phrase_model(phrase_model)
+
+# take some example texts: meetings of the Dutch States General in January 1725
+text1 = "ie Veucris den 5. Januaris 1725. PR&ASIDE, Den Heere Bentinck. PRASENTIEBUS, De Heeren Jan Welderen , van Dam, Torck , met een extraordinaris Gedeputeerde uyt de Provincie van Gelderlandt. Van Maasdam , vanden Boeizelaar , Raadtpenfionaris van Hoornbeeck , met een extraordinaris Gedeputeerde uyt de Provincie van Hollandt ende Welt-Vrieslandt. Velters, Ockere , Noey; van Hoorn , met een extraordinaris Gedeputeerde uyt de Provincie van Zeelandt. Van Renswoude , van Voor{t. Van Schwartzenbergh, vander Waayen, Vegilin Van I{elmuden. Van Iddekinge ‚ van Tamminga."
+
+text2 = "Mercuri: den 10. Jangarii , | 1725. ia PRESIDE, Den Heere an Iddekinge. PRA&SENTIBUS, De Heeren /an Welderen , van Dam, van Wynbergen, Torck, met een extraordinaris Gedeputeerde uyt de Provincie van Gelderland. Van Maasdam , Raadtpenfionaris van Hoorn=beeck. Velters, Ockerfe, Noey. Taats van Amerongen, van Renswoude. Vander Waasen , Vegilin, ’ Bentinck, van I(elmaden. Van Tamminga."
+
+```
+
+The `find_matches` method returns match objects:
+
+```python
+# look for matches in the first example text
+for match in fuzzy_searcher.find_matches(text1):
+    print(match)
+```
+
+Printing the matches directly yields the following output:
+```python
+Match(phrase: "Veneris", variant: "Veneris",string: "Veucris", offset: 3)
+Match(phrase: "den .. Januarii 1725", variant: "den .. Januarii 1725",string: "den 5. Januaris 1725.", offset: 11)
+Match(phrase: "PRAESIDE", variant: "PRAESIDE",string: "PR&ASIDE,", offset: 33)
+Match(phrase: "PRAESENTIBUS", variant: "PRAESENTIBUS",string: "PRASENTIEBUS,", offset: 63)
+```
+
+Alternatively, each match object can generate a JSON representation of the match containing all information:
+
+```python
+# look for matches in the first example text
+for match in fuzzy_searcher.find_matches(text1):
+    print(match.json())
+```
+
+This yields more detailed output:
+
+```js
+{'match_keyword': 'Veneris', 'match_term': 'Veneris', 'match_string': 'Veucris', 'match_offset': 3, 'char_match': 0.7142857142857143, 'ngram_match': 0.625, 'levenshtein_distance': 0.7142857142857143}
+{'match_keyword': 'den .. Januarii 1725', 'match_term': 'den .. Januarii 1725', 'match_string': 'den 5. Januaris 1725', 'match_offset': 11, 'char_match': 0.9, 'ngram_match': 0.8095238095238095, 'levenshtein_distance': 0.9}
+{'match_keyword': 'PRAESIDE', 'match_term': 'PRAESIDE', 'match_string': 'PR&ASIDE', 'match_offset': 33, 'char_match': 0.875, 'ngram_match': 0.6666666666666666, 'levenshtein_distance': 0.75}
+{'match_keyword': 'PRAESENTIBUS', 'match_term': 'PRAESENTIBUS', 'match_string': 'PRASENTIEBUS', 'match_offset': 63, 'char_match': 1.0, 'ngram_match': 0.7692307692307693, 'levenshtein_distance': 0.8333333333333334}
+```
+
+Running the searcher on the second text:
+
+```python
+# look for matches in the second example text
+for match in fuzzy_searcher.find_candidates(text2):
+    print(match.json())
+```
+
+This yields the following output:
+
+```js
+{'phrase': 'Veneris', 'variant': 'Veneris', 'string': 'Veucris', 'offset': 3, 'match_scores': {'char_match': 0.7142857142857143, 'ngram_match': 0.625, 'levenshtein_similarity': 0.7142857142857143}}
+{'phrase': 'den .. Januarii 1725', 'variant': 'den .. Januarii 1725', 'string': 'den 5. Januaris 1725.', 'offset': 11, 'match_scores': {'char_match': 0.95, 'ngram_match': 0.7619047619047619, 'levenshtein_similarity': 0.8571428571428572}}
+{'phrase': 'PRAESIDE', 'variant': 'PRAESIDE', 'string': 'PR&ASIDE,', 'offset': 33, 'match_scores': {'char_match': 0.875, 'ngram_match': 0.5555555555555556, 'levenshtein_similarity': 0.6666666666666667}}
+{'phrase': 'PRAESENTIBUS', 'variant': 'PRAESENTIBUS', 'string': 'PRASENTIEBUS,', 'offset': 63, 'match_scores': {'char_match': 1.0, 'ngram_match': 0.6923076923076923, 'levenshtein_similarity': 0.7692307692307692}}
+```
+
+## Matches as Web Annotations
+
+If texts are passed to `find_matches` as dictionaries with an identifier, the resulting matches
+include the text identifier and can generate Web Annotation representations:
+
+```python
+# create a dictionary for the second text and add an identifier
+text2_with_id = {
+    "text": text2,
+    "id": "urn:republic:3783_0076:page=151:para=4"
+}
+matches = fuzzy_searcher.find_matches(text2_with_id)
+
+import json
+
+# use json.dumps to pretty print the first match as Web Annotation
+print(json.dumps(matches[0].as_web_anno(), indent=2))
+```
+
+Output:
+
+```json
+{
+  "@context": "http://www.w3.org/ns/anno.jsonld",
+  "id": "cca6740d-e584-4322-b517-67d92e0e508a",
+  "type": "Annotation",
+  "motivation": "classifying",
+  "created": "2020-12-08T10:22:26.838154",
+  "generator": {
+    "id": "https://github.com/marijnkoolen/fuzzy-search",
+    "type": "Software",
+    "name": "FuzzySearcher"
+  },
+  "target": {
+    "source": "urn:republic:3783_0076:page=151:para=4",
+    "selector": {
+      "type": "TextPositionSelector",
+      "start": 0,
+      "end": 8
+    }
+  },
+  "body": {
+    "type": "Dataset",
+    "value": {
+      "match_phrase": "Mercurii",
+      "match_variant": "Mercurii",
+      "match_string": "Mercuri:",
+      "phrase_metadata": {
+        "phrase": "Mercurii"
+      }
+    }
+  }
+}
+```
+
+[HTML docs](html_docs/index.html)
+
+
+## Documentation To Do
+
+- adding variant phrases and distractors
+- multiple searchers and searching in the context of other matches
+
+
+
+
+%package help
+Summary:	Development documents and examples for fuzzy-search
+Provides:	python3-fuzzy-search-doc
+%description help
+# fuzzy-search
+Fuzzy search module for searching lists of words in low quality OCR and HTR text.
+
+Project page on PyPI: [https://pypi.org/project/fuzzy-search/](https://pypi.org/project/fuzzy-search/)
+
+## Installing
+
+```commandline
+pip install -u fuzzy-search
+```
+
+## Usage
+
+```python
+from fuzzy_search.fuzzy_phrase_searcher import FuzzyPhraseSearcher
+from fuzzy_search.fuzzy_phrase_model import PhraseModel
+
+# highger matching thresholds for higher quality OCR/HTR (higher precision, recall should be good anyway)
+# lower matching thresholds for lower quality OCR/HTR (higher recall, as that's the main problem)
+config = {
+    "char_match_threshold": 0.8,
+    "ngram_threshold": 0.6,
+    "levenshtein_threshold": 0.8,
+    "ignorecase": False,
+    "ngram_size": 3,
+    "skip_size": 0,
+}
+
+# initialize a new searcher instance with the config
+fuzzy_searcher = FuzzyPhraseSearcher(config)
+
+# create a list of domain keywords and phrases
+domain_phrases = [
+    # terms for the chair and attendants of a meeting
+    "PRAESIDE",
+    "PRAESENTIBUS",
+    # some weekdays in Latin
+    "Veneris", 
+    "Mercuri",
+    # some date phrase where any date in January 1725 should match
+    "den .. Januarii 1725"
+]
+
+# create a PhraseModel object from the domain phrases
+phrase_model = PhraseModel(phrases=domain_phrases)
+
+# register the phrase model with the searcher
+fuzzy_searcher.index_phrase_model(phrase_model)
+
+# take some example texts: meetings of the Dutch States General in January 1725
+text1 = "ie Veucris den 5. Januaris 1725. PR&ASIDE, Den Heere Bentinck. PRASENTIEBUS, De Heeren Jan Welderen , van Dam, Torck , met een extraordinaris Gedeputeerde uyt de Provincie van Gelderlandt. Van Maasdam , vanden Boeizelaar , Raadtpenfionaris van Hoornbeeck , met een extraordinaris Gedeputeerde uyt de Provincie van Hollandt ende Welt-Vrieslandt. Velters, Ockere , Noey; van Hoorn , met een extraordinaris Gedeputeerde uyt de Provincie van Zeelandt. Van Renswoude , van Voor{t. Van Schwartzenbergh, vander Waayen, Vegilin Van I{elmuden. Van Iddekinge ‚ van Tamminga."
+
+text2 = "Mercuri: den 10. Jangarii , | 1725. ia PRESIDE, Den Heere an Iddekinge. PRA&SENTIBUS, De Heeren /an Welderen , van Dam, van Wynbergen, Torck, met een extraordinaris Gedeputeerde uyt de Provincie van Gelderland. Van Maasdam , Raadtpenfionaris van Hoorn=beeck. Velters, Ockerfe, Noey. Taats van Amerongen, van Renswoude. Vander Waasen , Vegilin, ’ Bentinck, van I(elmaden. Van Tamminga."
+
+```
+
+The `find_matches` method returns match objects:
+
+```python
+# look for matches in the first example text
+for match in fuzzy_searcher.find_matches(text1):
+    print(match)
+```
+
+Printing the matches directly yields the following output:
+```python
+Match(phrase: "Veneris", variant: "Veneris",string: "Veucris", offset: 3)
+Match(phrase: "den .. Januarii 1725", variant: "den .. Januarii 1725",string: "den 5. Januaris 1725.", offset: 11)
+Match(phrase: "PRAESIDE", variant: "PRAESIDE",string: "PR&ASIDE,", offset: 33)
+Match(phrase: "PRAESENTIBUS", variant: "PRAESENTIBUS",string: "PRASENTIEBUS,", offset: 63)
+```
+
+Alternatively, each match object can generate a JSON representation of the match containing all information:
+
+```python
+# look for matches in the first example text
+for match in fuzzy_searcher.find_matches(text1):
+    print(match.json())
+```
+
+This yields more detailed output:
+
+```js
+{'match_keyword': 'Veneris', 'match_term': 'Veneris', 'match_string': 'Veucris', 'match_offset': 3, 'char_match': 0.7142857142857143, 'ngram_match': 0.625, 'levenshtein_distance': 0.7142857142857143}
+{'match_keyword': 'den .. Januarii 1725', 'match_term': 'den .. Januarii 1725', 'match_string': 'den 5. Januaris 1725', 'match_offset': 11, 'char_match': 0.9, 'ngram_match': 0.8095238095238095, 'levenshtein_distance': 0.9}
+{'match_keyword': 'PRAESIDE', 'match_term': 'PRAESIDE', 'match_string': 'PR&ASIDE', 'match_offset': 33, 'char_match': 0.875, 'ngram_match': 0.6666666666666666, 'levenshtein_distance': 0.75}
+{'match_keyword': 'PRAESENTIBUS', 'match_term': 'PRAESENTIBUS', 'match_string': 'PRASENTIEBUS', 'match_offset': 63, 'char_match': 1.0, 'ngram_match': 0.7692307692307693, 'levenshtein_distance': 0.8333333333333334}
+```
+
+Running the searcher on the second text:
+
+```python
+# look for matches in the second example text
+for match in fuzzy_searcher.find_candidates(text2):
+    print(match.json())
+```
+
+This yields the following output:
+
+```js
+{'phrase': 'Veneris', 'variant': 'Veneris', 'string': 'Veucris', 'offset': 3, 'match_scores': {'char_match': 0.7142857142857143, 'ngram_match': 0.625, 'levenshtein_similarity': 0.7142857142857143}}
+{'phrase': 'den .. Januarii 1725', 'variant': 'den .. Januarii 1725', 'string': 'den 5. Januaris 1725.', 'offset': 11, 'match_scores': {'char_match': 0.95, 'ngram_match': 0.7619047619047619, 'levenshtein_similarity': 0.8571428571428572}}
+{'phrase': 'PRAESIDE', 'variant': 'PRAESIDE', 'string': 'PR&ASIDE,', 'offset': 33, 'match_scores': {'char_match': 0.875, 'ngram_match': 0.5555555555555556, 'levenshtein_similarity': 0.6666666666666667}}
+{'phrase': 'PRAESENTIBUS', 'variant': 'PRAESENTIBUS', 'string': 'PRASENTIEBUS,', 'offset': 63, 'match_scores': {'char_match': 1.0, 'ngram_match': 0.6923076923076923, 'levenshtein_similarity': 0.7692307692307692}}
+```
+
+## Matches as Web Annotations
+
+If texts are passed to `find_matches` as dictionaries with an identifier, the resulting matches
+include the text identifier and can generate Web Annotation representations:
+
+```python
+# create a dictionary for the second text and add an identifier
+text2_with_id = {
+    "text": text2,
+    "id": "urn:republic:3783_0076:page=151:para=4"
+}
+matches = fuzzy_searcher.find_matches(text2_with_id)
+
+import json
+
+# use json.dumps to pretty print the first match as Web Annotation
+print(json.dumps(matches[0].as_web_anno(), indent=2))
+```
+
+Output:
+
+```json
+{
+  "@context": "http://www.w3.org/ns/anno.jsonld",
+  "id": "cca6740d-e584-4322-b517-67d92e0e508a",
+  "type": "Annotation",
+  "motivation": "classifying",
+  "created": "2020-12-08T10:22:26.838154",
+  "generator": {
+    "id": "https://github.com/marijnkoolen/fuzzy-search",
+    "type": "Software",
+    "name": "FuzzySearcher"
+  },
+  "target": {
+    "source": "urn:republic:3783_0076:page=151:para=4",
+    "selector": {
+      "type": "TextPositionSelector",
+      "start": 0,
+      "end": 8
+    }
+  },
+  "body": {
+    "type": "Dataset",
+    "value": {
+      "match_phrase": "Mercurii",
+      "match_variant": "Mercurii",
+      "match_string": "Mercuri:",
+      "phrase_metadata": {
+        "phrase": "Mercurii"
+      }
+    }
+  }
+}
+```
+
+[HTML docs](html_docs/index.html)
+
+
+## Documentation To Do
+
+- adding variant phrases and distractors
+- multiple searchers and searching in the context of other matches
+
+
+
+
+%prep
+%autosetup -n fuzzy_search-1.6.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-fuzzy-search -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 1.6.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..2dc912e
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+7b61aba61478daa2efeacbbd806d1484  fuzzy_search-1.6.0.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-06-20 09:55:44 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-06-20 09:55:44 +0000
commit	c512a2cb191f1b732d54e09554f4e7cae7b50542 (patch)
tree	bf8cc4f780cf9f463773fbfe8fae0a067e6202dc
parent	8a280396539eb5036d8dc208fdcf253cf5fc0318 (diff)