3 files changed, 289 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..b14167a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/fuzzycat-0.1.23.tar.gz
diff --git a/python-fuzzycat.spec b/python-fuzzycat.spec
new file mode 100644
index 0000000..7566f9b
--- /dev/null
+++ b/python-fuzzycat.spec
@@ -0,0 +1,287 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-fuzzycat
+Version:	0.1.23
+Release:	1
+Summary:	Fuzzy matching utilities for scholarly metadata
+License:	MIT License
+URL:		https://github.com/miku/fuzzycat
+Source0:	https://mirrors.aliyun.com/pypi/web/packages/ff/91/19a75e56b496384ca5635de7640cc50ef0de75853b1ce3c758ea6e85cdb0/fuzzycat-0.1.23.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-dynaconf
+Requires:	python3-elasticsearch
+Requires:	python3-elasticsearch-dsl
+Requires:	python3-fatcat-openapi-client
+Requires:	python3-ftfy
+Requires:	python3-glom
+Requires:	python3-grobid-tei-xml
+Requires:	python3-jellyfish
+Requires:	python3-pyyaml
+Requires:	python3-regex
+Requires:	python3-requests
+Requires:	python3-thefuzz
+Requires:	python3-toml
+Requires:	python3-unidecode
+Requires:	python3-zstandard
+Requires:	python3-ipython
+Requires:	python3-isort
+Requires:	python3-mypy
+Requires:	python3-pylint
+Requires:	python3-pytest
+Requires:	python3-pytest-cov
+Requires:	python3-twine
+Requires:	python3-yapf
+
+%description
+![https://pypi.org/project/fuzzycat/](https://img.shields.io/pypi/v/fuzzycat?style=flat-square)
+This Python library contains routines for finding near-duplicate bibliographic
+entities (primarily research papers), and estimating whether two metadata
+records describe the same work (or variations of the same work). Some routines
+are designed to work "offline" with batches of billions of sorted metadata
+records, and others are designed to work "online" making queries against hosted
+web services and catalogs.
+`fuzzycat` was originally developed by Martin Czygan at the Internet Archive,
+and is used in the construction of a [citation
+graph](https://gitlab.com/internetarchive/refcat) and to identify duplicate
+records in the [fatcat.wiki](https://fatcat.wiki) catalog and
+[scholar.archive.org](https://scholar.archive.org) search index.
+**DISCLAIMER:** this tool is still under development, as indicated by the "0"
+major version. The interface, semantics, and behavior are likely to be tweaked.
+## Quickstart
+Inside a `virtualenv` (or similar), install with [pip](https://pypi.org/project/pip/):
+```
+pip install fuzzycat
+```
+The `fuzzycat.simple` module contains high-level helpers which query Internet
+Archive hosted services:
+    import elasticsearch
+    from fuzzycat.simple import *
+    es_client = elasticsearch.Elasticsearch("https://search.fatcat.wiki:443")
+    # parses reference using GROBID (at https://grobid.qa.fatcat.wiki),
+    # then queries Elasticsearch (at https://search.fatcat.wiki),
+    # then scores candidates against latest catalog record fetched from
+    #  https://api.fatcat.wiki
+    best_match = closest_fuzzy_unstructured_match(
+        """Cunningham HB, Weis JJ, Taveras LR, Huerta S. Mesh migration following abdominal hernia repair: a comprehensive review. Hernia. 2019 Apr;23(2):235-243. doi: 10.1007/s10029-019-01898-9. Epub 2019 Jan 30. PMID: 30701369.""",
+        es_client=es_client)
+    print(best_match)
+    # FuzzyReleaseMatchResult(status=<Status.EXACT: 'exact'>, reason=<Reason.DOI: 'doi'>, release={...})
+    # same as above, but without the GROBID parsing, and returns multiple results
+    matches = close_fuzzy_biblio_matches(
+        dict(
+            title="Mesh migration following abdominal hernia repair: a comprehensive review",
+            first_author="Cunningham",
+            year=2019,
+            journal="Hernia",
+        ),
+        es_client=es_client,
+    )
+A CLI tool is included for processing records in UNIX stdin/stdout pipelines:
+    # print usage
+    python -m fuzzycat
+## Features and Use-Cases
+The [refcat project](https://gitlab.com/internetarchive/refcat) builds on top
+of this library to build a citation graph by processing billions of structured
+and unstructured reference records extracted from scholarly papers (note: jfor
+performance critical parts, some code has been ported to Go, albeit the test
+suite is shared between the Python and Go implementations).
+Automated imports of metadata records into the fatcat catalog use fuzzycat to
+filter new metadata which look like duplicates of existing records from other
+sources.
+In conjunction with standard command-line tools (like `sort`), fatcat bulk
+metadata snapshots can be clustered and reduced into groups to flag duplicate
+records for merging.
+Extracted reference strings from any source (webpages, books, papers, wikis,
+databases, etc) can be resolved against the fatcat catalog of scholarly papers.
+## Support and Acknowledgements
+Work on this software received support from the Andrew W. Mellon Foundation
+through multiple phases of the ["Ensuring the Persistent Access of Open Access
+Journal Literature"](https://mellon.org/grants/grants-database/advanced-search/?amount-low=&amount-high=&year-start=&year-end=&city=&state=&country=&q=%22Ensuring+the+Persistent+Access%22&per_page=25) project (see [original announcement](http://blog.archive.org/2018/03/05/andrew-w-mellon-foundation-awards-grant-to-the-internet-archive-for-long-tail-journal-preservation/)).
+Additional acknowledgements [at fatcat.wiki](https://fatcat.wiki/about).
+
+%package -n python3-fuzzycat
+Summary:	Fuzzy matching utilities for scholarly metadata
+Provides:	python-fuzzycat
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-fuzzycat
+![https://pypi.org/project/fuzzycat/](https://img.shields.io/pypi/v/fuzzycat?style=flat-square)
+This Python library contains routines for finding near-duplicate bibliographic
+entities (primarily research papers), and estimating whether two metadata
+records describe the same work (or variations of the same work). Some routines
+are designed to work "offline" with batches of billions of sorted metadata
+records, and others are designed to work "online" making queries against hosted
+web services and catalogs.
+`fuzzycat` was originally developed by Martin Czygan at the Internet Archive,
+and is used in the construction of a [citation
+graph](https://gitlab.com/internetarchive/refcat) and to identify duplicate
+records in the [fatcat.wiki](https://fatcat.wiki) catalog and
+[scholar.archive.org](https://scholar.archive.org) search index.
+**DISCLAIMER:** this tool is still under development, as indicated by the "0"
+major version. The interface, semantics, and behavior are likely to be tweaked.
+## Quickstart
+Inside a `virtualenv` (or similar), install with [pip](https://pypi.org/project/pip/):
+```
+pip install fuzzycat
+```
+The `fuzzycat.simple` module contains high-level helpers which query Internet
+Archive hosted services:
+    import elasticsearch
+    from fuzzycat.simple import *
+    es_client = elasticsearch.Elasticsearch("https://search.fatcat.wiki:443")
+    # parses reference using GROBID (at https://grobid.qa.fatcat.wiki),
+    # then queries Elasticsearch (at https://search.fatcat.wiki),
+    # then scores candidates against latest catalog record fetched from
+    #  https://api.fatcat.wiki
+    best_match = closest_fuzzy_unstructured_match(
+        """Cunningham HB, Weis JJ, Taveras LR, Huerta S. Mesh migration following abdominal hernia repair: a comprehensive review. Hernia. 2019 Apr;23(2):235-243. doi: 10.1007/s10029-019-01898-9. Epub 2019 Jan 30. PMID: 30701369.""",
+        es_client=es_client)
+    print(best_match)
+    # FuzzyReleaseMatchResult(status=<Status.EXACT: 'exact'>, reason=<Reason.DOI: 'doi'>, release={...})
+    # same as above, but without the GROBID parsing, and returns multiple results
+    matches = close_fuzzy_biblio_matches(
+        dict(
+            title="Mesh migration following abdominal hernia repair: a comprehensive review",
+            first_author="Cunningham",
+            year=2019,
+            journal="Hernia",
+        ),
+        es_client=es_client,
+    )
+A CLI tool is included for processing records in UNIX stdin/stdout pipelines:
+    # print usage
+    python -m fuzzycat
+## Features and Use-Cases
+The [refcat project](https://gitlab.com/internetarchive/refcat) builds on top
+of this library to build a citation graph by processing billions of structured
+and unstructured reference records extracted from scholarly papers (note: jfor
+performance critical parts, some code has been ported to Go, albeit the test
+suite is shared between the Python and Go implementations).
+Automated imports of metadata records into the fatcat catalog use fuzzycat to
+filter new metadata which look like duplicates of existing records from other
+sources.
+In conjunction with standard command-line tools (like `sort`), fatcat bulk
+metadata snapshots can be clustered and reduced into groups to flag duplicate
+records for merging.
+Extracted reference strings from any source (webpages, books, papers, wikis,
+databases, etc) can be resolved against the fatcat catalog of scholarly papers.
+## Support and Acknowledgements
+Work on this software received support from the Andrew W. Mellon Foundation
+through multiple phases of the ["Ensuring the Persistent Access of Open Access
+Journal Literature"](https://mellon.org/grants/grants-database/advanced-search/?amount-low=&amount-high=&year-start=&year-end=&city=&state=&country=&q=%22Ensuring+the+Persistent+Access%22&per_page=25) project (see [original announcement](http://blog.archive.org/2018/03/05/andrew-w-mellon-foundation-awards-grant-to-the-internet-archive-for-long-tail-journal-preservation/)).
+Additional acknowledgements [at fatcat.wiki](https://fatcat.wiki/about).
+
+%package help
+Summary:	Development documents and examples for fuzzycat
+Provides:	python3-fuzzycat-doc
+%description help
+![https://pypi.org/project/fuzzycat/](https://img.shields.io/pypi/v/fuzzycat?style=flat-square)
+This Python library contains routines for finding near-duplicate bibliographic
+entities (primarily research papers), and estimating whether two metadata
+records describe the same work (or variations of the same work). Some routines
+are designed to work "offline" with batches of billions of sorted metadata
+records, and others are designed to work "online" making queries against hosted
+web services and catalogs.
+`fuzzycat` was originally developed by Martin Czygan at the Internet Archive,
+and is used in the construction of a [citation
+graph](https://gitlab.com/internetarchive/refcat) and to identify duplicate
+records in the [fatcat.wiki](https://fatcat.wiki) catalog and
+[scholar.archive.org](https://scholar.archive.org) search index.
+**DISCLAIMER:** this tool is still under development, as indicated by the "0"
+major version. The interface, semantics, and behavior are likely to be tweaked.
+## Quickstart
+Inside a `virtualenv` (or similar), install with [pip](https://pypi.org/project/pip/):
+```
+pip install fuzzycat
+```
+The `fuzzycat.simple` module contains high-level helpers which query Internet
+Archive hosted services:
+    import elasticsearch
+    from fuzzycat.simple import *
+    es_client = elasticsearch.Elasticsearch("https://search.fatcat.wiki:443")
+    # parses reference using GROBID (at https://grobid.qa.fatcat.wiki),
+    # then queries Elasticsearch (at https://search.fatcat.wiki),
+    # then scores candidates against latest catalog record fetched from
+    #  https://api.fatcat.wiki
+    best_match = closest_fuzzy_unstructured_match(
+        """Cunningham HB, Weis JJ, Taveras LR, Huerta S. Mesh migration following abdominal hernia repair: a comprehensive review. Hernia. 2019 Apr;23(2):235-243. doi: 10.1007/s10029-019-01898-9. Epub 2019 Jan 30. PMID: 30701369.""",
+        es_client=es_client)
+    print(best_match)
+    # FuzzyReleaseMatchResult(status=<Status.EXACT: 'exact'>, reason=<Reason.DOI: 'doi'>, release={...})
+    # same as above, but without the GROBID parsing, and returns multiple results
+    matches = close_fuzzy_biblio_matches(
+        dict(
+            title="Mesh migration following abdominal hernia repair: a comprehensive review",
+            first_author="Cunningham",
+            year=2019,
+            journal="Hernia",
+        ),
+        es_client=es_client,
+    )
+A CLI tool is included for processing records in UNIX stdin/stdout pipelines:
+    # print usage
+    python -m fuzzycat
+## Features and Use-Cases
+The [refcat project](https://gitlab.com/internetarchive/refcat) builds on top
+of this library to build a citation graph by processing billions of structured
+and unstructured reference records extracted from scholarly papers (note: jfor
+performance critical parts, some code has been ported to Go, albeit the test
+suite is shared between the Python and Go implementations).
+Automated imports of metadata records into the fatcat catalog use fuzzycat to
+filter new metadata which look like duplicates of existing records from other
+sources.
+In conjunction with standard command-line tools (like `sort`), fatcat bulk
+metadata snapshots can be clustered and reduced into groups to flag duplicate
+records for merging.
+Extracted reference strings from any source (webpages, books, papers, wikis,
+databases, etc) can be resolved against the fatcat catalog of scholarly papers.
+## Support and Acknowledgements
+Work on this software received support from the Andrew W. Mellon Foundation
+through multiple phases of the ["Ensuring the Persistent Access of Open Access
+Journal Literature"](https://mellon.org/grants/grants-database/advanced-search/?amount-low=&amount-high=&year-start=&year-end=&city=&state=&country=&q=%22Ensuring+the+Persistent+Access%22&per_page=25) project (see [original announcement](http://blog.archive.org/2018/03/05/andrew-w-mellon-foundation-awards-grant-to-the-internet-archive-for-long-tail-journal-preservation/)).
+Additional acknowledgements [at fatcat.wiki](https://fatcat.wiki/about).
+
+%prep
+%autosetup -n fuzzycat-0.1.23
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-fuzzycat -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.23-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..a61e891
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+c71c31be6f7a156c320c878de60d5214  fuzzycat-0.1.23.tar.gz