summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-05 09:26:43 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-05 09:26:43 +0000
commit0c875aac2d881cdf66d55d00915c3d2cdd6438cc (patch)
treea11d9b28f934d163b43b9254567723df3c0e55aa
parent00dadb45576d679ad19ae3fac29813595f63f3b4 (diff)
automatic import of python-ahocorasick-rsopeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-ahocorasick-rs.spec683
-rw-r--r--sources1
3 files changed, 685 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..b8ac522 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/ahocorasick_rs-0.14.0.tar.gz
diff --git a/python-ahocorasick-rs.spec b/python-ahocorasick-rs.spec
new file mode 100644
index 0000000..d70aa69
--- /dev/null
+++ b/python-ahocorasick-rs.spec
@@ -0,0 +1,683 @@
+%global _empty_manifest_terminate_build 0
+Name: python-ahocorasick-rs
+Version: 0.14.0
+Release: 1
+Summary: Search a string for multiple substrings at once
+License: Apache 2.0
+URL: https://github.com/G-Research/ahocorasick_rs
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/00/3b/a44d5ff4347bffa859f92a5cc7137e49658cf1c5d37c3b69e5413c135023/ahocorasick_rs-0.14.0.tar.gz
+
+
+%description
+# ahocorasick_rs: Quickly search for multiple substrings at once
+
+`ahocorasick_rs` allows you to search for multiple substrings ("patterns") in a given string ("haystack") using variations of the [Aho-Corasick algorithm](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm).
+
+In particular, it's implemented as a wrapper of the Rust [`aho-corasick`](https://docs.rs/aho-corasick/) library, and provides a faster alternative to the [`pyahocorasick`](https://pyahocorasick.readthedocs.io/) library.
+
+The specific use case is searching for large numbers of patterns (in the thousands) where the Rust library's DFA-based state machine allows for faster matching.
+
+Found any problems or have any questions? [File an issue on the GitHub project](https://github.com/G-Research/ahocorasick_rs).
+
+* [Quickstart](#quickstart)
+* [Additional configuration](#configuration)
+* [Implementation details](#implementation)
+* [Benchmarks](#benchmarks)
+
+## Quickstart <a name="quickstart"></a>
+
+The `ahocorasick_rs` library allows you to search for multiple strings ("patterns") within a haystack.
+For example, let's install the library:
+
+```shell-session
+$ pip install ahocorasick-rs
+```
+
+Then, we can construct a `AhoCorasick` object:
+
+```python
+>>> import ahocorasick_rs
+>>> patterns = ["hello", "world", "fish"]
+>>> haystack = "this is my first hello world. hello!"
+>>> ac = ahocorasick_rs.AhoCorasick(patterns)
+```
+
+`AhoCorasick.find_matches_as_indexes()` returns a list of tuples, each tuple being:
+
+1. The index of the found pattern inside the list of patterns.
+2. The start index of the pattern inside the haystack.
+3. The end index of the pattern inside the haystack.
+
+```python
+>>> ac.find_matches_as_indexes(haystack)
+[(0, 17, 22), (1, 23, 28), (0, 30, 35)]
+>>> patterns[0], patterns[1], patterns[0]
+('hello', 'world', 'hello')
+>>> haystack[17:22], haystack[23:28], haystack[30:35]
+('hello', 'world', 'hello')
+```
+
+`find_matches_as_strings()` returns a list of found patterns:
+
+```python
+>>> ac.find_matches_as_strings(haystack)
+['hello', 'world', 'hello']
+```
+
+## Additional configuration <a name="configuration"></a>
+
+### Match kind
+
+There are three ways you can configure matching in cases where multiple patterns overlap.
+For a more in-depth explanation, see the [underlying Rust library's documentation of matching](https://docs.rs/aho-corasick/latest/aho_corasick/enum.MatchKind.html).
+
+Assume we have this starting point:
+
+```python
+>>> from ahocorasick_rs import AhoCorasick, MatchKind
+```
+
+#### `Standard` (the default)
+
+This returns the pattern that matches first, semantically-speaking.
+This is the default matching pattern.
+
+```python
+>>> ac AhoCorasick(["disco", "disc", "discontent"])
+>>> ac.find_matches_as_strings("discontent")
+['disc']
+>>> ac = AhoCorasick(["b", "abcd"])
+>>> ac.find_matches_as_strings("abcdef")
+['b']
+```
+
+In this case `disc` will match before `disco` or `discontent`.
+
+Similarly, `b` will match before `abcd` because it ends earlier in the haystack than `abcd` does:
+
+```python
+>>> ac = AhoCorasick(["b", "abcd"])
+>>> ac.find_matches_as_strings("abcdef")
+['b']
+```
+
+#### `LeftmostFirst`
+
+This returns the leftmost-in-the-haystack matching pattern that appears first in _the list of given patterns_.
+That means the order of patterns makes a difference:
+
+```python
+>>> ac = AhoCorasick(["disco", "disc"], matchkind=MatchKind.LeftmostFirst)
+>>> ac.find_matches_as_strings("discontent")
+['disco']
+>>> ac = AhoCorasick(["disc", "disco"], matchkind=MatchKind.LeftmostFirst)
+['disc']
+```
+
+Here we see `abcd` matched first, because it starts before `b`:
+
+```python
+>>> ac = AhoCorasick(["b", "abcd"], matchkind=MatchKind.LeftmostFirst)
+>>> ac.find_matches_as_strings("abcdef")
+['abcd']
+```
+##### `LeftmostLongest`
+
+This returns the leftmost-in-the-haystack matching pattern that is longest:
+
+```python
+>>> ac = AhoCorasick(["disco", "disc", "discontent"], matchkind=MatchKind.LeftmostLongest)
+>>> ac.find_matches_as_strings("discontent")
+['discontent']
+```
+
+### Overlapping matches
+
+You can get all overlapping matches, instead of just one of them, but only if you stick to the default matchkind, `MatchKind.Standard`:
+
+```python
+>>> from ahocorasick_rs import AhoCorasick
+>>> patterns = ["winter", "onte", "disco", "discontent"]
+>>> ac = AhoCorasick(patterns)
+>>> ac.find_matches_as_strings("discontent", overlapping=True)
+['disco', 'onte', 'discontent']
+```
+
+### Trading memory for speed
+
+If you use ``find_matches_as_strings()``, there are two ways strings can be constructed: from the haystack, or by caching the patterns on the object.
+The former takes more work, the latter uses more memory if the patterns would otherwise have been garbage-collected.
+You can control the behavior by using the `store_patterns` keyword argument to `AhoCorasick()`.
+
+* ``AhoCorasick(..., store_patterns=None)``: The default.
+ Use a heuristic (currently, whether the total of pattern string lengths is less than 4096 characters) to decide whether to store patterns or not.
+* ``AhoCorasick(..., store_patterns=True)``: Keep references to the patterns, potentially speeding up ``find_matches_as_strings()`` at the cost of using more memory.
+ If this uses large amounts of memory this might actually slow things down due to pressure on the CPU memory cache, and/or the performance benefit might be overwhelmed by the algorithm's search time.
+* ``AhoCorasick(..., store_patterns=False)``: Don't keep references to the patterns, saving some memory but potentially slowing down ``find_matches_as_strings()``, especially when there are only a small number of patterns and you are searching a small haystack.
+
+### Algorithm implementations: trading construction speed, memory, and performance
+
+You can choose the type of underlying automaton to use, with different performance tradeoffs.
+
+The underlying Rust library supports [four choices](https://docs.rs/aho-corasick/latest/aho_corasick/struct.AhoCorasickBuilder.html#method.kind), which are exposed:
+
+* `None` uses a heuristic to choose the "best" Aho-Corasick implementation for the given patterns.
+* `Implementation.NoncontiguousNFA`: A noncontiguous NFA is the fastest to be built, has moderate memory usage and is typically the slowest to execute a search.
+* `Implementation.ContiguousNFA`: A contiguous NFA is a little slower to build than a noncontiguous NFA, has excellent memory usage and is typically a little slower than a DFA for a search.
+* `Implementation.DFA`: A DFA is very slow to build, uses exorbitant amounts of memory, but will typically execute searches the fastest.
+
+The default choice is `Implementation.DFA` since expensive setup compensated by fast batch operations is the standard Python tradeoff.
+
+```python
+>>> from ahocorasick_rs import AhoCorasick, Implementation
+>>> ac = AhoCorasick(["disco", "disc"], implementation=Implementation.NoncontiguousNFA)
+```
+
+## Implementation details <a name="implementation"></a>
+
+* Matching releases the GIL, to enable concurrency.
+* Not all features from the underlying library are exposed; if you would like additional features, please [file an issue](https://github.com/g-research/ahocorasick_rs/issues/new) or submit a PR.
+
+## Benchmarks <a name="benchmarks"></a>
+
+As with any benchmark, real-world results will differ based on your particular situation.
+If performance is important to your application, measure the alternatives yourself!
+
+### Longer strings and many patterns
+
+This benchmark matches ~4,000 patterns against lines of text that are ~700 characters long.
+Each line matches either zero (90%) or one pattern (10%).
+
+Higher is better; `ahocorasick_rs` is much faster in both cases.
+
+| `find_matches_as_strings` or equivalent | Operations per second |
+|-----------------------------------------|---------------------:|
+| `ahocorasick_rs` longest matching | `436,000` |
+| `pyahocorasick` longest matching | `65,000` |
+| `ahocorasick_rs` overlapping matching | `329,000` |
+| `pyahocorasick` overlapping matching | `76,000` |
+
+### Shorter strings and few patterns
+
+This benchmarks matches ~10 patterns against lines of text that are ~70 characters long.
+Each line matches ~5 patterns.
+
+Higher is better; again, `ahocorasick_rs` is faster for both, though with a smaller margin.
+
+| `find_matches_as_strings` or equivalent | Operations per second |
+|-----------------------------------------|------------------------:|
+| `ahocorasick_rs` longest matching | `1,930,000` |
+| `pyahocorasick` longest matching | `1,120,000` |
+| `ahocorasick_rs` overlapping matching | `1,250,000` |
+| `pyahocorasick` overlapping matching | `880,000` |
+
+
+
+
+%package -n python3-ahocorasick-rs
+Summary: Search a string for multiple substrings at once
+Provides: python-ahocorasick-rs
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+BuildRequires: python3-cffi
+BuildRequires: gcc
+BuildRequires: gdb
+%description -n python3-ahocorasick-rs
+# ahocorasick_rs: Quickly search for multiple substrings at once
+
+`ahocorasick_rs` allows you to search for multiple substrings ("patterns") in a given string ("haystack") using variations of the [Aho-Corasick algorithm](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm).
+
+In particular, it's implemented as a wrapper of the Rust [`aho-corasick`](https://docs.rs/aho-corasick/) library, and provides a faster alternative to the [`pyahocorasick`](https://pyahocorasick.readthedocs.io/) library.
+
+The specific use case is searching for large numbers of patterns (in the thousands) where the Rust library's DFA-based state machine allows for faster matching.
+
+Found any problems or have any questions? [File an issue on the GitHub project](https://github.com/G-Research/ahocorasick_rs).
+
+* [Quickstart](#quickstart)
+* [Additional configuration](#configuration)
+* [Implementation details](#implementation)
+* [Benchmarks](#benchmarks)
+
+## Quickstart <a name="quickstart"></a>
+
+The `ahocorasick_rs` library allows you to search for multiple strings ("patterns") within a haystack.
+For example, let's install the library:
+
+```shell-session
+$ pip install ahocorasick-rs
+```
+
+Then, we can construct a `AhoCorasick` object:
+
+```python
+>>> import ahocorasick_rs
+>>> patterns = ["hello", "world", "fish"]
+>>> haystack = "this is my first hello world. hello!"
+>>> ac = ahocorasick_rs.AhoCorasick(patterns)
+```
+
+`AhoCorasick.find_matches_as_indexes()` returns a list of tuples, each tuple being:
+
+1. The index of the found pattern inside the list of patterns.
+2. The start index of the pattern inside the haystack.
+3. The end index of the pattern inside the haystack.
+
+```python
+>>> ac.find_matches_as_indexes(haystack)
+[(0, 17, 22), (1, 23, 28), (0, 30, 35)]
+>>> patterns[0], patterns[1], patterns[0]
+('hello', 'world', 'hello')
+>>> haystack[17:22], haystack[23:28], haystack[30:35]
+('hello', 'world', 'hello')
+```
+
+`find_matches_as_strings()` returns a list of found patterns:
+
+```python
+>>> ac.find_matches_as_strings(haystack)
+['hello', 'world', 'hello']
+```
+
+## Additional configuration <a name="configuration"></a>
+
+### Match kind
+
+There are three ways you can configure matching in cases where multiple patterns overlap.
+For a more in-depth explanation, see the [underlying Rust library's documentation of matching](https://docs.rs/aho-corasick/latest/aho_corasick/enum.MatchKind.html).
+
+Assume we have this starting point:
+
+```python
+>>> from ahocorasick_rs import AhoCorasick, MatchKind
+```
+
+#### `Standard` (the default)
+
+This returns the pattern that matches first, semantically-speaking.
+This is the default matching pattern.
+
+```python
+>>> ac AhoCorasick(["disco", "disc", "discontent"])
+>>> ac.find_matches_as_strings("discontent")
+['disc']
+>>> ac = AhoCorasick(["b", "abcd"])
+>>> ac.find_matches_as_strings("abcdef")
+['b']
+```
+
+In this case `disc` will match before `disco` or `discontent`.
+
+Similarly, `b` will match before `abcd` because it ends earlier in the haystack than `abcd` does:
+
+```python
+>>> ac = AhoCorasick(["b", "abcd"])
+>>> ac.find_matches_as_strings("abcdef")
+['b']
+```
+
+#### `LeftmostFirst`
+
+This returns the leftmost-in-the-haystack matching pattern that appears first in _the list of given patterns_.
+That means the order of patterns makes a difference:
+
+```python
+>>> ac = AhoCorasick(["disco", "disc"], matchkind=MatchKind.LeftmostFirst)
+>>> ac.find_matches_as_strings("discontent")
+['disco']
+>>> ac = AhoCorasick(["disc", "disco"], matchkind=MatchKind.LeftmostFirst)
+['disc']
+```
+
+Here we see `abcd` matched first, because it starts before `b`:
+
+```python
+>>> ac = AhoCorasick(["b", "abcd"], matchkind=MatchKind.LeftmostFirst)
+>>> ac.find_matches_as_strings("abcdef")
+['abcd']
+```
+##### `LeftmostLongest`
+
+This returns the leftmost-in-the-haystack matching pattern that is longest:
+
+```python
+>>> ac = AhoCorasick(["disco", "disc", "discontent"], matchkind=MatchKind.LeftmostLongest)
+>>> ac.find_matches_as_strings("discontent")
+['discontent']
+```
+
+### Overlapping matches
+
+You can get all overlapping matches, instead of just one of them, but only if you stick to the default matchkind, `MatchKind.Standard`:
+
+```python
+>>> from ahocorasick_rs import AhoCorasick
+>>> patterns = ["winter", "onte", "disco", "discontent"]
+>>> ac = AhoCorasick(patterns)
+>>> ac.find_matches_as_strings("discontent", overlapping=True)
+['disco', 'onte', 'discontent']
+```
+
+### Trading memory for speed
+
+If you use ``find_matches_as_strings()``, there are two ways strings can be constructed: from the haystack, or by caching the patterns on the object.
+The former takes more work, the latter uses more memory if the patterns would otherwise have been garbage-collected.
+You can control the behavior by using the `store_patterns` keyword argument to `AhoCorasick()`.
+
+* ``AhoCorasick(..., store_patterns=None)``: The default.
+ Use a heuristic (currently, whether the total of pattern string lengths is less than 4096 characters) to decide whether to store patterns or not.
+* ``AhoCorasick(..., store_patterns=True)``: Keep references to the patterns, potentially speeding up ``find_matches_as_strings()`` at the cost of using more memory.
+ If this uses large amounts of memory this might actually slow things down due to pressure on the CPU memory cache, and/or the performance benefit might be overwhelmed by the algorithm's search time.
+* ``AhoCorasick(..., store_patterns=False)``: Don't keep references to the patterns, saving some memory but potentially slowing down ``find_matches_as_strings()``, especially when there are only a small number of patterns and you are searching a small haystack.
+
+### Algorithm implementations: trading construction speed, memory, and performance
+
+You can choose the type of underlying automaton to use, with different performance tradeoffs.
+
+The underlying Rust library supports [four choices](https://docs.rs/aho-corasick/latest/aho_corasick/struct.AhoCorasickBuilder.html#method.kind), which are exposed:
+
+* `None` uses a heuristic to choose the "best" Aho-Corasick implementation for the given patterns.
+* `Implementation.NoncontiguousNFA`: A noncontiguous NFA is the fastest to be built, has moderate memory usage and is typically the slowest to execute a search.
+* `Implementation.ContiguousNFA`: A contiguous NFA is a little slower to build than a noncontiguous NFA, has excellent memory usage and is typically a little slower than a DFA for a search.
+* `Implementation.DFA`: A DFA is very slow to build, uses exorbitant amounts of memory, but will typically execute searches the fastest.
+
+The default choice is `Implementation.DFA` since expensive setup compensated by fast batch operations is the standard Python tradeoff.
+
+```python
+>>> from ahocorasick_rs import AhoCorasick, Implementation
+>>> ac = AhoCorasick(["disco", "disc"], implementation=Implementation.NoncontiguousNFA)
+```
+
+## Implementation details <a name="implementation"></a>
+
+* Matching releases the GIL, to enable concurrency.
+* Not all features from the underlying library are exposed; if you would like additional features, please [file an issue](https://github.com/g-research/ahocorasick_rs/issues/new) or submit a PR.
+
+## Benchmarks <a name="benchmarks"></a>
+
+As with any benchmark, real-world results will differ based on your particular situation.
+If performance is important to your application, measure the alternatives yourself!
+
+### Longer strings and many patterns
+
+This benchmark matches ~4,000 patterns against lines of text that are ~700 characters long.
+Each line matches either zero (90%) or one pattern (10%).
+
+Higher is better; `ahocorasick_rs` is much faster in both cases.
+
+| `find_matches_as_strings` or equivalent | Operations per second |
+|-----------------------------------------|---------------------:|
+| `ahocorasick_rs` longest matching | `436,000` |
+| `pyahocorasick` longest matching | `65,000` |
+| `ahocorasick_rs` overlapping matching | `329,000` |
+| `pyahocorasick` overlapping matching | `76,000` |
+
+### Shorter strings and few patterns
+
+This benchmarks matches ~10 patterns against lines of text that are ~70 characters long.
+Each line matches ~5 patterns.
+
+Higher is better; again, `ahocorasick_rs` is faster for both, though with a smaller margin.
+
+| `find_matches_as_strings` or equivalent | Operations per second |
+|-----------------------------------------|------------------------:|
+| `ahocorasick_rs` longest matching | `1,930,000` |
+| `pyahocorasick` longest matching | `1,120,000` |
+| `ahocorasick_rs` overlapping matching | `1,250,000` |
+| `pyahocorasick` overlapping matching | `880,000` |
+
+
+
+
+%package help
+Summary: Development documents and examples for ahocorasick-rs
+Provides: python3-ahocorasick-rs-doc
+%description help
+# ahocorasick_rs: Quickly search for multiple substrings at once
+
+`ahocorasick_rs` allows you to search for multiple substrings ("patterns") in a given string ("haystack") using variations of the [Aho-Corasick algorithm](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm).
+
+In particular, it's implemented as a wrapper of the Rust [`aho-corasick`](https://docs.rs/aho-corasick/) library, and provides a faster alternative to the [`pyahocorasick`](https://pyahocorasick.readthedocs.io/) library.
+
+The specific use case is searching for large numbers of patterns (in the thousands) where the Rust library's DFA-based state machine allows for faster matching.
+
+Found any problems or have any questions? [File an issue on the GitHub project](https://github.com/G-Research/ahocorasick_rs).
+
+* [Quickstart](#quickstart)
+* [Additional configuration](#configuration)
+* [Implementation details](#implementation)
+* [Benchmarks](#benchmarks)
+
+## Quickstart <a name="quickstart"></a>
+
+The `ahocorasick_rs` library allows you to search for multiple strings ("patterns") within a haystack.
+For example, let's install the library:
+
+```shell-session
+$ pip install ahocorasick-rs
+```
+
+Then, we can construct a `AhoCorasick` object:
+
+```python
+>>> import ahocorasick_rs
+>>> patterns = ["hello", "world", "fish"]
+>>> haystack = "this is my first hello world. hello!"
+>>> ac = ahocorasick_rs.AhoCorasick(patterns)
+```
+
+`AhoCorasick.find_matches_as_indexes()` returns a list of tuples, each tuple being:
+
+1. The index of the found pattern inside the list of patterns.
+2. The start index of the pattern inside the haystack.
+3. The end index of the pattern inside the haystack.
+
+```python
+>>> ac.find_matches_as_indexes(haystack)
+[(0, 17, 22), (1, 23, 28), (0, 30, 35)]
+>>> patterns[0], patterns[1], patterns[0]
+('hello', 'world', 'hello')
+>>> haystack[17:22], haystack[23:28], haystack[30:35]
+('hello', 'world', 'hello')
+```
+
+`find_matches_as_strings()` returns a list of found patterns:
+
+```python
+>>> ac.find_matches_as_strings(haystack)
+['hello', 'world', 'hello']
+```
+
+## Additional configuration <a name="configuration"></a>
+
+### Match kind
+
+There are three ways you can configure matching in cases where multiple patterns overlap.
+For a more in-depth explanation, see the [underlying Rust library's documentation of matching](https://docs.rs/aho-corasick/latest/aho_corasick/enum.MatchKind.html).
+
+Assume we have this starting point:
+
+```python
+>>> from ahocorasick_rs import AhoCorasick, MatchKind
+```
+
+#### `Standard` (the default)
+
+This returns the pattern that matches first, semantically-speaking.
+This is the default matching pattern.
+
+```python
+>>> ac AhoCorasick(["disco", "disc", "discontent"])
+>>> ac.find_matches_as_strings("discontent")
+['disc']
+>>> ac = AhoCorasick(["b", "abcd"])
+>>> ac.find_matches_as_strings("abcdef")
+['b']
+```
+
+In this case `disc` will match before `disco` or `discontent`.
+
+Similarly, `b` will match before `abcd` because it ends earlier in the haystack than `abcd` does:
+
+```python
+>>> ac = AhoCorasick(["b", "abcd"])
+>>> ac.find_matches_as_strings("abcdef")
+['b']
+```
+
+#### `LeftmostFirst`
+
+This returns the leftmost-in-the-haystack matching pattern that appears first in _the list of given patterns_.
+That means the order of patterns makes a difference:
+
+```python
+>>> ac = AhoCorasick(["disco", "disc"], matchkind=MatchKind.LeftmostFirst)
+>>> ac.find_matches_as_strings("discontent")
+['disco']
+>>> ac = AhoCorasick(["disc", "disco"], matchkind=MatchKind.LeftmostFirst)
+['disc']
+```
+
+Here we see `abcd` matched first, because it starts before `b`:
+
+```python
+>>> ac = AhoCorasick(["b", "abcd"], matchkind=MatchKind.LeftmostFirst)
+>>> ac.find_matches_as_strings("abcdef")
+['abcd']
+```
+##### `LeftmostLongest`
+
+This returns the leftmost-in-the-haystack matching pattern that is longest:
+
+```python
+>>> ac = AhoCorasick(["disco", "disc", "discontent"], matchkind=MatchKind.LeftmostLongest)
+>>> ac.find_matches_as_strings("discontent")
+['discontent']
+```
+
+### Overlapping matches
+
+You can get all overlapping matches, instead of just one of them, but only if you stick to the default matchkind, `MatchKind.Standard`:
+
+```python
+>>> from ahocorasick_rs import AhoCorasick
+>>> patterns = ["winter", "onte", "disco", "discontent"]
+>>> ac = AhoCorasick(patterns)
+>>> ac.find_matches_as_strings("discontent", overlapping=True)
+['disco', 'onte', 'discontent']
+```
+
+### Trading memory for speed
+
+If you use ``find_matches_as_strings()``, there are two ways strings can be constructed: from the haystack, or by caching the patterns on the object.
+The former takes more work, the latter uses more memory if the patterns would otherwise have been garbage-collected.
+You can control the behavior by using the `store_patterns` keyword argument to `AhoCorasick()`.
+
+* ``AhoCorasick(..., store_patterns=None)``: The default.
+ Use a heuristic (currently, whether the total of pattern string lengths is less than 4096 characters) to decide whether to store patterns or not.
+* ``AhoCorasick(..., store_patterns=True)``: Keep references to the patterns, potentially speeding up ``find_matches_as_strings()`` at the cost of using more memory.
+ If this uses large amounts of memory this might actually slow things down due to pressure on the CPU memory cache, and/or the performance benefit might be overwhelmed by the algorithm's search time.
+* ``AhoCorasick(..., store_patterns=False)``: Don't keep references to the patterns, saving some memory but potentially slowing down ``find_matches_as_strings()``, especially when there are only a small number of patterns and you are searching a small haystack.
+
+### Algorithm implementations: trading construction speed, memory, and performance
+
+You can choose the type of underlying automaton to use, with different performance tradeoffs.
+
+The underlying Rust library supports [four choices](https://docs.rs/aho-corasick/latest/aho_corasick/struct.AhoCorasickBuilder.html#method.kind), which are exposed:
+
+* `None` uses a heuristic to choose the "best" Aho-Corasick implementation for the given patterns.
+* `Implementation.NoncontiguousNFA`: A noncontiguous NFA is the fastest to be built, has moderate memory usage and is typically the slowest to execute a search.
+* `Implementation.ContiguousNFA`: A contiguous NFA is a little slower to build than a noncontiguous NFA, has excellent memory usage and is typically a little slower than a DFA for a search.
+* `Implementation.DFA`: A DFA is very slow to build, uses exorbitant amounts of memory, but will typically execute searches the fastest.
+
+The default choice is `Implementation.DFA` since expensive setup compensated by fast batch operations is the standard Python tradeoff.
+
+```python
+>>> from ahocorasick_rs import AhoCorasick, Implementation
+>>> ac = AhoCorasick(["disco", "disc"], implementation=Implementation.NoncontiguousNFA)
+```
+
+## Implementation details <a name="implementation"></a>
+
+* Matching releases the GIL, to enable concurrency.
+* Not all features from the underlying library are exposed; if you would like additional features, please [file an issue](https://github.com/g-research/ahocorasick_rs/issues/new) or submit a PR.
+
+## Benchmarks <a name="benchmarks"></a>
+
+As with any benchmark, real-world results will differ based on your particular situation.
+If performance is important to your application, measure the alternatives yourself!
+
+### Longer strings and many patterns
+
+This benchmark matches ~4,000 patterns against lines of text that are ~700 characters long.
+Each line matches either zero (90%) or one pattern (10%).
+
+Higher is better; `ahocorasick_rs` is much faster in both cases.
+
+| `find_matches_as_strings` or equivalent | Operations per second |
+|-----------------------------------------|---------------------:|
+| `ahocorasick_rs` longest matching | `436,000` |
+| `pyahocorasick` longest matching | `65,000` |
+| `ahocorasick_rs` overlapping matching | `329,000` |
+| `pyahocorasick` overlapping matching | `76,000` |
+
+### Shorter strings and few patterns
+
+This benchmarks matches ~10 patterns against lines of text that are ~70 characters long.
+Each line matches ~5 patterns.
+
+Higher is better; again, `ahocorasick_rs` is faster for both, though with a smaller margin.
+
+| `find_matches_as_strings` or equivalent | Operations per second |
+|-----------------------------------------|------------------------:|
+| `ahocorasick_rs` longest matching | `1,930,000` |
+| `pyahocorasick` longest matching | `1,120,000` |
+| `ahocorasick_rs` overlapping matching | `1,250,000` |
+| `pyahocorasick` overlapping matching | `880,000` |
+
+
+
+
+%prep
+%autosetup -n ahocorasick-rs-0.14.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-ahocorasick-rs -f filelist.lst
+%dir %{python3_sitearch}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.14.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..d064a83
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+c14bb19d76fb30b31783a55809d9dec0 ahocorasick_rs-0.14.0.tar.gz