automatic import of python-drain3

author: CoprDistGit <infra@openeuler.org> 2023-04-11 15:25:26 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-04-11 15:25:26 +0000
commit: c12a7daa9cd0bea2d662f5e94f7e6b6accd281f1 (patch)
tree: 7858e92ee14bd149083e190e512110677170a707
parent: 2eb44a4c9e4136d3a61b1932616d483a091b4bb8 (diff)
3 files changed, 1514 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..dfb5fc4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/drain3-0.9.11.tar.gz
diff --git a/python-drain3.spec b/python-drain3.spec
new file mode 100644
index 0000000..c64ee5d
--- /dev/null
+++ b/python-drain3.spec
@@ -0,0 +1,1512 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-drain3
+Version:	0.9.11
+Release:	1
+Summary:	Persistent & streaming log template miner
+License:	MIT
+URL:		https://github.com/IBM/Drain3
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/dc/83/4da2d3a11b5e0edf1a4f4c0c2dd42126d2eb1f31c733967edd3dfac1af94/drain3-0.9.11.tar.gz
+BuildArch:	noarch
+
+
+%description
+# Drain3
+
+## Introduction
+
+Drain3 is an online log template miner that can extract templates (clusters) from a stream of log messages in a timely
+manner. It employs a parse tree with fixed depth to guide the log group search process, which effectively avoids
+constructing a very deep and unbalanced tree.
+
+Drain3 continuously learns on-the-fly and extracts log templates from raw log entries.
+
+#### Example:
+
+For the input:
+
+```
+connected to 10.0.0.1
+connected to 192.168.0.1
+Hex number 0xDEADBEAF
+user davidoh logged in
+user eranr logged in
+```
+
+Drain3 extracts the following templates:
+
+```
+ID=1     : size=2         : connected to <:IP:>
+ID=2     : size=1         : Hex number <:HEX:>
+ID=3     : size=2         : user <:*:> logged in
+```
+
+Full sample program output:
+
+```
+Starting Drain3 template miner
+Checking for saved state
+Saved state not found
+Drain3 started with 'FILE' persistence
+Starting training mode. Reading from std-in ('q' to finish)
+> connected to 10.0.0.1
+Saving state of 1 clusters with 1 messages, 528 bytes, reason: cluster_created (1)
+{"change_type": "cluster_created", "cluster_id": 1, "cluster_size": 1, "template_mined": "connected to <:IP:>", "cluster_count": 1}
+Parameters: [ExtractedParameter(value='10.0.0.1', mask_name='IP')]
+> connected to 192.168.0.1
+{"change_type": "none", "cluster_id": 1, "cluster_size": 2, "template_mined": "connected to <:IP:>", "cluster_count": 1}
+Parameters: [ExtractedParameter(value='192.168.0.1', mask_name='IP')]
+> Hex number 0xDEADBEAF
+Saving state of 2 clusters with 3 messages, 584 bytes, reason: cluster_created (2)
+{"change_type": "cluster_created", "cluster_id": 2, "cluster_size": 1, "template_mined": "Hex number <:HEX:>", "cluster_count": 2}
+Parameters: [ExtractedParameter(value='0xDEADBEAF', mask_name='HEX')]
+> user davidoh logged in
+Saving state of 3 clusters with 4 messages, 648 bytes, reason: cluster_created (3)
+{"change_type": "cluster_created", "cluster_id": 3, "cluster_size": 1, "template_mined": "user davidoh logged in", "cluster_count": 3}
+Parameters: []
+> user eranr logged in
+Saving state of 3 clusters with 5 messages, 644 bytes, reason: cluster_template_changed (3)
+{"change_type": "cluster_template_changed", "cluster_id": 3, "cluster_size": 2, "template_mined": "user <:*:> logged in", "cluster_count": 3}
+Parameters: [ExtractedParameter(value='eranr', mask_name='*')]
+> q
+Training done. Mined clusters:
+ID=1     : size=2         : connected to <:IP:>
+ID=2     : size=1         : Hex number <:HEX:>
+ID=3     : size=2         : user <:*:> logged in
+```
+
+This project is an upgrade of the original [Drain](https://github.com/logpai/logparser/blob/master/logparser/Drain)
+project by LogPAI from Python 2.7 to Python 3.6 or later with additional features and bug-fixes.
+
+Read more information about Drain from the following paper:
+
+- Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R.
+  Lyu. [Drain: An Online Log Parsing Approach with Fixed Depth Tree](http://jiemingzhu.github.io/pub/pjhe_icws2017.pdf),
+  Proceedings of the 24th International Conference on Web Services (ICWS), 2017.
+
+A Drain3 use case is presented in this blog
+post: [Use open source Drain3 log-template mining project to monitor for network outages](https://developer.ibm.com/blogs/how-mining-log-templates-can-help-ai-ops-in-cloud-scale-data-centers)
+.
+
+#### New features
+
+- [**Persistence**](#persistence). Save and load Drain state into an [Apache Kafka](https://kafka.apache.org)
+  topic, [Redis](https://redis.io/) or a file.
+- **Streaming**. Support feeding Drain with messages one-be-one.
+- [**Masking**](#masking). Replace some message parts (e.g numbers, IPs, emails) with wildcards. This improves the
+  accuracy of template mining.
+- [**Packaging**](#installation). As a pip package.
+- [**Configuration**](#configuration). Support for configuring Drain3 using an `.ini` file or a configuration object. 
+- [**Memory efficiency**](#memory-efficiency). Decrease the memory footprint of internal data structures and introduce
+  cache to control max memory consumed (thanks to @StanislawSwierc)
+- [**Inference mode**](#training-vs-inference-modes). In case you want to separate training and inference phase, Drain3
+  provides a function for *fast* matching against already-learned clusters (templates) only, without the usage of
+  regular expressions.
+- [**Parameter extraction**](#parameter-extraction). Accurate extraction of the variable parts from a log message as an
+  ordered list, based on its mined template and the defined masking instructions (thanks to @Impelon).
+
+#### Expected Input and Output
+
+Although Drain3 can be ingested with full raw log message, template mining accuracy can be improved if you feed it with
+only the unstructured free-text portion of log messages, by first removing structured parts like timestamp, hostname.
+severity, etc.
+
+The output is a dictionary with the following fields:
+
+- `change_type` - indicates either if a new template was identified, an existing template was changed or message added
+  to an existing cluster.
+- `cluster_id` - Sequential ID of the cluster that the log belongs to.
+- `cluster_size`- The size (message count) of the cluster that the log belongs to.
+- `cluster_count` - Count clusters seen so far.
+- `template_mined`- the last template of above cluster_id.
+
+## Configuration
+
+Drain3 is configured using [configparser](https://docs.python.org/3.4/library/configparser.html). By default, config
+filename is `drain3.ini` in working directory. It can also be configured passing
+a [TemplateMinerConfig](drain3/template_miner_config.py) object to the [TemplateMiner](drain3/template_miner.py)
+constructor.
+
+Primary configuration parameters:
+
+- `[DRAIN]/sim_th` - similarity threshold. if percentage of similar tokens for a log message is below this number, a new
+  log cluster will be created (default 0.4)
+- `[DRAIN]/depth` - max depth levels of log clusters. Minimum is 2. (default 4)
+- `[DRAIN]/max_children` - max number of children of an internal node (default 100)
+- `[DRAIN]/max_clusters` - max number of tracked clusters (unlimited by default). When this number is reached, model
+  starts replacing old clusters with a new ones according to the LRU cache eviction policy.
+- `[DRAIN]/extra_delimiters` - delimiters to apply when splitting log message into words (in addition to whitespace) (
+  default none). Format is a Python list e.g. `['_', ':']`.
+- `[MASKING]/masking` - parameters masking - in json format (default "")
+- `[MASKING]/mask_prefix` & `[MASKING]/mask_suffix` - the wrapping of identified parameters in templates. By default, it
+  is `<` and `>` respectively.
+- `[SNAPSHOT]/snapshot_interval_minutes` - time interval for new snapshots (default 1)
+- `[SNAPSHOT]/compress_state` - whether to compress the state before saving it. This can be useful when using Kafka
+  persistence.
+
+## Masking
+
+This feature allows masking of specific variable parts in log message with keywords, prior to passing to Drain. A
+well-defined masking can improve template mining accuracy.
+
+Template parameters that do not match any custom mask in the preliminary masking phase are replaced with `<*>` by Drain
+core.
+
+Use a list of regular expressions in the configuration file with the format `{'regex_pattern', 'mask_with'}` to set
+custom masking.
+
+For example, following masking instructions in `drain3.ini` will mask IP addresses and integers:
+
+```
+[MASKING]
+masking = [
+          {"regex_pattern":"((?<=[^A-Za-z0-9])|^)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})((?=[^A-Za-z0-9])|$)", "mask_with": "IP"},
+          {"regex_pattern":"((?<=[^A-Za-z0-9])|^)([\\-\\+]?\\d+)((?=[^A-Za-z0-9])|$)", "mask_with": "NUM"},
+          ]
+    ]
+```
+
+## Persistence
+
+The persistence feature saves and loads a snapshot of Drain3 state in a (compressed) json format. This feature adds
+restart resiliency to Drain allowing continuation of activity and maintain learned knowledge across restarts.
+
+Drain3 state includes the search tree and all the clusters that were identified up until snapshot time.
+
+The snapshot also persist number of log messages matched each cluster, and it's `cluster_id`.
+
+An example of a snapshot:
+
+```json
+{
+  "clusters": [
+    {
+      "cluster_id": 1,
+      "log_template_tokens": [
+        "aa",
+        "aa",
+        "<*>"
+      ],
+      "py/object": "drain3_core.LogCluster",
+      "size": 2
+    },
+    {
+      "cluster_id": 2,
+      "log_template_tokens": [
+        "My",
+        "IP",
+        "is",
+        "<IP>"
+      ],
+      "py/object": "drain3_core.LogCluster",
+      "size": 1
+    }
+  ]
+}
+```
+
+This example snapshot persist two clusters with the templates:
+
+`["aa", "aa", "<*>"]` - occurs twice
+
+`["My", "IP", "is", "<IP>"]` - occurs once
+
+Snapshots are created in the following events:
+
+- `cluster_created` - in any new template
+- `cluster_template_changed` - in any update of a template
+- `periodic` - after n minutes from the last snapshot. This is intended to save cluster sizes even if no new template
+  was identified.
+
+Drain3 currently supports the following persistence modes:
+
+- **Kafka** - The snapshot is saved in a dedicated topic used only for snapshots - the last message in this topic is the
+  last snapshot that will be loaded after restart. For Kafka persistence, you need to provide: `topic_name`. You may
+  also provide other `kwargs` that are supported by `kafka.KafkaConsumer` and `kafka.Producer` e.g `bootstrap_servers`
+  to change Kafka endpoint (default is `localhost:9092`).
+
+- **Redis** - The snapshot is saved to a key in Redis database (contributed by @matabares).
+
+- **File** - The snapshot is saved to a file.
+
+- **Memory** - The snapshot is saved an in-memory object.
+
+- **None** - No persistence.
+
+Drain3 persistence modes can be easily extended to another medium / database by inheriting
+the [PersistenceHandler](drain3/persistence_handler.py) class.
+
+## Training vs. Inference modes
+
+In some use-cases, it is required to separate training and inference phases.
+
+In training phase you should call `template_miner.add_log_message(log_line)`. This will match log line against an
+existing cluster (if similarity is above threshold) or create a new cluster. It may also change the template of an
+existing cluster.
+
+In inference mode you should call `template_miner.match(log_line)`. This will match log line against previously learned
+clusters only. No new clusters are created and templates of existing clusters are not changed. Match to existing cluster
+has to be perfect, otherwise `None` is returned. You can use persistence option to load previously trained clusters
+before inference.
+
+## Memory efficiency
+
+This feature limits the max memory used by the model. It is particularly important for large and possibly unbounded log
+streams. This feature is controlled by the `max_clusters` parameter, which sets the max number of clusters/templates
+trarcked by the model. When the limit is reached, new templates start to replace the old ones according to the Least
+Recently Used (LRU) eviction policy. This makes the model adapt quickly to the most recent templates in the log stream.
+
+## Parameter Extraction
+
+Drain3 supports retrieving an ordered list of variables in a log message, after its template was mined. Each parameter
+is accompanied by the name of the mask that was matched, or `*` for the catch-all mask.
+
+Parameter extraction is performed by generating a regular expression that matches the template and then applying it on
+the log message. When `exact_matching` is enabled (by default), the generated regex included the regular expression
+defined in relevant masking instructions. If there are multiple masking instructions with the same name, either match
+can satisfy the regex. It is possible to disable exact matching so that every variable is matched against a
+non-whitespace character sequence. This may improve performance on expanse of accuracy.
+
+Parameter extraction regexes generated per template are cached by default, to improve performance. You can control cache
+size with the ` MASKING/parameter_extraction_cache_capacity` configuration parameter.
+
+Sample usage:
+
+```python
+result = template_miner.add_log_message(log_line)
+params = template_miner.extract_parameters(
+    result["template_mined"], log_line, exact_matching=True)
+```
+
+For the input `"user johndoe logged in 11 minuts ago"`, the template would be:
+
+```
+"user <:*:> logged in <:NUM:> minuts ago"
+```
+
+... and the extracted parameters:
+
+```
+[
+  ExtractedParameter(value='johndoe', mask_name='*'), 
+  ExtractedParameter(value='11', mask_name='NUM')
+]
+```
+
+## Installation
+
+Drain3 is available from [PyPI](https://pypi.org/project/drain3). To install use `pip`:
+
+```
+pip3 install drain3
+```
+
+Note: If you decide to use Kafka or Redis persistence, you should install relevant client library explicitly, since it
+is declared as an extra (optional) dependency, by either:
+
+```
+pip3 install kafka-python
+```
+
+-- or --
+
+```
+pip3 install redis
+```
+
+## Examples
+
+In order to run the examples directly from the repository, you need to install dependencies. You can do that using *
+pipenv* by executing the following command (assuming pipenv already installed):
+
+```shell
+python3 -m pipenv sync
+```
+
+#### Example 1 - `drain_stdin_demo`
+
+Run [examples/drain_stdin_demo.py](examples/drain_stdin_demo.py) from the root folder of the repository by:
+
+```
+python3 -m pipenv run python -m examples.drain_stdin_demo
+```
+
+This example uses Drain3 on input from stdin and persist to either Kafka / file / no persistence.
+
+Change `persistence_type` variable in the example to change persistence mode.
+
+Enter several log lines using the command line. Press `q` to end online learn-and-match mode.
+
+Next, demo goes to match (inference) only mode, in which no new clusters are trained and input is matched against
+previously trained clusters only. Press `q` again to finish execution.
+
+#### Example 2 - `drain_bigfile_demo`
+
+Run [examples/drain_bigfile_demo](examples/drain_bigfile_demo.py) from the root folder of the repository by:
+
+```
+python3 -m pipenv run python -m examples.drain_bigfile_demo
+```
+
+This example downloads a real-world log file (of an SSH server) and process all lines, then prints result clusters,
+prefix tree and performance statistics.
+
+#### Sample config file
+
+An example `drain3.ini` file with masking instructions can be found in the [examples](examples) folder as well.
+
+## Contributing
+
+Our project welcomes external contributions. Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for further details.
+
+## Change Log
+
+##### v0.9.11
+
+* Fixed possible DivideByZero error when the profiler is enabled - [Issue #65](https://github.com/IBM/Drain3/issues/65). 
+
+##### v0.9.10
+
+* Fixed compatibility issue with Python 3.10 caused by removal of `KeysView`.
+
+##### v0.9.9
+
+* Added support for accurate log message parameter extraction in a new function - `extract_parameters()`. The
+  function `get_parameter_list()` is deprecated (Thanks to *@Impelon*).
+* Refactored `AbstractMaskingInstruction` as a base class for `RegexMaskingInstruction`, allowing to introduce other
+  types of masking mechanisms.
+
+##### v0.9.8
+
+* Added an option `full_search_strategy` option in `TemplateMiner.match()` and `Drain.match()`. See more info at
+  Issue [#48](https://github.com/IBM/Drain3/issues/48).
+* Added an option to disable parameterization of tokens that contains digits in
+  configuration: `TemplateMinerConfig.parametrize_numeric_tokens`
+* Loading Drain snapshot now only restores clusters state and not configuration parameters. This improves backwards
+  compatibility when introducing new Drain configuration parameters.
+
+##### v0.9.7
+
+* Fixed bug in original Drain: log clusters were created multiple times for log messages with fewer tokens
+  than `max_node_depth`.
+* Changed `depth` property name to a more descriptive name `max_node_depth` as Drain always subtracts 2 of `depth`
+  argument value. Also added `log_cluster_depth` property to reflect original value of depth argument (Breaking Change).
+* Restricted `depth` param to minimum sensible value of 3.
+* Added log cluster count to nodes in `Drain.print_tree()`
+* Added optional log cluster details to `Drain.print_tree()`
+
+##### v0.9.6
+
+* Fix issue https://github.com/IBM/Drain3/issues/38: Unnecessary update of LRU cache in case `max_clusters` is used (
+  thanks *@StanislawSwierc*).
+
+##### v0.9.5
+
+* Added: `TemplateMiner.match()` function for fast matching against existing clusters only.
+
+##### v0.9.4
+
+* Added: `TemplateMiner.get_parameter_list()` function to extract template parameters for raw log message (thanks to *
+  @cwyalpha*)
+* Added option to customize mask wrapper - Instead of the default `<*>`, `<NUM>` etc, you can select any wrapper prefix
+  or suffix by overriding `TemplateMinerConfig.mask_prefix` and `TemplateMinerConfig.mask_prefix`
+* Fixed: config `.ini` file is always read from same folder as source file in demos in tests (thanks *@RobinMaas95*)
+
+##### v0.9.3
+
+* Fixed: comparison of type int with type str in function `add_seq_to_prefix_tree` #28 (bug introduced at v0.9.1)
+
+##### v0.9.2
+
+* Updated jsonpickle version
+* Keys `id_to_cluster` dict are now persisted by jsonpickle as `int` instead of `str` to avoid keys type conversion on
+  load snapshot which caused some issues.
+* Added cachetools dependency to `setup.py`.
+
+##### v0.9.1
+
+* Added option to configure `TemplateMiner` using a configuration object (without `.ini` file).
+* Support for `print_tree()` to a file/stream.
+* Added `MemoryBufferPersistence`
+* Added unit tests for state save/load.
+* Bug fix: missing type-conversion in state loading, introduced in v0.9.0
+* Refactor: Drain prefix tree keys are now of type `str` also for 1st level
+  (was `int` before), for type consistency.
+
+##### v0.9.0
+
+* Decrease memory footprint of the main data structures.
+* Added `max_clusters` option to limit the number of tracked clusters.
+* Changed cluster identifier type from str to int
+* Added more unit tests and CI
+
+##### v0.8.6
+
+* Added `extra_delimiters` configuration option to Drain
+
+##### v0.8.5
+
+* Profiler improvements
+
+##### v0.8.4
+
+* Masking speed improvement
+
+##### v0.8.3
+
+* Fix: profiler state after load from snapshot
+
+##### v0.8.2
+
+* Fixed snapshot backward compatibility to v0.7.9
+
+##### v0.8.1
+
+* Bugfix in profiling configuration read
+
+##### v0.8.0
+
+* Added time profiling support (disabled by default)
+* Added cluster ID to snapshot reason log (credit: @boernd)
+* Minor Readability and documentation improvements in Drain
+
+##### v0.7.9
+
+* Fix: `KafkaPersistence` now accepts also `bootstrap_servers` as kwargs.
+
+##### v0.7.8
+
+* Using `kafka-python` package instead of `kafka` (newer).
+* Added support for specifying additional configuration as `kwargs` in Kafka persistence handler.
+
+##### v0.7.7
+
+* Corrected default Drain config values.
+
+##### v0.7.6
+
+* Improvement in config file handling (Note: new sections were added instead of `DEFAULT` section)
+
+##### v0.7.5
+
+* Made Kafka and Redis optional requirements
+ 
+
+
+%package -n python3-drain3
+Summary:	Persistent & streaming log template miner
+Provides:	python-drain3
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-drain3
+# Drain3
+
+## Introduction
+
+Drain3 is an online log template miner that can extract templates (clusters) from a stream of log messages in a timely
+manner. It employs a parse tree with fixed depth to guide the log group search process, which effectively avoids
+constructing a very deep and unbalanced tree.
+
+Drain3 continuously learns on-the-fly and extracts log templates from raw log entries.
+
+#### Example:
+
+For the input:
+
+```
+connected to 10.0.0.1
+connected to 192.168.0.1
+Hex number 0xDEADBEAF
+user davidoh logged in
+user eranr logged in
+```
+
+Drain3 extracts the following templates:
+
+```
+ID=1     : size=2         : connected to <:IP:>
+ID=2     : size=1         : Hex number <:HEX:>
+ID=3     : size=2         : user <:*:> logged in
+```
+
+Full sample program output:
+
+```
+Starting Drain3 template miner
+Checking for saved state
+Saved state not found
+Drain3 started with 'FILE' persistence
+Starting training mode. Reading from std-in ('q' to finish)
+> connected to 10.0.0.1
+Saving state of 1 clusters with 1 messages, 528 bytes, reason: cluster_created (1)
+{"change_type": "cluster_created", "cluster_id": 1, "cluster_size": 1, "template_mined": "connected to <:IP:>", "cluster_count": 1}
+Parameters: [ExtractedParameter(value='10.0.0.1', mask_name='IP')]
+> connected to 192.168.0.1
+{"change_type": "none", "cluster_id": 1, "cluster_size": 2, "template_mined": "connected to <:IP:>", "cluster_count": 1}
+Parameters: [ExtractedParameter(value='192.168.0.1', mask_name='IP')]
+> Hex number 0xDEADBEAF
+Saving state of 2 clusters with 3 messages, 584 bytes, reason: cluster_created (2)
+{"change_type": "cluster_created", "cluster_id": 2, "cluster_size": 1, "template_mined": "Hex number <:HEX:>", "cluster_count": 2}
+Parameters: [ExtractedParameter(value='0xDEADBEAF', mask_name='HEX')]
+> user davidoh logged in
+Saving state of 3 clusters with 4 messages, 648 bytes, reason: cluster_created (3)
+{"change_type": "cluster_created", "cluster_id": 3, "cluster_size": 1, "template_mined": "user davidoh logged in", "cluster_count": 3}
+Parameters: []
+> user eranr logged in
+Saving state of 3 clusters with 5 messages, 644 bytes, reason: cluster_template_changed (3)
+{"change_type": "cluster_template_changed", "cluster_id": 3, "cluster_size": 2, "template_mined": "user <:*:> logged in", "cluster_count": 3}
+Parameters: [ExtractedParameter(value='eranr', mask_name='*')]
+> q
+Training done. Mined clusters:
+ID=1     : size=2         : connected to <:IP:>
+ID=2     : size=1         : Hex number <:HEX:>
+ID=3     : size=2         : user <:*:> logged in
+```
+
+This project is an upgrade of the original [Drain](https://github.com/logpai/logparser/blob/master/logparser/Drain)
+project by LogPAI from Python 2.7 to Python 3.6 or later with additional features and bug-fixes.
+
+Read more information about Drain from the following paper:
+
+- Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R.
+  Lyu. [Drain: An Online Log Parsing Approach with Fixed Depth Tree](http://jiemingzhu.github.io/pub/pjhe_icws2017.pdf),
+  Proceedings of the 24th International Conference on Web Services (ICWS), 2017.
+
+A Drain3 use case is presented in this blog
+post: [Use open source Drain3 log-template mining project to monitor for network outages](https://developer.ibm.com/blogs/how-mining-log-templates-can-help-ai-ops-in-cloud-scale-data-centers)
+.
+
+#### New features
+
+- [**Persistence**](#persistence). Save and load Drain state into an [Apache Kafka](https://kafka.apache.org)
+  topic, [Redis](https://redis.io/) or a file.
+- **Streaming**. Support feeding Drain with messages one-be-one.
+- [**Masking**](#masking). Replace some message parts (e.g numbers, IPs, emails) with wildcards. This improves the
+  accuracy of template mining.
+- [**Packaging**](#installation). As a pip package.
+- [**Configuration**](#configuration). Support for configuring Drain3 using an `.ini` file or a configuration object. 
+- [**Memory efficiency**](#memory-efficiency). Decrease the memory footprint of internal data structures and introduce
+  cache to control max memory consumed (thanks to @StanislawSwierc)
+- [**Inference mode**](#training-vs-inference-modes). In case you want to separate training and inference phase, Drain3
+  provides a function for *fast* matching against already-learned clusters (templates) only, without the usage of
+  regular expressions.
+- [**Parameter extraction**](#parameter-extraction). Accurate extraction of the variable parts from a log message as an
+  ordered list, based on its mined template and the defined masking instructions (thanks to @Impelon).
+
+#### Expected Input and Output
+
+Although Drain3 can be ingested with full raw log message, template mining accuracy can be improved if you feed it with
+only the unstructured free-text portion of log messages, by first removing structured parts like timestamp, hostname.
+severity, etc.
+
+The output is a dictionary with the following fields:
+
+- `change_type` - indicates either if a new template was identified, an existing template was changed or message added
+  to an existing cluster.
+- `cluster_id` - Sequential ID of the cluster that the log belongs to.
+- `cluster_size`- The size (message count) of the cluster that the log belongs to.
+- `cluster_count` - Count clusters seen so far.
+- `template_mined`- the last template of above cluster_id.
+
+## Configuration
+
+Drain3 is configured using [configparser](https://docs.python.org/3.4/library/configparser.html). By default, config
+filename is `drain3.ini` in working directory. It can also be configured passing
+a [TemplateMinerConfig](drain3/template_miner_config.py) object to the [TemplateMiner](drain3/template_miner.py)
+constructor.
+
+Primary configuration parameters:
+
+- `[DRAIN]/sim_th` - similarity threshold. if percentage of similar tokens for a log message is below this number, a new
+  log cluster will be created (default 0.4)
+- `[DRAIN]/depth` - max depth levels of log clusters. Minimum is 2. (default 4)
+- `[DRAIN]/max_children` - max number of children of an internal node (default 100)
+- `[DRAIN]/max_clusters` - max number of tracked clusters (unlimited by default). When this number is reached, model
+  starts replacing old clusters with a new ones according to the LRU cache eviction policy.
+- `[DRAIN]/extra_delimiters` - delimiters to apply when splitting log message into words (in addition to whitespace) (
+  default none). Format is a Python list e.g. `['_', ':']`.
+- `[MASKING]/masking` - parameters masking - in json format (default "")
+- `[MASKING]/mask_prefix` & `[MASKING]/mask_suffix` - the wrapping of identified parameters in templates. By default, it
+  is `<` and `>` respectively.
+- `[SNAPSHOT]/snapshot_interval_minutes` - time interval for new snapshots (default 1)
+- `[SNAPSHOT]/compress_state` - whether to compress the state before saving it. This can be useful when using Kafka
+  persistence.
+
+## Masking
+
+This feature allows masking of specific variable parts in log message with keywords, prior to passing to Drain. A
+well-defined masking can improve template mining accuracy.
+
+Template parameters that do not match any custom mask in the preliminary masking phase are replaced with `<*>` by Drain
+core.
+
+Use a list of regular expressions in the configuration file with the format `{'regex_pattern', 'mask_with'}` to set
+custom masking.
+
+For example, following masking instructions in `drain3.ini` will mask IP addresses and integers:
+
+```
+[MASKING]
+masking = [
+          {"regex_pattern":"((?<=[^A-Za-z0-9])|^)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})((?=[^A-Za-z0-9])|$)", "mask_with": "IP"},
+          {"regex_pattern":"((?<=[^A-Za-z0-9])|^)([\\-\\+]?\\d+)((?=[^A-Za-z0-9])|$)", "mask_with": "NUM"},
+          ]
+    ]
+```
+
+## Persistence
+
+The persistence feature saves and loads a snapshot of Drain3 state in a (compressed) json format. This feature adds
+restart resiliency to Drain allowing continuation of activity and maintain learned knowledge across restarts.
+
+Drain3 state includes the search tree and all the clusters that were identified up until snapshot time.
+
+The snapshot also persist number of log messages matched each cluster, and it's `cluster_id`.
+
+An example of a snapshot:
+
+```json
+{
+  "clusters": [
+    {
+      "cluster_id": 1,
+      "log_template_tokens": [
+        "aa",
+        "aa",
+        "<*>"
+      ],
+      "py/object": "drain3_core.LogCluster",
+      "size": 2
+    },
+    {
+      "cluster_id": 2,
+      "log_template_tokens": [
+        "My",
+        "IP",
+        "is",
+        "<IP>"
+      ],
+      "py/object": "drain3_core.LogCluster",
+      "size": 1
+    }
+  ]
+}
+```
+
+This example snapshot persist two clusters with the templates:
+
+`["aa", "aa", "<*>"]` - occurs twice
+
+`["My", "IP", "is", "<IP>"]` - occurs once
+
+Snapshots are created in the following events:
+
+- `cluster_created` - in any new template
+- `cluster_template_changed` - in any update of a template
+- `periodic` - after n minutes from the last snapshot. This is intended to save cluster sizes even if no new template
+  was identified.
+
+Drain3 currently supports the following persistence modes:
+
+- **Kafka** - The snapshot is saved in a dedicated topic used only for snapshots - the last message in this topic is the
+  last snapshot that will be loaded after restart. For Kafka persistence, you need to provide: `topic_name`. You may
+  also provide other `kwargs` that are supported by `kafka.KafkaConsumer` and `kafka.Producer` e.g `bootstrap_servers`
+  to change Kafka endpoint (default is `localhost:9092`).
+
+- **Redis** - The snapshot is saved to a key in Redis database (contributed by @matabares).
+
+- **File** - The snapshot is saved to a file.
+
+- **Memory** - The snapshot is saved an in-memory object.
+
+- **None** - No persistence.
+
+Drain3 persistence modes can be easily extended to another medium / database by inheriting
+the [PersistenceHandler](drain3/persistence_handler.py) class.
+
+## Training vs. Inference modes
+
+In some use-cases, it is required to separate training and inference phases.
+
+In training phase you should call `template_miner.add_log_message(log_line)`. This will match log line against an
+existing cluster (if similarity is above threshold) or create a new cluster. It may also change the template of an
+existing cluster.
+
+In inference mode you should call `template_miner.match(log_line)`. This will match log line against previously learned
+clusters only. No new clusters are created and templates of existing clusters are not changed. Match to existing cluster
+has to be perfect, otherwise `None` is returned. You can use persistence option to load previously trained clusters
+before inference.
+
+## Memory efficiency
+
+This feature limits the max memory used by the model. It is particularly important for large and possibly unbounded log
+streams. This feature is controlled by the `max_clusters` parameter, which sets the max number of clusters/templates
+trarcked by the model. When the limit is reached, new templates start to replace the old ones according to the Least
+Recently Used (LRU) eviction policy. This makes the model adapt quickly to the most recent templates in the log stream.
+
+## Parameter Extraction
+
+Drain3 supports retrieving an ordered list of variables in a log message, after its template was mined. Each parameter
+is accompanied by the name of the mask that was matched, or `*` for the catch-all mask.
+
+Parameter extraction is performed by generating a regular expression that matches the template and then applying it on
+the log message. When `exact_matching` is enabled (by default), the generated regex included the regular expression
+defined in relevant masking instructions. If there are multiple masking instructions with the same name, either match
+can satisfy the regex. It is possible to disable exact matching so that every variable is matched against a
+non-whitespace character sequence. This may improve performance on expanse of accuracy.
+
+Parameter extraction regexes generated per template are cached by default, to improve performance. You can control cache
+size with the ` MASKING/parameter_extraction_cache_capacity` configuration parameter.
+
+Sample usage:
+
+```python
+result = template_miner.add_log_message(log_line)
+params = template_miner.extract_parameters(
+    result["template_mined"], log_line, exact_matching=True)
+```
+
+For the input `"user johndoe logged in 11 minuts ago"`, the template would be:
+
+```
+"user <:*:> logged in <:NUM:> minuts ago"
+```
+
+... and the extracted parameters:
+
+```
+[
+  ExtractedParameter(value='johndoe', mask_name='*'), 
+  ExtractedParameter(value='11', mask_name='NUM')
+]
+```
+
+## Installation
+
+Drain3 is available from [PyPI](https://pypi.org/project/drain3). To install use `pip`:
+
+```
+pip3 install drain3
+```
+
+Note: If you decide to use Kafka or Redis persistence, you should install relevant client library explicitly, since it
+is declared as an extra (optional) dependency, by either:
+
+```
+pip3 install kafka-python
+```
+
+-- or --
+
+```
+pip3 install redis
+```
+
+## Examples
+
+In order to run the examples directly from the repository, you need to install dependencies. You can do that using *
+pipenv* by executing the following command (assuming pipenv already installed):
+
+```shell
+python3 -m pipenv sync
+```
+
+#### Example 1 - `drain_stdin_demo`
+
+Run [examples/drain_stdin_demo.py](examples/drain_stdin_demo.py) from the root folder of the repository by:
+
+```
+python3 -m pipenv run python -m examples.drain_stdin_demo
+```
+
+This example uses Drain3 on input from stdin and persist to either Kafka / file / no persistence.
+
+Change `persistence_type` variable in the example to change persistence mode.
+
+Enter several log lines using the command line. Press `q` to end online learn-and-match mode.
+
+Next, demo goes to match (inference) only mode, in which no new clusters are trained and input is matched against
+previously trained clusters only. Press `q` again to finish execution.
+
+#### Example 2 - `drain_bigfile_demo`
+
+Run [examples/drain_bigfile_demo](examples/drain_bigfile_demo.py) from the root folder of the repository by:
+
+```
+python3 -m pipenv run python -m examples.drain_bigfile_demo
+```
+
+This example downloads a real-world log file (of an SSH server) and process all lines, then prints result clusters,
+prefix tree and performance statistics.
+
+#### Sample config file
+
+An example `drain3.ini` file with masking instructions can be found in the [examples](examples) folder as well.
+
+## Contributing
+
+Our project welcomes external contributions. Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for further details.
+
+## Change Log
+
+##### v0.9.11
+
+* Fixed possible DivideByZero error when the profiler is enabled - [Issue #65](https://github.com/IBM/Drain3/issues/65). 
+
+##### v0.9.10
+
+* Fixed compatibility issue with Python 3.10 caused by removal of `KeysView`.
+
+##### v0.9.9
+
+* Added support for accurate log message parameter extraction in a new function - `extract_parameters()`. The
+  function `get_parameter_list()` is deprecated (Thanks to *@Impelon*).
+* Refactored `AbstractMaskingInstruction` as a base class for `RegexMaskingInstruction`, allowing to introduce other
+  types of masking mechanisms.
+
+##### v0.9.8
+
+* Added an option `full_search_strategy` option in `TemplateMiner.match()` and `Drain.match()`. See more info at
+  Issue [#48](https://github.com/IBM/Drain3/issues/48).
+* Added an option to disable parameterization of tokens that contains digits in
+  configuration: `TemplateMinerConfig.parametrize_numeric_tokens`
+* Loading Drain snapshot now only restores clusters state and not configuration parameters. This improves backwards
+  compatibility when introducing new Drain configuration parameters.
+
+##### v0.9.7
+
+* Fixed bug in original Drain: log clusters were created multiple times for log messages with fewer tokens
+  than `max_node_depth`.
+* Changed `depth` property name to a more descriptive name `max_node_depth` as Drain always subtracts 2 of `depth`
+  argument value. Also added `log_cluster_depth` property to reflect original value of depth argument (Breaking Change).
+* Restricted `depth` param to minimum sensible value of 3.
+* Added log cluster count to nodes in `Drain.print_tree()`
+* Added optional log cluster details to `Drain.print_tree()`
+
+##### v0.9.6
+
+* Fix issue https://github.com/IBM/Drain3/issues/38: Unnecessary update of LRU cache in case `max_clusters` is used (
+  thanks *@StanislawSwierc*).
+
+##### v0.9.5
+
+* Added: `TemplateMiner.match()` function for fast matching against existing clusters only.
+
+##### v0.9.4
+
+* Added: `TemplateMiner.get_parameter_list()` function to extract template parameters for raw log message (thanks to *
+  @cwyalpha*)
+* Added option to customize mask wrapper - Instead of the default `<*>`, `<NUM>` etc, you can select any wrapper prefix
+  or suffix by overriding `TemplateMinerConfig.mask_prefix` and `TemplateMinerConfig.mask_prefix`
+* Fixed: config `.ini` file is always read from same folder as source file in demos in tests (thanks *@RobinMaas95*)
+
+##### v0.9.3
+
+* Fixed: comparison of type int with type str in function `add_seq_to_prefix_tree` #28 (bug introduced at v0.9.1)
+
+##### v0.9.2
+
+* Updated jsonpickle version
+* Keys `id_to_cluster` dict are now persisted by jsonpickle as `int` instead of `str` to avoid keys type conversion on
+  load snapshot which caused some issues.
+* Added cachetools dependency to `setup.py`.
+
+##### v0.9.1
+
+* Added option to configure `TemplateMiner` using a configuration object (without `.ini` file).
+* Support for `print_tree()` to a file/stream.
+* Added `MemoryBufferPersistence`
+* Added unit tests for state save/load.
+* Bug fix: missing type-conversion in state loading, introduced in v0.9.0
+* Refactor: Drain prefix tree keys are now of type `str` also for 1st level
+  (was `int` before), for type consistency.
+
+##### v0.9.0
+
+* Decrease memory footprint of the main data structures.
+* Added `max_clusters` option to limit the number of tracked clusters.
+* Changed cluster identifier type from str to int
+* Added more unit tests and CI
+
+##### v0.8.6
+
+* Added `extra_delimiters` configuration option to Drain
+
+##### v0.8.5
+
+* Profiler improvements
+
+##### v0.8.4
+
+* Masking speed improvement
+
+##### v0.8.3
+
+* Fix: profiler state after load from snapshot
+
+##### v0.8.2
+
+* Fixed snapshot backward compatibility to v0.7.9
+
+##### v0.8.1
+
+* Bugfix in profiling configuration read
+
+##### v0.8.0
+
+* Added time profiling support (disabled by default)
+* Added cluster ID to snapshot reason log (credit: @boernd)
+* Minor Readability and documentation improvements in Drain
+
+##### v0.7.9
+
+* Fix: `KafkaPersistence` now accepts also `bootstrap_servers` as kwargs.
+
+##### v0.7.8
+
+* Using `kafka-python` package instead of `kafka` (newer).
+* Added support for specifying additional configuration as `kwargs` in Kafka persistence handler.
+
+##### v0.7.7
+
+* Corrected default Drain config values.
+
+##### v0.7.6
+
+* Improvement in config file handling (Note: new sections were added instead of `DEFAULT` section)
+
+##### v0.7.5
+
+* Made Kafka and Redis optional requirements
+ 
+
+
+%package help
+Summary:	Development documents and examples for drain3
+Provides:	python3-drain3-doc
+%description help
+# Drain3
+
+## Introduction
+
+Drain3 is an online log template miner that can extract templates (clusters) from a stream of log messages in a timely
+manner. It employs a parse tree with fixed depth to guide the log group search process, which effectively avoids
+constructing a very deep and unbalanced tree.
+
+Drain3 continuously learns on-the-fly and extracts log templates from raw log entries.
+
+#### Example:
+
+For the input:
+
+```
+connected to 10.0.0.1
+connected to 192.168.0.1
+Hex number 0xDEADBEAF
+user davidoh logged in
+user eranr logged in
+```
+
+Drain3 extracts the following templates:
+
+```
+ID=1     : size=2         : connected to <:IP:>
+ID=2     : size=1         : Hex number <:HEX:>
+ID=3     : size=2         : user <:*:> logged in
+```
+
+Full sample program output:
+
+```
+Starting Drain3 template miner
+Checking for saved state
+Saved state not found
+Drain3 started with 'FILE' persistence
+Starting training mode. Reading from std-in ('q' to finish)
+> connected to 10.0.0.1
+Saving state of 1 clusters with 1 messages, 528 bytes, reason: cluster_created (1)
+{"change_type": "cluster_created", "cluster_id": 1, "cluster_size": 1, "template_mined": "connected to <:IP:>", "cluster_count": 1}
+Parameters: [ExtractedParameter(value='10.0.0.1', mask_name='IP')]
+> connected to 192.168.0.1
+{"change_type": "none", "cluster_id": 1, "cluster_size": 2, "template_mined": "connected to <:IP:>", "cluster_count": 1}
+Parameters: [ExtractedParameter(value='192.168.0.1', mask_name='IP')]
+> Hex number 0xDEADBEAF
+Saving state of 2 clusters with 3 messages, 584 bytes, reason: cluster_created (2)
+{"change_type": "cluster_created", "cluster_id": 2, "cluster_size": 1, "template_mined": "Hex number <:HEX:>", "cluster_count": 2}
+Parameters: [ExtractedParameter(value='0xDEADBEAF', mask_name='HEX')]
+> user davidoh logged in
+Saving state of 3 clusters with 4 messages, 648 bytes, reason: cluster_created (3)
+{"change_type": "cluster_created", "cluster_id": 3, "cluster_size": 1, "template_mined": "user davidoh logged in", "cluster_count": 3}
+Parameters: []
+> user eranr logged in
+Saving state of 3 clusters with 5 messages, 644 bytes, reason: cluster_template_changed (3)
+{"change_type": "cluster_template_changed", "cluster_id": 3, "cluster_size": 2, "template_mined": "user <:*:> logged in", "cluster_count": 3}
+Parameters: [ExtractedParameter(value='eranr', mask_name='*')]
+> q
+Training done. Mined clusters:
+ID=1     : size=2         : connected to <:IP:>
+ID=2     : size=1         : Hex number <:HEX:>
+ID=3     : size=2         : user <:*:> logged in
+```
+
+This project is an upgrade of the original [Drain](https://github.com/logpai/logparser/blob/master/logparser/Drain)
+project by LogPAI from Python 2.7 to Python 3.6 or later with additional features and bug-fixes.
+
+Read more information about Drain from the following paper:
+
+- Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R.
+  Lyu. [Drain: An Online Log Parsing Approach with Fixed Depth Tree](http://jiemingzhu.github.io/pub/pjhe_icws2017.pdf),
+  Proceedings of the 24th International Conference on Web Services (ICWS), 2017.
+
+A Drain3 use case is presented in this blog
+post: [Use open source Drain3 log-template mining project to monitor for network outages](https://developer.ibm.com/blogs/how-mining-log-templates-can-help-ai-ops-in-cloud-scale-data-centers)
+.
+
+#### New features
+
+- [**Persistence**](#persistence). Save and load Drain state into an [Apache Kafka](https://kafka.apache.org)
+  topic, [Redis](https://redis.io/) or a file.
+- **Streaming**. Support feeding Drain with messages one-be-one.
+- [**Masking**](#masking). Replace some message parts (e.g numbers, IPs, emails) with wildcards. This improves the
+  accuracy of template mining.
+- [**Packaging**](#installation). As a pip package.
+- [**Configuration**](#configuration). Support for configuring Drain3 using an `.ini` file or a configuration object. 
+- [**Memory efficiency**](#memory-efficiency). Decrease the memory footprint of internal data structures and introduce
+  cache to control max memory consumed (thanks to @StanislawSwierc)
+- [**Inference mode**](#training-vs-inference-modes). In case you want to separate training and inference phase, Drain3
+  provides a function for *fast* matching against already-learned clusters (templates) only, without the usage of
+  regular expressions.
+- [**Parameter extraction**](#parameter-extraction). Accurate extraction of the variable parts from a log message as an
+  ordered list, based on its mined template and the defined masking instructions (thanks to @Impelon).
+
+#### Expected Input and Output
+
+Although Drain3 can be ingested with full raw log message, template mining accuracy can be improved if you feed it with
+only the unstructured free-text portion of log messages, by first removing structured parts like timestamp, hostname.
+severity, etc.
+
+The output is a dictionary with the following fields:
+
+- `change_type` - indicates either if a new template was identified, an existing template was changed or message added
+  to an existing cluster.
+- `cluster_id` - Sequential ID of the cluster that the log belongs to.
+- `cluster_size`- The size (message count) of the cluster that the log belongs to.
+- `cluster_count` - Count clusters seen so far.
+- `template_mined`- the last template of above cluster_id.
+
+## Configuration
+
+Drain3 is configured using [configparser](https://docs.python.org/3.4/library/configparser.html). By default, config
+filename is `drain3.ini` in working directory. It can also be configured passing
+a [TemplateMinerConfig](drain3/template_miner_config.py) object to the [TemplateMiner](drain3/template_miner.py)
+constructor.
+
+Primary configuration parameters:
+
+- `[DRAIN]/sim_th` - similarity threshold. if percentage of similar tokens for a log message is below this number, a new
+  log cluster will be created (default 0.4)
+- `[DRAIN]/depth` - max depth levels of log clusters. Minimum is 2. (default 4)
+- `[DRAIN]/max_children` - max number of children of an internal node (default 100)
+- `[DRAIN]/max_clusters` - max number of tracked clusters (unlimited by default). When this number is reached, model
+  starts replacing old clusters with a new ones according to the LRU cache eviction policy.
+- `[DRAIN]/extra_delimiters` - delimiters to apply when splitting log message into words (in addition to whitespace) (
+  default none). Format is a Python list e.g. `['_', ':']`.
+- `[MASKING]/masking` - parameters masking - in json format (default "")
+- `[MASKING]/mask_prefix` & `[MASKING]/mask_suffix` - the wrapping of identified parameters in templates. By default, it
+  is `<` and `>` respectively.
+- `[SNAPSHOT]/snapshot_interval_minutes` - time interval for new snapshots (default 1)
+- `[SNAPSHOT]/compress_state` - whether to compress the state before saving it. This can be useful when using Kafka
+  persistence.
+
+## Masking
+
+This feature allows masking of specific variable parts in log message with keywords, prior to passing to Drain. A
+well-defined masking can improve template mining accuracy.
+
+Template parameters that do not match any custom mask in the preliminary masking phase are replaced with `<*>` by Drain
+core.
+
+Use a list of regular expressions in the configuration file with the format `{'regex_pattern', 'mask_with'}` to set
+custom masking.
+
+For example, following masking instructions in `drain3.ini` will mask IP addresses and integers:
+
+```
+[MASKING]
+masking = [
+          {"regex_pattern":"((?<=[^A-Za-z0-9])|^)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})((?=[^A-Za-z0-9])|$)", "mask_with": "IP"},
+          {"regex_pattern":"((?<=[^A-Za-z0-9])|^)([\\-\\+]?\\d+)((?=[^A-Za-z0-9])|$)", "mask_with": "NUM"},
+          ]
+    ]
+```
+
+## Persistence
+
+The persistence feature saves and loads a snapshot of Drain3 state in a (compressed) json format. This feature adds
+restart resiliency to Drain allowing continuation of activity and maintain learned knowledge across restarts.
+
+Drain3 state includes the search tree and all the clusters that were identified up until snapshot time.
+
+The snapshot also persist number of log messages matched each cluster, and it's `cluster_id`.
+
+An example of a snapshot:
+
+```json
+{
+  "clusters": [
+    {
+      "cluster_id": 1,
+      "log_template_tokens": [
+        "aa",
+        "aa",
+        "<*>"
+      ],
+      "py/object": "drain3_core.LogCluster",
+      "size": 2
+    },
+    {
+      "cluster_id": 2,
+      "log_template_tokens": [
+        "My",
+        "IP",
+        "is",
+        "<IP>"
+      ],
+      "py/object": "drain3_core.LogCluster",
+      "size": 1
+    }
+  ]
+}
+```
+
+This example snapshot persist two clusters with the templates:
+
+`["aa", "aa", "<*>"]` - occurs twice
+
+`["My", "IP", "is", "<IP>"]` - occurs once
+
+Snapshots are created in the following events:
+
+- `cluster_created` - in any new template
+- `cluster_template_changed` - in any update of a template
+- `periodic` - after n minutes from the last snapshot. This is intended to save cluster sizes even if no new template
+  was identified.
+
+Drain3 currently supports the following persistence modes:
+
+- **Kafka** - The snapshot is saved in a dedicated topic used only for snapshots - the last message in this topic is the
+  last snapshot that will be loaded after restart. For Kafka persistence, you need to provide: `topic_name`. You may
+  also provide other `kwargs` that are supported by `kafka.KafkaConsumer` and `kafka.Producer` e.g `bootstrap_servers`
+  to change Kafka endpoint (default is `localhost:9092`).
+
+- **Redis** - The snapshot is saved to a key in Redis database (contributed by @matabares).
+
+- **File** - The snapshot is saved to a file.
+
+- **Memory** - The snapshot is saved an in-memory object.
+
+- **None** - No persistence.
+
+Drain3 persistence modes can be easily extended to another medium / database by inheriting
+the [PersistenceHandler](drain3/persistence_handler.py) class.
+
+## Training vs. Inference modes
+
+In some use-cases, it is required to separate training and inference phases.
+
+In training phase you should call `template_miner.add_log_message(log_line)`. This will match log line against an
+existing cluster (if similarity is above threshold) or create a new cluster. It may also change the template of an
+existing cluster.
+
+In inference mode you should call `template_miner.match(log_line)`. This will match log line against previously learned
+clusters only. No new clusters are created and templates of existing clusters are not changed. Match to existing cluster
+has to be perfect, otherwise `None` is returned. You can use persistence option to load previously trained clusters
+before inference.
+
+## Memory efficiency
+
+This feature limits the max memory used by the model. It is particularly important for large and possibly unbounded log
+streams. This feature is controlled by the `max_clusters` parameter, which sets the max number of clusters/templates
+trarcked by the model. When the limit is reached, new templates start to replace the old ones according to the Least
+Recently Used (LRU) eviction policy. This makes the model adapt quickly to the most recent templates in the log stream.
+
+## Parameter Extraction
+
+Drain3 supports retrieving an ordered list of variables in a log message, after its template was mined. Each parameter
+is accompanied by the name of the mask that was matched, or `*` for the catch-all mask.
+
+Parameter extraction is performed by generating a regular expression that matches the template and then applying it on
+the log message. When `exact_matching` is enabled (by default), the generated regex included the regular expression
+defined in relevant masking instructions. If there are multiple masking instructions with the same name, either match
+can satisfy the regex. It is possible to disable exact matching so that every variable is matched against a
+non-whitespace character sequence. This may improve performance on expanse of accuracy.
+
+Parameter extraction regexes generated per template are cached by default, to improve performance. You can control cache
+size with the ` MASKING/parameter_extraction_cache_capacity` configuration parameter.
+
+Sample usage:
+
+```python
+result = template_miner.add_log_message(log_line)
+params = template_miner.extract_parameters(
+    result["template_mined"], log_line, exact_matching=True)
+```
+
+For the input `"user johndoe logged in 11 minuts ago"`, the template would be:
+
+```
+"user <:*:> logged in <:NUM:> minuts ago"
+```
+
+... and the extracted parameters:
+
+```
+[
+  ExtractedParameter(value='johndoe', mask_name='*'), 
+  ExtractedParameter(value='11', mask_name='NUM')
+]
+```
+
+## Installation
+
+Drain3 is available from [PyPI](https://pypi.org/project/drain3). To install use `pip`:
+
+```
+pip3 install drain3
+```
+
+Note: If you decide to use Kafka or Redis persistence, you should install relevant client library explicitly, since it
+is declared as an extra (optional) dependency, by either:
+
+```
+pip3 install kafka-python
+```
+
+-- or --
+
+```
+pip3 install redis
+```
+
+## Examples
+
+In order to run the examples directly from the repository, you need to install dependencies. You can do that using *
+pipenv* by executing the following command (assuming pipenv already installed):
+
+```shell
+python3 -m pipenv sync
+```
+
+#### Example 1 - `drain_stdin_demo`
+
+Run [examples/drain_stdin_demo.py](examples/drain_stdin_demo.py) from the root folder of the repository by:
+
+```
+python3 -m pipenv run python -m examples.drain_stdin_demo
+```
+
+This example uses Drain3 on input from stdin and persist to either Kafka / file / no persistence.
+
+Change `persistence_type` variable in the example to change persistence mode.
+
+Enter several log lines using the command line. Press `q` to end online learn-and-match mode.
+
+Next, demo goes to match (inference) only mode, in which no new clusters are trained and input is matched against
+previously trained clusters only. Press `q` again to finish execution.
+
+#### Example 2 - `drain_bigfile_demo`
+
+Run [examples/drain_bigfile_demo](examples/drain_bigfile_demo.py) from the root folder of the repository by:
+
+```
+python3 -m pipenv run python -m examples.drain_bigfile_demo
+```
+
+This example downloads a real-world log file (of an SSH server) and process all lines, then prints result clusters,
+prefix tree and performance statistics.
+
+#### Sample config file
+
+An example `drain3.ini` file with masking instructions can be found in the [examples](examples) folder as well.
+
+## Contributing
+
+Our project welcomes external contributions. Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for further details.
+
+## Change Log
+
+##### v0.9.11
+
+* Fixed possible DivideByZero error when the profiler is enabled - [Issue #65](https://github.com/IBM/Drain3/issues/65). 
+
+##### v0.9.10
+
+* Fixed compatibility issue with Python 3.10 caused by removal of `KeysView`.
+
+##### v0.9.9
+
+* Added support for accurate log message parameter extraction in a new function - `extract_parameters()`. The
+  function `get_parameter_list()` is deprecated (Thanks to *@Impelon*).
+* Refactored `AbstractMaskingInstruction` as a base class for `RegexMaskingInstruction`, allowing to introduce other
+  types of masking mechanisms.
+
+##### v0.9.8
+
+* Added an option `full_search_strategy` option in `TemplateMiner.match()` and `Drain.match()`. See more info at
+  Issue [#48](https://github.com/IBM/Drain3/issues/48).
+* Added an option to disable parameterization of tokens that contains digits in
+  configuration: `TemplateMinerConfig.parametrize_numeric_tokens`
+* Loading Drain snapshot now only restores clusters state and not configuration parameters. This improves backwards
+  compatibility when introducing new Drain configuration parameters.
+
+##### v0.9.7
+
+* Fixed bug in original Drain: log clusters were created multiple times for log messages with fewer tokens
+  than `max_node_depth`.
+* Changed `depth` property name to a more descriptive name `max_node_depth` as Drain always subtracts 2 of `depth`
+  argument value. Also added `log_cluster_depth` property to reflect original value of depth argument (Breaking Change).
+* Restricted `depth` param to minimum sensible value of 3.
+* Added log cluster count to nodes in `Drain.print_tree()`
+* Added optional log cluster details to `Drain.print_tree()`
+
+##### v0.9.6
+
+* Fix issue https://github.com/IBM/Drain3/issues/38: Unnecessary update of LRU cache in case `max_clusters` is used (
+  thanks *@StanislawSwierc*).
+
+##### v0.9.5
+
+* Added: `TemplateMiner.match()` function for fast matching against existing clusters only.
+
+##### v0.9.4
+
+* Added: `TemplateMiner.get_parameter_list()` function to extract template parameters for raw log message (thanks to *
+  @cwyalpha*)
+* Added option to customize mask wrapper - Instead of the default `<*>`, `<NUM>` etc, you can select any wrapper prefix
+  or suffix by overriding `TemplateMinerConfig.mask_prefix` and `TemplateMinerConfig.mask_prefix`
+* Fixed: config `.ini` file is always read from same folder as source file in demos in tests (thanks *@RobinMaas95*)
+
+##### v0.9.3
+
+* Fixed: comparison of type int with type str in function `add_seq_to_prefix_tree` #28 (bug introduced at v0.9.1)
+
+##### v0.9.2
+
+* Updated jsonpickle version
+* Keys `id_to_cluster` dict are now persisted by jsonpickle as `int` instead of `str` to avoid keys type conversion on
+  load snapshot which caused some issues.
+* Added cachetools dependency to `setup.py`.
+
+##### v0.9.1
+
+* Added option to configure `TemplateMiner` using a configuration object (without `.ini` file).
+* Support for `print_tree()` to a file/stream.
+* Added `MemoryBufferPersistence`
+* Added unit tests for state save/load.
+* Bug fix: missing type-conversion in state loading, introduced in v0.9.0
+* Refactor: Drain prefix tree keys are now of type `str` also for 1st level
+  (was `int` before), for type consistency.
+
+##### v0.9.0
+
+* Decrease memory footprint of the main data structures.
+* Added `max_clusters` option to limit the number of tracked clusters.
+* Changed cluster identifier type from str to int
+* Added more unit tests and CI
+
+##### v0.8.6
+
+* Added `extra_delimiters` configuration option to Drain
+
+##### v0.8.5
+
+* Profiler improvements
+
+##### v0.8.4
+
+* Masking speed improvement
+
+##### v0.8.3
+
+* Fix: profiler state after load from snapshot
+
+##### v0.8.2
+
+* Fixed snapshot backward compatibility to v0.7.9
+
+##### v0.8.1
+
+* Bugfix in profiling configuration read
+
+##### v0.8.0
+
+* Added time profiling support (disabled by default)
+* Added cluster ID to snapshot reason log (credit: @boernd)
+* Minor Readability and documentation improvements in Drain
+
+##### v0.7.9
+
+* Fix: `KafkaPersistence` now accepts also `bootstrap_servers` as kwargs.
+
+##### v0.7.8
+
+* Using `kafka-python` package instead of `kafka` (newer).
+* Added support for specifying additional configuration as `kwargs` in Kafka persistence handler.
+
+##### v0.7.7
+
+* Corrected default Drain config values.
+
+##### v0.7.6
+
+* Improvement in config file handling (Note: new sections were added instead of `DEFAULT` section)
+
+##### v0.7.5
+
+* Made Kafka and Redis optional requirements
+ 
+
+
+%prep
+%autosetup -n drain3-0.9.11
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-drain3 -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.9.11-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..9af2c1c
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+1f5016656c1d9c66d17222f7ae27e684  drain3-0.9.11.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-04-11 15:25:26 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-04-11 15:25:26 +0000
commit	c12a7daa9cd0bea2d662f5e94f7e6b6accd281f1 (patch)
tree	7858e92ee14bd149083e190e512110677170a707
parent	2eb44a4c9e4136d3a61b1932616d483a091b4bb8 (diff)