diff options
author | CoprDistGit <infra@openeuler.org> | 2023-05-10 08:18:29 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-05-10 08:18:29 +0000 |
commit | 76100fd50c92ef4c4b5ac23edb4600af64156e4e (patch) | |
tree | d57f380bb53cbded98c2d850e1cefdee072043ae | |
parent | 460388a85bbd8aa713d25eb27f43cc4ae5f73ad3 (diff) |
automatic import of python-carling
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-carling.spec | 749 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 751 insertions, 0 deletions
@@ -0,0 +1 @@ +/carling-0.3.5.tar.gz diff --git a/python-carling.spec b/python-carling.spec new file mode 100644 index 0000000..5ac823b --- /dev/null +++ b/python-carling.spec @@ -0,0 +1,749 @@ +%global _empty_manifest_terminate_build 0 +Name: python-carling +Version: 0.3.5 +Release: 1 +Summary: Useful transforms for supporting apache beam pipelines. +License: Apache-2.0 +URL: https://github.com/mc-digital/carling +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/61/ff/8fa7ff0dce0f7f87488ffa0f8f5b3b6d57dd8ff7f6cd0f7d47dd4808aab3/carling-0.3.5.tar.gz +BuildArch: noarch + +Requires: python3-apache-beam +Requires: python3-deepdiff + +%description +# Carling + +[](https://github.com/mc-digital/carling/actions?query=workflow%3ACI) +[](https://pypi.org/project/carling/) +[](https://pypi.org/project/carling/) +[](https://github.com/mc-digital/carling/blob/main/LICENSE) + +Via [Wikipedia](<https://en.wikipedia.org/wiki/Carling_(sailing)>): + +> Carlings are pieces of timber laid fore and aft under the deck of a ship, from one beam to another. +> They serve as a foundation for the whole body of the ship. + +Useful transforms for supporting our apache beam pipelines. + +## Mapping transform utils + +#### `carling.Label` + +Labels all elements. + +#### `carling.Select` + +Removes all columns which are not specified in `*keys`. + +#### `carling.Project` + +Transforms each element into a tuple of values of the specified columns. + +#### `carling.IndexBy` + +Transforms each element `V` into a tuple `(K, V)`. + +`K` is the projection of `V` by `*keys`, which is equal to the tuple +produced by the `Project` transform. + +#### `carling.Stringify` + +Transforms each element into its JSON representation. + +#### `carling.IndexBySingle` + +Transforms each element `V` into a tuple `(K, V)`. + +The difference between `IndexBySingle(key)` and `IndexBy(key)` with a single +key is as follows: + +- `IndexBySingle` produces the index as a plain value. +- `IndexBy` produces the index as a single-element tuple. + +#### `carling.RenameFromTo` + +Rename columns according to `from_to_key_mapping`. + +#### `carling.Exclude` + +Removes all columns specified in `*keys`. + +## Grouping transform utils + +Generic grouping transform utils + +#### `carling.UniqueOnly` + +Produces elements that are the only elements per key after deduplication. + +Given a `PCollection` of `(K, V)`, +this transform produces the collection of all `V`s that do not share +the same corresponding `K`s with any other elements after deduplicating +all equivalent `(K, V)` pairs. + +This transform is equivalent to `SingletonOnly` with `apache_beam.Distinct`. + +`[(1, "A"), (2, "B1"), (2, "B2"), (3, "C"), (3, "C"), (4, "A")]` will be +transformed into `["A", "C", "A"]`. + +#### `carling.SingletonOnly` + +Produces elements that are the only elements per key. + +Given a `PCollection` of `(K, V)`, +this transform produces the collection of all `V`s that do not share +the same corresponding `K`s with any other elements. + +`[(1, "A"), (2, "B1"), (2, "B2"), (3, "C"), (3, "C"), (4, "A")]` will be +transformed into `["A", "A"]`. + +#### `carling.Intersection` + +Produces the intersection of given `PCollection`s. + +Given a list of `PCollection`s, +this transform produces every element that appears in all collections of +the list. +Elements are deduplicated before taking the intersection. + +#### `carling.FilterByKey` + +Filters elements by their keys. + +The constructor receives one or more `PCollection`s of `K`s, +which are regarded as key lists. +Given a `PCollection` of `(K, V)`, +this transform discards all elements with `K`s that do not appear +in the key lists. + +If multiple collections are given to the constructor, +this transform treats the intersection of them as the key list. + +#### `carling.FilterByKeyUsingSideInput` + +Filters a single collection by a single lookup collection, using a common key. + +Given: - a `PCollection` (lookup_entries) of `(V)`, as a lookup collection - a `PCollection` (pcoll) of `(V)`, as values to be filtered - a common key (filter_key) + +A dictionary called `filter_dict` - is created by mapping the value of `filter_key` +for each entry in `lookup_entries` to True. + +Then, for each item in pcoll, the value associated with `filter_key` checkd against +`filter_dict`, and if it is found, the entry passes through. Otherwise, the entry is +discarded. + +Note: `lookup_entries` will be used as a **side input**, so care +must be taken regarding the size of the `lookup_entries` + +#### `carling.DifferencePerKey` + +Produces the difference per key between two `PCollection`s. + +Given two `PCollection`s of `V`, +this transform indexes the collections by the specified keys `primary_keys`, +compares corresponding two `V` lists for every `K`, +and produces the difference per `K`. +If there is no difference, this transform produces nothing. + +Two `V` lists are considered to be different if the numbers of elements +differ or two elements of the lists with a same index differ +at one of the specified columns `columns`. + +#### `carling.SortedSelectPerKey` + +- Groups items by a set of `keys` -- column names per row +- Emits the "MAX" _value_ for each collection as defined by the `key_fn` +- Can emit "MIN" by passing `reverse=True` kwarg + +#### `carling.PartitionRowsContainingNone` + +Emits two tagged PCollections: + +- Default (`result[None]`): Rows are guaranteed not to have any `None` values +- `result["contains_none"]`: Rows for which at least one column had a `None` value + +## Categorical + +#### `carling.CreateCategoricalDicts` + +For a set of columnular data inputs this function takes: + + - cat_cols: + + Type: `[str]` + + An array of "categorical" columns + + - existing_dicts: + + Type: `PCollection[(string, string, int)]` + + Rows of tuples of type: + (column, previously_seen_value, mapped_unique_int) + + Mapping a set of "previously seen" keys to unique int values for each + column. + Not optional. + If none exist, pass an empty PCollection. + +It then creates a transform which takes a pcollection and + + - looks at the input pcoll for unseen values in each categorical column + - creates new unique integers for each distinct unseen value, starting at max(previous value for column)+1 + - ammends the existing mappings with (col, unseen_value, new_unique_int) + +Output is: + + - Type: `PCollection[(string, string, int)]` + +This is useful for preparing data to be trained by eg. LightGBM + +#### `carling.ReplaceCategoricalColumns` + +- Utilizes the "categorical dictionary rows" generated by `CreateCategoricalDicts` which is a list of pairs of type of `(column, value,unique_int)`. + +- Replaces each column with the appropriate value found in the mapping. + +## Test Utils + +#### `carling.test_utils.pprint_equal_to` + +This module contains the `equal_to` function from apache beam, but adapted to output results using pretty print. Reading the results as a large, unformatted string makes it harder to pick out what changed/is missing. + +## General Util + +#### `carling.LogSample` + +Print items of the given `PCollection` to the log. + +`LogSample` prints the JSON representations of the input items to the Python's +standard logging system. + +To avoid too much log entries being printed, `LogSample` limits the number of +logged items. The constructor parameter `n` determines the limit. + +By default, `LogSample` prints logs with the `INFO` log level. The constructor +parameter `level` determines the level. + +#### `carling.ReifyMultiValueOption` + +Prepares multi-value delimited options. Useful in contexts where +you want to create a multi-value option in a template environment. + +- inputs: + - delimited string option + - optional delimiter string (default is "|") + +* output: + - Type: `PCollection[str]` + + +%package -n python3-carling +Summary: Useful transforms for supporting apache beam pipelines. +Provides: python-carling +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-carling +# Carling + +[](https://github.com/mc-digital/carling/actions?query=workflow%3ACI) +[](https://pypi.org/project/carling/) +[](https://pypi.org/project/carling/) +[](https://github.com/mc-digital/carling/blob/main/LICENSE) + +Via [Wikipedia](<https://en.wikipedia.org/wiki/Carling_(sailing)>): + +> Carlings are pieces of timber laid fore and aft under the deck of a ship, from one beam to another. +> They serve as a foundation for the whole body of the ship. + +Useful transforms for supporting our apache beam pipelines. + +## Mapping transform utils + +#### `carling.Label` + +Labels all elements. + +#### `carling.Select` + +Removes all columns which are not specified in `*keys`. + +#### `carling.Project` + +Transforms each element into a tuple of values of the specified columns. + +#### `carling.IndexBy` + +Transforms each element `V` into a tuple `(K, V)`. + +`K` is the projection of `V` by `*keys`, which is equal to the tuple +produced by the `Project` transform. + +#### `carling.Stringify` + +Transforms each element into its JSON representation. + +#### `carling.IndexBySingle` + +Transforms each element `V` into a tuple `(K, V)`. + +The difference between `IndexBySingle(key)` and `IndexBy(key)` with a single +key is as follows: + +- `IndexBySingle` produces the index as a plain value. +- `IndexBy` produces the index as a single-element tuple. + +#### `carling.RenameFromTo` + +Rename columns according to `from_to_key_mapping`. + +#### `carling.Exclude` + +Removes all columns specified in `*keys`. + +## Grouping transform utils + +Generic grouping transform utils + +#### `carling.UniqueOnly` + +Produces elements that are the only elements per key after deduplication. + +Given a `PCollection` of `(K, V)`, +this transform produces the collection of all `V`s that do not share +the same corresponding `K`s with any other elements after deduplicating +all equivalent `(K, V)` pairs. + +This transform is equivalent to `SingletonOnly` with `apache_beam.Distinct`. + +`[(1, "A"), (2, "B1"), (2, "B2"), (3, "C"), (3, "C"), (4, "A")]` will be +transformed into `["A", "C", "A"]`. + +#### `carling.SingletonOnly` + +Produces elements that are the only elements per key. + +Given a `PCollection` of `(K, V)`, +this transform produces the collection of all `V`s that do not share +the same corresponding `K`s with any other elements. + +`[(1, "A"), (2, "B1"), (2, "B2"), (3, "C"), (3, "C"), (4, "A")]` will be +transformed into `["A", "A"]`. + +#### `carling.Intersection` + +Produces the intersection of given `PCollection`s. + +Given a list of `PCollection`s, +this transform produces every element that appears in all collections of +the list. +Elements are deduplicated before taking the intersection. + +#### `carling.FilterByKey` + +Filters elements by their keys. + +The constructor receives one or more `PCollection`s of `K`s, +which are regarded as key lists. +Given a `PCollection` of `(K, V)`, +this transform discards all elements with `K`s that do not appear +in the key lists. + +If multiple collections are given to the constructor, +this transform treats the intersection of them as the key list. + +#### `carling.FilterByKeyUsingSideInput` + +Filters a single collection by a single lookup collection, using a common key. + +Given: - a `PCollection` (lookup_entries) of `(V)`, as a lookup collection - a `PCollection` (pcoll) of `(V)`, as values to be filtered - a common key (filter_key) + +A dictionary called `filter_dict` - is created by mapping the value of `filter_key` +for each entry in `lookup_entries` to True. + +Then, for each item in pcoll, the value associated with `filter_key` checkd against +`filter_dict`, and if it is found, the entry passes through. Otherwise, the entry is +discarded. + +Note: `lookup_entries` will be used as a **side input**, so care +must be taken regarding the size of the `lookup_entries` + +#### `carling.DifferencePerKey` + +Produces the difference per key between two `PCollection`s. + +Given two `PCollection`s of `V`, +this transform indexes the collections by the specified keys `primary_keys`, +compares corresponding two `V` lists for every `K`, +and produces the difference per `K`. +If there is no difference, this transform produces nothing. + +Two `V` lists are considered to be different if the numbers of elements +differ or two elements of the lists with a same index differ +at one of the specified columns `columns`. + +#### `carling.SortedSelectPerKey` + +- Groups items by a set of `keys` -- column names per row +- Emits the "MAX" _value_ for each collection as defined by the `key_fn` +- Can emit "MIN" by passing `reverse=True` kwarg + +#### `carling.PartitionRowsContainingNone` + +Emits two tagged PCollections: + +- Default (`result[None]`): Rows are guaranteed not to have any `None` values +- `result["contains_none"]`: Rows for which at least one column had a `None` value + +## Categorical + +#### `carling.CreateCategoricalDicts` + +For a set of columnular data inputs this function takes: + + - cat_cols: + + Type: `[str]` + + An array of "categorical" columns + + - existing_dicts: + + Type: `PCollection[(string, string, int)]` + + Rows of tuples of type: + (column, previously_seen_value, mapped_unique_int) + + Mapping a set of "previously seen" keys to unique int values for each + column. + Not optional. + If none exist, pass an empty PCollection. + +It then creates a transform which takes a pcollection and + + - looks at the input pcoll for unseen values in each categorical column + - creates new unique integers for each distinct unseen value, starting at max(previous value for column)+1 + - ammends the existing mappings with (col, unseen_value, new_unique_int) + +Output is: + + - Type: `PCollection[(string, string, int)]` + +This is useful for preparing data to be trained by eg. LightGBM + +#### `carling.ReplaceCategoricalColumns` + +- Utilizes the "categorical dictionary rows" generated by `CreateCategoricalDicts` which is a list of pairs of type of `(column, value,unique_int)`. + +- Replaces each column with the appropriate value found in the mapping. + +## Test Utils + +#### `carling.test_utils.pprint_equal_to` + +This module contains the `equal_to` function from apache beam, but adapted to output results using pretty print. Reading the results as a large, unformatted string makes it harder to pick out what changed/is missing. + +## General Util + +#### `carling.LogSample` + +Print items of the given `PCollection` to the log. + +`LogSample` prints the JSON representations of the input items to the Python's +standard logging system. + +To avoid too much log entries being printed, `LogSample` limits the number of +logged items. The constructor parameter `n` determines the limit. + +By default, `LogSample` prints logs with the `INFO` log level. The constructor +parameter `level` determines the level. + +#### `carling.ReifyMultiValueOption` + +Prepares multi-value delimited options. Useful in contexts where +you want to create a multi-value option in a template environment. + +- inputs: + - delimited string option + - optional delimiter string (default is "|") + +* output: + - Type: `PCollection[str]` + + +%package help +Summary: Development documents and examples for carling +Provides: python3-carling-doc +%description help +# Carling + +[](https://github.com/mc-digital/carling/actions?query=workflow%3ACI) +[](https://pypi.org/project/carling/) +[](https://pypi.org/project/carling/) +[](https://github.com/mc-digital/carling/blob/main/LICENSE) + +Via [Wikipedia](<https://en.wikipedia.org/wiki/Carling_(sailing)>): + +> Carlings are pieces of timber laid fore and aft under the deck of a ship, from one beam to another. +> They serve as a foundation for the whole body of the ship. + +Useful transforms for supporting our apache beam pipelines. + +## Mapping transform utils + +#### `carling.Label` + +Labels all elements. + +#### `carling.Select` + +Removes all columns which are not specified in `*keys`. + +#### `carling.Project` + +Transforms each element into a tuple of values of the specified columns. + +#### `carling.IndexBy` + +Transforms each element `V` into a tuple `(K, V)`. + +`K` is the projection of `V` by `*keys`, which is equal to the tuple +produced by the `Project` transform. + +#### `carling.Stringify` + +Transforms each element into its JSON representation. + +#### `carling.IndexBySingle` + +Transforms each element `V` into a tuple `(K, V)`. + +The difference between `IndexBySingle(key)` and `IndexBy(key)` with a single +key is as follows: + +- `IndexBySingle` produces the index as a plain value. +- `IndexBy` produces the index as a single-element tuple. + +#### `carling.RenameFromTo` + +Rename columns according to `from_to_key_mapping`. + +#### `carling.Exclude` + +Removes all columns specified in `*keys`. + +## Grouping transform utils + +Generic grouping transform utils + +#### `carling.UniqueOnly` + +Produces elements that are the only elements per key after deduplication. + +Given a `PCollection` of `(K, V)`, +this transform produces the collection of all `V`s that do not share +the same corresponding `K`s with any other elements after deduplicating +all equivalent `(K, V)` pairs. + +This transform is equivalent to `SingletonOnly` with `apache_beam.Distinct`. + +`[(1, "A"), (2, "B1"), (2, "B2"), (3, "C"), (3, "C"), (4, "A")]` will be +transformed into `["A", "C", "A"]`. + +#### `carling.SingletonOnly` + +Produces elements that are the only elements per key. + +Given a `PCollection` of `(K, V)`, +this transform produces the collection of all `V`s that do not share +the same corresponding `K`s with any other elements. + +`[(1, "A"), (2, "B1"), (2, "B2"), (3, "C"), (3, "C"), (4, "A")]` will be +transformed into `["A", "A"]`. + +#### `carling.Intersection` + +Produces the intersection of given `PCollection`s. + +Given a list of `PCollection`s, +this transform produces every element that appears in all collections of +the list. +Elements are deduplicated before taking the intersection. + +#### `carling.FilterByKey` + +Filters elements by their keys. + +The constructor receives one or more `PCollection`s of `K`s, +which are regarded as key lists. +Given a `PCollection` of `(K, V)`, +this transform discards all elements with `K`s that do not appear +in the key lists. + +If multiple collections are given to the constructor, +this transform treats the intersection of them as the key list. + +#### `carling.FilterByKeyUsingSideInput` + +Filters a single collection by a single lookup collection, using a common key. + +Given: - a `PCollection` (lookup_entries) of `(V)`, as a lookup collection - a `PCollection` (pcoll) of `(V)`, as values to be filtered - a common key (filter_key) + +A dictionary called `filter_dict` - is created by mapping the value of `filter_key` +for each entry in `lookup_entries` to True. + +Then, for each item in pcoll, the value associated with `filter_key` checkd against +`filter_dict`, and if it is found, the entry passes through. Otherwise, the entry is +discarded. + +Note: `lookup_entries` will be used as a **side input**, so care +must be taken regarding the size of the `lookup_entries` + +#### `carling.DifferencePerKey` + +Produces the difference per key between two `PCollection`s. + +Given two `PCollection`s of `V`, +this transform indexes the collections by the specified keys `primary_keys`, +compares corresponding two `V` lists for every `K`, +and produces the difference per `K`. +If there is no difference, this transform produces nothing. + +Two `V` lists are considered to be different if the numbers of elements +differ or two elements of the lists with a same index differ +at one of the specified columns `columns`. + +#### `carling.SortedSelectPerKey` + +- Groups items by a set of `keys` -- column names per row +- Emits the "MAX" _value_ for each collection as defined by the `key_fn` +- Can emit "MIN" by passing `reverse=True` kwarg + +#### `carling.PartitionRowsContainingNone` + +Emits two tagged PCollections: + +- Default (`result[None]`): Rows are guaranteed not to have any `None` values +- `result["contains_none"]`: Rows for which at least one column had a `None` value + +## Categorical + +#### `carling.CreateCategoricalDicts` + +For a set of columnular data inputs this function takes: + + - cat_cols: + + Type: `[str]` + + An array of "categorical" columns + + - existing_dicts: + + Type: `PCollection[(string, string, int)]` + + Rows of tuples of type: + (column, previously_seen_value, mapped_unique_int) + + Mapping a set of "previously seen" keys to unique int values for each + column. + Not optional. + If none exist, pass an empty PCollection. + +It then creates a transform which takes a pcollection and + + - looks at the input pcoll for unseen values in each categorical column + - creates new unique integers for each distinct unseen value, starting at max(previous value for column)+1 + - ammends the existing mappings with (col, unseen_value, new_unique_int) + +Output is: + + - Type: `PCollection[(string, string, int)]` + +This is useful for preparing data to be trained by eg. LightGBM + +#### `carling.ReplaceCategoricalColumns` + +- Utilizes the "categorical dictionary rows" generated by `CreateCategoricalDicts` which is a list of pairs of type of `(column, value,unique_int)`. + +- Replaces each column with the appropriate value found in the mapping. + +## Test Utils + +#### `carling.test_utils.pprint_equal_to` + +This module contains the `equal_to` function from apache beam, but adapted to output results using pretty print. Reading the results as a large, unformatted string makes it harder to pick out what changed/is missing. + +## General Util + +#### `carling.LogSample` + +Print items of the given `PCollection` to the log. + +`LogSample` prints the JSON representations of the input items to the Python's +standard logging system. + +To avoid too much log entries being printed, `LogSample` limits the number of +logged items. The constructor parameter `n` determines the limit. + +By default, `LogSample` prints logs with the `INFO` log level. The constructor +parameter `level` determines the level. + +#### `carling.ReifyMultiValueOption` + +Prepares multi-value delimited options. Useful in contexts where +you want to create a multi-value option in a template environment. + +- inputs: + - delimited string option + - optional delimiter string (default is "|") + +* output: + - Type: `PCollection[str]` + + +%prep +%autosetup -n carling-0.3.5 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-carling -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 0.3.5-1 +- Package Spec generated @@ -0,0 +1 @@ +4c9692013b312e3cfaa0811454e021a4 carling-0.3.5.tar.gz |