From ad8308f634d8c09760c9c2d25bf69abcb7043f89 Mon Sep 17 00:00:00 2001 From: CoprDistGit Date: Thu, 8 Jun 2023 20:45:52 +0000 Subject: automatic import of python-servicex-databinder --- python-servicex-databinder.spec | 682 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 682 insertions(+) create mode 100644 python-servicex-databinder.spec (limited to 'python-servicex-databinder.spec') diff --git a/python-servicex-databinder.spec b/python-servicex-databinder.spec new file mode 100644 index 0000000..8103d3f --- /dev/null +++ b/python-servicex-databinder.spec @@ -0,0 +1,682 @@ +%global _empty_manifest_terminate_build 0 +Name: python-servicex-databinder +Version: 0.4.1 +Release: 1 +Summary: ServiceX data management using a configuration file +License: BSD 3-clause +URL: https://github.com/kyungeonchoi/ServiceXDataBinder +Source0: https://mirrors.aliyun.com/pypi/web/packages/90/81/a0ad5b2ff865f382d8880c211b5e6bb6423e419dc41be66db00d76e9fe30/servicex_databinder-0.4.1.tar.gz +BuildArch: noarch + +Requires: python3-servicex +Requires: python3-tcut-to-qastle +Requires: python3-nest-asyncio +Requires: python3-tqdm +Requires: python3-pyarrow +Requires: python3-backoff +Requires: python3-func-adl-servicex + +%description +# ServiceX DataBinder + +

Release v0.4.1

+ +[![PyPI version](https://badge.fury.io/py/servicex-databinder.svg)](https://badge.fury.io/py/servicex-databinder) + +`servicex-databinder` is a user-analysis data management package using a single configuration file. +Samples with external data sources (e.g. `RucioDID` or `XRootDFiles`) utilize ServiceX to deliver user-selected columns with optional row filtering. + + +The following table shows supported ServiceX transformers by DataBinder + +| Input format | Code generator | Transformer | Output format +| :--- | :---: | :---: | :---: | +| ROOT Ntuple | func-adl | `uproot` | `root` or `parquet` | +| ATLAS Release 21 xAOD | func-adl | `atlasr21`| `root` | + + + + +## Prerequisite +- [Access to a ServiceX instance](https://servicex.readthedocs.io/en/latest/user/getting-started/) +- Python 3.7+ + +## Installation +```shell +pip install servicex-databinder +``` + +## Configuration file + +The configuration file is a yaml file containing all the information. + +The [following example configuration file](config_minimum.yaml) contains minimal fields. You can also download [`servicex-opendata.yaml`](servicex-opendata.yaml) file (rename to `servicex.yaml`) at your working directory, and run DataBinder for OpenData without an access token. + +```yaml +General: + ServiceXName: servicex-opendata + OutputFormat: parquet + +Sample: + - Name: ggH125_ZZ4lep + XRootDFiles: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\ + /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root" + Tree: mini + Columns: lep_pt, lep_eta +``` + +`General` block requires two mandatory options (`ServiceXName` and `OutputFormat`) as in the example above. + +Input dataset for each Sample can be defined either by `RucioDID` or `XRootDFiles` or `LocalPath`. + +ServiceX query can be constructed with either TCut syntax or func-adl. +- Options for TCut syntax: `Filter`1 and `Columns` +- Option for Func-adl expression: `FuncADL` + +      1 `Filter` works only for scalar-type of TBranch. + +Output format can be either `Apache parquet` or `ROOT ntuple` for `uproot` backend. Only `ROOT ntuple` format is supported for `xAOD` backend. + + +The followings are available options: + + +| Option for `General` block | Description | DataType | +|:--------:|:------:|:------| +| `ServiceXName`* | ServiceX backend name in your `servicex.yaml` file
| `String` | +| `OutputDirectory` | Path to the directory for ServiceX delivered files | `String` | +| `OutputFormat`* | Output file format of ServiceX delivered data (`parquet` or `root` for `uproot` / `root` for `xaod`) | `String` | +| `WriteOutputDict` | Name of an ouput yaml file containing Python nested dictionary of output file paths (located in the `OutputDirectory`) | `String` | +| `IgnoreServiceXCache` | Ignore the existing ServiceX cache and force to make ServiceX requests | `Boolean` | +

*Mandatory options

+ +| Option for `Sample` block | Description |DataType | +|:--------:|:------:|:------| +| `Name` | sample name defined by a user |`String` | +| `RucioDID` | Rucio Dataset Id (DID) for a given sample;
Can be multiple DIDs separated by comma |`String` | +| `XRootDFiles` | XRootD files (e.g. `root://`) for a given sample;
Can be multiple files separated by comma |`String` | +| `Tree` | Name of the input ROOT `TTree`;
Can be multiple `TTree`s separated by comma (`uproot` ONLY) |`String` | +| `Filter` | Selection in the TCut syntax, e.g. `jet_pt > 10e3 && jet_eta < 2.0` (TCut ONLY) |`String` | +| `Columns` | List of columns (or branches) to be delivered; multiple columns separately by comma (TCut ONLY) |`String` | +| `FuncADL` | func-adl expression for a given sample |`String` | +| `LocalPath` | File path directly from local path (NO ServiceX tranformation) | `String` | + + + + + +A config file can be simplified by utilizing `Definition` block. You can define placeholders under `Definition` block, which will replace all matched placeholders in the values of `Sample` block. Note that placeholders must start with `DEF_`. + +You can source each Sample using different ServiceX transformers. +The default transformer is set by `type` of `servicex.yaml`, but `Transformer` in the `General` block overwrites if present, and `Transformer` in each `Sample` overwrites any previous transformer selection. + +The [following example configuration](config_maximum.yaml) shows how to use each Options. + +```yaml +General: + ServiceXName: servicex-uc-af + Transformer: uproot + OutputFormat: root + OutputDirectory: /Users/kchoi/data_for_MLstudy + WriteOutputDict: fileset_ml_study + IgnoreServiceXCache: False + +Sample: + - Name: Signal + RucioDID: user.kchoi:user.kchoi.signalA, + user.kchoi:user.kchoi.signalB, + user.kchoi:user.kchoi.signalC + Tree: nominal + FuncADL: DEF_ttH_nominal_query + - Name: Background1 + XRootDFiles: DEF_ggH_input + Tree: mini + Filter: lep_n>2 + Columns: lep_pt, lep_eta + - Name: Background2 + Transformer: atlasr21 + RucioDID: DEF_Zee_input + FuncADL: DEF_Zee_query + - Name: Background3 + LocalPath: /Users/kchoi/Work/data/background3 + +Definition: + DEF_ttH_nominal_query: "Where(lambda e: e.met_met>150e3). \ + Select(lambda event: {'el_pt': event.el_pt, 'jet_e': event.jet_e, \ + 'jet_pt': event.jet_pt, 'met_met': event.met_met})" + DEF_ggH_input: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\ + /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root" + DEF_Zee_input: "mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.\ + merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00" + DEF_Zee_query: "SelectMany('lambda e: e.Jets(\"AntiKt4EMTopoJets\")'). \ + Where('lambda j: (j.pt() / 1000) > 30'). \ + Select('lambda j: j.pt() / 1000.0'). \ + AsROOTTTree('junk.root', 'my_tree', [\"JetPt\"])" +``` + + +## Deliver data + +```python +from servicex_databinder import DataBinder +sx_db = DataBinder('.yml') +out = sx_db.deliver() +``` + +The function `deliver()` returns a Python nested dictionary that contains delivered files. + + +Input configuration can be also passed in a form of a Python dictionary. + +Delivered Samples and files in the `OutputDirectory` are always synced with the DataBinder config file. + + + +## Error handling + +```python +failed_requests = sx_db.get_failed_requests() +``` + +If failed ServiceX request(s), `deliver()` will print number of failed requests and the name of Sample, Tree if present, and input dataset. You can get a full list of failed samples and error messages for each by `get_failed_requests()` function. If it is not clear from the message you can browse `Logs` in the ServiceX instance webpage for the detail. + +## Useful tools + +### Create Rucio container for multiple DIDs + +The current ServiceX generates one request per Rucio DID. +It's often the case that a physics analysis needs to process hundreds of DIDs. +In such cases, the script (`scripts/create_rucio_container.py`) can be used to create one Rucio container per Sample from a yaml file. +An example yaml file (`scripts/rucio_dids_example.yaml`) is included. + +Here is the usage of the script: + +```shell +usage: create_rucio_containers.py [-h] [--dry-run DRY_RUN] + infile container_name version + +Create Rucio containers from multiple DIDs + +positional arguments: + infile yaml file contains Rucio DIDs for each Sample + container_name e.g. user.kchoi:user.kchoi..Sample.v1 + version e.g. user.kchoi:user.kchoi.fcnc_ana.Sample. + +optional arguments: + -h, --help show this help message and exit + --dry-run DRY_RUN Run without creating new Rucio container + +``` + +## Acknowledgements + +Support for this work was provided by the the U.S. Department of Energy, Office of High Energy Physics under Grant No. DE-SC0007890 + + +%package -n python3-servicex-databinder +Summary: ServiceX data management using a configuration file +Provides: python-servicex-databinder +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-servicex-databinder +# ServiceX DataBinder + +

Release v0.4.1

+ +[![PyPI version](https://badge.fury.io/py/servicex-databinder.svg)](https://badge.fury.io/py/servicex-databinder) + +`servicex-databinder` is a user-analysis data management package using a single configuration file. +Samples with external data sources (e.g. `RucioDID` or `XRootDFiles`) utilize ServiceX to deliver user-selected columns with optional row filtering. + + +The following table shows supported ServiceX transformers by DataBinder + +| Input format | Code generator | Transformer | Output format +| :--- | :---: | :---: | :---: | +| ROOT Ntuple | func-adl | `uproot` | `root` or `parquet` | +| ATLAS Release 21 xAOD | func-adl | `atlasr21`| `root` | + + + + +## Prerequisite +- [Access to a ServiceX instance](https://servicex.readthedocs.io/en/latest/user/getting-started/) +- Python 3.7+ + +## Installation +```shell +pip install servicex-databinder +``` + +## Configuration file + +The configuration file is a yaml file containing all the information. + +The [following example configuration file](config_minimum.yaml) contains minimal fields. You can also download [`servicex-opendata.yaml`](servicex-opendata.yaml) file (rename to `servicex.yaml`) at your working directory, and run DataBinder for OpenData without an access token. + +```yaml +General: + ServiceXName: servicex-opendata + OutputFormat: parquet + +Sample: + - Name: ggH125_ZZ4lep + XRootDFiles: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\ + /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root" + Tree: mini + Columns: lep_pt, lep_eta +``` + +`General` block requires two mandatory options (`ServiceXName` and `OutputFormat`) as in the example above. + +Input dataset for each Sample can be defined either by `RucioDID` or `XRootDFiles` or `LocalPath`. + +ServiceX query can be constructed with either TCut syntax or func-adl. +- Options for TCut syntax: `Filter`1 and `Columns` +- Option for Func-adl expression: `FuncADL` + +      1 `Filter` works only for scalar-type of TBranch. + +Output format can be either `Apache parquet` or `ROOT ntuple` for `uproot` backend. Only `ROOT ntuple` format is supported for `xAOD` backend. + + +The followings are available options: + + +| Option for `General` block | Description | DataType | +|:--------:|:------:|:------| +| `ServiceXName`* | ServiceX backend name in your `servicex.yaml` file
| `String` | +| `OutputDirectory` | Path to the directory for ServiceX delivered files | `String` | +| `OutputFormat`* | Output file format of ServiceX delivered data (`parquet` or `root` for `uproot` / `root` for `xaod`) | `String` | +| `WriteOutputDict` | Name of an ouput yaml file containing Python nested dictionary of output file paths (located in the `OutputDirectory`) | `String` | +| `IgnoreServiceXCache` | Ignore the existing ServiceX cache and force to make ServiceX requests | `Boolean` | +

*Mandatory options

+ +| Option for `Sample` block | Description |DataType | +|:--------:|:------:|:------| +| `Name` | sample name defined by a user |`String` | +| `RucioDID` | Rucio Dataset Id (DID) for a given sample;
Can be multiple DIDs separated by comma |`String` | +| `XRootDFiles` | XRootD files (e.g. `root://`) for a given sample;
Can be multiple files separated by comma |`String` | +| `Tree` | Name of the input ROOT `TTree`;
Can be multiple `TTree`s separated by comma (`uproot` ONLY) |`String` | +| `Filter` | Selection in the TCut syntax, e.g. `jet_pt > 10e3 && jet_eta < 2.0` (TCut ONLY) |`String` | +| `Columns` | List of columns (or branches) to be delivered; multiple columns separately by comma (TCut ONLY) |`String` | +| `FuncADL` | func-adl expression for a given sample |`String` | +| `LocalPath` | File path directly from local path (NO ServiceX tranformation) | `String` | + + + + + +A config file can be simplified by utilizing `Definition` block. You can define placeholders under `Definition` block, which will replace all matched placeholders in the values of `Sample` block. Note that placeholders must start with `DEF_`. + +You can source each Sample using different ServiceX transformers. +The default transformer is set by `type` of `servicex.yaml`, but `Transformer` in the `General` block overwrites if present, and `Transformer` in each `Sample` overwrites any previous transformer selection. + +The [following example configuration](config_maximum.yaml) shows how to use each Options. + +```yaml +General: + ServiceXName: servicex-uc-af + Transformer: uproot + OutputFormat: root + OutputDirectory: /Users/kchoi/data_for_MLstudy + WriteOutputDict: fileset_ml_study + IgnoreServiceXCache: False + +Sample: + - Name: Signal + RucioDID: user.kchoi:user.kchoi.signalA, + user.kchoi:user.kchoi.signalB, + user.kchoi:user.kchoi.signalC + Tree: nominal + FuncADL: DEF_ttH_nominal_query + - Name: Background1 + XRootDFiles: DEF_ggH_input + Tree: mini + Filter: lep_n>2 + Columns: lep_pt, lep_eta + - Name: Background2 + Transformer: atlasr21 + RucioDID: DEF_Zee_input + FuncADL: DEF_Zee_query + - Name: Background3 + LocalPath: /Users/kchoi/Work/data/background3 + +Definition: + DEF_ttH_nominal_query: "Where(lambda e: e.met_met>150e3). \ + Select(lambda event: {'el_pt': event.el_pt, 'jet_e': event.jet_e, \ + 'jet_pt': event.jet_pt, 'met_met': event.met_met})" + DEF_ggH_input: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\ + /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root" + DEF_Zee_input: "mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.\ + merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00" + DEF_Zee_query: "SelectMany('lambda e: e.Jets(\"AntiKt4EMTopoJets\")'). \ + Where('lambda j: (j.pt() / 1000) > 30'). \ + Select('lambda j: j.pt() / 1000.0'). \ + AsROOTTTree('junk.root', 'my_tree', [\"JetPt\"])" +``` + + +## Deliver data + +```python +from servicex_databinder import DataBinder +sx_db = DataBinder('.yml') +out = sx_db.deliver() +``` + +The function `deliver()` returns a Python nested dictionary that contains delivered files. + + +Input configuration can be also passed in a form of a Python dictionary. + +Delivered Samples and files in the `OutputDirectory` are always synced with the DataBinder config file. + + + +## Error handling + +```python +failed_requests = sx_db.get_failed_requests() +``` + +If failed ServiceX request(s), `deliver()` will print number of failed requests and the name of Sample, Tree if present, and input dataset. You can get a full list of failed samples and error messages for each by `get_failed_requests()` function. If it is not clear from the message you can browse `Logs` in the ServiceX instance webpage for the detail. + +## Useful tools + +### Create Rucio container for multiple DIDs + +The current ServiceX generates one request per Rucio DID. +It's often the case that a physics analysis needs to process hundreds of DIDs. +In such cases, the script (`scripts/create_rucio_container.py`) can be used to create one Rucio container per Sample from a yaml file. +An example yaml file (`scripts/rucio_dids_example.yaml`) is included. + +Here is the usage of the script: + +```shell +usage: create_rucio_containers.py [-h] [--dry-run DRY_RUN] + infile container_name version + +Create Rucio containers from multiple DIDs + +positional arguments: + infile yaml file contains Rucio DIDs for each Sample + container_name e.g. user.kchoi:user.kchoi..Sample.v1 + version e.g. user.kchoi:user.kchoi.fcnc_ana.Sample. + +optional arguments: + -h, --help show this help message and exit + --dry-run DRY_RUN Run without creating new Rucio container + +``` + +## Acknowledgements + +Support for this work was provided by the the U.S. Department of Energy, Office of High Energy Physics under Grant No. DE-SC0007890 + + +%package help +Summary: Development documents and examples for servicex-databinder +Provides: python3-servicex-databinder-doc +%description help +# ServiceX DataBinder + +

Release v0.4.1

+ +[![PyPI version](https://badge.fury.io/py/servicex-databinder.svg)](https://badge.fury.io/py/servicex-databinder) + +`servicex-databinder` is a user-analysis data management package using a single configuration file. +Samples with external data sources (e.g. `RucioDID` or `XRootDFiles`) utilize ServiceX to deliver user-selected columns with optional row filtering. + + +The following table shows supported ServiceX transformers by DataBinder + +| Input format | Code generator | Transformer | Output format +| :--- | :---: | :---: | :---: | +| ROOT Ntuple | func-adl | `uproot` | `root` or `parquet` | +| ATLAS Release 21 xAOD | func-adl | `atlasr21`| `root` | + + + + +## Prerequisite +- [Access to a ServiceX instance](https://servicex.readthedocs.io/en/latest/user/getting-started/) +- Python 3.7+ + +## Installation +```shell +pip install servicex-databinder +``` + +## Configuration file + +The configuration file is a yaml file containing all the information. + +The [following example configuration file](config_minimum.yaml) contains minimal fields. You can also download [`servicex-opendata.yaml`](servicex-opendata.yaml) file (rename to `servicex.yaml`) at your working directory, and run DataBinder for OpenData without an access token. + +```yaml +General: + ServiceXName: servicex-opendata + OutputFormat: parquet + +Sample: + - Name: ggH125_ZZ4lep + XRootDFiles: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\ + /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root" + Tree: mini + Columns: lep_pt, lep_eta +``` + +`General` block requires two mandatory options (`ServiceXName` and `OutputFormat`) as in the example above. + +Input dataset for each Sample can be defined either by `RucioDID` or `XRootDFiles` or `LocalPath`. + +ServiceX query can be constructed with either TCut syntax or func-adl. +- Options for TCut syntax: `Filter`1 and `Columns` +- Option for Func-adl expression: `FuncADL` + +      1 `Filter` works only for scalar-type of TBranch. + +Output format can be either `Apache parquet` or `ROOT ntuple` for `uproot` backend. Only `ROOT ntuple` format is supported for `xAOD` backend. + + +The followings are available options: + + +| Option for `General` block | Description | DataType | +|:--------:|:------:|:------| +| `ServiceXName`* | ServiceX backend name in your `servicex.yaml` file
| `String` | +| `OutputDirectory` | Path to the directory for ServiceX delivered files | `String` | +| `OutputFormat`* | Output file format of ServiceX delivered data (`parquet` or `root` for `uproot` / `root` for `xaod`) | `String` | +| `WriteOutputDict` | Name of an ouput yaml file containing Python nested dictionary of output file paths (located in the `OutputDirectory`) | `String` | +| `IgnoreServiceXCache` | Ignore the existing ServiceX cache and force to make ServiceX requests | `Boolean` | +

*Mandatory options

+ +| Option for `Sample` block | Description |DataType | +|:--------:|:------:|:------| +| `Name` | sample name defined by a user |`String` | +| `RucioDID` | Rucio Dataset Id (DID) for a given sample;
Can be multiple DIDs separated by comma |`String` | +| `XRootDFiles` | XRootD files (e.g. `root://`) for a given sample;
Can be multiple files separated by comma |`String` | +| `Tree` | Name of the input ROOT `TTree`;
Can be multiple `TTree`s separated by comma (`uproot` ONLY) |`String` | +| `Filter` | Selection in the TCut syntax, e.g. `jet_pt > 10e3 && jet_eta < 2.0` (TCut ONLY) |`String` | +| `Columns` | List of columns (or branches) to be delivered; multiple columns separately by comma (TCut ONLY) |`String` | +| `FuncADL` | func-adl expression for a given sample |`String` | +| `LocalPath` | File path directly from local path (NO ServiceX tranformation) | `String` | + + + + + +A config file can be simplified by utilizing `Definition` block. You can define placeholders under `Definition` block, which will replace all matched placeholders in the values of `Sample` block. Note that placeholders must start with `DEF_`. + +You can source each Sample using different ServiceX transformers. +The default transformer is set by `type` of `servicex.yaml`, but `Transformer` in the `General` block overwrites if present, and `Transformer` in each `Sample` overwrites any previous transformer selection. + +The [following example configuration](config_maximum.yaml) shows how to use each Options. + +```yaml +General: + ServiceXName: servicex-uc-af + Transformer: uproot + OutputFormat: root + OutputDirectory: /Users/kchoi/data_for_MLstudy + WriteOutputDict: fileset_ml_study + IgnoreServiceXCache: False + +Sample: + - Name: Signal + RucioDID: user.kchoi:user.kchoi.signalA, + user.kchoi:user.kchoi.signalB, + user.kchoi:user.kchoi.signalC + Tree: nominal + FuncADL: DEF_ttH_nominal_query + - Name: Background1 + XRootDFiles: DEF_ggH_input + Tree: mini + Filter: lep_n>2 + Columns: lep_pt, lep_eta + - Name: Background2 + Transformer: atlasr21 + RucioDID: DEF_Zee_input + FuncADL: DEF_Zee_query + - Name: Background3 + LocalPath: /Users/kchoi/Work/data/background3 + +Definition: + DEF_ttH_nominal_query: "Where(lambda e: e.met_met>150e3). \ + Select(lambda event: {'el_pt': event.el_pt, 'jet_e': event.jet_e, \ + 'jet_pt': event.jet_pt, 'met_met': event.met_met})" + DEF_ggH_input: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\ + /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root" + DEF_Zee_input: "mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.\ + merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00" + DEF_Zee_query: "SelectMany('lambda e: e.Jets(\"AntiKt4EMTopoJets\")'). \ + Where('lambda j: (j.pt() / 1000) > 30'). \ + Select('lambda j: j.pt() / 1000.0'). \ + AsROOTTTree('junk.root', 'my_tree', [\"JetPt\"])" +``` + + +## Deliver data + +```python +from servicex_databinder import DataBinder +sx_db = DataBinder('.yml') +out = sx_db.deliver() +``` + +The function `deliver()` returns a Python nested dictionary that contains delivered files. + + +Input configuration can be also passed in a form of a Python dictionary. + +Delivered Samples and files in the `OutputDirectory` are always synced with the DataBinder config file. + + + +## Error handling + +```python +failed_requests = sx_db.get_failed_requests() +``` + +If failed ServiceX request(s), `deliver()` will print number of failed requests and the name of Sample, Tree if present, and input dataset. You can get a full list of failed samples and error messages for each by `get_failed_requests()` function. If it is not clear from the message you can browse `Logs` in the ServiceX instance webpage for the detail. + +## Useful tools + +### Create Rucio container for multiple DIDs + +The current ServiceX generates one request per Rucio DID. +It's often the case that a physics analysis needs to process hundreds of DIDs. +In such cases, the script (`scripts/create_rucio_container.py`) can be used to create one Rucio container per Sample from a yaml file. +An example yaml file (`scripts/rucio_dids_example.yaml`) is included. + +Here is the usage of the script: + +```shell +usage: create_rucio_containers.py [-h] [--dry-run DRY_RUN] + infile container_name version + +Create Rucio containers from multiple DIDs + +positional arguments: + infile yaml file contains Rucio DIDs for each Sample + container_name e.g. user.kchoi:user.kchoi..Sample.v1 + version e.g. user.kchoi:user.kchoi.fcnc_ana.Sample. + +optional arguments: + -h, --help show this help message and exit + --dry-run DRY_RUN Run without creating new Rucio container + +``` + +## Acknowledgements + +Support for this work was provided by the the U.S. Department of Energy, Office of High Energy Physics under Grant No. DE-SC0007890 + + +%prep +%autosetup -n servicex_databinder-0.4.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-servicex-databinder -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Thu Jun 08 2023 Python_Bot - 0.4.1-1 +- Package Spec generated -- cgit v1.2.3