summaryrefslogtreecommitdiff
path: root/python-servicex-databinder.spec
diff options
context:
space:
mode:
Diffstat (limited to 'python-servicex-databinder.spec')
-rw-r--r--python-servicex-databinder.spec682
1 files changed, 682 insertions, 0 deletions
diff --git a/python-servicex-databinder.spec b/python-servicex-databinder.spec
new file mode 100644
index 0000000..8103d3f
--- /dev/null
+++ b/python-servicex-databinder.spec
@@ -0,0 +1,682 @@
+%global _empty_manifest_terminate_build 0
+Name: python-servicex-databinder
+Version: 0.4.1
+Release: 1
+Summary: ServiceX data management using a configuration file
+License: BSD 3-clause
+URL: https://github.com/kyungeonchoi/ServiceXDataBinder
+Source0: https://mirrors.aliyun.com/pypi/web/packages/90/81/a0ad5b2ff865f382d8880c211b5e6bb6423e419dc41be66db00d76e9fe30/servicex_databinder-0.4.1.tar.gz
+BuildArch: noarch
+
+Requires: python3-servicex
+Requires: python3-tcut-to-qastle
+Requires: python3-nest-asyncio
+Requires: python3-tqdm
+Requires: python3-pyarrow
+Requires: python3-backoff
+Requires: python3-func-adl-servicex
+
+%description
+# ServiceX DataBinder
+
+<p align="right"> Release v0.4.1 </p>
+
+[![PyPI version](https://badge.fury.io/py/servicex-databinder.svg)](https://badge.fury.io/py/servicex-databinder)
+
+`servicex-databinder` is a user-analysis data management package using a single configuration file.
+Samples with external data sources (e.g. `RucioDID` or `XRootDFiles`) utilize ServiceX to deliver user-selected columns with optional row filtering.
+<!-- to interact with ServiceX instance to make ServiceX request(s) and manage ServiceX delivered data from a single configuration file. -->
+
+The following table shows supported ServiceX transformers by DataBinder
+
+| Input format | Code generator | Transformer | Output format
+| :--- | :---: | :---: | :---: |
+| ROOT Ntuple | func-adl | `uproot` | `root` or `parquet` |
+| ATLAS Release 21 xAOD | func-adl | `atlasr21`| `root` |
+<!-- | ROOT Ntuple | python function | `python`| -->
+
+<!-- [`ServiceX`](https://github.com/ssl-hep/ServiceX) is a scalable HEP event data extraction, transformation and delivery system.
+
+['ServiceX Client library'](https://github.com/ssl-hep/ServiceX_frontend) provides -->
+
+## Prerequisite
+- [Access to a ServiceX instance](https://servicex.readthedocs.io/en/latest/user/getting-started/)
+- Python 3.7+
+
+## Installation
+```shell
+pip install servicex-databinder
+```
+
+## Configuration file
+
+The configuration file is a yaml file containing all the information.
+
+The [following example configuration file](config_minimum.yaml) contains minimal fields. You can also download [`servicex-opendata.yaml`](servicex-opendata.yaml) file (rename to `servicex.yaml`) at your working directory, and run DataBinder for OpenData without an access token.
+
+```yaml
+General:
+ ServiceXName: servicex-opendata
+ OutputFormat: parquet
+
+Sample:
+ - Name: ggH125_ZZ4lep
+ XRootDFiles: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\
+ /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root"
+ Tree: mini
+ Columns: lep_pt, lep_eta
+```
+
+`General` block requires two mandatory options (`ServiceXName` and `OutputFormat`) as in the example above.
+
+Input dataset for each Sample can be defined either by `RucioDID` or `XRootDFiles` or `LocalPath`.
+
+ServiceX query can be constructed with either TCut syntax or func-adl.
+- Options for TCut syntax: `Filter`<sup>1</sup> and `Columns`
+- Option for Func-adl expression: `FuncADL`
+
+&nbsp; &nbsp; &nbsp; <sup>1</sup> `Filter` works only for scalar-type of TBranch.
+
+Output format can be either `Apache parquet` or `ROOT ntuple` for `uproot` backend. Only `ROOT ntuple` format is supported for `xAOD` backend.
+
+
+The followings are available options:
+
+<!-- `General` block: -->
+| Option for `General` block | Description | DataType |
+|:--------:|:------:|:------|
+| `ServiceXName`* | ServiceX backend name in your `servicex.yaml` file <br> | `String` |
+| `OutputDirectory` | Path to the directory for ServiceX delivered files | `String` |
+| `OutputFormat`* | Output file format of ServiceX delivered data (`parquet` or `root` for `uproot` / `root` for `xaod`) | `String` |
+| `WriteOutputDict` | Name of an ouput yaml file containing Python nested dictionary of output file paths (located in the `OutputDirectory`) | `String` |
+| `IgnoreServiceXCache` | Ignore the existing ServiceX cache and force to make ServiceX requests | `Boolean` |
+<p align="right"> *Mandatory options</p>
+
+| Option for `Sample` block | Description |DataType |
+|:--------:|:------:|:------|
+| `Name` | sample name defined by a user |`String` |
+| `RucioDID` | Rucio Dataset Id (DID) for a given sample; <br> Can be multiple DIDs separated by comma |`String` |
+| `XRootDFiles` | XRootD files (e.g. `root://`) for a given sample; <br> Can be multiple files separated by comma |`String` |
+| `Tree` | Name of the input ROOT `TTree`; <br> Can be multiple `TTree`s separated by comma (`uproot` ONLY) |`String` |
+| `Filter` | Selection in the TCut syntax, e.g. `jet_pt > 10e3 && jet_eta < 2.0` (TCut ONLY) |`String` |
+| `Columns` | List of columns (or branches) to be delivered; multiple columns separately by comma (TCut ONLY) |`String` |
+| `FuncADL` | func-adl expression for a given sample |`String` |
+| `LocalPath` | File path directly from local path (NO ServiceX tranformation) | `String` |
+
+ <!-- Options exclusively for TCut syntax (CANNOT combine with the option `FuncADL`) -->
+
+ <!-- Option for func-adl expression (CANNOT combine with the option `Fitler` and `Columns`) -->
+
+A config file can be simplified by utilizing `Definition` block. You can define placeholders under `Definition` block, which will replace all matched placeholders in the values of `Sample` block. Note that placeholders must start with `DEF_`.
+
+You can source each Sample using different ServiceX transformers.
+The default transformer is set by `type` of `servicex.yaml`, but `Transformer` in the `General` block overwrites if present, and `Transformer` in each `Sample` overwrites any previous transformer selection.
+
+The [following example configuration](config_maximum.yaml) shows how to use each Options.
+
+```yaml
+General:
+ ServiceXName: servicex-uc-af
+ Transformer: uproot
+ OutputFormat: root
+ OutputDirectory: /Users/kchoi/data_for_MLstudy
+ WriteOutputDict: fileset_ml_study
+ IgnoreServiceXCache: False
+
+Sample:
+ - Name: Signal
+ RucioDID: user.kchoi:user.kchoi.signalA,
+ user.kchoi:user.kchoi.signalB,
+ user.kchoi:user.kchoi.signalC
+ Tree: nominal
+ FuncADL: DEF_ttH_nominal_query
+ - Name: Background1
+ XRootDFiles: DEF_ggH_input
+ Tree: mini
+ Filter: lep_n>2
+ Columns: lep_pt, lep_eta
+ - Name: Background2
+ Transformer: atlasr21
+ RucioDID: DEF_Zee_input
+ FuncADL: DEF_Zee_query
+ - Name: Background3
+ LocalPath: /Users/kchoi/Work/data/background3
+
+Definition:
+ DEF_ttH_nominal_query: "Where(lambda e: e.met_met>150e3). \
+ Select(lambda event: {'el_pt': event.el_pt, 'jet_e': event.jet_e, \
+ 'jet_pt': event.jet_pt, 'met_met': event.met_met})"
+ DEF_ggH_input: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\
+ /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root"
+ DEF_Zee_input: "mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.\
+ merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00"
+ DEF_Zee_query: "SelectMany('lambda e: e.Jets(\"AntiKt4EMTopoJets\")'). \
+ Where('lambda j: (j.pt() / 1000) > 30'). \
+ Select('lambda j: j.pt() / 1000.0'). \
+ AsROOTTTree('junk.root', 'my_tree', [\"JetPt\"])"
+```
+
+
+## Deliver data
+
+```python
+from servicex_databinder import DataBinder
+sx_db = DataBinder('<CONFIG>.yml')
+out = sx_db.deliver()
+```
+
+The function `deliver()` returns a Python nested dictionary that contains delivered files.
+<!-- - for `uproot` backend and `parquet` output format: `out['<SAMPLE>']['<TREE>'] = [ List of output parquet files ]`
+- for `uproot` backend and `root` output format: `out['<SAMPLE>'] = [ List of output root files ]`
+- for `xAOD` backend: `out['<SAMPLE>'] = [ List of output root files ]` -->
+
+Input configuration can be also passed in a form of a Python dictionary.
+
+Delivered Samples and files in the `OutputDirectory` are always synced with the DataBinder config file.
+
+<!-- ## Currently available
+- Dataset as Rucio DID + Input file format is ROOT TTree + ServiceX delivers output in parquet format
+- Dataset as Rucio DID + Input file format is ATLAS xAOD + ServiceX delivers output in ROOT TTree format
+- Dataset as XRootD + Input file format is ROOT TTree + ServiceX delivers output in parquet format -->
+
+## Error handling
+
+```python
+failed_requests = sx_db.get_failed_requests()
+```
+
+If failed ServiceX request(s), `deliver()` will print number of failed requests and the name of Sample, Tree if present, and input dataset. You can get a full list of failed samples and error messages for each by `get_failed_requests()` function. If it is not clear from the message you can browse `Logs` in the ServiceX instance webpage for the detail.
+
+## Useful tools
+
+### Create Rucio container for multiple DIDs
+
+The current ServiceX generates one request per Rucio DID.
+It's often the case that a physics analysis needs to process hundreds of DIDs.
+In such cases, the script (`scripts/create_rucio_container.py`) can be used to create one Rucio container per Sample from a yaml file.
+An example yaml file (`scripts/rucio_dids_example.yaml`) is included.
+
+Here is the usage of the script:
+
+```shell
+usage: create_rucio_containers.py [-h] [--dry-run DRY_RUN]
+ infile container_name version
+
+Create Rucio containers from multiple DIDs
+
+positional arguments:
+ infile yaml file contains Rucio DIDs for each Sample
+ container_name e.g. user.kchoi:user.kchoi.<container-name>.Sample.v1
+ version e.g. user.kchoi:user.kchoi.fcnc_ana.Sample.<version>
+
+optional arguments:
+ -h, --help show this help message and exit
+ --dry-run DRY_RUN Run without creating new Rucio container
+
+```
+
+## Acknowledgements
+
+Support for this work was provided by the the U.S. Department of Energy, Office of High Energy Physics under Grant No. DE-SC0007890
+
+
+%package -n python3-servicex-databinder
+Summary: ServiceX data management using a configuration file
+Provides: python-servicex-databinder
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-servicex-databinder
+# ServiceX DataBinder
+
+<p align="right"> Release v0.4.1 </p>
+
+[![PyPI version](https://badge.fury.io/py/servicex-databinder.svg)](https://badge.fury.io/py/servicex-databinder)
+
+`servicex-databinder` is a user-analysis data management package using a single configuration file.
+Samples with external data sources (e.g. `RucioDID` or `XRootDFiles`) utilize ServiceX to deliver user-selected columns with optional row filtering.
+<!-- to interact with ServiceX instance to make ServiceX request(s) and manage ServiceX delivered data from a single configuration file. -->
+
+The following table shows supported ServiceX transformers by DataBinder
+
+| Input format | Code generator | Transformer | Output format
+| :--- | :---: | :---: | :---: |
+| ROOT Ntuple | func-adl | `uproot` | `root` or `parquet` |
+| ATLAS Release 21 xAOD | func-adl | `atlasr21`| `root` |
+<!-- | ROOT Ntuple | python function | `python`| -->
+
+<!-- [`ServiceX`](https://github.com/ssl-hep/ServiceX) is a scalable HEP event data extraction, transformation and delivery system.
+
+['ServiceX Client library'](https://github.com/ssl-hep/ServiceX_frontend) provides -->
+
+## Prerequisite
+- [Access to a ServiceX instance](https://servicex.readthedocs.io/en/latest/user/getting-started/)
+- Python 3.7+
+
+## Installation
+```shell
+pip install servicex-databinder
+```
+
+## Configuration file
+
+The configuration file is a yaml file containing all the information.
+
+The [following example configuration file](config_minimum.yaml) contains minimal fields. You can also download [`servicex-opendata.yaml`](servicex-opendata.yaml) file (rename to `servicex.yaml`) at your working directory, and run DataBinder for OpenData without an access token.
+
+```yaml
+General:
+ ServiceXName: servicex-opendata
+ OutputFormat: parquet
+
+Sample:
+ - Name: ggH125_ZZ4lep
+ XRootDFiles: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\
+ /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root"
+ Tree: mini
+ Columns: lep_pt, lep_eta
+```
+
+`General` block requires two mandatory options (`ServiceXName` and `OutputFormat`) as in the example above.
+
+Input dataset for each Sample can be defined either by `RucioDID` or `XRootDFiles` or `LocalPath`.
+
+ServiceX query can be constructed with either TCut syntax or func-adl.
+- Options for TCut syntax: `Filter`<sup>1</sup> and `Columns`
+- Option for Func-adl expression: `FuncADL`
+
+&nbsp; &nbsp; &nbsp; <sup>1</sup> `Filter` works only for scalar-type of TBranch.
+
+Output format can be either `Apache parquet` or `ROOT ntuple` for `uproot` backend. Only `ROOT ntuple` format is supported for `xAOD` backend.
+
+
+The followings are available options:
+
+<!-- `General` block: -->
+| Option for `General` block | Description | DataType |
+|:--------:|:------:|:------|
+| `ServiceXName`* | ServiceX backend name in your `servicex.yaml` file <br> | `String` |
+| `OutputDirectory` | Path to the directory for ServiceX delivered files | `String` |
+| `OutputFormat`* | Output file format of ServiceX delivered data (`parquet` or `root` for `uproot` / `root` for `xaod`) | `String` |
+| `WriteOutputDict` | Name of an ouput yaml file containing Python nested dictionary of output file paths (located in the `OutputDirectory`) | `String` |
+| `IgnoreServiceXCache` | Ignore the existing ServiceX cache and force to make ServiceX requests | `Boolean` |
+<p align="right"> *Mandatory options</p>
+
+| Option for `Sample` block | Description |DataType |
+|:--------:|:------:|:------|
+| `Name` | sample name defined by a user |`String` |
+| `RucioDID` | Rucio Dataset Id (DID) for a given sample; <br> Can be multiple DIDs separated by comma |`String` |
+| `XRootDFiles` | XRootD files (e.g. `root://`) for a given sample; <br> Can be multiple files separated by comma |`String` |
+| `Tree` | Name of the input ROOT `TTree`; <br> Can be multiple `TTree`s separated by comma (`uproot` ONLY) |`String` |
+| `Filter` | Selection in the TCut syntax, e.g. `jet_pt > 10e3 && jet_eta < 2.0` (TCut ONLY) |`String` |
+| `Columns` | List of columns (or branches) to be delivered; multiple columns separately by comma (TCut ONLY) |`String` |
+| `FuncADL` | func-adl expression for a given sample |`String` |
+| `LocalPath` | File path directly from local path (NO ServiceX tranformation) | `String` |
+
+ <!-- Options exclusively for TCut syntax (CANNOT combine with the option `FuncADL`) -->
+
+ <!-- Option for func-adl expression (CANNOT combine with the option `Fitler` and `Columns`) -->
+
+A config file can be simplified by utilizing `Definition` block. You can define placeholders under `Definition` block, which will replace all matched placeholders in the values of `Sample` block. Note that placeholders must start with `DEF_`.
+
+You can source each Sample using different ServiceX transformers.
+The default transformer is set by `type` of `servicex.yaml`, but `Transformer` in the `General` block overwrites if present, and `Transformer` in each `Sample` overwrites any previous transformer selection.
+
+The [following example configuration](config_maximum.yaml) shows how to use each Options.
+
+```yaml
+General:
+ ServiceXName: servicex-uc-af
+ Transformer: uproot
+ OutputFormat: root
+ OutputDirectory: /Users/kchoi/data_for_MLstudy
+ WriteOutputDict: fileset_ml_study
+ IgnoreServiceXCache: False
+
+Sample:
+ - Name: Signal
+ RucioDID: user.kchoi:user.kchoi.signalA,
+ user.kchoi:user.kchoi.signalB,
+ user.kchoi:user.kchoi.signalC
+ Tree: nominal
+ FuncADL: DEF_ttH_nominal_query
+ - Name: Background1
+ XRootDFiles: DEF_ggH_input
+ Tree: mini
+ Filter: lep_n>2
+ Columns: lep_pt, lep_eta
+ - Name: Background2
+ Transformer: atlasr21
+ RucioDID: DEF_Zee_input
+ FuncADL: DEF_Zee_query
+ - Name: Background3
+ LocalPath: /Users/kchoi/Work/data/background3
+
+Definition:
+ DEF_ttH_nominal_query: "Where(lambda e: e.met_met>150e3). \
+ Select(lambda event: {'el_pt': event.el_pt, 'jet_e': event.jet_e, \
+ 'jet_pt': event.jet_pt, 'met_met': event.met_met})"
+ DEF_ggH_input: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\
+ /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root"
+ DEF_Zee_input: "mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.\
+ merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00"
+ DEF_Zee_query: "SelectMany('lambda e: e.Jets(\"AntiKt4EMTopoJets\")'). \
+ Where('lambda j: (j.pt() / 1000) > 30'). \
+ Select('lambda j: j.pt() / 1000.0'). \
+ AsROOTTTree('junk.root', 'my_tree', [\"JetPt\"])"
+```
+
+
+## Deliver data
+
+```python
+from servicex_databinder import DataBinder
+sx_db = DataBinder('<CONFIG>.yml')
+out = sx_db.deliver()
+```
+
+The function `deliver()` returns a Python nested dictionary that contains delivered files.
+<!-- - for `uproot` backend and `parquet` output format: `out['<SAMPLE>']['<TREE>'] = [ List of output parquet files ]`
+- for `uproot` backend and `root` output format: `out['<SAMPLE>'] = [ List of output root files ]`
+- for `xAOD` backend: `out['<SAMPLE>'] = [ List of output root files ]` -->
+
+Input configuration can be also passed in a form of a Python dictionary.
+
+Delivered Samples and files in the `OutputDirectory` are always synced with the DataBinder config file.
+
+<!-- ## Currently available
+- Dataset as Rucio DID + Input file format is ROOT TTree + ServiceX delivers output in parquet format
+- Dataset as Rucio DID + Input file format is ATLAS xAOD + ServiceX delivers output in ROOT TTree format
+- Dataset as XRootD + Input file format is ROOT TTree + ServiceX delivers output in parquet format -->
+
+## Error handling
+
+```python
+failed_requests = sx_db.get_failed_requests()
+```
+
+If failed ServiceX request(s), `deliver()` will print number of failed requests and the name of Sample, Tree if present, and input dataset. You can get a full list of failed samples and error messages for each by `get_failed_requests()` function. If it is not clear from the message you can browse `Logs` in the ServiceX instance webpage for the detail.
+
+## Useful tools
+
+### Create Rucio container for multiple DIDs
+
+The current ServiceX generates one request per Rucio DID.
+It's often the case that a physics analysis needs to process hundreds of DIDs.
+In such cases, the script (`scripts/create_rucio_container.py`) can be used to create one Rucio container per Sample from a yaml file.
+An example yaml file (`scripts/rucio_dids_example.yaml`) is included.
+
+Here is the usage of the script:
+
+```shell
+usage: create_rucio_containers.py [-h] [--dry-run DRY_RUN]
+ infile container_name version
+
+Create Rucio containers from multiple DIDs
+
+positional arguments:
+ infile yaml file contains Rucio DIDs for each Sample
+ container_name e.g. user.kchoi:user.kchoi.<container-name>.Sample.v1
+ version e.g. user.kchoi:user.kchoi.fcnc_ana.Sample.<version>
+
+optional arguments:
+ -h, --help show this help message and exit
+ --dry-run DRY_RUN Run without creating new Rucio container
+
+```
+
+## Acknowledgements
+
+Support for this work was provided by the the U.S. Department of Energy, Office of High Energy Physics under Grant No. DE-SC0007890
+
+
+%package help
+Summary: Development documents and examples for servicex-databinder
+Provides: python3-servicex-databinder-doc
+%description help
+# ServiceX DataBinder
+
+<p align="right"> Release v0.4.1 </p>
+
+[![PyPI version](https://badge.fury.io/py/servicex-databinder.svg)](https://badge.fury.io/py/servicex-databinder)
+
+`servicex-databinder` is a user-analysis data management package using a single configuration file.
+Samples with external data sources (e.g. `RucioDID` or `XRootDFiles`) utilize ServiceX to deliver user-selected columns with optional row filtering.
+<!-- to interact with ServiceX instance to make ServiceX request(s) and manage ServiceX delivered data from a single configuration file. -->
+
+The following table shows supported ServiceX transformers by DataBinder
+
+| Input format | Code generator | Transformer | Output format
+| :--- | :---: | :---: | :---: |
+| ROOT Ntuple | func-adl | `uproot` | `root` or `parquet` |
+| ATLAS Release 21 xAOD | func-adl | `atlasr21`| `root` |
+<!-- | ROOT Ntuple | python function | `python`| -->
+
+<!-- [`ServiceX`](https://github.com/ssl-hep/ServiceX) is a scalable HEP event data extraction, transformation and delivery system.
+
+['ServiceX Client library'](https://github.com/ssl-hep/ServiceX_frontend) provides -->
+
+## Prerequisite
+- [Access to a ServiceX instance](https://servicex.readthedocs.io/en/latest/user/getting-started/)
+- Python 3.7+
+
+## Installation
+```shell
+pip install servicex-databinder
+```
+
+## Configuration file
+
+The configuration file is a yaml file containing all the information.
+
+The [following example configuration file](config_minimum.yaml) contains minimal fields. You can also download [`servicex-opendata.yaml`](servicex-opendata.yaml) file (rename to `servicex.yaml`) at your working directory, and run DataBinder for OpenData without an access token.
+
+```yaml
+General:
+ ServiceXName: servicex-opendata
+ OutputFormat: parquet
+
+Sample:
+ - Name: ggH125_ZZ4lep
+ XRootDFiles: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\
+ /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root"
+ Tree: mini
+ Columns: lep_pt, lep_eta
+```
+
+`General` block requires two mandatory options (`ServiceXName` and `OutputFormat`) as in the example above.
+
+Input dataset for each Sample can be defined either by `RucioDID` or `XRootDFiles` or `LocalPath`.
+
+ServiceX query can be constructed with either TCut syntax or func-adl.
+- Options for TCut syntax: `Filter`<sup>1</sup> and `Columns`
+- Option for Func-adl expression: `FuncADL`
+
+&nbsp; &nbsp; &nbsp; <sup>1</sup> `Filter` works only for scalar-type of TBranch.
+
+Output format can be either `Apache parquet` or `ROOT ntuple` for `uproot` backend. Only `ROOT ntuple` format is supported for `xAOD` backend.
+
+
+The followings are available options:
+
+<!-- `General` block: -->
+| Option for `General` block | Description | DataType |
+|:--------:|:------:|:------|
+| `ServiceXName`* | ServiceX backend name in your `servicex.yaml` file <br> | `String` |
+| `OutputDirectory` | Path to the directory for ServiceX delivered files | `String` |
+| `OutputFormat`* | Output file format of ServiceX delivered data (`parquet` or `root` for `uproot` / `root` for `xaod`) | `String` |
+| `WriteOutputDict` | Name of an ouput yaml file containing Python nested dictionary of output file paths (located in the `OutputDirectory`) | `String` |
+| `IgnoreServiceXCache` | Ignore the existing ServiceX cache and force to make ServiceX requests | `Boolean` |
+<p align="right"> *Mandatory options</p>
+
+| Option for `Sample` block | Description |DataType |
+|:--------:|:------:|:------|
+| `Name` | sample name defined by a user |`String` |
+| `RucioDID` | Rucio Dataset Id (DID) for a given sample; <br> Can be multiple DIDs separated by comma |`String` |
+| `XRootDFiles` | XRootD files (e.g. `root://`) for a given sample; <br> Can be multiple files separated by comma |`String` |
+| `Tree` | Name of the input ROOT `TTree`; <br> Can be multiple `TTree`s separated by comma (`uproot` ONLY) |`String` |
+| `Filter` | Selection in the TCut syntax, e.g. `jet_pt > 10e3 && jet_eta < 2.0` (TCut ONLY) |`String` |
+| `Columns` | List of columns (or branches) to be delivered; multiple columns separately by comma (TCut ONLY) |`String` |
+| `FuncADL` | func-adl expression for a given sample |`String` |
+| `LocalPath` | File path directly from local path (NO ServiceX tranformation) | `String` |
+
+ <!-- Options exclusively for TCut syntax (CANNOT combine with the option `FuncADL`) -->
+
+ <!-- Option for func-adl expression (CANNOT combine with the option `Fitler` and `Columns`) -->
+
+A config file can be simplified by utilizing `Definition` block. You can define placeholders under `Definition` block, which will replace all matched placeholders in the values of `Sample` block. Note that placeholders must start with `DEF_`.
+
+You can source each Sample using different ServiceX transformers.
+The default transformer is set by `type` of `servicex.yaml`, but `Transformer` in the `General` block overwrites if present, and `Transformer` in each `Sample` overwrites any previous transformer selection.
+
+The [following example configuration](config_maximum.yaml) shows how to use each Options.
+
+```yaml
+General:
+ ServiceXName: servicex-uc-af
+ Transformer: uproot
+ OutputFormat: root
+ OutputDirectory: /Users/kchoi/data_for_MLstudy
+ WriteOutputDict: fileset_ml_study
+ IgnoreServiceXCache: False
+
+Sample:
+ - Name: Signal
+ RucioDID: user.kchoi:user.kchoi.signalA,
+ user.kchoi:user.kchoi.signalB,
+ user.kchoi:user.kchoi.signalC
+ Tree: nominal
+ FuncADL: DEF_ttH_nominal_query
+ - Name: Background1
+ XRootDFiles: DEF_ggH_input
+ Tree: mini
+ Filter: lep_n>2
+ Columns: lep_pt, lep_eta
+ - Name: Background2
+ Transformer: atlasr21
+ RucioDID: DEF_Zee_input
+ FuncADL: DEF_Zee_query
+ - Name: Background3
+ LocalPath: /Users/kchoi/Work/data/background3
+
+Definition:
+ DEF_ttH_nominal_query: "Where(lambda e: e.met_met>150e3). \
+ Select(lambda event: {'el_pt': event.el_pt, 'jet_e': event.jet_e, \
+ 'jet_pt': event.jet_pt, 'met_met': event.met_met})"
+ DEF_ggH_input: "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\
+ /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root"
+ DEF_Zee_input: "mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.\
+ merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00"
+ DEF_Zee_query: "SelectMany('lambda e: e.Jets(\"AntiKt4EMTopoJets\")'). \
+ Where('lambda j: (j.pt() / 1000) > 30'). \
+ Select('lambda j: j.pt() / 1000.0'). \
+ AsROOTTTree('junk.root', 'my_tree', [\"JetPt\"])"
+```
+
+
+## Deliver data
+
+```python
+from servicex_databinder import DataBinder
+sx_db = DataBinder('<CONFIG>.yml')
+out = sx_db.deliver()
+```
+
+The function `deliver()` returns a Python nested dictionary that contains delivered files.
+<!-- - for `uproot` backend and `parquet` output format: `out['<SAMPLE>']['<TREE>'] = [ List of output parquet files ]`
+- for `uproot` backend and `root` output format: `out['<SAMPLE>'] = [ List of output root files ]`
+- for `xAOD` backend: `out['<SAMPLE>'] = [ List of output root files ]` -->
+
+Input configuration can be also passed in a form of a Python dictionary.
+
+Delivered Samples and files in the `OutputDirectory` are always synced with the DataBinder config file.
+
+<!-- ## Currently available
+- Dataset as Rucio DID + Input file format is ROOT TTree + ServiceX delivers output in parquet format
+- Dataset as Rucio DID + Input file format is ATLAS xAOD + ServiceX delivers output in ROOT TTree format
+- Dataset as XRootD + Input file format is ROOT TTree + ServiceX delivers output in parquet format -->
+
+## Error handling
+
+```python
+failed_requests = sx_db.get_failed_requests()
+```
+
+If failed ServiceX request(s), `deliver()` will print number of failed requests and the name of Sample, Tree if present, and input dataset. You can get a full list of failed samples and error messages for each by `get_failed_requests()` function. If it is not clear from the message you can browse `Logs` in the ServiceX instance webpage for the detail.
+
+## Useful tools
+
+### Create Rucio container for multiple DIDs
+
+The current ServiceX generates one request per Rucio DID.
+It's often the case that a physics analysis needs to process hundreds of DIDs.
+In such cases, the script (`scripts/create_rucio_container.py`) can be used to create one Rucio container per Sample from a yaml file.
+An example yaml file (`scripts/rucio_dids_example.yaml`) is included.
+
+Here is the usage of the script:
+
+```shell
+usage: create_rucio_containers.py [-h] [--dry-run DRY_RUN]
+ infile container_name version
+
+Create Rucio containers from multiple DIDs
+
+positional arguments:
+ infile yaml file contains Rucio DIDs for each Sample
+ container_name e.g. user.kchoi:user.kchoi.<container-name>.Sample.v1
+ version e.g. user.kchoi:user.kchoi.fcnc_ana.Sample.<version>
+
+optional arguments:
+ -h, --help show this help message and exit
+ --dry-run DRY_RUN Run without creating new Rucio container
+
+```
+
+## Acknowledgements
+
+Support for this work was provided by the the U.S. Department of Energy, Office of High Energy Physics under Grant No. DE-SC0007890
+
+
+%prep
+%autosetup -n servicex_databinder-0.4.1
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-servicex-databinder -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Thu Jun 08 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.1-1
+- Package Spec generated