diff options
Diffstat (limited to 'python-shelephant.spec')
-rw-r--r-- | python-shelephant.spec | 1170 |
1 files changed, 1170 insertions, 0 deletions
diff --git a/python-shelephant.spec b/python-shelephant.spec new file mode 100644 index 0000000..9f980c3 --- /dev/null +++ b/python-shelephant.spec @@ -0,0 +1,1170 @@ +%global _empty_manifest_terminate_build 0 +Name: python-shelephant +Version: 0.21.8 +Release: 1 +Summary: Simple dataset management +License: MIT License +URL: https://pypi.org/project/shelephant/ +Source0: https://mirrors.aliyun.com/pypi/web/packages/5e/d8/54b1087d508a895fca47f2523a58a4d998aaaacd90ca8a83a8603c507034/shelephant-0.21.8.tar.gz +BuildArch: noarch + +Requires: python3-click +Requires: python3-numpy +Requires: python3-platformdirs +Requires: python3-prettytable +Requires: python3-pyyaml +Requires: python3-tqdm + +%description +# shelephant + +[](https://github.com/tdegeus/shelephant/actions) +[](https://shelephant.readthedocs.io/?badge=latest) +[](https://anaconda.org/conda-forge/shelephant) +[](https://pypi.org/project/shelephant/) + +Command-line arguments with a memory (stored in YAML-files). + +**Documentation: https://shelephant.readthedocs.io** + +# Contents + +<!-- MarkdownTOC --> + +- [Overview](#overview) + - [Hallmark feature: Copy with restart](#hallmark-feature-copy-with-restart) + - [Command-line tools](#command-line-tools) + - [File information](#file-information) + - [File operations](#file-operations) + - [YAML file operations](#yaml-file-operations) +- [Disclaimer](#disclaimer) +- [Getting shelephant](#getting-shelephant) + - [Using conda](#using-conda) + - [Using PyPi](#using-pypi) + - [From source](#from-source) +- [Detailed examples](#detailed-examples) + - [Get files from remote, allowing restarts](#get-files-from-remote-allowing-restarts) + - [Avoid recomputing checksums](#avoid-recomputing-checksums) + - [Send files to host](#send-files-to-host) + - [Basic copy](#basic-copy) + - [Restart](#restart) + +<!-- /MarkdownTOC --> + +# Overview + +## Hallmark feature: Copy with restart + +*shelephant* presents you with a way to copy files (from a remote, using SSH) in two steps: +1. Collect a list of files that should be copied in a YAML-file, + allowing you to **review and customise** the copy operation + (e.g. by *changing the order* and making last-minute manual changes). +2. Perform the copy, efficiently skipping files that are identical. + +Typical workflow: + +```bash +# Collect files to copy & compute their checksum (e.g. on remote system) +# - creates "shelephant_dump.yaml" +shelephant_dump *.hdf5 +# - reads "shelephant_dump.yaml" +# - creates "shelephant_checksum.yaml" +shelephant_checksum + +# Combine all needed info (locally) +# - reads "shelephant_dump.yaml" and "shelephant_checksum.yaml" +# - creates "shelephant_hostinfo.yaml" +shelephant_hostinfo --host myhost --prefix /some/path --files --checksum + +# Copy from remote (can be restarted and any time, existing files are skipped) +# - reads "shelephant_hostinfo.yaml" +shelephant_get +``` + +> * The filenames can be customised. +> * To copy *to* a remote system use `shelephant_send`. +> * Get details in the help of the respective commands, e.g. `shelephant_dump --help`. +> * *shelephant* works for both local as remote copy actions. + +## Command-line tools + +### File information + +* `shelephant_dump`: list filenames in a YAML file. +* `shelephant_checksum`: get the checksums of files listed in a YAML file. +* `shelephant_hostinfo`: collect host information (from a remote system). + +### File operations + +* `shelephant_get`: copy from remote, based on earlier stored information. +* `shelephant_send`: copy to remote, based on earlier stored information. +* `shelephant_rm`: remove files listed in a YAML file. +* `shelephant_cp`: copy files listed in a YAML file. +* `shelephant_mv`: move files listed in a YAML file. + +### YAML file operations + +* `shelephant_extract`: isolate a (number of) field(s) in a (new) YAML file. +* `shelephant_merge`: merge two YAML-files. +* `shelephant_parse`: parse a YAML-files and print to screen. + +# Disclaimer + +This library is free to use under the [MIT license](https://github.com/tdegeus/shelephant/blob/master/LICENSE). Any additions are very much appreciated, in terms of suggested functionality, code, documentation, testimonials, word-of-mouth advertisement, etc. Bug reports or feature requests can be filed on [GitHub](https://github.com/tdegeus/shelephant). As always, the code comes with no guarantee. None of the developers can be held responsible for possible mistakes. + +Download: [.zip file](https://github.com/tdegeus/shelephant/zipball/master) | [.tar.gz file](https://github.com/tdegeus/shelephant/tarball/master). + +(c - [MIT](https://github.com/tdegeus/shelephant/blob/master/LICENSE)) T.W.J. de Geus (Tom) | tom@geus.me | www.geus.me | [github.com/tdegeus/shelephant](https://github.com/tdegeus/shelephant) + +# Getting shelephant + +## Using conda + +```bash +conda install -c conda-forge shelephant +``` + +This will also download and install all necessary dependencies. + +## Using PyPi + +```bash +pip install shelephant +``` + +This will also download and install the necessary Python modules. + +## From source + +```bash +# Download shelephant +git checkout https://github.com/tdegeus/shelephant.git +cd shelephant + +# Install +python -m pip install . +``` + +This will also download and install the necessary Python modules. + + +# Detailed examples + +## Get files from remote, allowing restarts + +Suppose that we want to copy all `*.txt` files +from a certain directory `/path/where/files/are/stored` on a remote host `hostname`. + +First step, collect information *on the host*: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/files/are/stored/on/remote" + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# optional but useful, get the checksum of the files to copy +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Second step, copy files to the *local system*, collecting everything in a single place: + +```bash +# go to the relevant location on the local system +# (often this is new directory) +cd "/path/where/to/copy/to" + +# get the file-information compiled on the host +# and store in a (temporary) local file +# note that all paths are on the remote system, +# and that they are now copied using secure-copy (scp) +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/files/are/stored/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" + +# finally, get the files using secure copy +# (the files are stored relative to the path of 'remote_info.yaml', +# identically to how they are relative to 'files_to_copy.yaml' on remote) +shelephant_get remote_info.yaml +``` + +> If you use the default filenames for `shelephant_dump` (`shelephant_dump.yaml`) and +> `shelephant_checksum` (`shelephant_checksum.yaml`) remotely, +> you can also specify `--files` and `--checksum` without an argument. + +An interesting benefit that derives from having computed the checksums on the host, +is that `shelephant_get` can be stopped and restarted: +**only files that do not exist locally, or that were only partially copied +(whose checksum does not match the remotely computed checksum), will be copied; +all fully copied files will be skipped**. + +Let's further illustrate with a complete example. On the host, suppose that we have +```none +/path/where/files/are/stored/on/remote +- foo.txt +- bar.txt +``` + +This will give, `files_to_copy.yaml`: + +```yaml +- foo.txt +- bar.txt +``` +`files_checksum.yaml` (for example): + +```yaml +- 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae +- fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9 +``` + +This information will be collected to `remote_info.yaml` +``` +host: hostname +root: /path/where/files/are/stored/on/remote +files: + - foo.txt + - bar.txt +checksum: + - 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae + - fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9 +``` + +`shelephant_get` will now copy `foo.txt` and `bar.txt` relative to the directory of +`remote_info.yaml` +(in this case in the same folder as `remote_info.yaml`). +It will skip any files whose filename and checksum match to target ones. + +### Avoid recomputing checksums + +Suppose that we want to restart multiple times, or that we +update the files present on the remote after copying them initially. +In that case, we can use previously computed +checksums to avoid recomputing them +(which can be costly for large files). + +First step, update information *on the host*: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/files/are/stored/on/remote" + +# collect the previously computed information +shelephant_hostinfo -o precomputed_checksums.yaml -f files_to_copy.yaml -c files_checksum.yaml + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# get the checksum of the files to copy, where possible reading precomputed values +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml -l precomputed_checksums.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Second step, copy files to the *local system*, collecting everything in a single place: + +```bash +# go to the relevant location on the local system +# (often this is new directory) +cd "/path/where/to/copy/to" + +# collect the previously computed information +shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml + +# list files currently present locally +shelephant_dump -o files_present.yaml *.txt + +# get the checksum of the files to copy, where possible reading precomputed values +shelephant_checksum -o files_checksum.yaml files_present.yaml -l precomputed_checksums.yaml + +# combine local files and checksums +shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml + +# get the file-information compiled on the host [as before] +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/files/are/stored/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" + +# get the files using secure copy +# use the precomputed checksums instead of computing them +shelephant_get remote_info.yaml --local "precomputed_checksums.yaml" +``` + +## Send files to host + +### Basic copy + +Suppose that we want to copy all `*.txt` files +from a certain local directory `/path/where/files/are/stored/locally`, +to a remote host `hostname`. + +First, we will collect information *locally*: + +```bash +# go the relevant location (locally) +cd /path/where/files/are/stored/locally + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt +``` + +Then, we will specify some basic information about the host + +```bash +# specify basic information about the host +# and store in a (temporary) local file +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/to/copy/to/on/remote" \ +``` + +Now we can copy the files: +```bash +shelephant_send files_to_copy.yaml remote_info.yaml +``` + +### Restart + +Suppose that copying was interrupted before completing. +We can avoid recopying by again using the checksums. +We therefore need to know which files are already present remotely +and which checksum they have. +Thereto: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/to/copy/to/on/remote" + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# get the checksum of the files to copy +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Now we will complement the basic host-info: +```bash +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/to/copy/to/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" +``` + +And restart the partial copy: +```bash +shelephant_send files_to_copy.yaml remote_info.yaml +``` + + +%package -n python3-shelephant +Summary: Simple dataset management +Provides: python-shelephant +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-shelephant +# shelephant + +[](https://github.com/tdegeus/shelephant/actions) +[](https://shelephant.readthedocs.io/?badge=latest) +[](https://anaconda.org/conda-forge/shelephant) +[](https://pypi.org/project/shelephant/) + +Command-line arguments with a memory (stored in YAML-files). + +**Documentation: https://shelephant.readthedocs.io** + +# Contents + +<!-- MarkdownTOC --> + +- [Overview](#overview) + - [Hallmark feature: Copy with restart](#hallmark-feature-copy-with-restart) + - [Command-line tools](#command-line-tools) + - [File information](#file-information) + - [File operations](#file-operations) + - [YAML file operations](#yaml-file-operations) +- [Disclaimer](#disclaimer) +- [Getting shelephant](#getting-shelephant) + - [Using conda](#using-conda) + - [Using PyPi](#using-pypi) + - [From source](#from-source) +- [Detailed examples](#detailed-examples) + - [Get files from remote, allowing restarts](#get-files-from-remote-allowing-restarts) + - [Avoid recomputing checksums](#avoid-recomputing-checksums) + - [Send files to host](#send-files-to-host) + - [Basic copy](#basic-copy) + - [Restart](#restart) + +<!-- /MarkdownTOC --> + +# Overview + +## Hallmark feature: Copy with restart + +*shelephant* presents you with a way to copy files (from a remote, using SSH) in two steps: +1. Collect a list of files that should be copied in a YAML-file, + allowing you to **review and customise** the copy operation + (e.g. by *changing the order* and making last-minute manual changes). +2. Perform the copy, efficiently skipping files that are identical. + +Typical workflow: + +```bash +# Collect files to copy & compute their checksum (e.g. on remote system) +# - creates "shelephant_dump.yaml" +shelephant_dump *.hdf5 +# - reads "shelephant_dump.yaml" +# - creates "shelephant_checksum.yaml" +shelephant_checksum + +# Combine all needed info (locally) +# - reads "shelephant_dump.yaml" and "shelephant_checksum.yaml" +# - creates "shelephant_hostinfo.yaml" +shelephant_hostinfo --host myhost --prefix /some/path --files --checksum + +# Copy from remote (can be restarted and any time, existing files are skipped) +# - reads "shelephant_hostinfo.yaml" +shelephant_get +``` + +> * The filenames can be customised. +> * To copy *to* a remote system use `shelephant_send`. +> * Get details in the help of the respective commands, e.g. `shelephant_dump --help`. +> * *shelephant* works for both local as remote copy actions. + +## Command-line tools + +### File information + +* `shelephant_dump`: list filenames in a YAML file. +* `shelephant_checksum`: get the checksums of files listed in a YAML file. +* `shelephant_hostinfo`: collect host information (from a remote system). + +### File operations + +* `shelephant_get`: copy from remote, based on earlier stored information. +* `shelephant_send`: copy to remote, based on earlier stored information. +* `shelephant_rm`: remove files listed in a YAML file. +* `shelephant_cp`: copy files listed in a YAML file. +* `shelephant_mv`: move files listed in a YAML file. + +### YAML file operations + +* `shelephant_extract`: isolate a (number of) field(s) in a (new) YAML file. +* `shelephant_merge`: merge two YAML-files. +* `shelephant_parse`: parse a YAML-files and print to screen. + +# Disclaimer + +This library is free to use under the [MIT license](https://github.com/tdegeus/shelephant/blob/master/LICENSE). Any additions are very much appreciated, in terms of suggested functionality, code, documentation, testimonials, word-of-mouth advertisement, etc. Bug reports or feature requests can be filed on [GitHub](https://github.com/tdegeus/shelephant). As always, the code comes with no guarantee. None of the developers can be held responsible for possible mistakes. + +Download: [.zip file](https://github.com/tdegeus/shelephant/zipball/master) | [.tar.gz file](https://github.com/tdegeus/shelephant/tarball/master). + +(c - [MIT](https://github.com/tdegeus/shelephant/blob/master/LICENSE)) T.W.J. de Geus (Tom) | tom@geus.me | www.geus.me | [github.com/tdegeus/shelephant](https://github.com/tdegeus/shelephant) + +# Getting shelephant + +## Using conda + +```bash +conda install -c conda-forge shelephant +``` + +This will also download and install all necessary dependencies. + +## Using PyPi + +```bash +pip install shelephant +``` + +This will also download and install the necessary Python modules. + +## From source + +```bash +# Download shelephant +git checkout https://github.com/tdegeus/shelephant.git +cd shelephant + +# Install +python -m pip install . +``` + +This will also download and install the necessary Python modules. + + +# Detailed examples + +## Get files from remote, allowing restarts + +Suppose that we want to copy all `*.txt` files +from a certain directory `/path/where/files/are/stored` on a remote host `hostname`. + +First step, collect information *on the host*: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/files/are/stored/on/remote" + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# optional but useful, get the checksum of the files to copy +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Second step, copy files to the *local system*, collecting everything in a single place: + +```bash +# go to the relevant location on the local system +# (often this is new directory) +cd "/path/where/to/copy/to" + +# get the file-information compiled on the host +# and store in a (temporary) local file +# note that all paths are on the remote system, +# and that they are now copied using secure-copy (scp) +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/files/are/stored/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" + +# finally, get the files using secure copy +# (the files are stored relative to the path of 'remote_info.yaml', +# identically to how they are relative to 'files_to_copy.yaml' on remote) +shelephant_get remote_info.yaml +``` + +> If you use the default filenames for `shelephant_dump` (`shelephant_dump.yaml`) and +> `shelephant_checksum` (`shelephant_checksum.yaml`) remotely, +> you can also specify `--files` and `--checksum` without an argument. + +An interesting benefit that derives from having computed the checksums on the host, +is that `shelephant_get` can be stopped and restarted: +**only files that do not exist locally, or that were only partially copied +(whose checksum does not match the remotely computed checksum), will be copied; +all fully copied files will be skipped**. + +Let's further illustrate with a complete example. On the host, suppose that we have +```none +/path/where/files/are/stored/on/remote +- foo.txt +- bar.txt +``` + +This will give, `files_to_copy.yaml`: + +```yaml +- foo.txt +- bar.txt +``` +`files_checksum.yaml` (for example): + +```yaml +- 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae +- fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9 +``` + +This information will be collected to `remote_info.yaml` +``` +host: hostname +root: /path/where/files/are/stored/on/remote +files: + - foo.txt + - bar.txt +checksum: + - 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae + - fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9 +``` + +`shelephant_get` will now copy `foo.txt` and `bar.txt` relative to the directory of +`remote_info.yaml` +(in this case in the same folder as `remote_info.yaml`). +It will skip any files whose filename and checksum match to target ones. + +### Avoid recomputing checksums + +Suppose that we want to restart multiple times, or that we +update the files present on the remote after copying them initially. +In that case, we can use previously computed +checksums to avoid recomputing them +(which can be costly for large files). + +First step, update information *on the host*: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/files/are/stored/on/remote" + +# collect the previously computed information +shelephant_hostinfo -o precomputed_checksums.yaml -f files_to_copy.yaml -c files_checksum.yaml + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# get the checksum of the files to copy, where possible reading precomputed values +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml -l precomputed_checksums.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Second step, copy files to the *local system*, collecting everything in a single place: + +```bash +# go to the relevant location on the local system +# (often this is new directory) +cd "/path/where/to/copy/to" + +# collect the previously computed information +shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml + +# list files currently present locally +shelephant_dump -o files_present.yaml *.txt + +# get the checksum of the files to copy, where possible reading precomputed values +shelephant_checksum -o files_checksum.yaml files_present.yaml -l precomputed_checksums.yaml + +# combine local files and checksums +shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml + +# get the file-information compiled on the host [as before] +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/files/are/stored/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" + +# get the files using secure copy +# use the precomputed checksums instead of computing them +shelephant_get remote_info.yaml --local "precomputed_checksums.yaml" +``` + +## Send files to host + +### Basic copy + +Suppose that we want to copy all `*.txt` files +from a certain local directory `/path/where/files/are/stored/locally`, +to a remote host `hostname`. + +First, we will collect information *locally*: + +```bash +# go the relevant location (locally) +cd /path/where/files/are/stored/locally + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt +``` + +Then, we will specify some basic information about the host + +```bash +# specify basic information about the host +# and store in a (temporary) local file +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/to/copy/to/on/remote" \ +``` + +Now we can copy the files: +```bash +shelephant_send files_to_copy.yaml remote_info.yaml +``` + +### Restart + +Suppose that copying was interrupted before completing. +We can avoid recopying by again using the checksums. +We therefore need to know which files are already present remotely +and which checksum they have. +Thereto: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/to/copy/to/on/remote" + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# get the checksum of the files to copy +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Now we will complement the basic host-info: +```bash +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/to/copy/to/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" +``` + +And restart the partial copy: +```bash +shelephant_send files_to_copy.yaml remote_info.yaml +``` + + +%package help +Summary: Development documents and examples for shelephant +Provides: python3-shelephant-doc +%description help +# shelephant + +[](https://github.com/tdegeus/shelephant/actions) +[](https://shelephant.readthedocs.io/?badge=latest) +[](https://anaconda.org/conda-forge/shelephant) +[](https://pypi.org/project/shelephant/) + +Command-line arguments with a memory (stored in YAML-files). + +**Documentation: https://shelephant.readthedocs.io** + +# Contents + +<!-- MarkdownTOC --> + +- [Overview](#overview) + - [Hallmark feature: Copy with restart](#hallmark-feature-copy-with-restart) + - [Command-line tools](#command-line-tools) + - [File information](#file-information) + - [File operations](#file-operations) + - [YAML file operations](#yaml-file-operations) +- [Disclaimer](#disclaimer) +- [Getting shelephant](#getting-shelephant) + - [Using conda](#using-conda) + - [Using PyPi](#using-pypi) + - [From source](#from-source) +- [Detailed examples](#detailed-examples) + - [Get files from remote, allowing restarts](#get-files-from-remote-allowing-restarts) + - [Avoid recomputing checksums](#avoid-recomputing-checksums) + - [Send files to host](#send-files-to-host) + - [Basic copy](#basic-copy) + - [Restart](#restart) + +<!-- /MarkdownTOC --> + +# Overview + +## Hallmark feature: Copy with restart + +*shelephant* presents you with a way to copy files (from a remote, using SSH) in two steps: +1. Collect a list of files that should be copied in a YAML-file, + allowing you to **review and customise** the copy operation + (e.g. by *changing the order* and making last-minute manual changes). +2. Perform the copy, efficiently skipping files that are identical. + +Typical workflow: + +```bash +# Collect files to copy & compute their checksum (e.g. on remote system) +# - creates "shelephant_dump.yaml" +shelephant_dump *.hdf5 +# - reads "shelephant_dump.yaml" +# - creates "shelephant_checksum.yaml" +shelephant_checksum + +# Combine all needed info (locally) +# - reads "shelephant_dump.yaml" and "shelephant_checksum.yaml" +# - creates "shelephant_hostinfo.yaml" +shelephant_hostinfo --host myhost --prefix /some/path --files --checksum + +# Copy from remote (can be restarted and any time, existing files are skipped) +# - reads "shelephant_hostinfo.yaml" +shelephant_get +``` + +> * The filenames can be customised. +> * To copy *to* a remote system use `shelephant_send`. +> * Get details in the help of the respective commands, e.g. `shelephant_dump --help`. +> * *shelephant* works for both local as remote copy actions. + +## Command-line tools + +### File information + +* `shelephant_dump`: list filenames in a YAML file. +* `shelephant_checksum`: get the checksums of files listed in a YAML file. +* `shelephant_hostinfo`: collect host information (from a remote system). + +### File operations + +* `shelephant_get`: copy from remote, based on earlier stored information. +* `shelephant_send`: copy to remote, based on earlier stored information. +* `shelephant_rm`: remove files listed in a YAML file. +* `shelephant_cp`: copy files listed in a YAML file. +* `shelephant_mv`: move files listed in a YAML file. + +### YAML file operations + +* `shelephant_extract`: isolate a (number of) field(s) in a (new) YAML file. +* `shelephant_merge`: merge two YAML-files. +* `shelephant_parse`: parse a YAML-files and print to screen. + +# Disclaimer + +This library is free to use under the [MIT license](https://github.com/tdegeus/shelephant/blob/master/LICENSE). Any additions are very much appreciated, in terms of suggested functionality, code, documentation, testimonials, word-of-mouth advertisement, etc. Bug reports or feature requests can be filed on [GitHub](https://github.com/tdegeus/shelephant). As always, the code comes with no guarantee. None of the developers can be held responsible for possible mistakes. + +Download: [.zip file](https://github.com/tdegeus/shelephant/zipball/master) | [.tar.gz file](https://github.com/tdegeus/shelephant/tarball/master). + +(c - [MIT](https://github.com/tdegeus/shelephant/blob/master/LICENSE)) T.W.J. de Geus (Tom) | tom@geus.me | www.geus.me | [github.com/tdegeus/shelephant](https://github.com/tdegeus/shelephant) + +# Getting shelephant + +## Using conda + +```bash +conda install -c conda-forge shelephant +``` + +This will also download and install all necessary dependencies. + +## Using PyPi + +```bash +pip install shelephant +``` + +This will also download and install the necessary Python modules. + +## From source + +```bash +# Download shelephant +git checkout https://github.com/tdegeus/shelephant.git +cd shelephant + +# Install +python -m pip install . +``` + +This will also download and install the necessary Python modules. + + +# Detailed examples + +## Get files from remote, allowing restarts + +Suppose that we want to copy all `*.txt` files +from a certain directory `/path/where/files/are/stored` on a remote host `hostname`. + +First step, collect information *on the host*: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/files/are/stored/on/remote" + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# optional but useful, get the checksum of the files to copy +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Second step, copy files to the *local system*, collecting everything in a single place: + +```bash +# go to the relevant location on the local system +# (often this is new directory) +cd "/path/where/to/copy/to" + +# get the file-information compiled on the host +# and store in a (temporary) local file +# note that all paths are on the remote system, +# and that they are now copied using secure-copy (scp) +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/files/are/stored/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" + +# finally, get the files using secure copy +# (the files are stored relative to the path of 'remote_info.yaml', +# identically to how they are relative to 'files_to_copy.yaml' on remote) +shelephant_get remote_info.yaml +``` + +> If you use the default filenames for `shelephant_dump` (`shelephant_dump.yaml`) and +> `shelephant_checksum` (`shelephant_checksum.yaml`) remotely, +> you can also specify `--files` and `--checksum` without an argument. + +An interesting benefit that derives from having computed the checksums on the host, +is that `shelephant_get` can be stopped and restarted: +**only files that do not exist locally, or that were only partially copied +(whose checksum does not match the remotely computed checksum), will be copied; +all fully copied files will be skipped**. + +Let's further illustrate with a complete example. On the host, suppose that we have +```none +/path/where/files/are/stored/on/remote +- foo.txt +- bar.txt +``` + +This will give, `files_to_copy.yaml`: + +```yaml +- foo.txt +- bar.txt +``` +`files_checksum.yaml` (for example): + +```yaml +- 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae +- fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9 +``` + +This information will be collected to `remote_info.yaml` +``` +host: hostname +root: /path/where/files/are/stored/on/remote +files: + - foo.txt + - bar.txt +checksum: + - 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae + - fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9 +``` + +`shelephant_get` will now copy `foo.txt` and `bar.txt` relative to the directory of +`remote_info.yaml` +(in this case in the same folder as `remote_info.yaml`). +It will skip any files whose filename and checksum match to target ones. + +### Avoid recomputing checksums + +Suppose that we want to restart multiple times, or that we +update the files present on the remote after copying them initially. +In that case, we can use previously computed +checksums to avoid recomputing them +(which can be costly for large files). + +First step, update information *on the host*: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/files/are/stored/on/remote" + +# collect the previously computed information +shelephant_hostinfo -o precomputed_checksums.yaml -f files_to_copy.yaml -c files_checksum.yaml + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# get the checksum of the files to copy, where possible reading precomputed values +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml -l precomputed_checksums.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Second step, copy files to the *local system*, collecting everything in a single place: + +```bash +# go to the relevant location on the local system +# (often this is new directory) +cd "/path/where/to/copy/to" + +# collect the previously computed information +shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml + +# list files currently present locally +shelephant_dump -o files_present.yaml *.txt + +# get the checksum of the files to copy, where possible reading precomputed values +shelephant_checksum -o files_checksum.yaml files_present.yaml -l precomputed_checksums.yaml + +# combine local files and checksums +shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml + +# get the file-information compiled on the host [as before] +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/files/are/stored/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" + +# get the files using secure copy +# use the precomputed checksums instead of computing them +shelephant_get remote_info.yaml --local "precomputed_checksums.yaml" +``` + +## Send files to host + +### Basic copy + +Suppose that we want to copy all `*.txt` files +from a certain local directory `/path/where/files/are/stored/locally`, +to a remote host `hostname`. + +First, we will collect information *locally*: + +```bash +# go the relevant location (locally) +cd /path/where/files/are/stored/locally + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt +``` + +Then, we will specify some basic information about the host + +```bash +# specify basic information about the host +# and store in a (temporary) local file +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/to/copy/to/on/remote" \ +``` + +Now we can copy the files: +```bash +shelephant_send files_to_copy.yaml remote_info.yaml +``` + +### Restart + +Suppose that copying was interrupted before completing. +We can avoid recopying by again using the checksums. +We therefore need to know which files are already present remotely +and which checksum they have. +Thereto: + +```bash +# connect to the host +ssh hostname + +# go the relevant location at the host +cd "/path/where/to/copy/to/on/remote" + +# list files to copy +shelephant_dump -o files_to_copy.yaml *.txt + +# get the checksum of the files to copy +shelephant_checksum -o files_checksum.yaml files_to_copy.yaml + +# disconnect +exit # or press Ctrl + D +``` + +Now we will complement the basic host-info: +```bash +shelephant_hostinfo \ + -o remote_info.yaml \ + --host "hostname" \ + --prefix "/path/where/to/copy/to/on/remote" \ + --files "files_to_copy.yaml " \ + --checksum "files_checksum.yaml" +``` + +And restart the partial copy: +```bash +shelephant_send files_to_copy.yaml remote_info.yaml +``` + + +%prep +%autosetup -n shelephant-0.21.8 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-shelephant -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.21.8-1 +- Package Spec generated |