%global _empty_manifest_terminate_build 0 Name: python-pydbr Version: 0.0.7 Release: 1 Summary: Databricks client SDK with command line client for Databricks REST APIs License: MIT License URL: https://github.com/ivangeorgiev/pydbr Source0: https://mirrors.nju.edu.cn/pypi/web/packages/09/c6/618f1b2cacaa50ebae807f4e862bd61e8f32b5ca0d44fb2422ce74156274/pydbr-0.0.7.tar.gz BuildArch: noarch Requires: python3-click Requires: python3-requests %description # pydbr Databricks client SDK for Python with command line interface for Databricks REST APIs. {:toc} ## Introduction Pydbr (short of Python-Databricks) package provides python SDK for Databricks REST API: * dbfs * workspace * jobs * runs The package also comes with a useful CLI which might be very helpful in automation. ## Installation ```bash $ pip install pydbr ``` ## Databricks CLI Databricks command line client provides convenient way to interact with Databricks cluster at the command line. A very popular use of such approach in in automation tasks, like DevOps pipelines or third party workflow managers. You can call the Databricks CLI using convenient shell command `pydbr`: ```bash $ pydbr --help ``` or using python module: ```bash $ python -m pydbr.cli --help ``` To connect to the Databricks cluster, you can supply arguments at the command line: * `--bearer-token` * `--url` * `--cluster-id` Alternatively, you can define environment variables. Command line arguments take precedence. ```bash export DATABRICKS_URL='https://westeurope.azuredatabricks.net/' export DATABRICKS_BEARER_TOKEN='dapixyz89u9ufsdfd0' export DATABRICKS_CLUSTER_ID='1234-456778-abc234' export DATABRICKS_ORG_ID='87287878293983984' ``` ### DBFS #### List DBFS items ```bash # List items on DBFS pydbr dbfs ls --json-indent 3 FileStore/movielens ``` ```bash [ { "path": "/FileStore/movielens/ml-latest-small", "is_dir": true, "file_size": 0, "is_file": false, "human_size": "0 B" } ] ``` #### Download file from DBFS ```bash # Download a file and print to STDOUT pydbr dbfs get ml-latest-small/movies.csv ``` #### Download directory from DBFS ```bash # Download recursively entire directory and store locally pydbr dbfs get -o ml-local ml-latest-small ``` ### Workspace Databricks workspace contains notebooks and other items. #### List workspace ```bash #################### # List workspace # Default path is root - '/' $ pydbr workspace ls # auto-add leading '/' $ pydbr workspace ls 'Users' # Space-indentend json output with number of spaces $ pydbr workspace --json-indent 4 ls # Custom indent string $ pydbr workspace ls --json-indent='>' ``` #### Export items from Databricks workspace ```bash ##################### # Export workspace items # Export everything in source format using defaults: format=SOURCE, path=/ pydbr workspace export -o ./.dev/export # Export everything in DBC format pydbr workspace export -f DBC -o ./.dev/export. # When path is folder, export is recursive pydbr workspace export -o ./.dev/export-utils 'Utils' # Export single ITEM pydbr workspace export -o ./.dev/GetML 'Utils/Download MovieLens.py' ``` ### Runs This command group implements the [`jobs/runs` Databricks REST API](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit). #### Submit a notebook Implements: [https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit) ```bash $ pydbr runs submit "Utils/Download MovieLens" ``` ``` {"run_id": 4} ``` You can retrieve the job information using `runs get`: ```bash $ pydbr runs get 4 -i 3 ``` If you need to pass parameters, use the `--parameters` or `-p` option and specify JSON text. ```bash $ pydbr runs submit -p '{"run_tag":"20250103"}' "Utils/Download MovieLens" ``` You can refer also to parameters in JSON file: ```bash $ pydbr runs submit -p '@params.json' "Utils/Download MovieLens" ``` You can use the parameters in the notebook and will also be able to see them in the run metadata: ```bash pydbr runs get-output -i 3 8 ``` ```json { "notebook_output": { "result": "Downloaded files (tag: 20250103): README.txt, links.csv, movies.csv, ratings.csv, tags.csv", "truncated": false }, "error": null, "metadata": { "job_id": 8, "run_id": 8, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens", "base_parameters": { "run_tag": "20250103" } } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyyy-zzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyyy-zzzzzzzz", "spark_context_id": "8734983498349834" }, "overriding_parameters": null, "start_time": 1592067357734, "setup_duration": 0, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592067355", "run_page_url": "https://westeurope.azuredatabricks.net/?o=89349849834#job/8/run/1", "run_type": "SUBMIT_RUN" } } ``` #### Get run metadata Implements: [Databricks REST runs/get](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-get) ```bash $ pydbr runs get -i 3 6 ``` ```json { "job_id": 6, "run_id": 6, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyy-zzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyy-zzzzzz", "spark_context_id": "783487348734873873" }, "overriding_parameters": null, "start_time": 1592062497162, "setup_duration": 0, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592062494", "run_page_url": "https://westeurope.azuredatabricks.net/?o=398348734873487#job/6/run/1", "run_type": "SUBMIT_RUN" } ``` #### List Runs Implements: [Databricks REST runs/list](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-list) ```bash $ pydbr runs ls ``` To get only the runs for a particular job: ```bash # Get job with job-id=4 $ pydbr runs ls 4 -i 3 ``` ```json { "runs": [ { "job_id": 4, "run_id": 4, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "PENDING", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxxx-yyyy-zzzzzzz" }, "cluster_instance": { "cluster_id": "xxxxx-yyyy-zzzzzzz" }, "overriding_parameters": null, "start_time": 1592058826123, "setup_duration": 0, "execution_duration": 0, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592058823", "run_page_url": "https://westeurope.azuredatabricks.net/?o=abcdefghasdf#job/4/run/1", "run_type": "SUBMIT_RUN" } ], "has_more": false } ``` #### Export run Implements: [Databricks REST runs/export](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-export) ```bash $ pydbr runs export --content-only 4 > .dev/run-view.html ``` #### Get run output Implements: [Databricks REST runs/get-output](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-get-output) ```bash $ pydbr runs get-output -i 3 6 ``` ```json { "notebook_output": { "result": "Downloaded files: README.txt, links.csv, movies.csv, ratings.csv, tags.csv", "truncated": false }, "error": null, "metadata": { "job_id": 5, "run_id": 5, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyy-zzzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyy-zzzzzzz", "spark_context_id": "8973498743973498" }, "overriding_parameters": null, "start_time": 1592062147101, "setup_duration": 1000, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592062135", "run_page_url": "https://westeurope.azuredatabricks.net/?o=89798374987987#job/5/run/1", "run_type": "SUBMIT_RUN" } } ``` To get only the exit output: ```bash $ pydbr runs get-output -r 6 ``` ``` Downloaded files: README.txt, links.csv, movies.csv, ratings.csv, tags.csv ``` ## Python Client SDK for Databricks REST APIs To implement your own Databricks REST API client, you can use the Python Client SDK for Databricks REST APIs. ### Create Databricks connection ```python # Get Databricks workspace connection dbc = pydbr.connect( bearer_token='dapixyzabcd09rasdf', url='https://westeurope.azuredatabricks.net') ``` ### DBFS ```python # Get list of items at path /FileStore dbc.dbfs.ls('/FileStore') # Check if file or directory exists dbc.dbfs.exists('/path/to/heaven') # Make a directory and it's parents dbc.dbfs.mkdirs('/path/to/heaven') # Delete a directory recusively dbc.dbfs.rm('/path', recursive=True) # Download file block starting 1024 with size 2048 dbc.dbfs.read('/data/movies.csv', 1024, 2048) # Download entire file dbc.dbfs.read_all('/data/movies.csv') ``` ### Databricks workspace ```python # List root workspace directory dbc.workspace.ls('/') # Check if workspace item exists dbc.workspace.exists('/explore') # Check if workspace item is a directory dbc.workspace.is_directory('/') # Export notebook in default (SOURCE) format dbc.workspace.export('/my_notebook') # Export notebook in HTML format dbc.workspace.export('/my_notebook', 'HTML') ``` ## Build and publish ```bash pip install wheel twine python setup.py sdist bdist_wheel python -m twine upload dist/* ``` %package -n python3-pydbr Summary: Databricks client SDK with command line client for Databricks REST APIs Provides: python-pydbr BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-pydbr # pydbr Databricks client SDK for Python with command line interface for Databricks REST APIs. {:toc} ## Introduction Pydbr (short of Python-Databricks) package provides python SDK for Databricks REST API: * dbfs * workspace * jobs * runs The package also comes with a useful CLI which might be very helpful in automation. ## Installation ```bash $ pip install pydbr ``` ## Databricks CLI Databricks command line client provides convenient way to interact with Databricks cluster at the command line. A very popular use of such approach in in automation tasks, like DevOps pipelines or third party workflow managers. You can call the Databricks CLI using convenient shell command `pydbr`: ```bash $ pydbr --help ``` or using python module: ```bash $ python -m pydbr.cli --help ``` To connect to the Databricks cluster, you can supply arguments at the command line: * `--bearer-token` * `--url` * `--cluster-id` Alternatively, you can define environment variables. Command line arguments take precedence. ```bash export DATABRICKS_URL='https://westeurope.azuredatabricks.net/' export DATABRICKS_BEARER_TOKEN='dapixyz89u9ufsdfd0' export DATABRICKS_CLUSTER_ID='1234-456778-abc234' export DATABRICKS_ORG_ID='87287878293983984' ``` ### DBFS #### List DBFS items ```bash # List items on DBFS pydbr dbfs ls --json-indent 3 FileStore/movielens ``` ```bash [ { "path": "/FileStore/movielens/ml-latest-small", "is_dir": true, "file_size": 0, "is_file": false, "human_size": "0 B" } ] ``` #### Download file from DBFS ```bash # Download a file and print to STDOUT pydbr dbfs get ml-latest-small/movies.csv ``` #### Download directory from DBFS ```bash # Download recursively entire directory and store locally pydbr dbfs get -o ml-local ml-latest-small ``` ### Workspace Databricks workspace contains notebooks and other items. #### List workspace ```bash #################### # List workspace # Default path is root - '/' $ pydbr workspace ls # auto-add leading '/' $ pydbr workspace ls 'Users' # Space-indentend json output with number of spaces $ pydbr workspace --json-indent 4 ls # Custom indent string $ pydbr workspace ls --json-indent='>' ``` #### Export items from Databricks workspace ```bash ##################### # Export workspace items # Export everything in source format using defaults: format=SOURCE, path=/ pydbr workspace export -o ./.dev/export # Export everything in DBC format pydbr workspace export -f DBC -o ./.dev/export. # When path is folder, export is recursive pydbr workspace export -o ./.dev/export-utils 'Utils' # Export single ITEM pydbr workspace export -o ./.dev/GetML 'Utils/Download MovieLens.py' ``` ### Runs This command group implements the [`jobs/runs` Databricks REST API](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit). #### Submit a notebook Implements: [https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit) ```bash $ pydbr runs submit "Utils/Download MovieLens" ``` ``` {"run_id": 4} ``` You can retrieve the job information using `runs get`: ```bash $ pydbr runs get 4 -i 3 ``` If you need to pass parameters, use the `--parameters` or `-p` option and specify JSON text. ```bash $ pydbr runs submit -p '{"run_tag":"20250103"}' "Utils/Download MovieLens" ``` You can refer also to parameters in JSON file: ```bash $ pydbr runs submit -p '@params.json' "Utils/Download MovieLens" ``` You can use the parameters in the notebook and will also be able to see them in the run metadata: ```bash pydbr runs get-output -i 3 8 ``` ```json { "notebook_output": { "result": "Downloaded files (tag: 20250103): README.txt, links.csv, movies.csv, ratings.csv, tags.csv", "truncated": false }, "error": null, "metadata": { "job_id": 8, "run_id": 8, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens", "base_parameters": { "run_tag": "20250103" } } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyyy-zzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyyy-zzzzzzzz", "spark_context_id": "8734983498349834" }, "overriding_parameters": null, "start_time": 1592067357734, "setup_duration": 0, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592067355", "run_page_url": "https://westeurope.azuredatabricks.net/?o=89349849834#job/8/run/1", "run_type": "SUBMIT_RUN" } } ``` #### Get run metadata Implements: [Databricks REST runs/get](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-get) ```bash $ pydbr runs get -i 3 6 ``` ```json { "job_id": 6, "run_id": 6, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyy-zzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyy-zzzzzz", "spark_context_id": "783487348734873873" }, "overriding_parameters": null, "start_time": 1592062497162, "setup_duration": 0, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592062494", "run_page_url": "https://westeurope.azuredatabricks.net/?o=398348734873487#job/6/run/1", "run_type": "SUBMIT_RUN" } ``` #### List Runs Implements: [Databricks REST runs/list](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-list) ```bash $ pydbr runs ls ``` To get only the runs for a particular job: ```bash # Get job with job-id=4 $ pydbr runs ls 4 -i 3 ``` ```json { "runs": [ { "job_id": 4, "run_id": 4, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "PENDING", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxxx-yyyy-zzzzzzz" }, "cluster_instance": { "cluster_id": "xxxxx-yyyy-zzzzzzz" }, "overriding_parameters": null, "start_time": 1592058826123, "setup_duration": 0, "execution_duration": 0, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592058823", "run_page_url": "https://westeurope.azuredatabricks.net/?o=abcdefghasdf#job/4/run/1", "run_type": "SUBMIT_RUN" } ], "has_more": false } ``` #### Export run Implements: [Databricks REST runs/export](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-export) ```bash $ pydbr runs export --content-only 4 > .dev/run-view.html ``` #### Get run output Implements: [Databricks REST runs/get-output](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-get-output) ```bash $ pydbr runs get-output -i 3 6 ``` ```json { "notebook_output": { "result": "Downloaded files: README.txt, links.csv, movies.csv, ratings.csv, tags.csv", "truncated": false }, "error": null, "metadata": { "job_id": 5, "run_id": 5, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyy-zzzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyy-zzzzzzz", "spark_context_id": "8973498743973498" }, "overriding_parameters": null, "start_time": 1592062147101, "setup_duration": 1000, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592062135", "run_page_url": "https://westeurope.azuredatabricks.net/?o=89798374987987#job/5/run/1", "run_type": "SUBMIT_RUN" } } ``` To get only the exit output: ```bash $ pydbr runs get-output -r 6 ``` ``` Downloaded files: README.txt, links.csv, movies.csv, ratings.csv, tags.csv ``` ## Python Client SDK for Databricks REST APIs To implement your own Databricks REST API client, you can use the Python Client SDK for Databricks REST APIs. ### Create Databricks connection ```python # Get Databricks workspace connection dbc = pydbr.connect( bearer_token='dapixyzabcd09rasdf', url='https://westeurope.azuredatabricks.net') ``` ### DBFS ```python # Get list of items at path /FileStore dbc.dbfs.ls('/FileStore') # Check if file or directory exists dbc.dbfs.exists('/path/to/heaven') # Make a directory and it's parents dbc.dbfs.mkdirs('/path/to/heaven') # Delete a directory recusively dbc.dbfs.rm('/path', recursive=True) # Download file block starting 1024 with size 2048 dbc.dbfs.read('/data/movies.csv', 1024, 2048) # Download entire file dbc.dbfs.read_all('/data/movies.csv') ``` ### Databricks workspace ```python # List root workspace directory dbc.workspace.ls('/') # Check if workspace item exists dbc.workspace.exists('/explore') # Check if workspace item is a directory dbc.workspace.is_directory('/') # Export notebook in default (SOURCE) format dbc.workspace.export('/my_notebook') # Export notebook in HTML format dbc.workspace.export('/my_notebook', 'HTML') ``` ## Build and publish ```bash pip install wheel twine python setup.py sdist bdist_wheel python -m twine upload dist/* ``` %package help Summary: Development documents and examples for pydbr Provides: python3-pydbr-doc %description help # pydbr Databricks client SDK for Python with command line interface for Databricks REST APIs. {:toc} ## Introduction Pydbr (short of Python-Databricks) package provides python SDK for Databricks REST API: * dbfs * workspace * jobs * runs The package also comes with a useful CLI which might be very helpful in automation. ## Installation ```bash $ pip install pydbr ``` ## Databricks CLI Databricks command line client provides convenient way to interact with Databricks cluster at the command line. A very popular use of such approach in in automation tasks, like DevOps pipelines or third party workflow managers. You can call the Databricks CLI using convenient shell command `pydbr`: ```bash $ pydbr --help ``` or using python module: ```bash $ python -m pydbr.cli --help ``` To connect to the Databricks cluster, you can supply arguments at the command line: * `--bearer-token` * `--url` * `--cluster-id` Alternatively, you can define environment variables. Command line arguments take precedence. ```bash export DATABRICKS_URL='https://westeurope.azuredatabricks.net/' export DATABRICKS_BEARER_TOKEN='dapixyz89u9ufsdfd0' export DATABRICKS_CLUSTER_ID='1234-456778-abc234' export DATABRICKS_ORG_ID='87287878293983984' ``` ### DBFS #### List DBFS items ```bash # List items on DBFS pydbr dbfs ls --json-indent 3 FileStore/movielens ``` ```bash [ { "path": "/FileStore/movielens/ml-latest-small", "is_dir": true, "file_size": 0, "is_file": false, "human_size": "0 B" } ] ``` #### Download file from DBFS ```bash # Download a file and print to STDOUT pydbr dbfs get ml-latest-small/movies.csv ``` #### Download directory from DBFS ```bash # Download recursively entire directory and store locally pydbr dbfs get -o ml-local ml-latest-small ``` ### Workspace Databricks workspace contains notebooks and other items. #### List workspace ```bash #################### # List workspace # Default path is root - '/' $ pydbr workspace ls # auto-add leading '/' $ pydbr workspace ls 'Users' # Space-indentend json output with number of spaces $ pydbr workspace --json-indent 4 ls # Custom indent string $ pydbr workspace ls --json-indent='>' ``` #### Export items from Databricks workspace ```bash ##################### # Export workspace items # Export everything in source format using defaults: format=SOURCE, path=/ pydbr workspace export -o ./.dev/export # Export everything in DBC format pydbr workspace export -f DBC -o ./.dev/export. # When path is folder, export is recursive pydbr workspace export -o ./.dev/export-utils 'Utils' # Export single ITEM pydbr workspace export -o ./.dev/GetML 'Utils/Download MovieLens.py' ``` ### Runs This command group implements the [`jobs/runs` Databricks REST API](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit). #### Submit a notebook Implements: [https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit) ```bash $ pydbr runs submit "Utils/Download MovieLens" ``` ``` {"run_id": 4} ``` You can retrieve the job information using `runs get`: ```bash $ pydbr runs get 4 -i 3 ``` If you need to pass parameters, use the `--parameters` or `-p` option and specify JSON text. ```bash $ pydbr runs submit -p '{"run_tag":"20250103"}' "Utils/Download MovieLens" ``` You can refer also to parameters in JSON file: ```bash $ pydbr runs submit -p '@params.json' "Utils/Download MovieLens" ``` You can use the parameters in the notebook and will also be able to see them in the run metadata: ```bash pydbr runs get-output -i 3 8 ``` ```json { "notebook_output": { "result": "Downloaded files (tag: 20250103): README.txt, links.csv, movies.csv, ratings.csv, tags.csv", "truncated": false }, "error": null, "metadata": { "job_id": 8, "run_id": 8, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens", "base_parameters": { "run_tag": "20250103" } } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyyy-zzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyyy-zzzzzzzz", "spark_context_id": "8734983498349834" }, "overriding_parameters": null, "start_time": 1592067357734, "setup_duration": 0, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592067355", "run_page_url": "https://westeurope.azuredatabricks.net/?o=89349849834#job/8/run/1", "run_type": "SUBMIT_RUN" } } ``` #### Get run metadata Implements: [Databricks REST runs/get](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-get) ```bash $ pydbr runs get -i 3 6 ``` ```json { "job_id": 6, "run_id": 6, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyy-zzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyy-zzzzzz", "spark_context_id": "783487348734873873" }, "overriding_parameters": null, "start_time": 1592062497162, "setup_duration": 0, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592062494", "run_page_url": "https://westeurope.azuredatabricks.net/?o=398348734873487#job/6/run/1", "run_type": "SUBMIT_RUN" } ``` #### List Runs Implements: [Databricks REST runs/list](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-list) ```bash $ pydbr runs ls ``` To get only the runs for a particular job: ```bash # Get job with job-id=4 $ pydbr runs ls 4 -i 3 ``` ```json { "runs": [ { "job_id": 4, "run_id": 4, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "PENDING", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxxx-yyyy-zzzzzzz" }, "cluster_instance": { "cluster_id": "xxxxx-yyyy-zzzzzzz" }, "overriding_parameters": null, "start_time": 1592058826123, "setup_duration": 0, "execution_duration": 0, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592058823", "run_page_url": "https://westeurope.azuredatabricks.net/?o=abcdefghasdf#job/4/run/1", "run_type": "SUBMIT_RUN" } ], "has_more": false } ``` #### Export run Implements: [Databricks REST runs/export](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-export) ```bash $ pydbr runs export --content-only 4 > .dev/run-view.html ``` #### Get run output Implements: [Databricks REST runs/get-output](https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-get-output) ```bash $ pydbr runs get-output -i 3 6 ``` ```json { "notebook_output": { "result": "Downloaded files: README.txt, links.csv, movies.csv, ratings.csv, tags.csv", "truncated": false }, "error": null, "metadata": { "job_id": 5, "run_id": 5, "creator_user_name": "your.name@gmail.com", "number_in_job": 1, "original_attempt_run_id": null, "state": { "life_cycle_state": "TERMINATED", "result_state": "SUCCESS", "state_message": "" }, "schedule": null, "task": { "notebook_task": { "notebook_path": "/Utils/Download MovieLens" } }, "cluster_spec": { "existing_cluster_id": "xxxx-yyyyy-zzzzzzz" }, "cluster_instance": { "cluster_id": "xxxx-yyyyy-zzzzzzz", "spark_context_id": "8973498743973498" }, "overriding_parameters": null, "start_time": 1592062147101, "setup_duration": 1000, "execution_duration": 11000, "cleanup_duration": 0, "trigger": null, "run_name": "pydbr-1592062135", "run_page_url": "https://westeurope.azuredatabricks.net/?o=89798374987987#job/5/run/1", "run_type": "SUBMIT_RUN" } } ``` To get only the exit output: ```bash $ pydbr runs get-output -r 6 ``` ``` Downloaded files: README.txt, links.csv, movies.csv, ratings.csv, tags.csv ``` ## Python Client SDK for Databricks REST APIs To implement your own Databricks REST API client, you can use the Python Client SDK for Databricks REST APIs. ### Create Databricks connection ```python # Get Databricks workspace connection dbc = pydbr.connect( bearer_token='dapixyzabcd09rasdf', url='https://westeurope.azuredatabricks.net') ``` ### DBFS ```python # Get list of items at path /FileStore dbc.dbfs.ls('/FileStore') # Check if file or directory exists dbc.dbfs.exists('/path/to/heaven') # Make a directory and it's parents dbc.dbfs.mkdirs('/path/to/heaven') # Delete a directory recusively dbc.dbfs.rm('/path', recursive=True) # Download file block starting 1024 with size 2048 dbc.dbfs.read('/data/movies.csv', 1024, 2048) # Download entire file dbc.dbfs.read_all('/data/movies.csv') ``` ### Databricks workspace ```python # List root workspace directory dbc.workspace.ls('/') # Check if workspace item exists dbc.workspace.exists('/explore') # Check if workspace item is a directory dbc.workspace.is_directory('/') # Export notebook in default (SOURCE) format dbc.workspace.export('/my_notebook') # Export notebook in HTML format dbc.workspace.export('/my_notebook', 'HTML') ``` ## Build and publish ```bash pip install wheel twine python setup.py sdist bdist_wheel python -m twine upload dist/* ``` %prep %autosetup -n pydbr-0.0.7 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-pydbr -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri May 05 2023 Python_Bot - 0.0.7-1 - Package Spec generated