summaryrefslogtreecommitdiff
path: root/python-dask-sql.spec
diff options
context:
space:
mode:
Diffstat (limited to 'python-dask-sql.spec')
-rw-r--r--python-dask-sql.spec311
1 files changed, 311 insertions, 0 deletions
diff --git a/python-dask-sql.spec b/python-dask-sql.spec
new file mode 100644
index 0000000..7fcf21a
--- /dev/null
+++ b/python-dask-sql.spec
@@ -0,0 +1,311 @@
+%global _empty_manifest_terminate_build 0
+Name: python-dask-sql
+Version: 2023.4.0
+Release: 1
+Summary: SQL query layer for Dask
+License: MIT
+URL: https://github.com/dask-contrib/dask-sql/
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/95/ba/82ec4a5f7e766f66c22b3a5d447a458fe09702a0e965a978b8cea422dff1/dask_sql-2023.4.0.tar.gz
+
+
+%description
+## Example
+For this example, we use some data loaded from disk and query them with a SQL command from our python code.
+Any pandas or dask dataframe can be used as input and ``dask-sql`` understands a large amount of formats (csv, parquet, json,...) and locations (s3, hdfs, gcs,...).
+```python
+import dask.dataframe as dd
+from dask_sql import Context
+# Create a context to hold the registered tables
+c = Context()
+# Load the data and register it in the context
+# This will give the table a name, that we can use in queries
+df = dd.read_csv("...")
+c.create_table("my_data", df)
+# Now execute a SQL query. The result is again dask dataframe.
+result = c.sql("""
+ SELECT
+ my_data.name,
+ SUM(my_data.x)
+ FROM
+ my_data
+ GROUP BY
+ my_data.name
+""", return_futures=False)
+# Show the result
+print(result)
+```
+## Quickstart
+Have a look into the [documentation](https://dask-sql.readthedocs.io/en/latest/) or start the example notebook on [binder](https://mybinder.org/v2/gh/dask-contrib/dask-sql-binder/main?urlpath=lab).
+> `dask-sql` is currently under development and does so far not understand all SQL commands (but a large fraction).
+We are actively looking for feedback, improvements and contributors!
+## Installation
+`dask-sql` can be installed via `conda` (preferred) or `pip` - or in a development environment.
+### With `conda`
+Create a new conda environment or use your already present environment:
+ conda create -n dask-sql
+ conda activate dask-sql
+Install the package from the `conda-forge` channel:
+ conda install dask-sql -c conda-forge
+### With `pip`
+You can install the package with
+ pip install dask-sql
+### For development
+If you want to have the newest (unreleased) `dask-sql` version or if you plan to do development on `dask-sql`, you can also install the package from sources.
+ git clone https://github.com/dask-contrib/dask-sql.git
+Create a new conda environment and install the development environment:
+ conda env create -f continuous_integration/environment-3.9-dev.yaml
+It is not recommended to use `pip` instead of `conda` for the environment setup.
+After that, you can install the package in development mode
+ pip install -e ".[dev]"
+The Rust DataFusion bindings are built as part of the `pip install`.
+If changes are made to the Rust source in `dask_planner/`, another build/install must be run to recompile the bindings:
+ python setup.py build install
+This repository uses [pre-commit](https://pre-commit.com/) hooks. To install them, call
+ pre-commit install
+## Testing
+You can run the tests (after installation) with
+ pytest tests
+GPU-specific tests require additional dependencies specified in `continuous_integration/gpuci/environment.yaml`.
+These can be added to the development environment by running
+```
+conda env update -n dask-sql -f continuous_integration/gpuci/environment.yaml
+```
+And GPU-specific tests can be run with
+```
+pytest tests -m gpu --rungpu
+```
+## SQL Server
+`dask-sql` comes with a small test implementation for a SQL server.
+Instead of rebuilding a full ODBC driver, we re-use the [presto wire protocol](https://github.com/prestodb/presto/wiki/HTTP-Protocol).
+It is - so far - only a start of the development and missing important concepts, such as
+authentication.
+You can test the sql presto server by running (after installation)
+ dask-sql-server
+or by using the created docker image
+ docker run --rm -it -p 8080:8080 nbraun/dask-sql
+in one terminal. This will spin up a server on port 8080 (by default)
+that looks similar to a normal presto database to any presto client.
+You can test this for example with the default [presto client](https://prestosql.io/docs/current/installation/cli.html):
+ presto --server localhost:8080
+Now you can fire simple SQL queries (as no data is loaded by default):
+ => SELECT 1 + 1;
+
+%package -n python3-dask-sql
+Summary: SQL query layer for Dask
+Provides: python-dask-sql
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+BuildRequires: python3-cffi
+BuildRequires: gcc
+BuildRequires: gdb
+%description -n python3-dask-sql
+## Example
+For this example, we use some data loaded from disk and query them with a SQL command from our python code.
+Any pandas or dask dataframe can be used as input and ``dask-sql`` understands a large amount of formats (csv, parquet, json,...) and locations (s3, hdfs, gcs,...).
+```python
+import dask.dataframe as dd
+from dask_sql import Context
+# Create a context to hold the registered tables
+c = Context()
+# Load the data and register it in the context
+# This will give the table a name, that we can use in queries
+df = dd.read_csv("...")
+c.create_table("my_data", df)
+# Now execute a SQL query. The result is again dask dataframe.
+result = c.sql("""
+ SELECT
+ my_data.name,
+ SUM(my_data.x)
+ FROM
+ my_data
+ GROUP BY
+ my_data.name
+""", return_futures=False)
+# Show the result
+print(result)
+```
+## Quickstart
+Have a look into the [documentation](https://dask-sql.readthedocs.io/en/latest/) or start the example notebook on [binder](https://mybinder.org/v2/gh/dask-contrib/dask-sql-binder/main?urlpath=lab).
+> `dask-sql` is currently under development and does so far not understand all SQL commands (but a large fraction).
+We are actively looking for feedback, improvements and contributors!
+## Installation
+`dask-sql` can be installed via `conda` (preferred) or `pip` - or in a development environment.
+### With `conda`
+Create a new conda environment or use your already present environment:
+ conda create -n dask-sql
+ conda activate dask-sql
+Install the package from the `conda-forge` channel:
+ conda install dask-sql -c conda-forge
+### With `pip`
+You can install the package with
+ pip install dask-sql
+### For development
+If you want to have the newest (unreleased) `dask-sql` version or if you plan to do development on `dask-sql`, you can also install the package from sources.
+ git clone https://github.com/dask-contrib/dask-sql.git
+Create a new conda environment and install the development environment:
+ conda env create -f continuous_integration/environment-3.9-dev.yaml
+It is not recommended to use `pip` instead of `conda` for the environment setup.
+After that, you can install the package in development mode
+ pip install -e ".[dev]"
+The Rust DataFusion bindings are built as part of the `pip install`.
+If changes are made to the Rust source in `dask_planner/`, another build/install must be run to recompile the bindings:
+ python setup.py build install
+This repository uses [pre-commit](https://pre-commit.com/) hooks. To install them, call
+ pre-commit install
+## Testing
+You can run the tests (after installation) with
+ pytest tests
+GPU-specific tests require additional dependencies specified in `continuous_integration/gpuci/environment.yaml`.
+These can be added to the development environment by running
+```
+conda env update -n dask-sql -f continuous_integration/gpuci/environment.yaml
+```
+And GPU-specific tests can be run with
+```
+pytest tests -m gpu --rungpu
+```
+## SQL Server
+`dask-sql` comes with a small test implementation for a SQL server.
+Instead of rebuilding a full ODBC driver, we re-use the [presto wire protocol](https://github.com/prestodb/presto/wiki/HTTP-Protocol).
+It is - so far - only a start of the development and missing important concepts, such as
+authentication.
+You can test the sql presto server by running (after installation)
+ dask-sql-server
+or by using the created docker image
+ docker run --rm -it -p 8080:8080 nbraun/dask-sql
+in one terminal. This will spin up a server on port 8080 (by default)
+that looks similar to a normal presto database to any presto client.
+You can test this for example with the default [presto client](https://prestosql.io/docs/current/installation/cli.html):
+ presto --server localhost:8080
+Now you can fire simple SQL queries (as no data is loaded by default):
+ => SELECT 1 + 1;
+
+%package help
+Summary: Development documents and examples for dask-sql
+Provides: python3-dask-sql-doc
+%description help
+## Example
+For this example, we use some data loaded from disk and query them with a SQL command from our python code.
+Any pandas or dask dataframe can be used as input and ``dask-sql`` understands a large amount of formats (csv, parquet, json,...) and locations (s3, hdfs, gcs,...).
+```python
+import dask.dataframe as dd
+from dask_sql import Context
+# Create a context to hold the registered tables
+c = Context()
+# Load the data and register it in the context
+# This will give the table a name, that we can use in queries
+df = dd.read_csv("...")
+c.create_table("my_data", df)
+# Now execute a SQL query. The result is again dask dataframe.
+result = c.sql("""
+ SELECT
+ my_data.name,
+ SUM(my_data.x)
+ FROM
+ my_data
+ GROUP BY
+ my_data.name
+""", return_futures=False)
+# Show the result
+print(result)
+```
+## Quickstart
+Have a look into the [documentation](https://dask-sql.readthedocs.io/en/latest/) or start the example notebook on [binder](https://mybinder.org/v2/gh/dask-contrib/dask-sql-binder/main?urlpath=lab).
+> `dask-sql` is currently under development and does so far not understand all SQL commands (but a large fraction).
+We are actively looking for feedback, improvements and contributors!
+## Installation
+`dask-sql` can be installed via `conda` (preferred) or `pip` - or in a development environment.
+### With `conda`
+Create a new conda environment or use your already present environment:
+ conda create -n dask-sql
+ conda activate dask-sql
+Install the package from the `conda-forge` channel:
+ conda install dask-sql -c conda-forge
+### With `pip`
+You can install the package with
+ pip install dask-sql
+### For development
+If you want to have the newest (unreleased) `dask-sql` version or if you plan to do development on `dask-sql`, you can also install the package from sources.
+ git clone https://github.com/dask-contrib/dask-sql.git
+Create a new conda environment and install the development environment:
+ conda env create -f continuous_integration/environment-3.9-dev.yaml
+It is not recommended to use `pip` instead of `conda` for the environment setup.
+After that, you can install the package in development mode
+ pip install -e ".[dev]"
+The Rust DataFusion bindings are built as part of the `pip install`.
+If changes are made to the Rust source in `dask_planner/`, another build/install must be run to recompile the bindings:
+ python setup.py build install
+This repository uses [pre-commit](https://pre-commit.com/) hooks. To install them, call
+ pre-commit install
+## Testing
+You can run the tests (after installation) with
+ pytest tests
+GPU-specific tests require additional dependencies specified in `continuous_integration/gpuci/environment.yaml`.
+These can be added to the development environment by running
+```
+conda env update -n dask-sql -f continuous_integration/gpuci/environment.yaml
+```
+And GPU-specific tests can be run with
+```
+pytest tests -m gpu --rungpu
+```
+## SQL Server
+`dask-sql` comes with a small test implementation for a SQL server.
+Instead of rebuilding a full ODBC driver, we re-use the [presto wire protocol](https://github.com/prestodb/presto/wiki/HTTP-Protocol).
+It is - so far - only a start of the development and missing important concepts, such as
+authentication.
+You can test the sql presto server by running (after installation)
+ dask-sql-server
+or by using the created docker image
+ docker run --rm -it -p 8080:8080 nbraun/dask-sql
+in one terminal. This will spin up a server on port 8080 (by default)
+that looks similar to a normal presto database to any presto client.
+You can test this for example with the default [presto client](https://prestosql.io/docs/current/installation/cli.html):
+ presto --server localhost:8080
+Now you can fire simple SQL queries (as no data is loaded by default):
+ => SELECT 1 + 1;
+
+%prep
+%autosetup -n dask-sql-2023.4.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-dask-sql -f filelist.lst
+%dir %{python3_sitearch}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 2023.4.0-1
+- Package Spec generated