From b315d911bf23de72e5a3a08907071e119fcb5174 Mon Sep 17 00:00:00 2001 From: CoprDistGit Date: Fri, 21 Apr 2023 16:46:19 +0000 Subject: automatic import of python-modin --- .gitignore | 1 + python-modin.spec | 1628 ++++++++++++++++++++++++++--------------------------- sources | 2 +- 3 files changed, 813 insertions(+), 818 deletions(-) diff --git a/.gitignore b/.gitignore index 9350351..c85e006 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ /modin-0.19.0.tar.gz +/modin-0.20.0.tar.gz diff --git a/python-modin.spec b/python-modin.spec index 05e339e..6448126 100644 --- a/python-modin.spec +++ b/python-modin.spec @@ -1,11 +1,11 @@ %global _empty_manifest_terminate_build 0 Name: python-modin -Version: 0.19.0 +Version: 0.20.0 Release: 1 Summary: Modin: Make your pandas code run faster by changing one line of code. License: Apache 2 URL: https://github.com/modin-project/modin -Source0: https://mirrors.nju.edu.cn/pypi/web/packages/f0/21/d8756af2ce7441a043415ff65e08d7ed28af213dfff8a918c99dfd356af4/modin-0.19.0.tar.gz +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/88/4b/e99be23c29463b14a28aea8043787adac11804406cae24c2af5f15c2b158/modin-0.20.0.tar.gz BuildArch: noarch Requires: python3-pandas @@ -35,277 +35,275 @@ Requires: python3-pyparsing Requires: python3-unidist[mpi] %description -

-

Scale your pandas workflows by changing one line of code

- -
- -|

Dev Community & Support

|

Forums

|

Socials

|

Docs

| -|:---: | :---: | :---: | :---: | -| [![Slack](https://img.shields.io/badge/Slack-4A154B?style=for-the-badge&logo=slack&logoColor=white)](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) | [![Stack Overflow](https://img.shields.io/badge/-Stackoverflow-FE7A16?style=for-the-badge&logo=stack-overflow&logoColor=white)](https://stackoverflow.com/questions/tagged/modin) | Twitter Follow | | - -
- -

- - - -PyPI version - -

- -### What is Modin? - -Modin is a drop-in replacement for [pandas](https://github.com/pandas-dev/pandas). While pandas is -single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your -cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs -[out of memory](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html). - -By simply replacing the import statement, Modin offers users effortless speed and scale for their pandas workflows: - - - -In the GIFs below, Modin (left) and pandas (right) perform *the same pandas operations* on a 2GB dataset. The only difference between the two notebook examples is the import statement. - - - - - - - - - - - - - - -
- -The charts below show the speedup you get by replacing pandas with Modin based on the examples above. The example notebooks can be found [here](examples/jupyter). To learn more about the speedups you could get with Modin and try out some examples on your own, check out our [10-minute quickstart guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) to try out some examples on your own! - - - -### Installation - -#### From PyPI - -Modin can be installed with `pip` on Linux, Windows and MacOS: - -```bash -pip install modin[all] # (Recommended) Install Modin with all of Modin's currently supported engines. -``` - -If you want to install Modin with a specific engine, we recommend: - -```bash -pip install modin[ray] # Install Modin dependencies and Ray. -pip install modin[dask] # Install Modin dependencies and Dask. -pip install modin[unidist] # Install Modin dependencies and Unidist to run on Unidist -``` - -Modin automatically detects which engine(s) you have installed and uses that for scheduling computation. - -#### From conda-forge - -Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` -will install Modin and four engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask), -[Unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk). - -```bash -conda install -c conda-forge modin-all -``` - -Each engine can also be installed individually (and also as a combination of several engines): - -```bash -conda install -c conda-forge modin-ray # Install Modin dependencies and Ray. -conda install -c conda-forge modin-dask # Install Modin dependencies and Dask. -conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist. -conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK. -``` - -To speed up conda installation we recommend using libmamba solver. To do this install it in a base environment: - -```bash -conda install -n base conda-libmamba-solver -``` - -and then use it during istallation either like: - -```bash -conda install -c conda-forge modin-ray modin-hdk --experimental-solver=libmamba -``` - -or starting from conda 22.11 and libmamba solver 22.12 versions: - -```bash -conda install -c conda-forge modin-ray modin-hdk --solver=libmamba -``` - -#### Choosing a Compute Engine - -If you want to choose a specific compute engine to run on, you can set the environment -variable `MODIN_ENGINE` and Modin will do computation with that engine: - -```bash -export MODIN_ENGINE=ray # Modin will use Ray -export MODIN_ENGINE=dask # Modin will use Dask -export MODIN_ENGINE=unidist # Modin will use Unidist -``` - -If you want to choose the Unidist engine, you should set the additional environment -variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: - -```bash -export UNIDIST_BACKEND=mpi # Unidist will use MPI backend -``` - -This can also be done within a notebook/interpreter before you import Modin: - -```python -import modin.config as modin_cfg -import unidist.config as unidist_cfg - -modin_cfg.Engine.put("ray") # Modin will use Ray -modin_cfg.Engine.put("dask") # Modin will use Dask - -modin_cfg.Engine.put('unidist') # Modin will use Unidist -unidist_cfg.Backend.put('mpi') # Unidist will use MPI backend -``` - -Check [this Modin docs section](https://modin.readthedocs.io/en/latest/development/using_hdk.html) for HDK engine setup. - -_Note: You should not change the engine after your first operation with Modin as it will result in undefined behavior._ - -#### Which engine should I use? - -On Linux, MacOS, and Windows you can install and use either Ray, Dask or Unidist. There is no knowledge required -to use either of these engines as Modin abstracts away all of the complexity, so feel -free to pick either! - -On Linux you also can choose [HDK](https://modin.readthedocs.io/en/latest/development/using_hdk.html), which is an experimental -engine based on [HDK](https://github.com/intel-ai/hdk) and included in the -[Intel® Distribution of Modin](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html), -which is a part of [Intel® oneAPI AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html). - -### Pandas API Coverage - -

- -| pandas Object | Modin's Ray Engine Coverage | Modin's Dask Engine Coverage | Modin's Unidist Engine Coverage | -|-------------------|:------------------------------------------------------------------------------------:|:---------------:|:---------------:| -| `pd.DataFrame` | | | | -| `pd.Series` | | | -| `pd.read_csv` | ✅ | ✅ | ✅ | -| `pd.read_table` | ✅ | ✅ | ✅ | -| `pd.read_parquet` | ✅ | ✅ | ✅ | -| `pd.read_sql` | ✅ | ✅ | ✅ | -| `pd.read_feather` | ✅ | ✅ | ✅ | -| `pd.read_excel` | ✅ | ✅ | ✅ | -| `pd.read_json` | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | -| `pd.read_` | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | - -

-Some pandas APIs are easier to implement than others, so if something is missing feel -free to open an issue! - -### More about Modin - -For the complete documentation on Modin, visit our [ReadTheDocs](https://modin.readthedocs.io/en/latest/index.html) page. - -#### Scale your pandas workflow by changing a single line of code. - -_Note: In local mode (without a cluster), Modin will create and manage a local (Dask or Ray) cluster for the execution._ - -To use Modin, you do not need to specify how to distribute the data, or even know how many -cores your system has. In fact, you can continue using your previous -pandas notebooks while experiencing a considerable speedup from Modin, even on a single -machine. Once you've changed your import statement, you're ready to use Modin just like -you would with pandas! - -#### Faster pandas, even on your laptop - - - -The `modin.pandas` DataFrame is an extremely light-weight parallel DataFrame. -Modin transparently distributes the data and computation so that you can continue using the same pandas API -while working with more data faster. Because it is so light-weight, -Modin provides speed-ups of up to 4x on a laptop with 4 physical cores. - -In pandas, you are only able to use one core at a time when you are doing computation of -any kind. With Modin, you are able to use all of the CPU cores on your machine. Even with a -traditionally synchronous task like `read_csv`, we see large speedups by efficiently -distributing the work across your entire machine. - -```python -import modin.pandas as pd - -df = pd.read_csv("my_dataset.csv") -``` - -#### Modin can handle the datasets that pandas can't - -Often data scientists have to switch between different tools -for operating on datasets of different sizes. Processing large dataframes with pandas -is slow, and pandas does not support working with dataframes that are too large to fit -into the available memory. As a result, pandas workflows that work well -for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size -of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably -work with hundreds of GBs without worrying about substantial slowdown or memory errors. -With [cluster](https://modin.readthedocs.io/en/latest/getting_started/using_modin/using_modin_cluster.html) -and [out of core](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html) -support, Modin is a DataFrame library with both great single-node performance and high -scalability in a cluster. - -#### Modin Architecture - -We designed [Modin's architecture](https://modin.readthedocs.io/en/latest/development/architecture.html) -to be modular so we can plug in different components as they develop and improve: - -Modin's architecture - -### Other Resources - -#### Getting Started with Modin - -- [Documentation](https://modin.readthedocs.io/en/latest/) -- [10-min Quickstart Guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) -- [Examples and Tutorials](https://modin.readthedocs.io/en/latest/getting_started/examples.html) -- [Videos and Blogposts](https://modin.readthedocs.io/en/latest/getting_started/examples.html#talks-podcasts) -- [Benchmarking Modin](https://modin.readthedocs.io/en/latest/usage_guide/benchmarking.html) - -#### Modin Community - -- [Slack](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) -- [Discourse](https://discuss.modin.org) -- [Twitter](https://twitter.com/modin_project) -- [Mailing List](https://groups.google.com/g/modin-dev) -- [GitHub Issues](https://github.com/modin-project/modin/issues) -- [StackOverflow](https://stackoverflow.com/questions/tagged/modin) - -#### Learn More about Modin - -- [Frequently Asked Questions (FAQs)](https://modin.readthedocs.io/en/latest/getting_started/faq.html) -- [Troubleshooting Guide](https://modin.readthedocs.io/en/latest/getting_started/troubleshooting.html) -- [Development Guide](https://modin.readthedocs.io/en/latest/development/index.html) -- Modin is built on many years of research and development at UC Berkeley. Check out these selected papers to learn more about how Modin works: - - [Flexible Rule-Based Decomposition and Metadata Independence in Modin](https://people.eecs.berkeley.edu/~totemtang/paper/Modin.pdf) (VLDB 2021) - - [Dataframe Systems: Theory, Architecture, and Implementation](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-193.pdf) (PhD Dissertation 2021) - - [Towards Scalable Dataframe Systems](https://arxiv.org/pdf/2001.00888.pdf) (VLDB 2020) - -#### Getting Involved - -***`modin.pandas` is currently under active development. Requests and contributions are welcome!*** - -For more information on how to contribute to Modin, check out the -[Modin Contribution Guide](https://modin.readthedocs.io/en/latest/development/contributing.html). - -### License - -[Apache License 2.0](LICENSE) - - +

+

Scale your pandas workflows by changing one line of code

+ +
+ +|

Dev Community & Support

|

Forums

|

Socials

|

Docs

| +|:---: | :---: | :---: | :---: | +| [![Slack](https://img.shields.io/badge/Slack-4A154B?style=for-the-badge&logo=slack&logoColor=white)](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) | [![Stack Overflow](https://img.shields.io/badge/-Stackoverflow-FE7A16?style=for-the-badge&logo=stack-overflow&logoColor=white)](https://stackoverflow.com/questions/tagged/modin) | Twitter Follow | | + +
+ +

+ + + +PyPI version + +

+ +### What is Modin? + +Modin is a drop-in replacement for [pandas](https://github.com/pandas-dev/pandas). While pandas is +single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your +cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs +[out of memory](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html). + +By simply replacing the import statement, Modin offers users effortless speed and scale for their pandas workflows: + + + +In the GIFs below, Modin (left) and pandas (right) perform *the same pandas operations* on a 2GB dataset. The only difference between the two notebook examples is the import statement. + + + + + + + + + + + + + + +
+ +The charts below show the speedup you get by replacing pandas with Modin based on the examples above. The example notebooks can be found [here](examples/jupyter). To learn more about the speedups you could get with Modin and try out some examples on your own, check out our [10-minute quickstart guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) to try out some examples on your own! + + + +### Installation + +#### From PyPI + +Modin can be installed with `pip` on Linux, Windows and MacOS: + +```bash +pip install "modin[all]" # (Recommended) Install Modin with all of Modin's currently supported engines. +``` + +If you want to install Modin with a specific engine, we recommend: + +```bash +pip install "modin[ray]" # Install Modin dependencies and Ray. +pip install "modin[dask]" # Install Modin dependencies and Dask. +pip install "modin[unidist]" # Install Modin dependencies and Unidist. +``` + +Modin automatically detects which engine(s) you have installed and uses that for scheduling computation. + +#### From conda-forge + +Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` +will install Modin and four engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask), +[Unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk). + +```bash +conda install -c conda-forge modin-all +``` + +Each engine can also be installed individually (and also as a combination of several engines): + +```bash +conda install -c conda-forge modin-ray # Install Modin dependencies and Ray. +conda install -c conda-forge modin-dask # Install Modin dependencies and Dask. +conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist. +conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK. +``` + +To speed up conda installation we recommend using libmamba solver. To do this install it in a base environment: + +```bash +conda install -n base conda-libmamba-solver +``` + +and then use it during istallation either like: + +```bash +conda install -c conda-forge modin-ray modin-hdk --experimental-solver=libmamba +``` + +or starting from conda 22.11 and libmamba solver 22.12 versions: + +```bash +conda install -c conda-forge modin-ray modin-hdk --solver=libmamba +``` + +#### Choosing a Compute Engine + +If you want to choose a specific compute engine to run on, you can set the environment +variable `MODIN_ENGINE` and Modin will do computation with that engine: + +```bash +export MODIN_ENGINE=ray # Modin will use Ray +export MODIN_ENGINE=dask # Modin will use Dask +export MODIN_ENGINE=unidist # Modin will use Unidist +``` + +If you want to choose the Unidist engine, you should set the additional environment +variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: + +```bash +export UNIDIST_BACKEND=mpi # Unidist will use MPI backend +``` + +This can also be done within a notebook/interpreter before you import Modin: + +```python +import modin.config as modin_cfg +import unidist.config as unidist_cfg + +modin_cfg.Engine.put("ray") # Modin will use Ray +modin_cfg.Engine.put("dask") # Modin will use Dask + +modin_cfg.Engine.put('unidist') # Modin will use Unidist +unidist_cfg.Backend.put('mpi') # Unidist will use MPI backend +``` + +Check [this Modin docs section](https://modin.readthedocs.io/en/latest/development/using_hdk.html) for HDK engine setup. + +_Note: You should not change the engine after your first operation with Modin as it will result in undefined behavior._ + +#### Which engine should I use? + +On Linux, MacOS, and Windows you can install and use either Ray, Dask or Unidist. There is no knowledge required +to use either of these engines as Modin abstracts away all of the complexity, so feel +free to pick either! + +On Linux you also can choose [HDK](https://modin.readthedocs.io/en/latest/development/using_hdk.html), which is an experimental +engine based on [HDK](https://github.com/intel-ai/hdk) and included in the +[Intel® Distribution of Modin](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html), +which is a part of [Intel® oneAPI AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html). + +### Pandas API Coverage + +

+ +| pandas Object | Modin's Ray Engine Coverage | Modin's Dask Engine Coverage | Modin's Unidist Engine Coverage | +|-------------------|:------------------------------------------------------------------------------------:|:---------------:|:---------------:| +| `pd.DataFrame` | | | | +| `pd.Series` | | | +| `pd.read_csv` | ✅ | ✅ | ✅ | +| `pd.read_table` | ✅ | ✅ | ✅ | +| `pd.read_parquet` | ✅ | ✅ | ✅ | +| `pd.read_sql` | ✅ | ✅ | ✅ | +| `pd.read_feather` | ✅ | ✅ | ✅ | +| `pd.read_excel` | ✅ | ✅ | ✅ | +| `pd.read_json` | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | +| `pd.read_` | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | + +

+Some pandas APIs are easier to implement than others, so if something is missing feel +free to open an issue! + +### More about Modin + +For the complete documentation on Modin, visit our [ReadTheDocs](https://modin.readthedocs.io/en/latest/index.html) page. + +#### Scale your pandas workflow by changing a single line of code. + +_Note: In local mode (without a cluster), Modin will create and manage a local (Dask or Ray) cluster for the execution._ + +To use Modin, you do not need to specify how to distribute the data, or even know how many +cores your system has. In fact, you can continue using your previous +pandas notebooks while experiencing a considerable speedup from Modin, even on a single +machine. Once you've changed your import statement, you're ready to use Modin just like +you would with pandas! + +#### Faster pandas, even on your laptop + + + +The `modin.pandas` DataFrame is an extremely light-weight parallel DataFrame. +Modin transparently distributes the data and computation so that you can continue using the same pandas API +while working with more data faster. Because it is so light-weight, +Modin provides speed-ups of up to 4x on a laptop with 4 physical cores. + +In pandas, you are only able to use one core at a time when you are doing computation of +any kind. With Modin, you are able to use all of the CPU cores on your machine. Even with a +traditionally synchronous task like `read_csv`, we see large speedups by efficiently +distributing the work across your entire machine. + +```python +import modin.pandas as pd + +df = pd.read_csv("my_dataset.csv") +``` + +#### Modin can handle the datasets that pandas can't + +Often data scientists have to switch between different tools +for operating on datasets of different sizes. Processing large dataframes with pandas +is slow, and pandas does not support working with dataframes that are too large to fit +into the available memory. As a result, pandas workflows that work well +for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size +of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably +work with hundreds of GBs without worrying about substantial slowdown or memory errors. +With [cluster](https://modin.readthedocs.io/en/latest/getting_started/using_modin/using_modin_cluster.html) +and [out of core](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html) +support, Modin is a DataFrame library with both great single-node performance and high +scalability in a cluster. + +#### Modin Architecture + +We designed [Modin's architecture](https://modin.readthedocs.io/en/latest/development/architecture.html) +to be modular so we can plug in different components as they develop and improve: + +Modin's architecture + +### Other Resources + +#### Getting Started with Modin + +- [Documentation](https://modin.readthedocs.io/en/latest/) +- [10-min Quickstart Guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) +- [Examples and Tutorials](https://modin.readthedocs.io/en/latest/getting_started/examples.html) +- [Videos and Blogposts](https://modin.readthedocs.io/en/latest/getting_started/examples.html#talks-podcasts) +- [Benchmarking Modin](https://modin.readthedocs.io/en/latest/usage_guide/benchmarking.html) + +#### Modin Community + +- [Slack](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) +- [Discourse](https://discuss.modin.org) +- [Twitter](https://twitter.com/modin_project) +- [Mailing List](https://groups.google.com/g/modin-dev) +- [GitHub Issues](https://github.com/modin-project/modin/issues) +- [StackOverflow](https://stackoverflow.com/questions/tagged/modin) + +#### Learn More about Modin + +- [Frequently Asked Questions (FAQs)](https://modin.readthedocs.io/en/latest/getting_started/faq.html) +- [Troubleshooting Guide](https://modin.readthedocs.io/en/latest/getting_started/troubleshooting.html) +- [Development Guide](https://modin.readthedocs.io/en/latest/development/index.html) +- Modin is built on many years of research and development at UC Berkeley. Check out these selected papers to learn more about how Modin works: + - [Flexible Rule-Based Decomposition and Metadata Independence in Modin](https://people.eecs.berkeley.edu/~totemtang/paper/Modin.pdf) (VLDB 2021) + - [Dataframe Systems: Theory, Architecture, and Implementation](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-193.pdf) (PhD Dissertation 2021) + - [Towards Scalable Dataframe Systems](https://arxiv.org/pdf/2001.00888.pdf) (VLDB 2020) + +#### Getting Involved + +***`modin.pandas` is currently under active development. Requests and contributions are welcome!*** + +For more information on how to contribute to Modin, check out the +[Modin Contribution Guide](https://modin.readthedocs.io/en/latest/development/contributing.html). + +### License + +[Apache License 2.0](LICENSE) %package -n python3-modin @@ -315,558 +313,554 @@ BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-modin -

-

Scale your pandas workflows by changing one line of code

- -
- -|

Dev Community & Support

|

Forums

|

Socials

|

Docs

| -|:---: | :---: | :---: | :---: | -| [![Slack](https://img.shields.io/badge/Slack-4A154B?style=for-the-badge&logo=slack&logoColor=white)](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) | [![Stack Overflow](https://img.shields.io/badge/-Stackoverflow-FE7A16?style=for-the-badge&logo=stack-overflow&logoColor=white)](https://stackoverflow.com/questions/tagged/modin) | Twitter Follow | | - -
- -

- - - -PyPI version - -

- -### What is Modin? - -Modin is a drop-in replacement for [pandas](https://github.com/pandas-dev/pandas). While pandas is -single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your -cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs -[out of memory](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html). - -By simply replacing the import statement, Modin offers users effortless speed and scale for their pandas workflows: - - - -In the GIFs below, Modin (left) and pandas (right) perform *the same pandas operations* on a 2GB dataset. The only difference between the two notebook examples is the import statement. - - - - - - - - - - - - - - -
- -The charts below show the speedup you get by replacing pandas with Modin based on the examples above. The example notebooks can be found [here](examples/jupyter). To learn more about the speedups you could get with Modin and try out some examples on your own, check out our [10-minute quickstart guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) to try out some examples on your own! - - - -### Installation - -#### From PyPI - -Modin can be installed with `pip` on Linux, Windows and MacOS: - -```bash -pip install modin[all] # (Recommended) Install Modin with all of Modin's currently supported engines. -``` - -If you want to install Modin with a specific engine, we recommend: - -```bash -pip install modin[ray] # Install Modin dependencies and Ray. -pip install modin[dask] # Install Modin dependencies and Dask. -pip install modin[unidist] # Install Modin dependencies and Unidist to run on Unidist -``` - -Modin automatically detects which engine(s) you have installed and uses that for scheduling computation. - -#### From conda-forge - -Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` -will install Modin and four engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask), -[Unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk). - -```bash -conda install -c conda-forge modin-all -``` - -Each engine can also be installed individually (and also as a combination of several engines): - -```bash -conda install -c conda-forge modin-ray # Install Modin dependencies and Ray. -conda install -c conda-forge modin-dask # Install Modin dependencies and Dask. -conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist. -conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK. -``` - -To speed up conda installation we recommend using libmamba solver. To do this install it in a base environment: - -```bash -conda install -n base conda-libmamba-solver -``` - -and then use it during istallation either like: - -```bash -conda install -c conda-forge modin-ray modin-hdk --experimental-solver=libmamba -``` - -or starting from conda 22.11 and libmamba solver 22.12 versions: - -```bash -conda install -c conda-forge modin-ray modin-hdk --solver=libmamba -``` - -#### Choosing a Compute Engine - -If you want to choose a specific compute engine to run on, you can set the environment -variable `MODIN_ENGINE` and Modin will do computation with that engine: - -```bash -export MODIN_ENGINE=ray # Modin will use Ray -export MODIN_ENGINE=dask # Modin will use Dask -export MODIN_ENGINE=unidist # Modin will use Unidist -``` - -If you want to choose the Unidist engine, you should set the additional environment -variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: - -```bash -export UNIDIST_BACKEND=mpi # Unidist will use MPI backend -``` - -This can also be done within a notebook/interpreter before you import Modin: - -```python -import modin.config as modin_cfg -import unidist.config as unidist_cfg - -modin_cfg.Engine.put("ray") # Modin will use Ray -modin_cfg.Engine.put("dask") # Modin will use Dask - -modin_cfg.Engine.put('unidist') # Modin will use Unidist -unidist_cfg.Backend.put('mpi') # Unidist will use MPI backend -``` - -Check [this Modin docs section](https://modin.readthedocs.io/en/latest/development/using_hdk.html) for HDK engine setup. - -_Note: You should not change the engine after your first operation with Modin as it will result in undefined behavior._ - -#### Which engine should I use? - -On Linux, MacOS, and Windows you can install and use either Ray, Dask or Unidist. There is no knowledge required -to use either of these engines as Modin abstracts away all of the complexity, so feel -free to pick either! - -On Linux you also can choose [HDK](https://modin.readthedocs.io/en/latest/development/using_hdk.html), which is an experimental -engine based on [HDK](https://github.com/intel-ai/hdk) and included in the -[Intel® Distribution of Modin](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html), -which is a part of [Intel® oneAPI AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html). - -### Pandas API Coverage - -

- -| pandas Object | Modin's Ray Engine Coverage | Modin's Dask Engine Coverage | Modin's Unidist Engine Coverage | -|-------------------|:------------------------------------------------------------------------------------:|:---------------:|:---------------:| -| `pd.DataFrame` | | | | -| `pd.Series` | | | -| `pd.read_csv` | ✅ | ✅ | ✅ | -| `pd.read_table` | ✅ | ✅ | ✅ | -| `pd.read_parquet` | ✅ | ✅ | ✅ | -| `pd.read_sql` | ✅ | ✅ | ✅ | -| `pd.read_feather` | ✅ | ✅ | ✅ | -| `pd.read_excel` | ✅ | ✅ | ✅ | -| `pd.read_json` | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | -| `pd.read_` | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | - -

-Some pandas APIs are easier to implement than others, so if something is missing feel -free to open an issue! - -### More about Modin - -For the complete documentation on Modin, visit our [ReadTheDocs](https://modin.readthedocs.io/en/latest/index.html) page. - -#### Scale your pandas workflow by changing a single line of code. - -_Note: In local mode (without a cluster), Modin will create and manage a local (Dask or Ray) cluster for the execution._ - -To use Modin, you do not need to specify how to distribute the data, or even know how many -cores your system has. In fact, you can continue using your previous -pandas notebooks while experiencing a considerable speedup from Modin, even on a single -machine. Once you've changed your import statement, you're ready to use Modin just like -you would with pandas! - -#### Faster pandas, even on your laptop - - - -The `modin.pandas` DataFrame is an extremely light-weight parallel DataFrame. -Modin transparently distributes the data and computation so that you can continue using the same pandas API -while working with more data faster. Because it is so light-weight, -Modin provides speed-ups of up to 4x on a laptop with 4 physical cores. - -In pandas, you are only able to use one core at a time when you are doing computation of -any kind. With Modin, you are able to use all of the CPU cores on your machine. Even with a -traditionally synchronous task like `read_csv`, we see large speedups by efficiently -distributing the work across your entire machine. - -```python -import modin.pandas as pd - -df = pd.read_csv("my_dataset.csv") -``` - -#### Modin can handle the datasets that pandas can't - -Often data scientists have to switch between different tools -for operating on datasets of different sizes. Processing large dataframes with pandas -is slow, and pandas does not support working with dataframes that are too large to fit -into the available memory. As a result, pandas workflows that work well -for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size -of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably -work with hundreds of GBs without worrying about substantial slowdown or memory errors. -With [cluster](https://modin.readthedocs.io/en/latest/getting_started/using_modin/using_modin_cluster.html) -and [out of core](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html) -support, Modin is a DataFrame library with both great single-node performance and high -scalability in a cluster. - -#### Modin Architecture - -We designed [Modin's architecture](https://modin.readthedocs.io/en/latest/development/architecture.html) -to be modular so we can plug in different components as they develop and improve: - -Modin's architecture - -### Other Resources - -#### Getting Started with Modin - -- [Documentation](https://modin.readthedocs.io/en/latest/) -- [10-min Quickstart Guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) -- [Examples and Tutorials](https://modin.readthedocs.io/en/latest/getting_started/examples.html) -- [Videos and Blogposts](https://modin.readthedocs.io/en/latest/getting_started/examples.html#talks-podcasts) -- [Benchmarking Modin](https://modin.readthedocs.io/en/latest/usage_guide/benchmarking.html) - -#### Modin Community - -- [Slack](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) -- [Discourse](https://discuss.modin.org) -- [Twitter](https://twitter.com/modin_project) -- [Mailing List](https://groups.google.com/g/modin-dev) -- [GitHub Issues](https://github.com/modin-project/modin/issues) -- [StackOverflow](https://stackoverflow.com/questions/tagged/modin) - -#### Learn More about Modin - -- [Frequently Asked Questions (FAQs)](https://modin.readthedocs.io/en/latest/getting_started/faq.html) -- [Troubleshooting Guide](https://modin.readthedocs.io/en/latest/getting_started/troubleshooting.html) -- [Development Guide](https://modin.readthedocs.io/en/latest/development/index.html) -- Modin is built on many years of research and development at UC Berkeley. Check out these selected papers to learn more about how Modin works: - - [Flexible Rule-Based Decomposition and Metadata Independence in Modin](https://people.eecs.berkeley.edu/~totemtang/paper/Modin.pdf) (VLDB 2021) - - [Dataframe Systems: Theory, Architecture, and Implementation](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-193.pdf) (PhD Dissertation 2021) - - [Towards Scalable Dataframe Systems](https://arxiv.org/pdf/2001.00888.pdf) (VLDB 2020) - -#### Getting Involved - -***`modin.pandas` is currently under active development. Requests and contributions are welcome!*** - -For more information on how to contribute to Modin, check out the -[Modin Contribution Guide](https://modin.readthedocs.io/en/latest/development/contributing.html). - -### License - -[Apache License 2.0](LICENSE) - - +

+

Scale your pandas workflows by changing one line of code

+ +
+ +|

Dev Community & Support

|

Forums

|

Socials

|

Docs

| +|:---: | :---: | :---: | :---: | +| [![Slack](https://img.shields.io/badge/Slack-4A154B?style=for-the-badge&logo=slack&logoColor=white)](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) | [![Stack Overflow](https://img.shields.io/badge/-Stackoverflow-FE7A16?style=for-the-badge&logo=stack-overflow&logoColor=white)](https://stackoverflow.com/questions/tagged/modin) | Twitter Follow | | + +
+ +

+ + + +PyPI version + +

+ +### What is Modin? + +Modin is a drop-in replacement for [pandas](https://github.com/pandas-dev/pandas). While pandas is +single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your +cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs +[out of memory](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html). + +By simply replacing the import statement, Modin offers users effortless speed and scale for their pandas workflows: + + + +In the GIFs below, Modin (left) and pandas (right) perform *the same pandas operations* on a 2GB dataset. The only difference between the two notebook examples is the import statement. + + + + + + + + + + + + + + +
+ +The charts below show the speedup you get by replacing pandas with Modin based on the examples above. The example notebooks can be found [here](examples/jupyter). To learn more about the speedups you could get with Modin and try out some examples on your own, check out our [10-minute quickstart guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) to try out some examples on your own! + + + +### Installation + +#### From PyPI + +Modin can be installed with `pip` on Linux, Windows and MacOS: + +```bash +pip install "modin[all]" # (Recommended) Install Modin with all of Modin's currently supported engines. +``` + +If you want to install Modin with a specific engine, we recommend: + +```bash +pip install "modin[ray]" # Install Modin dependencies and Ray. +pip install "modin[dask]" # Install Modin dependencies and Dask. +pip install "modin[unidist]" # Install Modin dependencies and Unidist. +``` + +Modin automatically detects which engine(s) you have installed and uses that for scheduling computation. + +#### From conda-forge + +Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` +will install Modin and four engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask), +[Unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk). + +```bash +conda install -c conda-forge modin-all +``` + +Each engine can also be installed individually (and also as a combination of several engines): + +```bash +conda install -c conda-forge modin-ray # Install Modin dependencies and Ray. +conda install -c conda-forge modin-dask # Install Modin dependencies and Dask. +conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist. +conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK. +``` + +To speed up conda installation we recommend using libmamba solver. To do this install it in a base environment: + +```bash +conda install -n base conda-libmamba-solver +``` + +and then use it during istallation either like: + +```bash +conda install -c conda-forge modin-ray modin-hdk --experimental-solver=libmamba +``` + +or starting from conda 22.11 and libmamba solver 22.12 versions: + +```bash +conda install -c conda-forge modin-ray modin-hdk --solver=libmamba +``` + +#### Choosing a Compute Engine + +If you want to choose a specific compute engine to run on, you can set the environment +variable `MODIN_ENGINE` and Modin will do computation with that engine: + +```bash +export MODIN_ENGINE=ray # Modin will use Ray +export MODIN_ENGINE=dask # Modin will use Dask +export MODIN_ENGINE=unidist # Modin will use Unidist +``` + +If you want to choose the Unidist engine, you should set the additional environment +variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: + +```bash +export UNIDIST_BACKEND=mpi # Unidist will use MPI backend +``` + +This can also be done within a notebook/interpreter before you import Modin: + +```python +import modin.config as modin_cfg +import unidist.config as unidist_cfg + +modin_cfg.Engine.put("ray") # Modin will use Ray +modin_cfg.Engine.put("dask") # Modin will use Dask + +modin_cfg.Engine.put('unidist') # Modin will use Unidist +unidist_cfg.Backend.put('mpi') # Unidist will use MPI backend +``` + +Check [this Modin docs section](https://modin.readthedocs.io/en/latest/development/using_hdk.html) for HDK engine setup. + +_Note: You should not change the engine after your first operation with Modin as it will result in undefined behavior._ + +#### Which engine should I use? + +On Linux, MacOS, and Windows you can install and use either Ray, Dask or Unidist. There is no knowledge required +to use either of these engines as Modin abstracts away all of the complexity, so feel +free to pick either! + +On Linux you also can choose [HDK](https://modin.readthedocs.io/en/latest/development/using_hdk.html), which is an experimental +engine based on [HDK](https://github.com/intel-ai/hdk) and included in the +[Intel® Distribution of Modin](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html), +which is a part of [Intel® oneAPI AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html). + +### Pandas API Coverage + +

+ +| pandas Object | Modin's Ray Engine Coverage | Modin's Dask Engine Coverage | Modin's Unidist Engine Coverage | +|-------------------|:------------------------------------------------------------------------------------:|:---------------:|:---------------:| +| `pd.DataFrame` | | | | +| `pd.Series` | | | +| `pd.read_csv` | ✅ | ✅ | ✅ | +| `pd.read_table` | ✅ | ✅ | ✅ | +| `pd.read_parquet` | ✅ | ✅ | ✅ | +| `pd.read_sql` | ✅ | ✅ | ✅ | +| `pd.read_feather` | ✅ | ✅ | ✅ | +| `pd.read_excel` | ✅ | ✅ | ✅ | +| `pd.read_json` | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | +| `pd.read_` | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | + +

+Some pandas APIs are easier to implement than others, so if something is missing feel +free to open an issue! + +### More about Modin + +For the complete documentation on Modin, visit our [ReadTheDocs](https://modin.readthedocs.io/en/latest/index.html) page. + +#### Scale your pandas workflow by changing a single line of code. + +_Note: In local mode (without a cluster), Modin will create and manage a local (Dask or Ray) cluster for the execution._ + +To use Modin, you do not need to specify how to distribute the data, or even know how many +cores your system has. In fact, you can continue using your previous +pandas notebooks while experiencing a considerable speedup from Modin, even on a single +machine. Once you've changed your import statement, you're ready to use Modin just like +you would with pandas! + +#### Faster pandas, even on your laptop + + + +The `modin.pandas` DataFrame is an extremely light-weight parallel DataFrame. +Modin transparently distributes the data and computation so that you can continue using the same pandas API +while working with more data faster. Because it is so light-weight, +Modin provides speed-ups of up to 4x on a laptop with 4 physical cores. + +In pandas, you are only able to use one core at a time when you are doing computation of +any kind. With Modin, you are able to use all of the CPU cores on your machine. Even with a +traditionally synchronous task like `read_csv`, we see large speedups by efficiently +distributing the work across your entire machine. + +```python +import modin.pandas as pd + +df = pd.read_csv("my_dataset.csv") +``` + +#### Modin can handle the datasets that pandas can't + +Often data scientists have to switch between different tools +for operating on datasets of different sizes. Processing large dataframes with pandas +is slow, and pandas does not support working with dataframes that are too large to fit +into the available memory. As a result, pandas workflows that work well +for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size +of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably +work with hundreds of GBs without worrying about substantial slowdown or memory errors. +With [cluster](https://modin.readthedocs.io/en/latest/getting_started/using_modin/using_modin_cluster.html) +and [out of core](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html) +support, Modin is a DataFrame library with both great single-node performance and high +scalability in a cluster. + +#### Modin Architecture + +We designed [Modin's architecture](https://modin.readthedocs.io/en/latest/development/architecture.html) +to be modular so we can plug in different components as they develop and improve: + +Modin's architecture + +### Other Resources + +#### Getting Started with Modin + +- [Documentation](https://modin.readthedocs.io/en/latest/) +- [10-min Quickstart Guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) +- [Examples and Tutorials](https://modin.readthedocs.io/en/latest/getting_started/examples.html) +- [Videos and Blogposts](https://modin.readthedocs.io/en/latest/getting_started/examples.html#talks-podcasts) +- [Benchmarking Modin](https://modin.readthedocs.io/en/latest/usage_guide/benchmarking.html) + +#### Modin Community + +- [Slack](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) +- [Discourse](https://discuss.modin.org) +- [Twitter](https://twitter.com/modin_project) +- [Mailing List](https://groups.google.com/g/modin-dev) +- [GitHub Issues](https://github.com/modin-project/modin/issues) +- [StackOverflow](https://stackoverflow.com/questions/tagged/modin) + +#### Learn More about Modin + +- [Frequently Asked Questions (FAQs)](https://modin.readthedocs.io/en/latest/getting_started/faq.html) +- [Troubleshooting Guide](https://modin.readthedocs.io/en/latest/getting_started/troubleshooting.html) +- [Development Guide](https://modin.readthedocs.io/en/latest/development/index.html) +- Modin is built on many years of research and development at UC Berkeley. Check out these selected papers to learn more about how Modin works: + - [Flexible Rule-Based Decomposition and Metadata Independence in Modin](https://people.eecs.berkeley.edu/~totemtang/paper/Modin.pdf) (VLDB 2021) + - [Dataframe Systems: Theory, Architecture, and Implementation](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-193.pdf) (PhD Dissertation 2021) + - [Towards Scalable Dataframe Systems](https://arxiv.org/pdf/2001.00888.pdf) (VLDB 2020) + +#### Getting Involved + +***`modin.pandas` is currently under active development. Requests and contributions are welcome!*** + +For more information on how to contribute to Modin, check out the +[Modin Contribution Guide](https://modin.readthedocs.io/en/latest/development/contributing.html). + +### License + +[Apache License 2.0](LICENSE) %package help Summary: Development documents and examples for modin Provides: python3-modin-doc %description help -

-

Scale your pandas workflows by changing one line of code

- -
- -|

Dev Community & Support

|

Forums

|

Socials

|

Docs

| -|:---: | :---: | :---: | :---: | -| [![Slack](https://img.shields.io/badge/Slack-4A154B?style=for-the-badge&logo=slack&logoColor=white)](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) | [![Stack Overflow](https://img.shields.io/badge/-Stackoverflow-FE7A16?style=for-the-badge&logo=stack-overflow&logoColor=white)](https://stackoverflow.com/questions/tagged/modin) | Twitter Follow | | - -
- -

- - - -PyPI version - -

- -### What is Modin? - -Modin is a drop-in replacement for [pandas](https://github.com/pandas-dev/pandas). While pandas is -single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your -cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs -[out of memory](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html). - -By simply replacing the import statement, Modin offers users effortless speed and scale for their pandas workflows: - - - -In the GIFs below, Modin (left) and pandas (right) perform *the same pandas operations* on a 2GB dataset. The only difference between the two notebook examples is the import statement. - - - - - - - - - - - - - - -
- -The charts below show the speedup you get by replacing pandas with Modin based on the examples above. The example notebooks can be found [here](examples/jupyter). To learn more about the speedups you could get with Modin and try out some examples on your own, check out our [10-minute quickstart guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) to try out some examples on your own! - - - -### Installation - -#### From PyPI - -Modin can be installed with `pip` on Linux, Windows and MacOS: - -```bash -pip install modin[all] # (Recommended) Install Modin with all of Modin's currently supported engines. -``` - -If you want to install Modin with a specific engine, we recommend: - -```bash -pip install modin[ray] # Install Modin dependencies and Ray. -pip install modin[dask] # Install Modin dependencies and Dask. -pip install modin[unidist] # Install Modin dependencies and Unidist to run on Unidist -``` - -Modin automatically detects which engine(s) you have installed and uses that for scheduling computation. - -#### From conda-forge - -Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` -will install Modin and four engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask), -[Unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk). - -```bash -conda install -c conda-forge modin-all -``` - -Each engine can also be installed individually (and also as a combination of several engines): - -```bash -conda install -c conda-forge modin-ray # Install Modin dependencies and Ray. -conda install -c conda-forge modin-dask # Install Modin dependencies and Dask. -conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist. -conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK. -``` - -To speed up conda installation we recommend using libmamba solver. To do this install it in a base environment: - -```bash -conda install -n base conda-libmamba-solver -``` - -and then use it during istallation either like: - -```bash -conda install -c conda-forge modin-ray modin-hdk --experimental-solver=libmamba -``` - -or starting from conda 22.11 and libmamba solver 22.12 versions: - -```bash -conda install -c conda-forge modin-ray modin-hdk --solver=libmamba -``` - -#### Choosing a Compute Engine - -If you want to choose a specific compute engine to run on, you can set the environment -variable `MODIN_ENGINE` and Modin will do computation with that engine: - -```bash -export MODIN_ENGINE=ray # Modin will use Ray -export MODIN_ENGINE=dask # Modin will use Dask -export MODIN_ENGINE=unidist # Modin will use Unidist -``` - -If you want to choose the Unidist engine, you should set the additional environment -variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: - -```bash -export UNIDIST_BACKEND=mpi # Unidist will use MPI backend -``` - -This can also be done within a notebook/interpreter before you import Modin: - -```python -import modin.config as modin_cfg -import unidist.config as unidist_cfg - -modin_cfg.Engine.put("ray") # Modin will use Ray -modin_cfg.Engine.put("dask") # Modin will use Dask - -modin_cfg.Engine.put('unidist') # Modin will use Unidist -unidist_cfg.Backend.put('mpi') # Unidist will use MPI backend -``` - -Check [this Modin docs section](https://modin.readthedocs.io/en/latest/development/using_hdk.html) for HDK engine setup. - -_Note: You should not change the engine after your first operation with Modin as it will result in undefined behavior._ - -#### Which engine should I use? - -On Linux, MacOS, and Windows you can install and use either Ray, Dask or Unidist. There is no knowledge required -to use either of these engines as Modin abstracts away all of the complexity, so feel -free to pick either! - -On Linux you also can choose [HDK](https://modin.readthedocs.io/en/latest/development/using_hdk.html), which is an experimental -engine based on [HDK](https://github.com/intel-ai/hdk) and included in the -[Intel® Distribution of Modin](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html), -which is a part of [Intel® oneAPI AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html). - -### Pandas API Coverage - -

- -| pandas Object | Modin's Ray Engine Coverage | Modin's Dask Engine Coverage | Modin's Unidist Engine Coverage | -|-------------------|:------------------------------------------------------------------------------------:|:---------------:|:---------------:| -| `pd.DataFrame` | | | | -| `pd.Series` | | | -| `pd.read_csv` | ✅ | ✅ | ✅ | -| `pd.read_table` | ✅ | ✅ | ✅ | -| `pd.read_parquet` | ✅ | ✅ | ✅ | -| `pd.read_sql` | ✅ | ✅ | ✅ | -| `pd.read_feather` | ✅ | ✅ | ✅ | -| `pd.read_excel` | ✅ | ✅ | ✅ | -| `pd.read_json` | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | -| `pd.read_` | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | - -

-Some pandas APIs are easier to implement than others, so if something is missing feel -free to open an issue! - -### More about Modin - -For the complete documentation on Modin, visit our [ReadTheDocs](https://modin.readthedocs.io/en/latest/index.html) page. - -#### Scale your pandas workflow by changing a single line of code. - -_Note: In local mode (without a cluster), Modin will create and manage a local (Dask or Ray) cluster for the execution._ - -To use Modin, you do not need to specify how to distribute the data, or even know how many -cores your system has. In fact, you can continue using your previous -pandas notebooks while experiencing a considerable speedup from Modin, even on a single -machine. Once you've changed your import statement, you're ready to use Modin just like -you would with pandas! - -#### Faster pandas, even on your laptop - - - -The `modin.pandas` DataFrame is an extremely light-weight parallel DataFrame. -Modin transparently distributes the data and computation so that you can continue using the same pandas API -while working with more data faster. Because it is so light-weight, -Modin provides speed-ups of up to 4x on a laptop with 4 physical cores. - -In pandas, you are only able to use one core at a time when you are doing computation of -any kind. With Modin, you are able to use all of the CPU cores on your machine. Even with a -traditionally synchronous task like `read_csv`, we see large speedups by efficiently -distributing the work across your entire machine. - -```python -import modin.pandas as pd - -df = pd.read_csv("my_dataset.csv") -``` - -#### Modin can handle the datasets that pandas can't - -Often data scientists have to switch between different tools -for operating on datasets of different sizes. Processing large dataframes with pandas -is slow, and pandas does not support working with dataframes that are too large to fit -into the available memory. As a result, pandas workflows that work well -for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size -of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably -work with hundreds of GBs without worrying about substantial slowdown or memory errors. -With [cluster](https://modin.readthedocs.io/en/latest/getting_started/using_modin/using_modin_cluster.html) -and [out of core](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html) -support, Modin is a DataFrame library with both great single-node performance and high -scalability in a cluster. - -#### Modin Architecture - -We designed [Modin's architecture](https://modin.readthedocs.io/en/latest/development/architecture.html) -to be modular so we can plug in different components as they develop and improve: - -Modin's architecture - -### Other Resources - -#### Getting Started with Modin - -- [Documentation](https://modin.readthedocs.io/en/latest/) -- [10-min Quickstart Guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) -- [Examples and Tutorials](https://modin.readthedocs.io/en/latest/getting_started/examples.html) -- [Videos and Blogposts](https://modin.readthedocs.io/en/latest/getting_started/examples.html#talks-podcasts) -- [Benchmarking Modin](https://modin.readthedocs.io/en/latest/usage_guide/benchmarking.html) - -#### Modin Community - -- [Slack](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) -- [Discourse](https://discuss.modin.org) -- [Twitter](https://twitter.com/modin_project) -- [Mailing List](https://groups.google.com/g/modin-dev) -- [GitHub Issues](https://github.com/modin-project/modin/issues) -- [StackOverflow](https://stackoverflow.com/questions/tagged/modin) - -#### Learn More about Modin - -- [Frequently Asked Questions (FAQs)](https://modin.readthedocs.io/en/latest/getting_started/faq.html) -- [Troubleshooting Guide](https://modin.readthedocs.io/en/latest/getting_started/troubleshooting.html) -- [Development Guide](https://modin.readthedocs.io/en/latest/development/index.html) -- Modin is built on many years of research and development at UC Berkeley. Check out these selected papers to learn more about how Modin works: - - [Flexible Rule-Based Decomposition and Metadata Independence in Modin](https://people.eecs.berkeley.edu/~totemtang/paper/Modin.pdf) (VLDB 2021) - - [Dataframe Systems: Theory, Architecture, and Implementation](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-193.pdf) (PhD Dissertation 2021) - - [Towards Scalable Dataframe Systems](https://arxiv.org/pdf/2001.00888.pdf) (VLDB 2020) - -#### Getting Involved - -***`modin.pandas` is currently under active development. Requests and contributions are welcome!*** - -For more information on how to contribute to Modin, check out the -[Modin Contribution Guide](https://modin.readthedocs.io/en/latest/development/contributing.html). - -### License - -[Apache License 2.0](LICENSE) - - +

+

Scale your pandas workflows by changing one line of code

+ +
+ +|

Dev Community & Support

|

Forums

|

Socials

|

Docs

| +|:---: | :---: | :---: | :---: | +| [![Slack](https://img.shields.io/badge/Slack-4A154B?style=for-the-badge&logo=slack&logoColor=white)](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) | [![Stack Overflow](https://img.shields.io/badge/-Stackoverflow-FE7A16?style=for-the-badge&logo=stack-overflow&logoColor=white)](https://stackoverflow.com/questions/tagged/modin) | Twitter Follow | | + +
+ +

+ + + +PyPI version + +

+ +### What is Modin? + +Modin is a drop-in replacement for [pandas](https://github.com/pandas-dev/pandas). While pandas is +single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your +cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs +[out of memory](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html). + +By simply replacing the import statement, Modin offers users effortless speed and scale for their pandas workflows: + + + +In the GIFs below, Modin (left) and pandas (right) perform *the same pandas operations* on a 2GB dataset. The only difference between the two notebook examples is the import statement. + + + + + + + + + + + + + + +
+ +The charts below show the speedup you get by replacing pandas with Modin based on the examples above. The example notebooks can be found [here](examples/jupyter). To learn more about the speedups you could get with Modin and try out some examples on your own, check out our [10-minute quickstart guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) to try out some examples on your own! + + + +### Installation + +#### From PyPI + +Modin can be installed with `pip` on Linux, Windows and MacOS: + +```bash +pip install "modin[all]" # (Recommended) Install Modin with all of Modin's currently supported engines. +``` + +If you want to install Modin with a specific engine, we recommend: + +```bash +pip install "modin[ray]" # Install Modin dependencies and Ray. +pip install "modin[dask]" # Install Modin dependencies and Dask. +pip install "modin[unidist]" # Install Modin dependencies and Unidist. +``` + +Modin automatically detects which engine(s) you have installed and uses that for scheduling computation. + +#### From conda-forge + +Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` +will install Modin and four engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask), +[Unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk). + +```bash +conda install -c conda-forge modin-all +``` + +Each engine can also be installed individually (and also as a combination of several engines): + +```bash +conda install -c conda-forge modin-ray # Install Modin dependencies and Ray. +conda install -c conda-forge modin-dask # Install Modin dependencies and Dask. +conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist. +conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK. +``` + +To speed up conda installation we recommend using libmamba solver. To do this install it in a base environment: + +```bash +conda install -n base conda-libmamba-solver +``` + +and then use it during istallation either like: + +```bash +conda install -c conda-forge modin-ray modin-hdk --experimental-solver=libmamba +``` + +or starting from conda 22.11 and libmamba solver 22.12 versions: + +```bash +conda install -c conda-forge modin-ray modin-hdk --solver=libmamba +``` + +#### Choosing a Compute Engine + +If you want to choose a specific compute engine to run on, you can set the environment +variable `MODIN_ENGINE` and Modin will do computation with that engine: + +```bash +export MODIN_ENGINE=ray # Modin will use Ray +export MODIN_ENGINE=dask # Modin will use Dask +export MODIN_ENGINE=unidist # Modin will use Unidist +``` + +If you want to choose the Unidist engine, you should set the additional environment +variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: + +```bash +export UNIDIST_BACKEND=mpi # Unidist will use MPI backend +``` + +This can also be done within a notebook/interpreter before you import Modin: + +```python +import modin.config as modin_cfg +import unidist.config as unidist_cfg + +modin_cfg.Engine.put("ray") # Modin will use Ray +modin_cfg.Engine.put("dask") # Modin will use Dask + +modin_cfg.Engine.put('unidist') # Modin will use Unidist +unidist_cfg.Backend.put('mpi') # Unidist will use MPI backend +``` + +Check [this Modin docs section](https://modin.readthedocs.io/en/latest/development/using_hdk.html) for HDK engine setup. + +_Note: You should not change the engine after your first operation with Modin as it will result in undefined behavior._ + +#### Which engine should I use? + +On Linux, MacOS, and Windows you can install and use either Ray, Dask or Unidist. There is no knowledge required +to use either of these engines as Modin abstracts away all of the complexity, so feel +free to pick either! + +On Linux you also can choose [HDK](https://modin.readthedocs.io/en/latest/development/using_hdk.html), which is an experimental +engine based on [HDK](https://github.com/intel-ai/hdk) and included in the +[Intel® Distribution of Modin](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html), +which is a part of [Intel® oneAPI AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html). + +### Pandas API Coverage + +

+ +| pandas Object | Modin's Ray Engine Coverage | Modin's Dask Engine Coverage | Modin's Unidist Engine Coverage | +|-------------------|:------------------------------------------------------------------------------------:|:---------------:|:---------------:| +| `pd.DataFrame` | | | | +| `pd.Series` | | | +| `pd.read_csv` | ✅ | ✅ | ✅ | +| `pd.read_table` | ✅ | ✅ | ✅ | +| `pd.read_parquet` | ✅ | ✅ | ✅ | +| `pd.read_sql` | ✅ | ✅ | ✅ | +| `pd.read_feather` | ✅ | ✅ | ✅ | +| `pd.read_excel` | ✅ | ✅ | ✅ | +| `pd.read_json` | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | [✳️](https://github.com/modin-project/modin/issues/554) | +| `pd.read_` | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | [✴️](https://modin.readthedocs.io/en/latest/supported_apis/io_supported.html) | + +

+Some pandas APIs are easier to implement than others, so if something is missing feel +free to open an issue! + +### More about Modin + +For the complete documentation on Modin, visit our [ReadTheDocs](https://modin.readthedocs.io/en/latest/index.html) page. + +#### Scale your pandas workflow by changing a single line of code. + +_Note: In local mode (without a cluster), Modin will create and manage a local (Dask or Ray) cluster for the execution._ + +To use Modin, you do not need to specify how to distribute the data, or even know how many +cores your system has. In fact, you can continue using your previous +pandas notebooks while experiencing a considerable speedup from Modin, even on a single +machine. Once you've changed your import statement, you're ready to use Modin just like +you would with pandas! + +#### Faster pandas, even on your laptop + + + +The `modin.pandas` DataFrame is an extremely light-weight parallel DataFrame. +Modin transparently distributes the data and computation so that you can continue using the same pandas API +while working with more data faster. Because it is so light-weight, +Modin provides speed-ups of up to 4x on a laptop with 4 physical cores. + +In pandas, you are only able to use one core at a time when you are doing computation of +any kind. With Modin, you are able to use all of the CPU cores on your machine. Even with a +traditionally synchronous task like `read_csv`, we see large speedups by efficiently +distributing the work across your entire machine. + +```python +import modin.pandas as pd + +df = pd.read_csv("my_dataset.csv") +``` + +#### Modin can handle the datasets that pandas can't + +Often data scientists have to switch between different tools +for operating on datasets of different sizes. Processing large dataframes with pandas +is slow, and pandas does not support working with dataframes that are too large to fit +into the available memory. As a result, pandas workflows that work well +for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size +of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably +work with hundreds of GBs without worrying about substantial slowdown or memory errors. +With [cluster](https://modin.readthedocs.io/en/latest/getting_started/using_modin/using_modin_cluster.html) +and [out of core](https://modin.readthedocs.io/en/latest/getting_started/why_modin/out_of_core.html) +support, Modin is a DataFrame library with both great single-node performance and high +scalability in a cluster. + +#### Modin Architecture + +We designed [Modin's architecture](https://modin.readthedocs.io/en/latest/development/architecture.html) +to be modular so we can plug in different components as they develop and improve: + +Modin's architecture + +### Other Resources + +#### Getting Started with Modin + +- [Documentation](https://modin.readthedocs.io/en/latest/) +- [10-min Quickstart Guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) +- [Examples and Tutorials](https://modin.readthedocs.io/en/latest/getting_started/examples.html) +- [Videos and Blogposts](https://modin.readthedocs.io/en/latest/getting_started/examples.html#talks-podcasts) +- [Benchmarking Modin](https://modin.readthedocs.io/en/latest/usage_guide/benchmarking.html) + +#### Modin Community + +- [Slack](https://join.slack.com/t/modin-project/shared_invite/zt-yvk5hr3b-f08p_ulbuRWsAfg9rMY3uA) +- [Discourse](https://discuss.modin.org) +- [Twitter](https://twitter.com/modin_project) +- [Mailing List](https://groups.google.com/g/modin-dev) +- [GitHub Issues](https://github.com/modin-project/modin/issues) +- [StackOverflow](https://stackoverflow.com/questions/tagged/modin) + +#### Learn More about Modin + +- [Frequently Asked Questions (FAQs)](https://modin.readthedocs.io/en/latest/getting_started/faq.html) +- [Troubleshooting Guide](https://modin.readthedocs.io/en/latest/getting_started/troubleshooting.html) +- [Development Guide](https://modin.readthedocs.io/en/latest/development/index.html) +- Modin is built on many years of research and development at UC Berkeley. Check out these selected papers to learn more about how Modin works: + - [Flexible Rule-Based Decomposition and Metadata Independence in Modin](https://people.eecs.berkeley.edu/~totemtang/paper/Modin.pdf) (VLDB 2021) + - [Dataframe Systems: Theory, Architecture, and Implementation](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-193.pdf) (PhD Dissertation 2021) + - [Towards Scalable Dataframe Systems](https://arxiv.org/pdf/2001.00888.pdf) (VLDB 2020) + +#### Getting Involved + +***`modin.pandas` is currently under active development. Requests and contributions are welcome!*** + +For more information on how to contribute to Modin, check out the +[Modin Contribution Guide](https://modin.readthedocs.io/en/latest/development/contributing.html). + +### License + +[Apache License 2.0](LICENSE) %prep -%autosetup -n modin-0.19.0 +%autosetup -n modin-0.20.0 %build %py3_build @@ -906,5 +900,5 @@ mv %{buildroot}/doclist.lst . %{_docdir}/* %changelog -* Mon Apr 10 2023 Python_Bot - 0.19.0-1 +* Fri Apr 21 2023 Python_Bot - 0.20.0-1 - Package Spec generated diff --git a/sources b/sources index 27c0a2b..734f868 100644 --- a/sources +++ b/sources @@ -1 +1 @@ -c8238e5bdc0ad1fbfd2c8bdfd5d6e8d7 modin-0.19.0.tar.gz +85d94e6bba8453fa7d6307125377009e modin-0.20.0.tar.gz -- cgit v1.2.3