summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-04-11 12:22:00 +0000
committerCoprDistGit <infra@openeuler.org>2023-04-11 12:22:00 +0000
commitb012f0a1c7d4c410ea24ec0b417fe035e8d4e879 (patch)
treece1398290f9a8ae79cc034c62fdb69b07b28c72f
parentc4a7fefc198800774d1bfa304442404ef7cbc8ff (diff)
automatic import of python-bcpandas
-rw-r--r--.gitignore1
-rw-r--r--python-bcpandas.spec1143
-rw-r--r--sources1
3 files changed, 1145 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..3641e65 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/bcpandas-2.4.0.tar.gz
diff --git a/python-bcpandas.spec b/python-bcpandas.spec
new file mode 100644
index 0000000..a01d3c8
--- /dev/null
+++ b/python-bcpandas.spec
@@ -0,0 +1,1143 @@
+%global _empty_manifest_terminate_build 0
+Name: python-bcpandas
+Version: 2.4.0
+Release: 1
+Summary: High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No knowledge of BCP required!!
+License: MIT
+URL: https://github.com/yehoshuadimarsky/bcpandas
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/e0/5c/a9fe97994ca3046384482aef7200b9c41f7b0fca3a3f623c375da50e9dec/bcpandas-2.4.0.tar.gz
+BuildArch: noarch
+
+Requires: python3-pandas
+Requires: python3-pyodbc
+Requires: python3-sqlalchemy
+
+%description
+# bcpandas
+
+[![PyPI version](https://img.shields.io/pypi/v/bcpandas.svg)](https://pypi.org/project/bcpandas/)
+[![Conda-Forge version](https://img.shields.io/conda/vn/conda-forge/bcpandas.svg)](https://anaconda.org/conda-forge/bcpandas)
+[![GitHub license](https://img.shields.io/github/license/yehoshuadimarsky/bcpandas.svg)](https://github.com/yehoshuadimarsky/bcpandas/blob/master/LICENSE)
+[![CI](https://github.com/yehoshuadimarsky/bcpandas/actions/workflows/CI.yml/badge.svg?branch=master)](https://github.com/yehoshuadimarsky/bcpandas/actions/workflows/CI.yml)
+[![codecov](https://codecov.io/gh/yehoshuadimarsky/bcpandas/branch/master/graph/badge.svg)](https://codecov.io/gh/yehoshuadimarsky/bcpandas)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![PyPI pyversions](https://img.shields.io/pypi/pyversions/bcpandas.svg)](https://pypi.python.org/pypi/bcpandas/)
+[![Awesome Badges](https://img.shields.io/badge/badges-awesome-green.svg)](https://github.com/Naereen/badges)
+[![Downloads](https://static.pepy.tech/personalized-badge/bcpandas?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pypi%20total%20downloads)](https://pepy.tech/project/bcpandas)
+[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/yehoshuadimarsky/bcpandas/master.svg)](https://results.pre-commit.ci/latest/github/yehoshuadimarsky/bcpandas/master)
+
+High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No
+knowledge of BCP required!! (pronounced _BEE-CEE-Pandas_)
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+
+- [Quickstart](#quickstart)
+- [IMPORTANT - Read vs. Write](#important---read-vs-write)
+- [Benchmarks](#benchmarks)
+ - [to_sql](#to_sql)
+ - [Why not just use the new pandas `method='multi'`?](#why-not-just-use-the-new-pandas-methodmulti)
+ - [read_sql](#read_sql)
+- [Requirements](#requirements)
+ - [Database](#database)
+ - [Python User](#python-user)
+- [Installation](#installation)
+- [Usage](#usage)
+ - [Credential/Connection object](#credentialconnection-object)
+ - [Recommended Usage](#recommended-usage)
+- [Known Issues](#known-issues)
+- [Background](#background)
+ - [Existing Solutions](#existing-solutions)
+ - [Design and Scope](#design-and-scope)
+- [Testing](#testing)
+ - [Testing Requirements](#testing-requirements)
+ - [What Is Tested?](#what-is-tested)
+ - [Testing Implementation](#testing-implementation)
+- [CI/CD](#cicd)
+- [Contributing](#contributing)
+- [GitHub Stars Trend](#github-stars-trend)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+## Quickstart
+
+```python
+In [1]: import pandas as pd
+ ...: import numpy as np
+ ...:
+ ...: from bcpandas import SqlCreds, to_sql
+
+In [2]: creds = SqlCreds(
+ ...: 'my_server',
+ ...: 'my_db',
+ ...: 'my_username',
+ ...: 'my_password'
+ ...: )
+
+In [3]: df = pd.DataFrame(
+ ...: data=np.ndarray(shape=(10, 6), dtype=int),
+ ...: columns=[f"col_{x}" for x in range(6)]
+ ...: )
+
+In [4]: df
+Out[4]:
+ col_0 col_1 col_2 col_3 col_4 col_5
+0 4128860 6029375 3801155 5570652 6619251 7536754
+1 4849756 7536751 4456552 7143529 7471201 7012467
+2 6029433 6881357 6881390 7274595 6553710 3342433
+3 6619228 7733358 6029427 6488162 6357104 6553710
+4 7536737 7077980 6422633 7536732 7602281 2949221
+5 6357104 7012451 6750305 7536741 7340124 7274610
+6 7340141 6226036 7274612 7077999 6881387 6029428
+7 6619243 6226041 6881378 6553710 7209065 6029415
+8 6881378 6553710 7209065 7536743 7274588 6619248
+9 6226030 7209065 6619231 6881380 7274612 3014770
+
+In [5]: to_sql(df, 'my_test_table', creds, index=False, if_exists='replace')
+
+In [6]: df2 = pd.read_sql_table(table_name='my_test_table', con=creds.engine)
+
+In [7]: df2
+Out[7]:
+ col_0 col_1 col_2 col_3 col_4 col_5
+0 4128860 6029375 3801155 5570652 6619251 7536754
+1 4849756 7536751 4456552 7143529 7471201 7012467
+2 6029433 6881357 6881390 7274595 6553710 3342433
+3 6619228 7733358 6029427 6488162 6357104 6553710
+4 7536737 7077980 6422633 7536732 7602281 2949221
+5 6357104 7012451 6750305 7536741 7340124 7274610
+6 7340141 6226036 7274612 7077999 6881387 6029428
+7 6619243 6226041 6881378 6553710 7209065 6029415
+8 6881378 6553710 7209065 7536743 7274588 6619248
+9 6226030 7209065 6619231 6881380 7274612 3014770
+```
+
+## IMPORTANT - Read vs. Write
+
+The big speedup benefit of bcpandas is in the `to_sql` function, as the benchmarks below show.
+However, the bcpandas `read_sql` function actually performs **slower** than the pandas equivalent.
+Therefore, the bcpandas `read_sql` function was deprecated in v5.0 and has now been removed in
+v6.0+. To read data **from** SQL to pandas, use the native pandas method `pd.read_sql_table` or
+`pd.read_sql_query`.
+
+## Benchmarks
+
+See figures below. All code is in the `/benchmarks` directory. To run the benchmarks, from the root
+directory of this repository, run `python benchmarks/benchmark.py main --help` and fill in the
+command line options that are presented.
+
+Running this will output
+
+1. PNG image of the graph
+2. JSON file of the benchmark data
+3. JSON file with the environment details of the machine that was used to generate it
+
+### to_sql
+
+> I didn't bother including the pandas non-`multiinsert` version here because it just takes way too
+> long
+
+![to_sql benchmark graph](benchmarks/tosql_benchmark.png)
+
+#### Why not just use the new pandas [`method='multi'`](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method)?
+
+1. Because it is still much slower
+2. Because you are forced to set the `chunksize` parameter to a very small number for it to work -
+ generally a bit less then `2100/<number of columns>`. This is because SQL Server can only accept
+ up to 2100 parameters in a query. See
+ [here](https://stackoverflow.com/questions/50689082/to-sql-pyodbc-count-field-incorrect-or-syntax-error)
+ and [here](https://github.com/mkleehammer/pyodbc/issues/217) for more discussion on this, and
+ the recommendation to use a bulk insert tool such as BCP. It seems that SQL Server simply didn't
+ design the regular `INSERT` statement to support huge amounts of data.
+
+### read_sql
+
+As you can see, pandas native clearly wins here
+
+![read_sql benchmark graph](benchmarks/readsql_benchmark.png)
+
+## Requirements
+
+### Database
+
+Any version of Microsoft SQL Server. Can be installed on-prem, in the cloud, on a VM, or one of the
+Azure versions.
+
+### Python User
+
+- [BCP](https://docs.microsoft.com/en-us/sql/tools/bcp-utility) Utility
+- Microsoft ODBC Driver **11, 13, 13.1, or 17** for SQL Server.
+ ([Microsoft Docs](https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server))
+ See the
+ [pyodbc docs](https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Windows)
+ for more on different driver versions.
+- Python >= 3.7
+- `pandas` >= 0.19
+- `sqlalchemy` >= 1.0
+- `pyodbc` as the
+ [supported DBAPI](https://docs.sqlalchemy.org/en/13/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc)
+
+## Installation
+
+| Source | Command |
+| :----: | :-------------------------------------: |
+| PyPI | `pip install bcpandas` |
+| Conda | `conda install -c conda-forge bcpandas` |
+
+## Usage
+
+1. Create creds (see next section)
+2. Replace any `df.to_sql(...)` in your code with `bcpandas.to_sql(df, ...)`
+
+That's it!
+
+### Credential/Connection object
+
+Bcpandas requires a `bcpandas.SqlCreds` object in order to use it, and also a `sqlalchemy.Engine`.
+The user has 2 options when constructing it.
+
+1. Create the bcpandas `SqlCreds` object with just the minimum attributes needed (server, database,
+ username, password), and bcpandas will create a full `Engine` object from this. It will use
+ `pyodbc`, `sqlalchemy`, and the Microsoft ODBC Driver for SQL Server, and will store it in the
+ `.engine` attribute.
+
+ ```python
+ In [1]: from bcpandas import SqlCreds
+
+ In [2]: creds = SqlCreds('my_server', 'my_db', 'my_username', 'my_password')
+
+ In [3]: creds.engine
+ Out[3]: Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password)
+
+ ```
+
+2. Pass a full `Engine` object to the bcpandas `SqlCreds` object, and bcpandas will attempt to
+ parse out the server, database, username, and password to pass to the command line utilities. If
+ a DSN is used, this will fail.
+
+ (continuing example above)
+
+ ```python
+ In [4]: creds2 = SqlCreds.from_engine(creds.engine)
+
+ In [5]: creds2.engine
+ Out[5]: Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password)
+
+ In [6]: creds2
+ Out[6]: SqlCreds(server='my_server', database='my_db', username='my_username', with_krb_auth=False, engine=Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password), password=[REDACTED])
+ ```
+
+### Recommended Usage
+
+| Feature | Pandas native | BCPandas |
+| ------------------------------------------------- | :----------------: | :----------------: |
+| Super speed | :x: | :white_check_mark: |
+| Good for simple data types like numbers and dates | :x: | :white_check_mark: |
+| Handle messy string data | :white_check_mark: | :x: |
+
+> built with the help of https://www.tablesgenerator.com/markdown_tables# and
+> https://gist.github.com/rxaviers/7360908
+
+## Known Issues
+
+Here are some caveats and limitations of bcpandas.
+
+- Bcpandas has been tested with all ASCII characters 32-127. Unicode characters beyond that range
+ have not been tested.
+- An empty string (`""`) in the dataframe becomes `NULL` in the SQL database instead of remaining
+ an empty string.
+- Because bcpandas first outputs to CSV, it needs to use several specific characters to create the
+ CSV, including a _delimiter_ and a _quote character_. Bcpandas attempts to use characters that
+ are not present in the dataframe for this, going through the possilbe delimiters and quote
+ characters specified in `constants.py`. If all possible characters are present in the dataframe
+ and bcpandas cannot find both a delimiter and quote character to use, it will throw an error.
+ - The BCP utility does **not** ignore delimiter characters when surrounded by quotes, unlike
+ CSVs - see
+ [here](https://docs.microsoft.com/en-us/sql/relational-databases/import-export/specify-field-and-row-terminators-sql-server#characters-supported-as-terminators)
+ in the Microsoft docs.
+- ~~If there is a NaN/Null in the last column of the dataframe it will throw an error. This is due
+ to a BCP issue. See my issue with Microsoft about this
+ [here](https://github.com/MicrosoftDocs/sql-docs/issues/2689).~~ This doesn't seem to be a
+ problem based on the tests.
+
+## Background
+
+Writing data from pandas DataFrames to a SQL database is very slow using the built-in `to_sql`
+method, even with the newly introduced
+[`execute_many`](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method)
+option. For Microsoft SQL Server, a far far faster method is to use the BCP utility provided by
+Microsoft. This utility is a command line tool that transfers data to/from the database and flat
+text files.
+
+This package is a wrapper for seamlessly using the bcp utility from Python using a pandas
+DataFrame. Despite the IO hits, the fastest option by far is saving the data to a CSV file in the
+file system and using the bcp utility to transfer the CSV file to SQL Server. **Best of all, you
+don't need to know anything about using BCP at all!**
+
+### Existing Solutions
+
+> Much credit is due to `bcpy` for the original idea and for some of the code that was adopted and
+> changed.
+
+<details>
+ <summary>bcpy</summary>
+
+[bcpy](https://github.com/titan550/bcpy) has several flaws:
+
+- No support for reading from SQL, only writing to SQL
+- A convoluted, overly class-based internal design
+- Scope a bit too broad - deals with pandas as well as flat files This repository aims to fix and
+improve on `bcpy` and the above issues by making the design choices described earlier.
+</details>
+
+### Design and Scope
+
+The _**only**_ scope of `bcpandas` is to read and write between a pandas DataFrame and a Microsoft
+SQL Server database. That's it. We do _**not**_ concern ourselves with reading existing flat files
+to/from SQL - that introduces _way_ to much complexity in trying to parse and decode the various
+parts of the file, like delimiters, quote characters, and line endings. Instead, to read/write an
+exiting flat file, just import it via pandas into a DataFrame, and then use `bcpandas`.
+
+The big benefit of this is that we get to precicely control all the finicky parts of the text file
+when we write/read it to a local file and then in the BCP utility. This lets us set library-wide
+defaults (maybe configurable in the future) and work with those.
+
+For now, we are using the non-XML BCP format file type. In the future, XML format files may be
+added.
+
+## Testing
+
+### Testing Requirements
+
+- Docker Desktop installed, either of the Linux or Windows runtimes, doesn't matter
+- pytest
+- hypothesis
+- pytest-cov (coverage.py)
+
+### What Is Tested?
+
+We take testing very seriously here. In order to rely on a library like this in production, it
+**MUST** be ruthlessly tested, which thankfully it is. Here is a partial list of what has been
+tested so far. Pull Requests welcome!
+
+- Data types: All ASCII characters 32-127 (using the Hypothesis library, see below). Unicode
+ characters beyond that range have not been tested.
+- `numpy.NaN`, `None`
+- `numpy.inf` (fails, as expected)
+- Empty dataframe (nothing happens, database not modified)
+- Duplicate column names (raises error)
+- Database columns that are missing from the dataframe, are out of order, or both (passes)
+- Extra dataframe columns that aren't in database, when `if_exists="append"` specified (fails)
+
+### Testing Implementation
+
+- Testing uses `pytest`.
+- To test for all possible data types, we use the `hypothesis` library, instead of trying to come
+ up with every single case on our own.
+- `pytest-cov` (which uses `coverage.py` under the hood) is used to measure code coverage. This is
+ then uploaded to [codecov.io](https://codecov.io/gh/yehoshuadimarsky/bcpandas) as part of the
+ CI/CD process (see below).
+- In order to spin up a local SQL Server during testing, we use Docker. Specifically, we run one of
+ the images that Microsoft provides that already have SQL Server fully installed, all we have to
+ do is use the image to run a container. Here are the links to the
+ [Linux versions](https://hub.docker.com/_/microsoft-mssql-server) and the Windows versions -
+ [Express](https://hub.docker.com/r/microsoft/mssql-server-windows-express/) and
+ [Developer](https://hub.docker.com/r/microsoft/mssql-server-windows-developer).
+ - When running the tests, we can specify a specific Docker image to use, by invoking the custom
+ command line option called `--mssql-docker-image`. For example:
+ ```bash
+ pytest bcpandas/tests --mssql-docker-image mcr.microsoft.com/mssql/server:2019-latest
+ ```
+- Instead of using the `subprocess` library to control Docker manually, we use the elegant
+ `docker-py` library which works very nicely. A `DockerDB` Python class is defined in
+ `bcpandas/tests/utils.py` and it wraps up all the Docker commands and functionality needed to use
+ SQL Server into one class. This class is used in `conftest.py` in the core bcpandas tests, and in
+ the `benchmarks/` directory for both the benchmarks code as well as the legacy tests for
+ `read_sql`.
+
+## CI/CD
+
+Github Actions is used for CI/CD, although it is still somewhat a work in progress.
+
+## Contributing
+
+Please, all contributions are very welcome!
+
+I will attempt to use the `pandas` docstring style as detailed
+[here](https://pandas.pydata.org/pandas-docs/stable/development/contributing_docstring.html).
+
+## GitHub Stars Trend
+
+[![Stargazers over time](https://starchart.cc/yehoshuadimarsky/bcpandas.svg)](https://starchart.cc/yehoshuadimarsky/bcpandas)
+
+
+%package -n python3-bcpandas
+Summary: High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No knowledge of BCP required!!
+Provides: python-bcpandas
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-bcpandas
+# bcpandas
+
+[![PyPI version](https://img.shields.io/pypi/v/bcpandas.svg)](https://pypi.org/project/bcpandas/)
+[![Conda-Forge version](https://img.shields.io/conda/vn/conda-forge/bcpandas.svg)](https://anaconda.org/conda-forge/bcpandas)
+[![GitHub license](https://img.shields.io/github/license/yehoshuadimarsky/bcpandas.svg)](https://github.com/yehoshuadimarsky/bcpandas/blob/master/LICENSE)
+[![CI](https://github.com/yehoshuadimarsky/bcpandas/actions/workflows/CI.yml/badge.svg?branch=master)](https://github.com/yehoshuadimarsky/bcpandas/actions/workflows/CI.yml)
+[![codecov](https://codecov.io/gh/yehoshuadimarsky/bcpandas/branch/master/graph/badge.svg)](https://codecov.io/gh/yehoshuadimarsky/bcpandas)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![PyPI pyversions](https://img.shields.io/pypi/pyversions/bcpandas.svg)](https://pypi.python.org/pypi/bcpandas/)
+[![Awesome Badges](https://img.shields.io/badge/badges-awesome-green.svg)](https://github.com/Naereen/badges)
+[![Downloads](https://static.pepy.tech/personalized-badge/bcpandas?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pypi%20total%20downloads)](https://pepy.tech/project/bcpandas)
+[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/yehoshuadimarsky/bcpandas/master.svg)](https://results.pre-commit.ci/latest/github/yehoshuadimarsky/bcpandas/master)
+
+High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No
+knowledge of BCP required!! (pronounced _BEE-CEE-Pandas_)
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+
+- [Quickstart](#quickstart)
+- [IMPORTANT - Read vs. Write](#important---read-vs-write)
+- [Benchmarks](#benchmarks)
+ - [to_sql](#to_sql)
+ - [Why not just use the new pandas `method='multi'`?](#why-not-just-use-the-new-pandas-methodmulti)
+ - [read_sql](#read_sql)
+- [Requirements](#requirements)
+ - [Database](#database)
+ - [Python User](#python-user)
+- [Installation](#installation)
+- [Usage](#usage)
+ - [Credential/Connection object](#credentialconnection-object)
+ - [Recommended Usage](#recommended-usage)
+- [Known Issues](#known-issues)
+- [Background](#background)
+ - [Existing Solutions](#existing-solutions)
+ - [Design and Scope](#design-and-scope)
+- [Testing](#testing)
+ - [Testing Requirements](#testing-requirements)
+ - [What Is Tested?](#what-is-tested)
+ - [Testing Implementation](#testing-implementation)
+- [CI/CD](#cicd)
+- [Contributing](#contributing)
+- [GitHub Stars Trend](#github-stars-trend)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+## Quickstart
+
+```python
+In [1]: import pandas as pd
+ ...: import numpy as np
+ ...:
+ ...: from bcpandas import SqlCreds, to_sql
+
+In [2]: creds = SqlCreds(
+ ...: 'my_server',
+ ...: 'my_db',
+ ...: 'my_username',
+ ...: 'my_password'
+ ...: )
+
+In [3]: df = pd.DataFrame(
+ ...: data=np.ndarray(shape=(10, 6), dtype=int),
+ ...: columns=[f"col_{x}" for x in range(6)]
+ ...: )
+
+In [4]: df
+Out[4]:
+ col_0 col_1 col_2 col_3 col_4 col_5
+0 4128860 6029375 3801155 5570652 6619251 7536754
+1 4849756 7536751 4456552 7143529 7471201 7012467
+2 6029433 6881357 6881390 7274595 6553710 3342433
+3 6619228 7733358 6029427 6488162 6357104 6553710
+4 7536737 7077980 6422633 7536732 7602281 2949221
+5 6357104 7012451 6750305 7536741 7340124 7274610
+6 7340141 6226036 7274612 7077999 6881387 6029428
+7 6619243 6226041 6881378 6553710 7209065 6029415
+8 6881378 6553710 7209065 7536743 7274588 6619248
+9 6226030 7209065 6619231 6881380 7274612 3014770
+
+In [5]: to_sql(df, 'my_test_table', creds, index=False, if_exists='replace')
+
+In [6]: df2 = pd.read_sql_table(table_name='my_test_table', con=creds.engine)
+
+In [7]: df2
+Out[7]:
+ col_0 col_1 col_2 col_3 col_4 col_5
+0 4128860 6029375 3801155 5570652 6619251 7536754
+1 4849756 7536751 4456552 7143529 7471201 7012467
+2 6029433 6881357 6881390 7274595 6553710 3342433
+3 6619228 7733358 6029427 6488162 6357104 6553710
+4 7536737 7077980 6422633 7536732 7602281 2949221
+5 6357104 7012451 6750305 7536741 7340124 7274610
+6 7340141 6226036 7274612 7077999 6881387 6029428
+7 6619243 6226041 6881378 6553710 7209065 6029415
+8 6881378 6553710 7209065 7536743 7274588 6619248
+9 6226030 7209065 6619231 6881380 7274612 3014770
+```
+
+## IMPORTANT - Read vs. Write
+
+The big speedup benefit of bcpandas is in the `to_sql` function, as the benchmarks below show.
+However, the bcpandas `read_sql` function actually performs **slower** than the pandas equivalent.
+Therefore, the bcpandas `read_sql` function was deprecated in v5.0 and has now been removed in
+v6.0+. To read data **from** SQL to pandas, use the native pandas method `pd.read_sql_table` or
+`pd.read_sql_query`.
+
+## Benchmarks
+
+See figures below. All code is in the `/benchmarks` directory. To run the benchmarks, from the root
+directory of this repository, run `python benchmarks/benchmark.py main --help` and fill in the
+command line options that are presented.
+
+Running this will output
+
+1. PNG image of the graph
+2. JSON file of the benchmark data
+3. JSON file with the environment details of the machine that was used to generate it
+
+### to_sql
+
+> I didn't bother including the pandas non-`multiinsert` version here because it just takes way too
+> long
+
+![to_sql benchmark graph](benchmarks/tosql_benchmark.png)
+
+#### Why not just use the new pandas [`method='multi'`](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method)?
+
+1. Because it is still much slower
+2. Because you are forced to set the `chunksize` parameter to a very small number for it to work -
+ generally a bit less then `2100/<number of columns>`. This is because SQL Server can only accept
+ up to 2100 parameters in a query. See
+ [here](https://stackoverflow.com/questions/50689082/to-sql-pyodbc-count-field-incorrect-or-syntax-error)
+ and [here](https://github.com/mkleehammer/pyodbc/issues/217) for more discussion on this, and
+ the recommendation to use a bulk insert tool such as BCP. It seems that SQL Server simply didn't
+ design the regular `INSERT` statement to support huge amounts of data.
+
+### read_sql
+
+As you can see, pandas native clearly wins here
+
+![read_sql benchmark graph](benchmarks/readsql_benchmark.png)
+
+## Requirements
+
+### Database
+
+Any version of Microsoft SQL Server. Can be installed on-prem, in the cloud, on a VM, or one of the
+Azure versions.
+
+### Python User
+
+- [BCP](https://docs.microsoft.com/en-us/sql/tools/bcp-utility) Utility
+- Microsoft ODBC Driver **11, 13, 13.1, or 17** for SQL Server.
+ ([Microsoft Docs](https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server))
+ See the
+ [pyodbc docs](https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Windows)
+ for more on different driver versions.
+- Python >= 3.7
+- `pandas` >= 0.19
+- `sqlalchemy` >= 1.0
+- `pyodbc` as the
+ [supported DBAPI](https://docs.sqlalchemy.org/en/13/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc)
+
+## Installation
+
+| Source | Command |
+| :----: | :-------------------------------------: |
+| PyPI | `pip install bcpandas` |
+| Conda | `conda install -c conda-forge bcpandas` |
+
+## Usage
+
+1. Create creds (see next section)
+2. Replace any `df.to_sql(...)` in your code with `bcpandas.to_sql(df, ...)`
+
+That's it!
+
+### Credential/Connection object
+
+Bcpandas requires a `bcpandas.SqlCreds` object in order to use it, and also a `sqlalchemy.Engine`.
+The user has 2 options when constructing it.
+
+1. Create the bcpandas `SqlCreds` object with just the minimum attributes needed (server, database,
+ username, password), and bcpandas will create a full `Engine` object from this. It will use
+ `pyodbc`, `sqlalchemy`, and the Microsoft ODBC Driver for SQL Server, and will store it in the
+ `.engine` attribute.
+
+ ```python
+ In [1]: from bcpandas import SqlCreds
+
+ In [2]: creds = SqlCreds('my_server', 'my_db', 'my_username', 'my_password')
+
+ In [3]: creds.engine
+ Out[3]: Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password)
+
+ ```
+
+2. Pass a full `Engine` object to the bcpandas `SqlCreds` object, and bcpandas will attempt to
+ parse out the server, database, username, and password to pass to the command line utilities. If
+ a DSN is used, this will fail.
+
+ (continuing example above)
+
+ ```python
+ In [4]: creds2 = SqlCreds.from_engine(creds.engine)
+
+ In [5]: creds2.engine
+ Out[5]: Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password)
+
+ In [6]: creds2
+ Out[6]: SqlCreds(server='my_server', database='my_db', username='my_username', with_krb_auth=False, engine=Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password), password=[REDACTED])
+ ```
+
+### Recommended Usage
+
+| Feature | Pandas native | BCPandas |
+| ------------------------------------------------- | :----------------: | :----------------: |
+| Super speed | :x: | :white_check_mark: |
+| Good for simple data types like numbers and dates | :x: | :white_check_mark: |
+| Handle messy string data | :white_check_mark: | :x: |
+
+> built with the help of https://www.tablesgenerator.com/markdown_tables# and
+> https://gist.github.com/rxaviers/7360908
+
+## Known Issues
+
+Here are some caveats and limitations of bcpandas.
+
+- Bcpandas has been tested with all ASCII characters 32-127. Unicode characters beyond that range
+ have not been tested.
+- An empty string (`""`) in the dataframe becomes `NULL` in the SQL database instead of remaining
+ an empty string.
+- Because bcpandas first outputs to CSV, it needs to use several specific characters to create the
+ CSV, including a _delimiter_ and a _quote character_. Bcpandas attempts to use characters that
+ are not present in the dataframe for this, going through the possilbe delimiters and quote
+ characters specified in `constants.py`. If all possible characters are present in the dataframe
+ and bcpandas cannot find both a delimiter and quote character to use, it will throw an error.
+ - The BCP utility does **not** ignore delimiter characters when surrounded by quotes, unlike
+ CSVs - see
+ [here](https://docs.microsoft.com/en-us/sql/relational-databases/import-export/specify-field-and-row-terminators-sql-server#characters-supported-as-terminators)
+ in the Microsoft docs.
+- ~~If there is a NaN/Null in the last column of the dataframe it will throw an error. This is due
+ to a BCP issue. See my issue with Microsoft about this
+ [here](https://github.com/MicrosoftDocs/sql-docs/issues/2689).~~ This doesn't seem to be a
+ problem based on the tests.
+
+## Background
+
+Writing data from pandas DataFrames to a SQL database is very slow using the built-in `to_sql`
+method, even with the newly introduced
+[`execute_many`](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method)
+option. For Microsoft SQL Server, a far far faster method is to use the BCP utility provided by
+Microsoft. This utility is a command line tool that transfers data to/from the database and flat
+text files.
+
+This package is a wrapper for seamlessly using the bcp utility from Python using a pandas
+DataFrame. Despite the IO hits, the fastest option by far is saving the data to a CSV file in the
+file system and using the bcp utility to transfer the CSV file to SQL Server. **Best of all, you
+don't need to know anything about using BCP at all!**
+
+### Existing Solutions
+
+> Much credit is due to `bcpy` for the original idea and for some of the code that was adopted and
+> changed.
+
+<details>
+ <summary>bcpy</summary>
+
+[bcpy](https://github.com/titan550/bcpy) has several flaws:
+
+- No support for reading from SQL, only writing to SQL
+- A convoluted, overly class-based internal design
+- Scope a bit too broad - deals with pandas as well as flat files This repository aims to fix and
+improve on `bcpy` and the above issues by making the design choices described earlier.
+</details>
+
+### Design and Scope
+
+The _**only**_ scope of `bcpandas` is to read and write between a pandas DataFrame and a Microsoft
+SQL Server database. That's it. We do _**not**_ concern ourselves with reading existing flat files
+to/from SQL - that introduces _way_ to much complexity in trying to parse and decode the various
+parts of the file, like delimiters, quote characters, and line endings. Instead, to read/write an
+exiting flat file, just import it via pandas into a DataFrame, and then use `bcpandas`.
+
+The big benefit of this is that we get to precicely control all the finicky parts of the text file
+when we write/read it to a local file and then in the BCP utility. This lets us set library-wide
+defaults (maybe configurable in the future) and work with those.
+
+For now, we are using the non-XML BCP format file type. In the future, XML format files may be
+added.
+
+## Testing
+
+### Testing Requirements
+
+- Docker Desktop installed, either of the Linux or Windows runtimes, doesn't matter
+- pytest
+- hypothesis
+- pytest-cov (coverage.py)
+
+### What Is Tested?
+
+We take testing very seriously here. In order to rely on a library like this in production, it
+**MUST** be ruthlessly tested, which thankfully it is. Here is a partial list of what has been
+tested so far. Pull Requests welcome!
+
+- Data types: All ASCII characters 32-127 (using the Hypothesis library, see below). Unicode
+ characters beyond that range have not been tested.
+- `numpy.NaN`, `None`
+- `numpy.inf` (fails, as expected)
+- Empty dataframe (nothing happens, database not modified)
+- Duplicate column names (raises error)
+- Database columns that are missing from the dataframe, are out of order, or both (passes)
+- Extra dataframe columns that aren't in database, when `if_exists="append"` specified (fails)
+
+### Testing Implementation
+
+- Testing uses `pytest`.
+- To test for all possible data types, we use the `hypothesis` library, instead of trying to come
+ up with every single case on our own.
+- `pytest-cov` (which uses `coverage.py` under the hood) is used to measure code coverage. This is
+ then uploaded to [codecov.io](https://codecov.io/gh/yehoshuadimarsky/bcpandas) as part of the
+ CI/CD process (see below).
+- In order to spin up a local SQL Server during testing, we use Docker. Specifically, we run one of
+ the images that Microsoft provides that already have SQL Server fully installed, all we have to
+ do is use the image to run a container. Here are the links to the
+ [Linux versions](https://hub.docker.com/_/microsoft-mssql-server) and the Windows versions -
+ [Express](https://hub.docker.com/r/microsoft/mssql-server-windows-express/) and
+ [Developer](https://hub.docker.com/r/microsoft/mssql-server-windows-developer).
+ - When running the tests, we can specify a specific Docker image to use, by invoking the custom
+ command line option called `--mssql-docker-image`. For example:
+ ```bash
+ pytest bcpandas/tests --mssql-docker-image mcr.microsoft.com/mssql/server:2019-latest
+ ```
+- Instead of using the `subprocess` library to control Docker manually, we use the elegant
+ `docker-py` library which works very nicely. A `DockerDB` Python class is defined in
+ `bcpandas/tests/utils.py` and it wraps up all the Docker commands and functionality needed to use
+ SQL Server into one class. This class is used in `conftest.py` in the core bcpandas tests, and in
+ the `benchmarks/` directory for both the benchmarks code as well as the legacy tests for
+ `read_sql`.
+
+## CI/CD
+
+Github Actions is used for CI/CD, although it is still somewhat a work in progress.
+
+## Contributing
+
+Please, all contributions are very welcome!
+
+I will attempt to use the `pandas` docstring style as detailed
+[here](https://pandas.pydata.org/pandas-docs/stable/development/contributing_docstring.html).
+
+## GitHub Stars Trend
+
+[![Stargazers over time](https://starchart.cc/yehoshuadimarsky/bcpandas.svg)](https://starchart.cc/yehoshuadimarsky/bcpandas)
+
+
+%package help
+Summary: Development documents and examples for bcpandas
+Provides: python3-bcpandas-doc
+%description help
+# bcpandas
+
+[![PyPI version](https://img.shields.io/pypi/v/bcpandas.svg)](https://pypi.org/project/bcpandas/)
+[![Conda-Forge version](https://img.shields.io/conda/vn/conda-forge/bcpandas.svg)](https://anaconda.org/conda-forge/bcpandas)
+[![GitHub license](https://img.shields.io/github/license/yehoshuadimarsky/bcpandas.svg)](https://github.com/yehoshuadimarsky/bcpandas/blob/master/LICENSE)
+[![CI](https://github.com/yehoshuadimarsky/bcpandas/actions/workflows/CI.yml/badge.svg?branch=master)](https://github.com/yehoshuadimarsky/bcpandas/actions/workflows/CI.yml)
+[![codecov](https://codecov.io/gh/yehoshuadimarsky/bcpandas/branch/master/graph/badge.svg)](https://codecov.io/gh/yehoshuadimarsky/bcpandas)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![PyPI pyversions](https://img.shields.io/pypi/pyversions/bcpandas.svg)](https://pypi.python.org/pypi/bcpandas/)
+[![Awesome Badges](https://img.shields.io/badge/badges-awesome-green.svg)](https://github.com/Naereen/badges)
+[![Downloads](https://static.pepy.tech/personalized-badge/bcpandas?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pypi%20total%20downloads)](https://pepy.tech/project/bcpandas)
+[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/yehoshuadimarsky/bcpandas/master.svg)](https://results.pre-commit.ci/latest/github/yehoshuadimarsky/bcpandas/master)
+
+High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No
+knowledge of BCP required!! (pronounced _BEE-CEE-Pandas_)
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+
+- [Quickstart](#quickstart)
+- [IMPORTANT - Read vs. Write](#important---read-vs-write)
+- [Benchmarks](#benchmarks)
+ - [to_sql](#to_sql)
+ - [Why not just use the new pandas `method='multi'`?](#why-not-just-use-the-new-pandas-methodmulti)
+ - [read_sql](#read_sql)
+- [Requirements](#requirements)
+ - [Database](#database)
+ - [Python User](#python-user)
+- [Installation](#installation)
+- [Usage](#usage)
+ - [Credential/Connection object](#credentialconnection-object)
+ - [Recommended Usage](#recommended-usage)
+- [Known Issues](#known-issues)
+- [Background](#background)
+ - [Existing Solutions](#existing-solutions)
+ - [Design and Scope](#design-and-scope)
+- [Testing](#testing)
+ - [Testing Requirements](#testing-requirements)
+ - [What Is Tested?](#what-is-tested)
+ - [Testing Implementation](#testing-implementation)
+- [CI/CD](#cicd)
+- [Contributing](#contributing)
+- [GitHub Stars Trend](#github-stars-trend)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+## Quickstart
+
+```python
+In [1]: import pandas as pd
+ ...: import numpy as np
+ ...:
+ ...: from bcpandas import SqlCreds, to_sql
+
+In [2]: creds = SqlCreds(
+ ...: 'my_server',
+ ...: 'my_db',
+ ...: 'my_username',
+ ...: 'my_password'
+ ...: )
+
+In [3]: df = pd.DataFrame(
+ ...: data=np.ndarray(shape=(10, 6), dtype=int),
+ ...: columns=[f"col_{x}" for x in range(6)]
+ ...: )
+
+In [4]: df
+Out[4]:
+ col_0 col_1 col_2 col_3 col_4 col_5
+0 4128860 6029375 3801155 5570652 6619251 7536754
+1 4849756 7536751 4456552 7143529 7471201 7012467
+2 6029433 6881357 6881390 7274595 6553710 3342433
+3 6619228 7733358 6029427 6488162 6357104 6553710
+4 7536737 7077980 6422633 7536732 7602281 2949221
+5 6357104 7012451 6750305 7536741 7340124 7274610
+6 7340141 6226036 7274612 7077999 6881387 6029428
+7 6619243 6226041 6881378 6553710 7209065 6029415
+8 6881378 6553710 7209065 7536743 7274588 6619248
+9 6226030 7209065 6619231 6881380 7274612 3014770
+
+In [5]: to_sql(df, 'my_test_table', creds, index=False, if_exists='replace')
+
+In [6]: df2 = pd.read_sql_table(table_name='my_test_table', con=creds.engine)
+
+In [7]: df2
+Out[7]:
+ col_0 col_1 col_2 col_3 col_4 col_5
+0 4128860 6029375 3801155 5570652 6619251 7536754
+1 4849756 7536751 4456552 7143529 7471201 7012467
+2 6029433 6881357 6881390 7274595 6553710 3342433
+3 6619228 7733358 6029427 6488162 6357104 6553710
+4 7536737 7077980 6422633 7536732 7602281 2949221
+5 6357104 7012451 6750305 7536741 7340124 7274610
+6 7340141 6226036 7274612 7077999 6881387 6029428
+7 6619243 6226041 6881378 6553710 7209065 6029415
+8 6881378 6553710 7209065 7536743 7274588 6619248
+9 6226030 7209065 6619231 6881380 7274612 3014770
+```
+
+## IMPORTANT - Read vs. Write
+
+The big speedup benefit of bcpandas is in the `to_sql` function, as the benchmarks below show.
+However, the bcpandas `read_sql` function actually performs **slower** than the pandas equivalent.
+Therefore, the bcpandas `read_sql` function was deprecated in v5.0 and has now been removed in
+v6.0+. To read data **from** SQL to pandas, use the native pandas method `pd.read_sql_table` or
+`pd.read_sql_query`.
+
+## Benchmarks
+
+See figures below. All code is in the `/benchmarks` directory. To run the benchmarks, from the root
+directory of this repository, run `python benchmarks/benchmark.py main --help` and fill in the
+command line options that are presented.
+
+Running this will output
+
+1. PNG image of the graph
+2. JSON file of the benchmark data
+3. JSON file with the environment details of the machine that was used to generate it
+
+### to_sql
+
+> I didn't bother including the pandas non-`multiinsert` version here because it just takes way too
+> long
+
+![to_sql benchmark graph](benchmarks/tosql_benchmark.png)
+
+#### Why not just use the new pandas [`method='multi'`](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method)?
+
+1. Because it is still much slower
+2. Because you are forced to set the `chunksize` parameter to a very small number for it to work -
+ generally a bit less then `2100/<number of columns>`. This is because SQL Server can only accept
+ up to 2100 parameters in a query. See
+ [here](https://stackoverflow.com/questions/50689082/to-sql-pyodbc-count-field-incorrect-or-syntax-error)
+ and [here](https://github.com/mkleehammer/pyodbc/issues/217) for more discussion on this, and
+ the recommendation to use a bulk insert tool such as BCP. It seems that SQL Server simply didn't
+ design the regular `INSERT` statement to support huge amounts of data.
+
+### read_sql
+
+As you can see, pandas native clearly wins here
+
+![read_sql benchmark graph](benchmarks/readsql_benchmark.png)
+
+## Requirements
+
+### Database
+
+Any version of Microsoft SQL Server. Can be installed on-prem, in the cloud, on a VM, or one of the
+Azure versions.
+
+### Python User
+
+- [BCP](https://docs.microsoft.com/en-us/sql/tools/bcp-utility) Utility
+- Microsoft ODBC Driver **11, 13, 13.1, or 17** for SQL Server.
+ ([Microsoft Docs](https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server))
+ See the
+ [pyodbc docs](https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Windows)
+ for more on different driver versions.
+- Python >= 3.7
+- `pandas` >= 0.19
+- `sqlalchemy` >= 1.0
+- `pyodbc` as the
+ [supported DBAPI](https://docs.sqlalchemy.org/en/13/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc)
+
+## Installation
+
+| Source | Command |
+| :----: | :-------------------------------------: |
+| PyPI | `pip install bcpandas` |
+| Conda | `conda install -c conda-forge bcpandas` |
+
+## Usage
+
+1. Create creds (see next section)
+2. Replace any `df.to_sql(...)` in your code with `bcpandas.to_sql(df, ...)`
+
+That's it!
+
+### Credential/Connection object
+
+Bcpandas requires a `bcpandas.SqlCreds` object in order to use it, and also a `sqlalchemy.Engine`.
+The user has 2 options when constructing it.
+
+1. Create the bcpandas `SqlCreds` object with just the minimum attributes needed (server, database,
+ username, password), and bcpandas will create a full `Engine` object from this. It will use
+ `pyodbc`, `sqlalchemy`, and the Microsoft ODBC Driver for SQL Server, and will store it in the
+ `.engine` attribute.
+
+ ```python
+ In [1]: from bcpandas import SqlCreds
+
+ In [2]: creds = SqlCreds('my_server', 'my_db', 'my_username', 'my_password')
+
+ In [3]: creds.engine
+ Out[3]: Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password)
+
+ ```
+
+2. Pass a full `Engine` object to the bcpandas `SqlCreds` object, and bcpandas will attempt to
+ parse out the server, database, username, and password to pass to the command line utilities. If
+ a DSN is used, this will fail.
+
+ (continuing example above)
+
+ ```python
+ In [4]: creds2 = SqlCreds.from_engine(creds.engine)
+
+ In [5]: creds2.engine
+ Out[5]: Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password)
+
+ In [6]: creds2
+ Out[6]: SqlCreds(server='my_server', database='my_db', username='my_username', with_krb_auth=False, engine=Engine(mssql+pyodbc:///?odbc_connect=Driver={ODBC Driver 17 for SQL Server};Server=tcp:my_server,1433;Database=my_db;UID=my_username;PWD=my_password), password=[REDACTED])
+ ```
+
+### Recommended Usage
+
+| Feature | Pandas native | BCPandas |
+| ------------------------------------------------- | :----------------: | :----------------: |
+| Super speed | :x: | :white_check_mark: |
+| Good for simple data types like numbers and dates | :x: | :white_check_mark: |
+| Handle messy string data | :white_check_mark: | :x: |
+
+> built with the help of https://www.tablesgenerator.com/markdown_tables# and
+> https://gist.github.com/rxaviers/7360908
+
+## Known Issues
+
+Here are some caveats and limitations of bcpandas.
+
+- Bcpandas has been tested with all ASCII characters 32-127. Unicode characters beyond that range
+ have not been tested.
+- An empty string (`""`) in the dataframe becomes `NULL` in the SQL database instead of remaining
+ an empty string.
+- Because bcpandas first outputs to CSV, it needs to use several specific characters to create the
+ CSV, including a _delimiter_ and a _quote character_. Bcpandas attempts to use characters that
+ are not present in the dataframe for this, going through the possilbe delimiters and quote
+ characters specified in `constants.py`. If all possible characters are present in the dataframe
+ and bcpandas cannot find both a delimiter and quote character to use, it will throw an error.
+ - The BCP utility does **not** ignore delimiter characters when surrounded by quotes, unlike
+ CSVs - see
+ [here](https://docs.microsoft.com/en-us/sql/relational-databases/import-export/specify-field-and-row-terminators-sql-server#characters-supported-as-terminators)
+ in the Microsoft docs.
+- ~~If there is a NaN/Null in the last column of the dataframe it will throw an error. This is due
+ to a BCP issue. See my issue with Microsoft about this
+ [here](https://github.com/MicrosoftDocs/sql-docs/issues/2689).~~ This doesn't seem to be a
+ problem based on the tests.
+
+## Background
+
+Writing data from pandas DataFrames to a SQL database is very slow using the built-in `to_sql`
+method, even with the newly introduced
+[`execute_many`](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method)
+option. For Microsoft SQL Server, a far far faster method is to use the BCP utility provided by
+Microsoft. This utility is a command line tool that transfers data to/from the database and flat
+text files.
+
+This package is a wrapper for seamlessly using the bcp utility from Python using a pandas
+DataFrame. Despite the IO hits, the fastest option by far is saving the data to a CSV file in the
+file system and using the bcp utility to transfer the CSV file to SQL Server. **Best of all, you
+don't need to know anything about using BCP at all!**
+
+### Existing Solutions
+
+> Much credit is due to `bcpy` for the original idea and for some of the code that was adopted and
+> changed.
+
+<details>
+ <summary>bcpy</summary>
+
+[bcpy](https://github.com/titan550/bcpy) has several flaws:
+
+- No support for reading from SQL, only writing to SQL
+- A convoluted, overly class-based internal design
+- Scope a bit too broad - deals with pandas as well as flat files This repository aims to fix and
+improve on `bcpy` and the above issues by making the design choices described earlier.
+</details>
+
+### Design and Scope
+
+The _**only**_ scope of `bcpandas` is to read and write between a pandas DataFrame and a Microsoft
+SQL Server database. That's it. We do _**not**_ concern ourselves with reading existing flat files
+to/from SQL - that introduces _way_ to much complexity in trying to parse and decode the various
+parts of the file, like delimiters, quote characters, and line endings. Instead, to read/write an
+exiting flat file, just import it via pandas into a DataFrame, and then use `bcpandas`.
+
+The big benefit of this is that we get to precicely control all the finicky parts of the text file
+when we write/read it to a local file and then in the BCP utility. This lets us set library-wide
+defaults (maybe configurable in the future) and work with those.
+
+For now, we are using the non-XML BCP format file type. In the future, XML format files may be
+added.
+
+## Testing
+
+### Testing Requirements
+
+- Docker Desktop installed, either of the Linux or Windows runtimes, doesn't matter
+- pytest
+- hypothesis
+- pytest-cov (coverage.py)
+
+### What Is Tested?
+
+We take testing very seriously here. In order to rely on a library like this in production, it
+**MUST** be ruthlessly tested, which thankfully it is. Here is a partial list of what has been
+tested so far. Pull Requests welcome!
+
+- Data types: All ASCII characters 32-127 (using the Hypothesis library, see below). Unicode
+ characters beyond that range have not been tested.
+- `numpy.NaN`, `None`
+- `numpy.inf` (fails, as expected)
+- Empty dataframe (nothing happens, database not modified)
+- Duplicate column names (raises error)
+- Database columns that are missing from the dataframe, are out of order, or both (passes)
+- Extra dataframe columns that aren't in database, when `if_exists="append"` specified (fails)
+
+### Testing Implementation
+
+- Testing uses `pytest`.
+- To test for all possible data types, we use the `hypothesis` library, instead of trying to come
+ up with every single case on our own.
+- `pytest-cov` (which uses `coverage.py` under the hood) is used to measure code coverage. This is
+ then uploaded to [codecov.io](https://codecov.io/gh/yehoshuadimarsky/bcpandas) as part of the
+ CI/CD process (see below).
+- In order to spin up a local SQL Server during testing, we use Docker. Specifically, we run one of
+ the images that Microsoft provides that already have SQL Server fully installed, all we have to
+ do is use the image to run a container. Here are the links to the
+ [Linux versions](https://hub.docker.com/_/microsoft-mssql-server) and the Windows versions -
+ [Express](https://hub.docker.com/r/microsoft/mssql-server-windows-express/) and
+ [Developer](https://hub.docker.com/r/microsoft/mssql-server-windows-developer).
+ - When running the tests, we can specify a specific Docker image to use, by invoking the custom
+ command line option called `--mssql-docker-image`. For example:
+ ```bash
+ pytest bcpandas/tests --mssql-docker-image mcr.microsoft.com/mssql/server:2019-latest
+ ```
+- Instead of using the `subprocess` library to control Docker manually, we use the elegant
+ `docker-py` library which works very nicely. A `DockerDB` Python class is defined in
+ `bcpandas/tests/utils.py` and it wraps up all the Docker commands and functionality needed to use
+ SQL Server into one class. This class is used in `conftest.py` in the core bcpandas tests, and in
+ the `benchmarks/` directory for both the benchmarks code as well as the legacy tests for
+ `read_sql`.
+
+## CI/CD
+
+Github Actions is used for CI/CD, although it is still somewhat a work in progress.
+
+## Contributing
+
+Please, all contributions are very welcome!
+
+I will attempt to use the `pandas` docstring style as detailed
+[here](https://pandas.pydata.org/pandas-docs/stable/development/contributing_docstring.html).
+
+## GitHub Stars Trend
+
+[![Stargazers over time](https://starchart.cc/yehoshuadimarsky/bcpandas.svg)](https://starchart.cc/yehoshuadimarsky/bcpandas)
+
+
+%prep
+%autosetup -n bcpandas-2.4.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-bcpandas -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 2.4.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..4636353
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+f06bcaf9931b2daeb898de70b2b9f7b8 bcpandas-2.4.0.tar.gz