diff options
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-pd-helper.spec | 385 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 387 insertions, 0 deletions
@@ -0,0 +1 @@ +/pd_helper-1.0.0.tar.gz diff --git a/python-pd-helper.spec b/python-pd-helper.spec new file mode 100644 index 0000000..a24de3d --- /dev/null +++ b/python-pd-helper.spec @@ -0,0 +1,385 @@ +%global _empty_manifest_terminate_build 0 +Name: python-pd-helper +Version: 1.0.0 +Release: 1 +Summary: A helpful script to optimize a Pandas DataFrame. +License: MIT License +URL: https://github.com/justinhchae/pd-helper +Source0: https://mirrors.aliyun.com/pypi/web/packages/91/90/e3db69d9c398cecc805a93885b8494974a7f1f579a5a62340148379be1d5/pd_helper-1.0.0.tar.gz +BuildArch: noarch + +Requires: python3-pandas +Requires: python3-numpy +Requires: python3-tqdm +Requires: python3-shortuuid + +%description +# pd-helper + + A helpful package to streamline Pandas DataFrame optimization. + + Save 50-75% on DataFrame memory usage by running the optimizer. + + Autoconfigure dtypes for appropriate data types in each column with **helper**. + + Generate a random DataFrame of controlled random variables for testing with **maker**. + +## Install + ```bash + pip install pd-helper + ``` + +## Basic Usage to Iterate over DataFrame +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + df = optimize(df) +``` +## Better Usage With Multiprocessing +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + df = optimize(df, enable_mp=True) +``` + +## Specify Special Mappings +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + special_mappings = {'string': ['object_id'], + 'category': ['item_name']} + + # special mappings will be applied instead of by optimize ruleset, they will be returned. + df = optimize(df + , enable_mp=True, + special_mappings=special_mappings + ) +``` + + +## Sample Results with Helper + +```bash +Starting with 175.63 MB memory. + +After optmization. + +Ending with 65.33 MB memory. +``` + +## Generating a Randomly Imperfect DataFrame with Maker + + Maker provides a class, MakeData(), to generate a table of made-up records. + + Each row is an event where an item was retrieved. + + Options to make the table imperfectly random in various ways. + + Sample table below: + +| | Retrieved Date | Item Name | Retrieved | Condition | Sector | +| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | +| Example | 2019-01-01, 2019-03-4 | Toaster, Lighter | True, False | Junk, Excellent | 1, 2 | +| Data Type | String | String | String | String | Integer | + + +## References + +* Pandas Categorical: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html> + +* Pandas Pickle: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html> + +* Pandas CSV: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html> + +* Pandas Datetime: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html> + +### TODO + +* Improve efficiency of iterating on DataFrame. + +* Allow user to toggle logging. + +* Provide tools for imputing missing data. + + + + +%package -n python3-pd-helper +Summary: A helpful script to optimize a Pandas DataFrame. +Provides: python-pd-helper +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-pd-helper +# pd-helper + + A helpful package to streamline Pandas DataFrame optimization. + + Save 50-75% on DataFrame memory usage by running the optimizer. + + Autoconfigure dtypes for appropriate data types in each column with **helper**. + + Generate a random DataFrame of controlled random variables for testing with **maker**. + +## Install + ```bash + pip install pd-helper + ``` + +## Basic Usage to Iterate over DataFrame +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + df = optimize(df) +``` +## Better Usage With Multiprocessing +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + df = optimize(df, enable_mp=True) +``` + +## Specify Special Mappings +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + special_mappings = {'string': ['object_id'], + 'category': ['item_name']} + + # special mappings will be applied instead of by optimize ruleset, they will be returned. + df = optimize(df + , enable_mp=True, + special_mappings=special_mappings + ) +``` + + +## Sample Results with Helper + +```bash +Starting with 175.63 MB memory. + +After optmization. + +Ending with 65.33 MB memory. +``` + +## Generating a Randomly Imperfect DataFrame with Maker + + Maker provides a class, MakeData(), to generate a table of made-up records. + + Each row is an event where an item was retrieved. + + Options to make the table imperfectly random in various ways. + + Sample table below: + +| | Retrieved Date | Item Name | Retrieved | Condition | Sector | +| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | +| Example | 2019-01-01, 2019-03-4 | Toaster, Lighter | True, False | Junk, Excellent | 1, 2 | +| Data Type | String | String | String | String | Integer | + + +## References + +* Pandas Categorical: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html> + +* Pandas Pickle: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html> + +* Pandas CSV: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html> + +* Pandas Datetime: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html> + +### TODO + +* Improve efficiency of iterating on DataFrame. + +* Allow user to toggle logging. + +* Provide tools for imputing missing data. + + + + +%package help +Summary: Development documents and examples for pd-helper +Provides: python3-pd-helper-doc +%description help +# pd-helper + + A helpful package to streamline Pandas DataFrame optimization. + + Save 50-75% on DataFrame memory usage by running the optimizer. + + Autoconfigure dtypes for appropriate data types in each column with **helper**. + + Generate a random DataFrame of controlled random variables for testing with **maker**. + +## Install + ```bash + pip install pd-helper + ``` + +## Basic Usage to Iterate over DataFrame +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + df = optimize(df) +``` +## Better Usage With Multiprocessing +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + df = optimize(df, enable_mp=True) +``` + +## Specify Special Mappings +```python +from pd_helper.maker import MakeData +from pd_helper.helper import optimize +faker = MakeData() + +if __name__ == "__main__": + # MakeData() generates a fake dataframe, convenient for testing + df = faker.make_df() + special_mappings = {'string': ['object_id'], + 'category': ['item_name']} + + # special mappings will be applied instead of by optimize ruleset, they will be returned. + df = optimize(df + , enable_mp=True, + special_mappings=special_mappings + ) +``` + + +## Sample Results with Helper + +```bash +Starting with 175.63 MB memory. + +After optmization. + +Ending with 65.33 MB memory. +``` + +## Generating a Randomly Imperfect DataFrame with Maker + + Maker provides a class, MakeData(), to generate a table of made-up records. + + Each row is an event where an item was retrieved. + + Options to make the table imperfectly random in various ways. + + Sample table below: + +| | Retrieved Date | Item Name | Retrieved | Condition | Sector | +| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | +| Example | 2019-01-01, 2019-03-4 | Toaster, Lighter | True, False | Junk, Excellent | 1, 2 | +| Data Type | String | String | String | String | Integer | + + +## References + +* Pandas Categorical: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html> + +* Pandas Pickle: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html> + +* Pandas CSV: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html> + +* Pandas Datetime: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html> + +### TODO + +* Improve efficiency of iterating on DataFrame. + +* Allow user to toggle logging. + +* Provide tools for imputing missing data. + + + + +%prep +%autosetup -n pd_helper-1.0.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-pd-helper -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.0-1 +- Package Spec generated @@ -0,0 +1 @@ +94d0e1ee5ebbcec038bfd5adfc91ec97 pd_helper-1.0.0.tar.gz |