diff options
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-csv-dataset.spec | 632 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 634 insertions, 0 deletions
@@ -0,0 +1 @@ +/csv-dataset-3.5.0.tar.gz diff --git a/python-csv-dataset.spec b/python-csv-dataset.spec new file mode 100644 index 0000000..0b1716d --- /dev/null +++ b/python-csv-dataset.spec @@ -0,0 +1,632 @@ +%global _empty_manifest_terminate_build 0 +Name: python-csv-dataset +Version: 3.5.0 +Release: 1 +Summary: csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion +License: MIT +URL: https://github.com/kaelzhang/python-csv-dataset +Source0: https://mirrors.aliyun.com/pypi/web/packages/e9/14/d504b2a84cb0ebcec98b5fdbefa59dfcd2e4cd5526f1bb81ee53fecb294e/csv-dataset-3.5.0.tar.gz +BuildArch: noarch + +Requires: python3-numpy +Requires: python3-common-decorators + +%description +[](https://travis-ci.org/kaelzhang/python-csv-dataset) +[](https://codecov.io/gh/kaelzhang/python-csv-dataset) +[](https://pypi.org/project/csv-dataset/) +[](https://github.com/kaelzhang/python-csv-dataset) + +# csv-dataset + +`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning. + +`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory. + +## Install + +```sh +$ pip install csv-dataset +``` + +## Usage + +Suppose we have a csv file whose absolute path is `filepath`: + +```csv +open_time,open,high,low,close,volume +1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283 +1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931 +1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628 +1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367 +1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154 +1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308 +... +``` + +```py +from csv_dataset import ( + Dataset, + CsvReader +) + +dataset = CsvDataset( + CsvReader( + filepath, + float, + # Abandon the first column and only pick the following + indexes=[1, 2, 3, 4, 5], + header=True + ) +).window(3, 1).batch(2) + +for element in dataset: + print(element) +``` + +The following output shows one print. + +```sh +[[[7145.99, 7150.0, 7141.01, 7142.33, 21.094283] + [7142.89, 7142.99, 7120.7, 7125.73, 118.279931] + [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]] + + [[7142.89, 7142.99, 7120.7, 7125.73, 118.279931] + [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ] + [7123.74, 7128.06, 7117.12, 7126.57, 39.885367]]] + +... +``` + +### Dataset(reader: AbstractReader) + +#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self + +Defines the window size, shift and stride. + +The default window size is `1` which means the dataset has no window. + +**Parameter explanation** + +Suppose we have a raw data set + +``` +[ 1 2 3 4 5 6 7 8 9 ... ] +``` + +And the following is a window of `(size=4, shift=3, stride=2)` + +``` + |-------------- size:4 --------------| + |- stride:2 -| | + | | | +win 0: [ 1 3 5 7 ] --------|----- + shift:3 +win 1: [ 4 6 8 10 ] --------|----- + +win 2: [ 7 9 11 13 ] + +... +``` + +#### dataset.batch(batch: int) -> self + +Defines batch size. + +The default batch size of the dataset is `1` which means it is single-batch + +If batch is `2` + +``` +batch 0: [[ 1 3 5 7 ] + [ 4 6 8 10 ]] + +batch 1: [[ 7 9 11 13 ] + [ 10 12 14 16 ]] + +... +``` + +#### dataset.get() -> Optional[np.ndarray] + +Gets the data of the next batch + +#### dataset.reset() -> self + +Resets dataset + +#### dataset.read(amount: int, reset_buffer: bool = False) + +- **amount** the maximum length of data the dataset will read +- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer + +Reads multiple batches at a time + +If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read. + +#### dataset.reset_buffer() -> None + +Reset buffer, so that the next read will have no overlap with the last one + +#### dataset.lines_need(reads: int) -> int + +Calculates and returns how many lines of the underlying datum are needed for reading `reads` times + +#### dataset.max_reads(max_lines: int) -> int | None + +Calculates `max_lines` lines could afford how many reads + +#### dataset.max_reads() -> int | None + +Calculates the current reader could afford how many reads. + +If `max_lines` of current reader is unset, then it returns `None` + +### CsvReader(filepath, dtype, indexes, **kwargs) + +- **filepath** `str` absolute path of the csv file +- **dtype** `Callable` data type. We should only use `float` or `int` for this argument. +- **indexes** `List[int]` column indexes to pick from the lines of the csv file +- **kwargs** + - **header** `bool = False` whether we should skip reading the header line. + - **splitter** `str = ','` the column splitter of the csv file + - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum. + - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit. + +#### reader.reset() + +Resets reader pos + +#### property reader.max_lines + +Gets `max_lines` + +#### setter reader.max_lines = lines + +Changes `max_lines` + +#### reader.readline() -> list + +Returns the converted value of the next line + +#### reader csvReader.lines + +Returns number of lines has been read + +## License + +[MIT](LICENSE) + + + + +%package -n python3-csv-dataset +Summary: csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion +Provides: python-csv-dataset +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-csv-dataset +[](https://travis-ci.org/kaelzhang/python-csv-dataset) +[](https://codecov.io/gh/kaelzhang/python-csv-dataset) +[](https://pypi.org/project/csv-dataset/) +[](https://github.com/kaelzhang/python-csv-dataset) + +# csv-dataset + +`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning. + +`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory. + +## Install + +```sh +$ pip install csv-dataset +``` + +## Usage + +Suppose we have a csv file whose absolute path is `filepath`: + +```csv +open_time,open,high,low,close,volume +1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283 +1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931 +1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628 +1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367 +1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154 +1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308 +... +``` + +```py +from csv_dataset import ( + Dataset, + CsvReader +) + +dataset = CsvDataset( + CsvReader( + filepath, + float, + # Abandon the first column and only pick the following + indexes=[1, 2, 3, 4, 5], + header=True + ) +).window(3, 1).batch(2) + +for element in dataset: + print(element) +``` + +The following output shows one print. + +```sh +[[[7145.99, 7150.0, 7141.01, 7142.33, 21.094283] + [7142.89, 7142.99, 7120.7, 7125.73, 118.279931] + [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]] + + [[7142.89, 7142.99, 7120.7, 7125.73, 118.279931] + [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ] + [7123.74, 7128.06, 7117.12, 7126.57, 39.885367]]] + +... +``` + +### Dataset(reader: AbstractReader) + +#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self + +Defines the window size, shift and stride. + +The default window size is `1` which means the dataset has no window. + +**Parameter explanation** + +Suppose we have a raw data set + +``` +[ 1 2 3 4 5 6 7 8 9 ... ] +``` + +And the following is a window of `(size=4, shift=3, stride=2)` + +``` + |-------------- size:4 --------------| + |- stride:2 -| | + | | | +win 0: [ 1 3 5 7 ] --------|----- + shift:3 +win 1: [ 4 6 8 10 ] --------|----- + +win 2: [ 7 9 11 13 ] + +... +``` + +#### dataset.batch(batch: int) -> self + +Defines batch size. + +The default batch size of the dataset is `1` which means it is single-batch + +If batch is `2` + +``` +batch 0: [[ 1 3 5 7 ] + [ 4 6 8 10 ]] + +batch 1: [[ 7 9 11 13 ] + [ 10 12 14 16 ]] + +... +``` + +#### dataset.get() -> Optional[np.ndarray] + +Gets the data of the next batch + +#### dataset.reset() -> self + +Resets dataset + +#### dataset.read(amount: int, reset_buffer: bool = False) + +- **amount** the maximum length of data the dataset will read +- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer + +Reads multiple batches at a time + +If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read. + +#### dataset.reset_buffer() -> None + +Reset buffer, so that the next read will have no overlap with the last one + +#### dataset.lines_need(reads: int) -> int + +Calculates and returns how many lines of the underlying datum are needed for reading `reads` times + +#### dataset.max_reads(max_lines: int) -> int | None + +Calculates `max_lines` lines could afford how many reads + +#### dataset.max_reads() -> int | None + +Calculates the current reader could afford how many reads. + +If `max_lines` of current reader is unset, then it returns `None` + +### CsvReader(filepath, dtype, indexes, **kwargs) + +- **filepath** `str` absolute path of the csv file +- **dtype** `Callable` data type. We should only use `float` or `int` for this argument. +- **indexes** `List[int]` column indexes to pick from the lines of the csv file +- **kwargs** + - **header** `bool = False` whether we should skip reading the header line. + - **splitter** `str = ','` the column splitter of the csv file + - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum. + - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit. + +#### reader.reset() + +Resets reader pos + +#### property reader.max_lines + +Gets `max_lines` + +#### setter reader.max_lines = lines + +Changes `max_lines` + +#### reader.readline() -> list + +Returns the converted value of the next line + +#### reader csvReader.lines + +Returns number of lines has been read + +## License + +[MIT](LICENSE) + + + + +%package help +Summary: Development documents and examples for csv-dataset +Provides: python3-csv-dataset-doc +%description help +[](https://travis-ci.org/kaelzhang/python-csv-dataset) +[](https://codecov.io/gh/kaelzhang/python-csv-dataset) +[](https://pypi.org/project/csv-dataset/) +[](https://github.com/kaelzhang/python-csv-dataset) + +# csv-dataset + +`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning. + +`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory. + +## Install + +```sh +$ pip install csv-dataset +``` + +## Usage + +Suppose we have a csv file whose absolute path is `filepath`: + +```csv +open_time,open,high,low,close,volume +1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283 +1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931 +1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628 +1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367 +1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154 +1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308 +... +``` + +```py +from csv_dataset import ( + Dataset, + CsvReader +) + +dataset = CsvDataset( + CsvReader( + filepath, + float, + # Abandon the first column and only pick the following + indexes=[1, 2, 3, 4, 5], + header=True + ) +).window(3, 1).batch(2) + +for element in dataset: + print(element) +``` + +The following output shows one print. + +```sh +[[[7145.99, 7150.0, 7141.01, 7142.33, 21.094283] + [7142.89, 7142.99, 7120.7, 7125.73, 118.279931] + [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]] + + [[7142.89, 7142.99, 7120.7, 7125.73, 118.279931] + [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ] + [7123.74, 7128.06, 7117.12, 7126.57, 39.885367]]] + +... +``` + +### Dataset(reader: AbstractReader) + +#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self + +Defines the window size, shift and stride. + +The default window size is `1` which means the dataset has no window. + +**Parameter explanation** + +Suppose we have a raw data set + +``` +[ 1 2 3 4 5 6 7 8 9 ... ] +``` + +And the following is a window of `(size=4, shift=3, stride=2)` + +``` + |-------------- size:4 --------------| + |- stride:2 -| | + | | | +win 0: [ 1 3 5 7 ] --------|----- + shift:3 +win 1: [ 4 6 8 10 ] --------|----- + +win 2: [ 7 9 11 13 ] + +... +``` + +#### dataset.batch(batch: int) -> self + +Defines batch size. + +The default batch size of the dataset is `1` which means it is single-batch + +If batch is `2` + +``` +batch 0: [[ 1 3 5 7 ] + [ 4 6 8 10 ]] + +batch 1: [[ 7 9 11 13 ] + [ 10 12 14 16 ]] + +... +``` + +#### dataset.get() -> Optional[np.ndarray] + +Gets the data of the next batch + +#### dataset.reset() -> self + +Resets dataset + +#### dataset.read(amount: int, reset_buffer: bool = False) + +- **amount** the maximum length of data the dataset will read +- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer + +Reads multiple batches at a time + +If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read. + +#### dataset.reset_buffer() -> None + +Reset buffer, so that the next read will have no overlap with the last one + +#### dataset.lines_need(reads: int) -> int + +Calculates and returns how many lines of the underlying datum are needed for reading `reads` times + +#### dataset.max_reads(max_lines: int) -> int | None + +Calculates `max_lines` lines could afford how many reads + +#### dataset.max_reads() -> int | None + +Calculates the current reader could afford how many reads. + +If `max_lines` of current reader is unset, then it returns `None` + +### CsvReader(filepath, dtype, indexes, **kwargs) + +- **filepath** `str` absolute path of the csv file +- **dtype** `Callable` data type. We should only use `float` or `int` for this argument. +- **indexes** `List[int]` column indexes to pick from the lines of the csv file +- **kwargs** + - **header** `bool = False` whether we should skip reading the header line. + - **splitter** `str = ','` the column splitter of the csv file + - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum. + - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit. + +#### reader.reset() + +Resets reader pos + +#### property reader.max_lines + +Gets `max_lines` + +#### setter reader.max_lines = lines + +Changes `max_lines` + +#### reader.readline() -> list + +Returns the converted value of the next line + +#### reader csvReader.lines + +Returns number of lines has been read + +## License + +[MIT](LICENSE) + + + + +%prep +%autosetup -n csv-dataset-3.5.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-csv-dataset -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 3.5.0-1 +- Package Spec generated @@ -0,0 +1 @@ +1a491fcbb5bc59bd8848f805d03a0ff9 csv-dataset-3.5.0.tar.gz |