summaryrefslogtreecommitdiff
path: root/python-csv-dataset.spec
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-06-20 04:21:03 +0000
committerCoprDistGit <infra@openeuler.org>2023-06-20 04:21:03 +0000
commit45bba2334bf0f0286d83a562859d01d61b3953ee (patch)
tree91473c3edaedd69eea1d84a6a8d57e8f59d89661 /python-csv-dataset.spec
parent6028109393276b402fa3fdab4d375688d2d2d386 (diff)
automatic import of python-csv-datasetopeneuler20.03
Diffstat (limited to 'python-csv-dataset.spec')
-rw-r--r--python-csv-dataset.spec632
1 files changed, 632 insertions, 0 deletions
diff --git a/python-csv-dataset.spec b/python-csv-dataset.spec
new file mode 100644
index 0000000..0b1716d
--- /dev/null
+++ b/python-csv-dataset.spec
@@ -0,0 +1,632 @@
+%global _empty_manifest_terminate_build 0
+Name: python-csv-dataset
+Version: 3.5.0
+Release: 1
+Summary: csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion
+License: MIT
+URL: https://github.com/kaelzhang/python-csv-dataset
+Source0: https://mirrors.aliyun.com/pypi/web/packages/e9/14/d504b2a84cb0ebcec98b5fdbefa59dfcd2e4cd5526f1bb81ee53fecb294e/csv-dataset-3.5.0.tar.gz
+BuildArch: noarch
+
+Requires: python3-numpy
+Requires: python3-common-decorators
+
+%description
+[![](https://travis-ci.org/kaelzhang/python-csv-dataset.svg?branch=master)](https://travis-ci.org/kaelzhang/python-csv-dataset)
+[![](https://codecov.io/gh/kaelzhang/python-csv-dataset/branch/master/graph/badge.svg)](https://codecov.io/gh/kaelzhang/python-csv-dataset)
+[![](https://img.shields.io/pypi/v/csv-dataset.svg)](https://pypi.org/project/csv-dataset/)
+[![](https://img.shields.io/pypi/l/csv-dataset.svg)](https://github.com/kaelzhang/python-csv-dataset)
+
+# csv-dataset
+
+`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning.
+
+`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory.
+
+## Install
+
+```sh
+$ pip install csv-dataset
+```
+
+## Usage
+
+Suppose we have a csv file whose absolute path is `filepath`:
+
+```csv
+open_time,open,high,low,close,volume
+1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283
+1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931
+1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628
+1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367
+1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154
+1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308
+...
+```
+
+```py
+from csv_dataset import (
+ Dataset,
+ CsvReader
+)
+
+dataset = CsvDataset(
+ CsvReader(
+ filepath,
+ float,
+ # Abandon the first column and only pick the following
+ indexes=[1, 2, 3, 4, 5],
+ header=True
+ )
+).window(3, 1).batch(2)
+
+for element in dataset:
+ print(element)
+```
+
+The following output shows one print.
+
+```sh
+[[[7145.99, 7150.0, 7141.01, 7142.33, 21.094283]
+ [7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
+ [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]]
+
+ [[7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
+ [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]
+ [7123.74, 7128.06, 7117.12, 7126.57, 39.885367]]]
+
+...
+```
+
+### Dataset(reader: AbstractReader)
+
+#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self
+
+Defines the window size, shift and stride.
+
+The default window size is `1` which means the dataset has no window.
+
+**Parameter explanation**
+
+Suppose we have a raw data set
+
+```
+[ 1 2 3 4 5 6 7 8 9 ... ]
+```
+
+And the following is a window of `(size=4, shift=3, stride=2)`
+
+```
+ |-------------- size:4 --------------|
+ |- stride:2 -| |
+ | | |
+win 0: [ 1 3 5 7 ] --------|-----
+ shift:3
+win 1: [ 4 6 8 10 ] --------|-----
+
+win 2: [ 7 9 11 13 ]
+
+...
+```
+
+#### dataset.batch(batch: int) -> self
+
+Defines batch size.
+
+The default batch size of the dataset is `1` which means it is single-batch
+
+If batch is `2`
+
+```
+batch 0: [[ 1 3 5 7 ]
+ [ 4 6 8 10 ]]
+
+batch 1: [[ 7 9 11 13 ]
+ [ 10 12 14 16 ]]
+
+...
+```
+
+#### dataset.get() -> Optional[np.ndarray]
+
+Gets the data of the next batch
+
+#### dataset.reset() -> self
+
+Resets dataset
+
+#### dataset.read(amount: int, reset_buffer: bool = False)
+
+- **amount** the maximum length of data the dataset will read
+- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer
+
+Reads multiple batches at a time
+
+If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read.
+
+#### dataset.reset_buffer() -> None
+
+Reset buffer, so that the next read will have no overlap with the last one
+
+#### dataset.lines_need(reads: int) -> int
+
+Calculates and returns how many lines of the underlying datum are needed for reading `reads` times
+
+#### dataset.max_reads(max_lines: int) -> int | None
+
+Calculates `max_lines` lines could afford how many reads
+
+#### dataset.max_reads() -> int | None
+
+Calculates the current reader could afford how many reads.
+
+If `max_lines` of current reader is unset, then it returns `None`
+
+### CsvReader(filepath, dtype, indexes, **kwargs)
+
+- **filepath** `str` absolute path of the csv file
+- **dtype** `Callable` data type. We should only use `float` or `int` for this argument.
+- **indexes** `List[int]` column indexes to pick from the lines of the csv file
+- **kwargs**
+ - **header** `bool = False` whether we should skip reading the header line.
+ - **splitter** `str = ','` the column splitter of the csv file
+ - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum.
+ - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit.
+
+#### reader.reset()
+
+Resets reader pos
+
+#### property reader.max_lines
+
+Gets `max_lines`
+
+#### setter reader.max_lines = lines
+
+Changes `max_lines`
+
+#### reader.readline() -> list
+
+Returns the converted value of the next line
+
+#### reader csvReader.lines
+
+Returns number of lines has been read
+
+## License
+
+[MIT](LICENSE)
+
+
+
+
+%package -n python3-csv-dataset
+Summary: csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion
+Provides: python-csv-dataset
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-csv-dataset
+[![](https://travis-ci.org/kaelzhang/python-csv-dataset.svg?branch=master)](https://travis-ci.org/kaelzhang/python-csv-dataset)
+[![](https://codecov.io/gh/kaelzhang/python-csv-dataset/branch/master/graph/badge.svg)](https://codecov.io/gh/kaelzhang/python-csv-dataset)
+[![](https://img.shields.io/pypi/v/csv-dataset.svg)](https://pypi.org/project/csv-dataset/)
+[![](https://img.shields.io/pypi/l/csv-dataset.svg)](https://github.com/kaelzhang/python-csv-dataset)
+
+# csv-dataset
+
+`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning.
+
+`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory.
+
+## Install
+
+```sh
+$ pip install csv-dataset
+```
+
+## Usage
+
+Suppose we have a csv file whose absolute path is `filepath`:
+
+```csv
+open_time,open,high,low,close,volume
+1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283
+1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931
+1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628
+1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367
+1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154
+1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308
+...
+```
+
+```py
+from csv_dataset import (
+ Dataset,
+ CsvReader
+)
+
+dataset = CsvDataset(
+ CsvReader(
+ filepath,
+ float,
+ # Abandon the first column and only pick the following
+ indexes=[1, 2, 3, 4, 5],
+ header=True
+ )
+).window(3, 1).batch(2)
+
+for element in dataset:
+ print(element)
+```
+
+The following output shows one print.
+
+```sh
+[[[7145.99, 7150.0, 7141.01, 7142.33, 21.094283]
+ [7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
+ [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]]
+
+ [[7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
+ [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]
+ [7123.74, 7128.06, 7117.12, 7126.57, 39.885367]]]
+
+...
+```
+
+### Dataset(reader: AbstractReader)
+
+#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self
+
+Defines the window size, shift and stride.
+
+The default window size is `1` which means the dataset has no window.
+
+**Parameter explanation**
+
+Suppose we have a raw data set
+
+```
+[ 1 2 3 4 5 6 7 8 9 ... ]
+```
+
+And the following is a window of `(size=4, shift=3, stride=2)`
+
+```
+ |-------------- size:4 --------------|
+ |- stride:2 -| |
+ | | |
+win 0: [ 1 3 5 7 ] --------|-----
+ shift:3
+win 1: [ 4 6 8 10 ] --------|-----
+
+win 2: [ 7 9 11 13 ]
+
+...
+```
+
+#### dataset.batch(batch: int) -> self
+
+Defines batch size.
+
+The default batch size of the dataset is `1` which means it is single-batch
+
+If batch is `2`
+
+```
+batch 0: [[ 1 3 5 7 ]
+ [ 4 6 8 10 ]]
+
+batch 1: [[ 7 9 11 13 ]
+ [ 10 12 14 16 ]]
+
+...
+```
+
+#### dataset.get() -> Optional[np.ndarray]
+
+Gets the data of the next batch
+
+#### dataset.reset() -> self
+
+Resets dataset
+
+#### dataset.read(amount: int, reset_buffer: bool = False)
+
+- **amount** the maximum length of data the dataset will read
+- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer
+
+Reads multiple batches at a time
+
+If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read.
+
+#### dataset.reset_buffer() -> None
+
+Reset buffer, so that the next read will have no overlap with the last one
+
+#### dataset.lines_need(reads: int) -> int
+
+Calculates and returns how many lines of the underlying datum are needed for reading `reads` times
+
+#### dataset.max_reads(max_lines: int) -> int | None
+
+Calculates `max_lines` lines could afford how many reads
+
+#### dataset.max_reads() -> int | None
+
+Calculates the current reader could afford how many reads.
+
+If `max_lines` of current reader is unset, then it returns `None`
+
+### CsvReader(filepath, dtype, indexes, **kwargs)
+
+- **filepath** `str` absolute path of the csv file
+- **dtype** `Callable` data type. We should only use `float` or `int` for this argument.
+- **indexes** `List[int]` column indexes to pick from the lines of the csv file
+- **kwargs**
+ - **header** `bool = False` whether we should skip reading the header line.
+ - **splitter** `str = ','` the column splitter of the csv file
+ - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum.
+ - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit.
+
+#### reader.reset()
+
+Resets reader pos
+
+#### property reader.max_lines
+
+Gets `max_lines`
+
+#### setter reader.max_lines = lines
+
+Changes `max_lines`
+
+#### reader.readline() -> list
+
+Returns the converted value of the next line
+
+#### reader csvReader.lines
+
+Returns number of lines has been read
+
+## License
+
+[MIT](LICENSE)
+
+
+
+
+%package help
+Summary: Development documents and examples for csv-dataset
+Provides: python3-csv-dataset-doc
+%description help
+[![](https://travis-ci.org/kaelzhang/python-csv-dataset.svg?branch=master)](https://travis-ci.org/kaelzhang/python-csv-dataset)
+[![](https://codecov.io/gh/kaelzhang/python-csv-dataset/branch/master/graph/badge.svg)](https://codecov.io/gh/kaelzhang/python-csv-dataset)
+[![](https://img.shields.io/pypi/v/csv-dataset.svg)](https://pypi.org/project/csv-dataset/)
+[![](https://img.shields.io/pypi/l/csv-dataset.svg)](https://github.com/kaelzhang/python-csv-dataset)
+
+# csv-dataset
+
+`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning.
+
+`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory.
+
+## Install
+
+```sh
+$ pip install csv-dataset
+```
+
+## Usage
+
+Suppose we have a csv file whose absolute path is `filepath`:
+
+```csv
+open_time,open,high,low,close,volume
+1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283
+1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931
+1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628
+1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367
+1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154
+1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308
+...
+```
+
+```py
+from csv_dataset import (
+ Dataset,
+ CsvReader
+)
+
+dataset = CsvDataset(
+ CsvReader(
+ filepath,
+ float,
+ # Abandon the first column and only pick the following
+ indexes=[1, 2, 3, 4, 5],
+ header=True
+ )
+).window(3, 1).batch(2)
+
+for element in dataset:
+ print(element)
+```
+
+The following output shows one print.
+
+```sh
+[[[7145.99, 7150.0, 7141.01, 7142.33, 21.094283]
+ [7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
+ [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]]
+
+ [[7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
+ [7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]
+ [7123.74, 7128.06, 7117.12, 7126.57, 39.885367]]]
+
+...
+```
+
+### Dataset(reader: AbstractReader)
+
+#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self
+
+Defines the window size, shift and stride.
+
+The default window size is `1` which means the dataset has no window.
+
+**Parameter explanation**
+
+Suppose we have a raw data set
+
+```
+[ 1 2 3 4 5 6 7 8 9 ... ]
+```
+
+And the following is a window of `(size=4, shift=3, stride=2)`
+
+```
+ |-------------- size:4 --------------|
+ |- stride:2 -| |
+ | | |
+win 0: [ 1 3 5 7 ] --------|-----
+ shift:3
+win 1: [ 4 6 8 10 ] --------|-----
+
+win 2: [ 7 9 11 13 ]
+
+...
+```
+
+#### dataset.batch(batch: int) -> self
+
+Defines batch size.
+
+The default batch size of the dataset is `1` which means it is single-batch
+
+If batch is `2`
+
+```
+batch 0: [[ 1 3 5 7 ]
+ [ 4 6 8 10 ]]
+
+batch 1: [[ 7 9 11 13 ]
+ [ 10 12 14 16 ]]
+
+...
+```
+
+#### dataset.get() -> Optional[np.ndarray]
+
+Gets the data of the next batch
+
+#### dataset.reset() -> self
+
+Resets dataset
+
+#### dataset.read(amount: int, reset_buffer: bool = False)
+
+- **amount** the maximum length of data the dataset will read
+- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer
+
+Reads multiple batches at a time
+
+If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read.
+
+#### dataset.reset_buffer() -> None
+
+Reset buffer, so that the next read will have no overlap with the last one
+
+#### dataset.lines_need(reads: int) -> int
+
+Calculates and returns how many lines of the underlying datum are needed for reading `reads` times
+
+#### dataset.max_reads(max_lines: int) -> int | None
+
+Calculates `max_lines` lines could afford how many reads
+
+#### dataset.max_reads() -> int | None
+
+Calculates the current reader could afford how many reads.
+
+If `max_lines` of current reader is unset, then it returns `None`
+
+### CsvReader(filepath, dtype, indexes, **kwargs)
+
+- **filepath** `str` absolute path of the csv file
+- **dtype** `Callable` data type. We should only use `float` or `int` for this argument.
+- **indexes** `List[int]` column indexes to pick from the lines of the csv file
+- **kwargs**
+ - **header** `bool = False` whether we should skip reading the header line.
+ - **splitter** `str = ','` the column splitter of the csv file
+ - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum.
+ - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit.
+
+#### reader.reset()
+
+Resets reader pos
+
+#### property reader.max_lines
+
+Gets `max_lines`
+
+#### setter reader.max_lines = lines
+
+Changes `max_lines`
+
+#### reader.readline() -> list
+
+Returns the converted value of the next line
+
+#### reader csvReader.lines
+
+Returns number of lines has been read
+
+## License
+
+[MIT](LICENSE)
+
+
+
+
+%prep
+%autosetup -n csv-dataset-3.5.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-csv-dataset -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 3.5.0-1
+- Package Spec generated