%global _empty_manifest_terminate_build 0
Name:		python-csv-dataset
Version:	3.5.0
Release:	1
Summary:	csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion
License:	MIT
URL:		https://github.com/kaelzhang/python-csv-dataset
Source0:	https://mirrors.aliyun.com/pypi/web/packages/e9/14/d504b2a84cb0ebcec98b5fdbefa59dfcd2e4cd5526f1bb81ee53fecb294e/csv-dataset-3.5.0.tar.gz
BuildArch:	noarch

Requires:	python3-numpy
Requires:	python3-common-decorators

%description
[![](https://travis-ci.org/kaelzhang/python-csv-dataset.svg?branch=master)](https://travis-ci.org/kaelzhang/python-csv-dataset)
[![](https://codecov.io/gh/kaelzhang/python-csv-dataset/branch/master/graph/badge.svg)](https://codecov.io/gh/kaelzhang/python-csv-dataset)
[![](https://img.shields.io/pypi/v/csv-dataset.svg)](https://pypi.org/project/csv-dataset/)
[![](https://img.shields.io/pypi/l/csv-dataset.svg)](https://github.com/kaelzhang/python-csv-dataset)

# csv-dataset

`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning.

`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory.

## Install

```sh
$ pip install csv-dataset
```

## Usage

Suppose we have a csv file whose absolute path is `filepath`:

```csv
open_time,open,high,low,close,volume
1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283
1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931
1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628
1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367
1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154
1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308
...
```

```py
from csv_dataset import (
    Dataset,
    CsvReader
)

dataset = CsvDataset(
    CsvReader(
        filepath,
        float,
        # Abandon the first column and only pick the following
        indexes=[1, 2, 3, 4, 5],
        header=True
    )
).window(3, 1).batch(2)

for element in dataset:
    print(element)
```

The following output shows one print.

```sh
[[[7145.99,  7150.0,   7141.01,  7142.33,   21.094283]
  [7142.89,  7142.99,  7120.7,   7125.73,  118.279931]
  [7125.76,  7134.46,  7123.12,  7123.12,   41.03628 ]]

 [[7142.89,  7142.99,  7120.7,   7125.73,  118.279931]
  [7125.76,  7134.46,  7123.12,  7123.12,   41.03628 ]
  [7123.74,  7128.06,  7117.12,  7126.57,   39.885367]]]

...
```

### Dataset(reader: AbstractReader)

#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self

Defines the window size, shift and stride.

The default window size is `1` which means the dataset has no window.

**Parameter explanation**

Suppose we have a raw data set

```
[ 1  2  3  4  5  6  7  8  9 ... ]
```

And the following is a window of `(size=4, shift=3, stride=2)`

```
          |-------------- size:4 --------------|
          |- stride:2 -|                       |
          |            |                       |
win 0:  [ 1            3           5           7  ] --------|-----
                                                       shift:3
win 1:  [ 4            6           8           10 ] --------|-----

win 2:  [ 7            9           11          13 ]

...
```

#### dataset.batch(batch: int) -> self

Defines batch size.

The default batch size of the dataset is `1` which means it is single-batch

If batch is `2`

```
batch 0:  [[ 1            3           5           7  ]
           [ 4            6           8           10 ]]

batch 1:  [[ 7            9           11          13 ]
           [ 10           12          14          16 ]]

...
```

#### dataset.get() -> Optional[np.ndarray]

Gets the data of the next batch

#### dataset.reset() -> self

Resets dataset

#### dataset.read(amount: int, reset_buffer: bool = False)

- **amount** the maximum length of data the dataset will read
- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer

Reads multiple batches at a time

If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read.

#### dataset.reset_buffer() -> None

Reset buffer, so that the next read will have no overlap with the last one

#### dataset.lines_need(reads: int) -> int

Calculates and returns how many lines of the underlying datum are needed for reading `reads` times

#### dataset.max_reads(max_lines: int) -> int | None

Calculates `max_lines` lines could afford how many reads

#### dataset.max_reads() -> int | None

Calculates the current reader could afford how many reads.

If `max_lines` of current reader is unset, then it returns `None`

### CsvReader(filepath, dtype, indexes, **kwargs)

- **filepath** `str` absolute path of the csv file
- **dtype** `Callable` data type. We should only use `float` or `int` for this argument.
- **indexes** `List[int]` column indexes to pick from the lines of the csv file
- **kwargs**
    - **header** `bool = False` whether we should skip reading the header line.
    - **splitter** `str = ','` the column splitter of the csv file
    - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum.
    - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit.

#### reader.reset()

Resets reader pos

#### property reader.max_lines

Gets `max_lines`

#### setter reader.max_lines = lines

Changes `max_lines`

#### reader.readline() -> list

Returns the converted value of the next line

#### reader csvReader.lines

Returns number of lines has been read

## License

[MIT](LICENSE)


%package -n python3-csv-dataset
Summary:	csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion
Provides:	python-csv-dataset
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-csv-dataset
[![](https://travis-ci.org/kaelzhang/python-csv-dataset.svg?branch=master)](https://travis-ci.org/kaelzhang/python-csv-dataset)
[![](https://codecov.io/gh/kaelzhang/python-csv-dataset/branch/master/graph/badge.svg)](https://codecov.io/gh/kaelzhang/python-csv-dataset)
[![](https://img.shields.io/pypi/v/csv-dataset.svg)](https://pypi.org/project/csv-dataset/)
[![](https://img.shields.io/pypi/l/csv-dataset.svg)](https://github.com/kaelzhang/python-csv-dataset)

# csv-dataset

`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning.

`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory.

## Install

```sh
$ pip install csv-dataset
```

## Usage

Suppose we have a csv file whose absolute path is `filepath`:

```csv
open_time,open,high,low,close,volume
1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283
1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931
1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628
1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367
1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154
1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308
...
```

```py
from csv_dataset import (
    Dataset,
    CsvReader
)

dataset = CsvDataset(
    CsvReader(
        filepath,
        float,
        # Abandon the first column and only pick the following
        indexes=[1, 2, 3, 4, 5],
        header=True
    )
).window(3, 1).batch(2)

for element in dataset:
    print(element)
```

The following output shows one print.

```sh
[[[7145.99,  7150.0,   7141.01,  7142.33,   21.094283]
  [7142.89,  7142.99,  7120.7,   7125.73,  118.279931]
  [7125.76,  7134.46,  7123.12,  7123.12,   41.03628 ]]

 [[7142.89,  7142.99,  7120.7,   7125.73,  118.279931]
  [7125.76,  7134.46,  7123.12,  7123.12,   41.03628 ]
  [7123.74,  7128.06,  7117.12,  7126.57,   39.885367]]]

...
```

### Dataset(reader: AbstractReader)

#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self

Defines the window size, shift and stride.

The default window size is `1` which means the dataset has no window.

**Parameter explanation**

Suppose we have a raw data set

```
[ 1  2  3  4  5  6  7  8  9 ... ]
```

And the following is a window of `(size=4, shift=3, stride=2)`

```
          |-------------- size:4 --------------|
          |- stride:2 -|                       |
          |            |                       |
win 0:  [ 1            3           5           7  ] --------|-----
                                                       shift:3
win 1:  [ 4            6           8           10 ] --------|-----

win 2:  [ 7            9           11          13 ]

...
```

#### dataset.batch(batch: int) -> self

Defines batch size.

The default batch size of the dataset is `1` which means it is single-batch

If batch is `2`

```
batch 0:  [[ 1            3           5           7  ]
           [ 4            6           8           10 ]]

batch 1:  [[ 7            9           11          13 ]
           [ 10           12          14          16 ]]

...
```

#### dataset.get() -> Optional[np.ndarray]

Gets the data of the next batch

#### dataset.reset() -> self

Resets dataset

#### dataset.read(amount: int, reset_buffer: bool = False)

- **amount** the maximum length of data the dataset will read
- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer

Reads multiple batches at a time

If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read.

#### dataset.reset_buffer() -> None

Reset buffer, so that the next read will have no overlap with the last one

#### dataset.lines_need(reads: int) -> int

Calculates and returns how many lines of the underlying datum are needed for reading `reads` times

#### dataset.max_reads(max_lines: int) -> int | None

Calculates `max_lines` lines could afford how many reads

#### dataset.max_reads() -> int | None

Calculates the current reader could afford how many reads.

If `max_lines` of current reader is unset, then it returns `None`

### CsvReader(filepath, dtype, indexes, **kwargs)

- **filepath** `str` absolute path of the csv file
- **dtype** `Callable` data type. We should only use `float` or `int` for this argument.
- **indexes** `List[int]` column indexes to pick from the lines of the csv file
- **kwargs**
    - **header** `bool = False` whether we should skip reading the header line.
    - **splitter** `str = ','` the column splitter of the csv file
    - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum.
    - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit.

#### reader.reset()

Resets reader pos

#### property reader.max_lines

Gets `max_lines`

#### setter reader.max_lines = lines

Changes `max_lines`

#### reader.readline() -> list

Returns the converted value of the next line

#### reader csvReader.lines

Returns number of lines has been read

## License

[MIT](LICENSE)


%package help
Summary:	Development documents and examples for csv-dataset
Provides:	python3-csv-dataset-doc
%description help
[![](https://travis-ci.org/kaelzhang/python-csv-dataset.svg?branch=master)](https://travis-ci.org/kaelzhang/python-csv-dataset)
[![](https://codecov.io/gh/kaelzhang/python-csv-dataset/branch/master/graph/badge.svg)](https://codecov.io/gh/kaelzhang/python-csv-dataset)
[![](https://img.shields.io/pypi/v/csv-dataset.svg)](https://pypi.org/project/csv-dataset/)
[![](https://img.shields.io/pypi/l/csv-dataset.svg)](https://github.com/kaelzhang/python-csv-dataset)

# csv-dataset

`CsvDataset` helps to read a csv file and create descriptive and efficient input pipelines for deep learning.

`CsvDataset` iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory.

## Install

```sh
$ pip install csv-dataset
```

## Usage

Suppose we have a csv file whose absolute path is `filepath`:

```csv
open_time,open,high,low,close,volume
1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283
1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931
1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628
1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367
1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154
1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308
...
```

```py
from csv_dataset import (
    Dataset,
    CsvReader
)

dataset = CsvDataset(
    CsvReader(
        filepath,
        float,
        # Abandon the first column and only pick the following
        indexes=[1, 2, 3, 4, 5],
        header=True
    )
).window(3, 1).batch(2)

for element in dataset:
    print(element)
```

The following output shows one print.

```sh
[[[7145.99,  7150.0,   7141.01,  7142.33,   21.094283]
  [7142.89,  7142.99,  7120.7,   7125.73,  118.279931]
  [7125.76,  7134.46,  7123.12,  7123.12,   41.03628 ]]

 [[7142.89,  7142.99,  7120.7,   7125.73,  118.279931]
  [7125.76,  7134.46,  7123.12,  7123.12,   41.03628 ]
  [7123.74,  7128.06,  7117.12,  7126.57,   39.885367]]]

...
```

### Dataset(reader: AbstractReader)

#### dataset.window(size: int, shift: int = None, stride: int = 1) -> self

Defines the window size, shift and stride.

The default window size is `1` which means the dataset has no window.

**Parameter explanation**

Suppose we have a raw data set

```
[ 1  2  3  4  5  6  7  8  9 ... ]
```

And the following is a window of `(size=4, shift=3, stride=2)`

```
          |-------------- size:4 --------------|
          |- stride:2 -|                       |
          |            |                       |
win 0:  [ 1            3           5           7  ] --------|-----
                                                       shift:3
win 1:  [ 4            6           8           10 ] --------|-----

win 2:  [ 7            9           11          13 ]

...
```

#### dataset.batch(batch: int) -> self

Defines batch size.

The default batch size of the dataset is `1` which means it is single-batch

If batch is `2`

```
batch 0:  [[ 1            3           5           7  ]
           [ 4            6           8           10 ]]

batch 1:  [[ 7            9           11          13 ]
           [ 10           12          14          16 ]]

...
```

#### dataset.get() -> Optional[np.ndarray]

Gets the data of the next batch

#### dataset.reset() -> self

Resets dataset

#### dataset.read(amount: int, reset_buffer: bool = False)

- **amount** the maximum length of data the dataset will read
- **reset_buffer** if `True`, the dataset will reset the data of the previous window in the buffer

Reads multiple batches at a time

If we `reset_buffer`, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read.

#### dataset.reset_buffer() -> None

Reset buffer, so that the next read will have no overlap with the last one

#### dataset.lines_need(reads: int) -> int

Calculates and returns how many lines of the underlying datum are needed for reading `reads` times

#### dataset.max_reads(max_lines: int) -> int | None

Calculates `max_lines` lines could afford how many reads

#### dataset.max_reads() -> int | None

Calculates the current reader could afford how many reads.

If `max_lines` of current reader is unset, then it returns `None`

### CsvReader(filepath, dtype, indexes, **kwargs)

- **filepath** `str` absolute path of the csv file
- **dtype** `Callable` data type. We should only use `float` or `int` for this argument.
- **indexes** `List[int]` column indexes to pick from the lines of the csv file
- **kwargs**
    - **header** `bool = False` whether we should skip reading the header line.
    - **splitter** `str = ','` the column splitter of the csv file
    - **normalizer** `List[NormalizerProtocol]` list of normalizer to normalize each column of data. A `NormalizerProtocol` should contains two methods, `normalize(float) -> float` to normalize the given datum and `restore(float) -> float` to restore the normalized datum.
    - **max_lines** `int = -1` max lines of the csv file to be read. Defaults to `-1` which means no limit.

#### reader.reset()

Resets reader pos

#### property reader.max_lines

Gets `max_lines`

#### setter reader.max_lines = lines

Changes `max_lines`

#### reader.readline() -> list

Returns the converted value of the next line

#### reader csvReader.lines

Returns number of lines has been read

## License

[MIT](LICENSE)


%prep
%autosetup -n csv-dataset-3.5.0

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-csv-dataset -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 3.5.0-1
- Package Spec generated