%global _empty_manifest_terminate_build 0
Name:		python-migbq
Version:	0.0.85
Release:	1
Summary:	read microsoft sql server table data and upload to bigquery
License:	MIT
URL:		https://github.com/jo8937/migbq
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/67/17/843d78730cc91a64ec9c6bb997176d929fff87e07fcadb2ba45c6a5e5c19/migbq-0.0.85.tar.gz
BuildArch:	noarch

Requires:	python3-ujson
Requires:	python3-peewee
Requires:	python3-pymssql
Requires:	python3-peewee-mssql
Requires:	python3-concurrent-log-handler
Requires:	python3-pyyaml
Requires:	python3-google-cloud-bigquery
Requires:	python3-Jinja2
Requires:	python3-httplib2

%description
# migbq 

rdbms-to-bigquery-data-loader

## Requirement

* Python
  - CPython 2.7.x

* RDBMS (below, DB)  
  - Microsoft SQL Server
  - Mysql (development)

* Table Spec
  - All table must have Numeric Primary Key Field

* DB User Grant
  - SELECT, INSERT, UPDATE, CREATE
  - can access DB's metadata ([INFORMATION_SCHEMA] database) 
  - some metadata tables create in source RDBMS
  - (If you don't want create table in source, you can use sqlite. fork this project and edit source)

* Google Cloud SDK 
  - install Google Cloud SDK must be required 
    - https://cloud.google.com/sdk/downloads
    - https://cloud.google.com/sdk/gcloud/reference/auth/login

* Pymssql freetds
  - http://www.pymssql.org/en/stable/

## Install

```
export PYMSSQL_BUILD_WITH_BUNDLED_FREETDS=1
pip install migbq
```

## Usage

### write Configuration File

* like embulk ( http://www.embulk.org ) 

### Example 

#### general congif file
* config.yml 

```yml
in:
  type: mssql
  host: localhost
  user: USER
  password: PASSWORD
  port: 1433
  database: DATABASE
  tables: 
    - tbl
    - tbl2
    - tbl3
  batch_size: 50000
  temp_csv_path: /temp/pymig_csv
  temp_csv_path_complete: /temp/pymig_csv_complete 
out:
  type: bigquery
  project: GCP_PROJECT
  dataset: BQ_DATASET
```

#### jinja2 template 

* config.j2.yml
 - variable is enviromant variable only.
 - file extension is **.j2.yml** 

```yml
in:
  type: mssql
{% include "mssql-connect.yml" %}
  tables: 
    - tbl
    - tbl2
    - tbl3
  batch_size: 50000
  temp_csv_path: /temp/pymig_csv
  temp_csv_path_complete: /temp/pymig_csv_complete 
out:
  type: bigquery
  project: {{ env.GCP_PROJECT }}
  dataset: BQ_DATASET
```


### Run  

#### (1) Execute

```bash
migbq run config.yml
```

#### (2) Check Job Complete

```bash
migbq check config.yml
```


#### (3) Check table count equals  

```bash
migbq sync config.yml
```

* Primary Key base count check. 

### Run Forever 

* you can add crontab 
* migbq have exclusive process lock. so you can add crontab every minute. 
* you must add both **run** and **check**  


## Description

### run command

**[1]** select RDBMS table metadata 
  - get table primary key name in RDBMS metadata table.
  - get column name and type fields in RDBMS metadata table.

**[2]** select RDBMS Primary key value range 
  - get min / max PK of table 

**[3]** select data in primary key range
  - select with pk min and min + batch_size

```sql
	select * from tbl where 0 < idx and idx <= 100;
```

  - create file **pymig-tbl-idx-1-100** 
  - gzip csv  

**[4]** upload csv file to bigquery  
  - direct upload to bigquery table. not upload to GCS (quota exceed can occur)

**[5]** Repeat 1~4 until over the max primary key. 

For example, batch_size : 100, max pk is 321, then rdbms query execute like below.

```sql

select * from tbl where 0 < idx and idx <= 100;
select * from tbl where 100 < idx and idx <= 200;
select * from tbl where 200 < idx and idx <= 300;
select * from tbl where 300 < idx and idx <= 400;

-- end 

```

### check command

* check bigquery jobid end. 
* retry fail job.


### Log file of program

* log file create in config file's sub directory [log]

### Pid file of program

* pid file provide unique process for unique command. created at below directory. exclusive file lock.


```
/tmp
```

### load metadata table

#### META: migrationmetadata

* one row insert when each 'select' runs

| field name | type   | description                            | smaple value  | etc               |
| ----:     |--------|----------------------------------------|-----------------|-------------|
| tableName | STRING  | target [tableName]                         | tbl             | Primary Key |
| firstPk   | INTEGER | [tableName]'s Min Primary Key value       | 1             |           |
| lastPk    | INTEGER | [tableName]'s Max Primary Key value                  | 123             |           |
| currentPk | STRING  | [tableName]'s read complete Primary Key value                  | 20             |           |
| regDate   | DATETIME| this row's insert date   | 2017-11-29 01:02:03             |           |
| modDate   | DATETIME| firstPk, lastPk modify date              | 2017-11-29 01:02:03             |           |
| endDate   | DATETIME| currentPk reach lastPk date | 2017-11-29 11:22:33             |           |
| pkName    | STRING  | [tableNames]'s Primary Key Name          | idx             |           |
| rowCnt    | INTEGER | [tableNames]'s count(*)              | 123             |           |
| pageTokenCurrent | STRING | not use now                                   | tbl             |           |
| pageTokenNext | STRING |  not use now                                     | tbl             |           |

#### LOG:  migrationmetadatalog

* sequance
  - run :  insert a row to this table when 'select [tableName]' executed
  - run :  update a row to this table when bigquery jobId created 
  - check : update a row to this table's jobComplete and checkComplete when bigquery jobId call ends 

| field name | type   | description                            | smaple value  | etc               |
| ----:     |--------|----------------------------------------|-----------------|-------------|
| idx | BigInt |  PK                                  | 1             | Primary Key Auto Increment |
| tableName | STRING | [tableName]                                  | tbl             | Primary Key |
| regDate   | DATETIME | row insert date   | 2017-11-29 01:02:03             |           |
| endDate   | DATETIME | when jobId is 'DONE'  | 2017-11-29 11:22:33             |           |
| pkName    | STRING | [tableNames]'s Primary Key Name          | idx             |           |
| cnt    | INTEGER | bigquery api : statistics.load.outputRows         | 123             |           |
| pkUpper    | INTEGER | each 'select' executed : [PKName] <= [pkUpper] | 100             |           |
| pkLower    | INTEGER | each 'select' executed : [PKName] > [pkLower]    | 0             |           |
| pkCurrent    | INTEGER | same as pkUpper  | 99             |           |
| jobId    | STRING | bigquery upload job jobId        | job-adf132f31rf3f             |           |
| errorMessage    | STRING | when jodId check result is 'ERROR', then write this  | ERROR:bigquery quota exceed             |           |
| checkComplete | INTEGER | check command       | 1             |           |
| jobComplete | INTEGER |  check command jobId check complete. success=1, fail=-1 | 1             |           |
| pageToken | STRING |  use as etc                                       |              |           |


## loadmap

* parallel loading not supported.  


%package -n python3-migbq
Summary:	read microsoft sql server table data and upload to bigquery
Provides:	python-migbq
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-migbq
# migbq 

rdbms-to-bigquery-data-loader

## Requirement

* Python
  - CPython 2.7.x

* RDBMS (below, DB)  
  - Microsoft SQL Server
  - Mysql (development)

* Table Spec
  - All table must have Numeric Primary Key Field

* DB User Grant
  - SELECT, INSERT, UPDATE, CREATE
  - can access DB's metadata ([INFORMATION_SCHEMA] database) 
  - some metadata tables create in source RDBMS
  - (If you don't want create table in source, you can use sqlite. fork this project and edit source)

* Google Cloud SDK 
  - install Google Cloud SDK must be required 
    - https://cloud.google.com/sdk/downloads
    - https://cloud.google.com/sdk/gcloud/reference/auth/login

* Pymssql freetds
  - http://www.pymssql.org/en/stable/

## Install

```
export PYMSSQL_BUILD_WITH_BUNDLED_FREETDS=1
pip install migbq
```

## Usage

### write Configuration File

* like embulk ( http://www.embulk.org ) 

### Example 

#### general congif file
* config.yml 

```yml
in:
  type: mssql
  host: localhost
  user: USER
  password: PASSWORD
  port: 1433
  database: DATABASE
  tables: 
    - tbl
    - tbl2
    - tbl3
  batch_size: 50000
  temp_csv_path: /temp/pymig_csv
  temp_csv_path_complete: /temp/pymig_csv_complete 
out:
  type: bigquery
  project: GCP_PROJECT
  dataset: BQ_DATASET
```

#### jinja2 template 

* config.j2.yml
 - variable is enviromant variable only.
 - file extension is **.j2.yml** 

```yml
in:
  type: mssql
{% include "mssql-connect.yml" %}
  tables: 
    - tbl
    - tbl2
    - tbl3
  batch_size: 50000
  temp_csv_path: /temp/pymig_csv
  temp_csv_path_complete: /temp/pymig_csv_complete 
out:
  type: bigquery
  project: {{ env.GCP_PROJECT }}
  dataset: BQ_DATASET
```


### Run  

#### (1) Execute

```bash
migbq run config.yml
```

#### (2) Check Job Complete

```bash
migbq check config.yml
```


#### (3) Check table count equals  

```bash
migbq sync config.yml
```

* Primary Key base count check. 

### Run Forever 

* you can add crontab 
* migbq have exclusive process lock. so you can add crontab every minute. 
* you must add both **run** and **check**  


## Description

### run command

**[1]** select RDBMS table metadata 
  - get table primary key name in RDBMS metadata table.
  - get column name and type fields in RDBMS metadata table.

**[2]** select RDBMS Primary key value range 
  - get min / max PK of table 

**[3]** select data in primary key range
  - select with pk min and min + batch_size

```sql
	select * from tbl where 0 < idx and idx <= 100;
```

  - create file **pymig-tbl-idx-1-100** 
  - gzip csv  

**[4]** upload csv file to bigquery  
  - direct upload to bigquery table. not upload to GCS (quota exceed can occur)

**[5]** Repeat 1~4 until over the max primary key. 

For example, batch_size : 100, max pk is 321, then rdbms query execute like below.

```sql

select * from tbl where 0 < idx and idx <= 100;
select * from tbl where 100 < idx and idx <= 200;
select * from tbl where 200 < idx and idx <= 300;
select * from tbl where 300 < idx and idx <= 400;

-- end 

```

### check command

* check bigquery jobid end. 
* retry fail job.


### Log file of program

* log file create in config file's sub directory [log]

### Pid file of program

* pid file provide unique process for unique command. created at below directory. exclusive file lock.


```
/tmp
```

### load metadata table

#### META: migrationmetadata

* one row insert when each 'select' runs

| field name | type   | description                            | smaple value  | etc               |
| ----:     |--------|----------------------------------------|-----------------|-------------|
| tableName | STRING  | target [tableName]                         | tbl             | Primary Key |
| firstPk   | INTEGER | [tableName]'s Min Primary Key value       | 1             |           |
| lastPk    | INTEGER | [tableName]'s Max Primary Key value                  | 123             |           |
| currentPk | STRING  | [tableName]'s read complete Primary Key value                  | 20             |           |
| regDate   | DATETIME| this row's insert date   | 2017-11-29 01:02:03             |           |
| modDate   | DATETIME| firstPk, lastPk modify date              | 2017-11-29 01:02:03             |           |
| endDate   | DATETIME| currentPk reach lastPk date | 2017-11-29 11:22:33             |           |
| pkName    | STRING  | [tableNames]'s Primary Key Name          | idx             |           |
| rowCnt    | INTEGER | [tableNames]'s count(*)              | 123             |           |
| pageTokenCurrent | STRING | not use now                                   | tbl             |           |
| pageTokenNext | STRING |  not use now                                     | tbl             |           |

#### LOG:  migrationmetadatalog

* sequance
  - run :  insert a row to this table when 'select [tableName]' executed
  - run :  update a row to this table when bigquery jobId created 
  - check : update a row to this table's jobComplete and checkComplete when bigquery jobId call ends 

| field name | type   | description                            | smaple value  | etc               |
| ----:     |--------|----------------------------------------|-----------------|-------------|
| idx | BigInt |  PK                                  | 1             | Primary Key Auto Increment |
| tableName | STRING | [tableName]                                  | tbl             | Primary Key |
| regDate   | DATETIME | row insert date   | 2017-11-29 01:02:03             |           |
| endDate   | DATETIME | when jobId is 'DONE'  | 2017-11-29 11:22:33             |           |
| pkName    | STRING | [tableNames]'s Primary Key Name          | idx             |           |
| cnt    | INTEGER | bigquery api : statistics.load.outputRows         | 123             |           |
| pkUpper    | INTEGER | each 'select' executed : [PKName] <= [pkUpper] | 100             |           |
| pkLower    | INTEGER | each 'select' executed : [PKName] > [pkLower]    | 0             |           |
| pkCurrent    | INTEGER | same as pkUpper  | 99             |           |
| jobId    | STRING | bigquery upload job jobId        | job-adf132f31rf3f             |           |
| errorMessage    | STRING | when jodId check result is 'ERROR', then write this  | ERROR:bigquery quota exceed             |           |
| checkComplete | INTEGER | check command       | 1             |           |
| jobComplete | INTEGER |  check command jobId check complete. success=1, fail=-1 | 1             |           |
| pageToken | STRING |  use as etc                                       |              |           |


## loadmap

* parallel loading not supported.  


%package help
Summary:	Development documents and examples for migbq
Provides:	python3-migbq-doc
%description help
# migbq 

rdbms-to-bigquery-data-loader

## Requirement

* Python
  - CPython 2.7.x

* RDBMS (below, DB)  
  - Microsoft SQL Server
  - Mysql (development)

* Table Spec
  - All table must have Numeric Primary Key Field

* DB User Grant
  - SELECT, INSERT, UPDATE, CREATE
  - can access DB's metadata ([INFORMATION_SCHEMA] database) 
  - some metadata tables create in source RDBMS
  - (If you don't want create table in source, you can use sqlite. fork this project and edit source)

* Google Cloud SDK 
  - install Google Cloud SDK must be required 
    - https://cloud.google.com/sdk/downloads
    - https://cloud.google.com/sdk/gcloud/reference/auth/login

* Pymssql freetds
  - http://www.pymssql.org/en/stable/

## Install

```
export PYMSSQL_BUILD_WITH_BUNDLED_FREETDS=1
pip install migbq
```

## Usage

### write Configuration File

* like embulk ( http://www.embulk.org ) 

### Example 

#### general congif file
* config.yml 

```yml
in:
  type: mssql
  host: localhost
  user: USER
  password: PASSWORD
  port: 1433
  database: DATABASE
  tables: 
    - tbl
    - tbl2
    - tbl3
  batch_size: 50000
  temp_csv_path: /temp/pymig_csv
  temp_csv_path_complete: /temp/pymig_csv_complete 
out:
  type: bigquery
  project: GCP_PROJECT
  dataset: BQ_DATASET
```

#### jinja2 template 

* config.j2.yml
 - variable is enviromant variable only.
 - file extension is **.j2.yml** 

```yml
in:
  type: mssql
{% include "mssql-connect.yml" %}
  tables: 
    - tbl
    - tbl2
    - tbl3
  batch_size: 50000
  temp_csv_path: /temp/pymig_csv
  temp_csv_path_complete: /temp/pymig_csv_complete 
out:
  type: bigquery
  project: {{ env.GCP_PROJECT }}
  dataset: BQ_DATASET
```


### Run  

#### (1) Execute

```bash
migbq run config.yml
```

#### (2) Check Job Complete

```bash
migbq check config.yml
```


#### (3) Check table count equals  

```bash
migbq sync config.yml
```

* Primary Key base count check. 

### Run Forever 

* you can add crontab 
* migbq have exclusive process lock. so you can add crontab every minute. 
* you must add both **run** and **check**  


## Description

### run command

**[1]** select RDBMS table metadata 
  - get table primary key name in RDBMS metadata table.
  - get column name and type fields in RDBMS metadata table.

**[2]** select RDBMS Primary key value range 
  - get min / max PK of table 

**[3]** select data in primary key range
  - select with pk min and min + batch_size

```sql
	select * from tbl where 0 < idx and idx <= 100;
```

  - create file **pymig-tbl-idx-1-100** 
  - gzip csv  

**[4]** upload csv file to bigquery  
  - direct upload to bigquery table. not upload to GCS (quota exceed can occur)

**[5]** Repeat 1~4 until over the max primary key. 

For example, batch_size : 100, max pk is 321, then rdbms query execute like below.

```sql

select * from tbl where 0 < idx and idx <= 100;
select * from tbl where 100 < idx and idx <= 200;
select * from tbl where 200 < idx and idx <= 300;
select * from tbl where 300 < idx and idx <= 400;

-- end 

```

### check command

* check bigquery jobid end. 
* retry fail job.


### Log file of program

* log file create in config file's sub directory [log]

### Pid file of program

* pid file provide unique process for unique command. created at below directory. exclusive file lock.


```
/tmp
```

### load metadata table

#### META: migrationmetadata

* one row insert when each 'select' runs

| field name | type   | description                            | smaple value  | etc               |
| ----:     |--------|----------------------------------------|-----------------|-------------|
| tableName | STRING  | target [tableName]                         | tbl             | Primary Key |
| firstPk   | INTEGER | [tableName]'s Min Primary Key value       | 1             |           |
| lastPk    | INTEGER | [tableName]'s Max Primary Key value                  | 123             |           |
| currentPk | STRING  | [tableName]'s read complete Primary Key value                  | 20             |           |
| regDate   | DATETIME| this row's insert date   | 2017-11-29 01:02:03             |           |
| modDate   | DATETIME| firstPk, lastPk modify date              | 2017-11-29 01:02:03             |           |
| endDate   | DATETIME| currentPk reach lastPk date | 2017-11-29 11:22:33             |           |
| pkName    | STRING  | [tableNames]'s Primary Key Name          | idx             |           |
| rowCnt    | INTEGER | [tableNames]'s count(*)              | 123             |           |
| pageTokenCurrent | STRING | not use now                                   | tbl             |           |
| pageTokenNext | STRING |  not use now                                     | tbl             |           |

#### LOG:  migrationmetadatalog

* sequance
  - run :  insert a row to this table when 'select [tableName]' executed
  - run :  update a row to this table when bigquery jobId created 
  - check : update a row to this table's jobComplete and checkComplete when bigquery jobId call ends 

| field name | type   | description                            | smaple value  | etc               |
| ----:     |--------|----------------------------------------|-----------------|-------------|
| idx | BigInt |  PK                                  | 1             | Primary Key Auto Increment |
| tableName | STRING | [tableName]                                  | tbl             | Primary Key |
| regDate   | DATETIME | row insert date   | 2017-11-29 01:02:03             |           |
| endDate   | DATETIME | when jobId is 'DONE'  | 2017-11-29 11:22:33             |           |
| pkName    | STRING | [tableNames]'s Primary Key Name          | idx             |           |
| cnt    | INTEGER | bigquery api : statistics.load.outputRows         | 123             |           |
| pkUpper    | INTEGER | each 'select' executed : [PKName] <= [pkUpper] | 100             |           |
| pkLower    | INTEGER | each 'select' executed : [PKName] > [pkLower]    | 0             |           |
| pkCurrent    | INTEGER | same as pkUpper  | 99             |           |
| jobId    | STRING | bigquery upload job jobId        | job-adf132f31rf3f             |           |
| errorMessage    | STRING | when jodId check result is 'ERROR', then write this  | ERROR:bigquery quota exceed             |           |
| checkComplete | INTEGER | check command       | 1             |           |
| jobComplete | INTEGER |  check command jobId check complete. success=1, fail=-1 | 1             |           |
| pageToken | STRING |  use as etc                                       |              |           |


## loadmap

* parallel loading not supported.  


%prep
%autosetup -n migbq-0.0.85

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-migbq -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Thu May 18 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.85-1
- Package Spec generated