%global _empty_manifest_terminate_build 0
Name:		python-gcgc
Version:	1.0.0
Release:	1
Summary:	GCGC is a preprocessing library for biological sequence model development.
License:	MIT
URL:		http://gcgc.trenthauck.com/
Source0:	https://mirrors.aliyun.com/pypi/web/packages/07/80/a45a6f4dfdd9dfcb4a2f6c505478dcdf50eb45fdefabbf9ff10b444e5147/gcgc-1.0.0.tar.gz
BuildArch:	noarch

Requires:	python3-pydantic
Requires:	python3-importlib-metadata
Requires:	python3-pytest
Requires:	python3-black
Requires:	python3-mypy
Requires:	python3-mypy-extensions
Requires:	python3-pycodestyle
Requires:	python3-pydocstyle
Requires:	python3-pytest-cov
Requires:	python3-mkdocs
Requires:	python3-mkdocs-material
Requires:	python3-phmdoctest
Requires:	python3-mkdocstrings
Requires:	python3-commitizen
Requires:	python3-pygments
Requires:	python3-isort
Requires:	python3-pylint
Requires:	python3-twine
Requires:	python3-biopython
Requires:	python3-tokenizers
Requires:	python3-datasets
Requires:	python3-True
Requires:	python3-setuptools-scm

%description
# GCGC

> GCGC is a tool for feature processing on Biological Sequences.

[![](https://github.com/tshauck/gcgc/workflows/Run%20Tests%20and%20Lint/badge.svg)](https://github.com/tshauck/gcgc/actions?query=workflow%3A%22Run+Tests+and+Lint%22)
[![](https://img.shields.io/pypi/v/gcgc.svg)](https://pypi.python.org/pypi/gcgc)
[![code style black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

## Installation

GCGC is primarily intended to be used as part of a larger workflow inside
Python.

To install via pip:

```sh
$ pip install gcgc
```

If you'd like to use code that helps gcgc's tokenizers integrate with common
third party libraries, either install those packages separately, or use gcgc's
extras.

```sh
$ pip install 'gcgc[pytorch,hf]'
```

## Documentation

The GCGC documentation is at [gcgc.trenthauck.com](http://gcgc.trenthauck.com),
please see it for examples.

### Quick Start

The easiest way to get started is to import the kmer tokenizer, configure it,
then start tokenizing.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
encoded = kmer_tokenizer.encode("ATCG")
print(encoded)
```

sample output:

```
[1, 6, 7, 8, 5, 2]
```

This output includes the "bos" token, the "eos" token, and the four nucleotide
tokens in between.

You can go the other way and convert the integers to strings.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
decoded = kmer_tokenizer.decode(kmer_tokenizer.encode("ATCG"))
print(decoded)
```

sample output:

```
['>', 'A', 'T', 'C', 'G', '<']
```

There's also the vocab for the kmer tokenizer.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
print(kmer_tokenizer.vocab.stoi)
```

sample output:

```
{'|': 0, '>': 1, '<': 2, '#': 3, '?': 4, 'G': 5, 'A': 6, 'T': 7, 'C': 8}
```


%package -n python3-gcgc
Summary:	GCGC is a preprocessing library for biological sequence model development.
Provides:	python-gcgc
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-gcgc
# GCGC

> GCGC is a tool for feature processing on Biological Sequences.

[![](https://github.com/tshauck/gcgc/workflows/Run%20Tests%20and%20Lint/badge.svg)](https://github.com/tshauck/gcgc/actions?query=workflow%3A%22Run+Tests+and+Lint%22)
[![](https://img.shields.io/pypi/v/gcgc.svg)](https://pypi.python.org/pypi/gcgc)
[![code style black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

## Installation

GCGC is primarily intended to be used as part of a larger workflow inside
Python.

To install via pip:

```sh
$ pip install gcgc
```

If you'd like to use code that helps gcgc's tokenizers integrate with common
third party libraries, either install those packages separately, or use gcgc's
extras.

```sh
$ pip install 'gcgc[pytorch,hf]'
```

## Documentation

The GCGC documentation is at [gcgc.trenthauck.com](http://gcgc.trenthauck.com),
please see it for examples.

### Quick Start

The easiest way to get started is to import the kmer tokenizer, configure it,
then start tokenizing.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
encoded = kmer_tokenizer.encode("ATCG")
print(encoded)
```

sample output:

```
[1, 6, 7, 8, 5, 2]
```

This output includes the "bos" token, the "eos" token, and the four nucleotide
tokens in between.

You can go the other way and convert the integers to strings.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
decoded = kmer_tokenizer.decode(kmer_tokenizer.encode("ATCG"))
print(decoded)
```

sample output:

```
['>', 'A', 'T', 'C', 'G', '<']
```

There's also the vocab for the kmer tokenizer.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
print(kmer_tokenizer.vocab.stoi)
```

sample output:

```
{'|': 0, '>': 1, '<': 2, '#': 3, '?': 4, 'G': 5, 'A': 6, 'T': 7, 'C': 8}
```


%package help
Summary:	Development documents and examples for gcgc
Provides:	python3-gcgc-doc
%description help
# GCGC

> GCGC is a tool for feature processing on Biological Sequences.

[![](https://github.com/tshauck/gcgc/workflows/Run%20Tests%20and%20Lint/badge.svg)](https://github.com/tshauck/gcgc/actions?query=workflow%3A%22Run+Tests+and+Lint%22)
[![](https://img.shields.io/pypi/v/gcgc.svg)](https://pypi.python.org/pypi/gcgc)
[![code style black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

## Installation

GCGC is primarily intended to be used as part of a larger workflow inside
Python.

To install via pip:

```sh
$ pip install gcgc
```

If you'd like to use code that helps gcgc's tokenizers integrate with common
third party libraries, either install those packages separately, or use gcgc's
extras.

```sh
$ pip install 'gcgc[pytorch,hf]'
```

## Documentation

The GCGC documentation is at [gcgc.trenthauck.com](http://gcgc.trenthauck.com),
please see it for examples.

### Quick Start

The easiest way to get started is to import the kmer tokenizer, configure it,
then start tokenizing.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
encoded = kmer_tokenizer.encode("ATCG")
print(encoded)
```

sample output:

```
[1, 6, 7, 8, 5, 2]
```

This output includes the "bos" token, the "eos" token, and the four nucleotide
tokens in between.

You can go the other way and convert the integers to strings.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
decoded = kmer_tokenizer.decode(kmer_tokenizer.encode("ATCG"))
print(decoded)
```

sample output:

```
['>', 'A', 'T', 'C', 'G', '<']
```

There's also the vocab for the kmer tokenizer.

```python
from gcgc import KmerTokenizer

kmer_tokenizer = KmerTokenizer(alphabet="unambiguous_dna")
print(kmer_tokenizer.vocab.stoi)
```

sample output:

```
{'|': 0, '>': 1, '<': 2, '#': 3, '?': 4, 'G': 5, 'A': 6, 'T': 7, 'C': 8}
```


%prep
%autosetup -n gcgc-1.0.0

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-gcgc -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri Jun 09 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.0-1
- Package Spec generated