%global _empty_manifest_terminate_build 0
Name:		python-instancelib
Version:	0.4.9.1
Release:	1
Summary:	A typed dataset abstraction toolkit for machine learning projects
License:	GNU LGPL v3
URL:		https://pypi.org/project/instancelib/
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/6e/3d/7ee9dccc7fa94386539a2f528f8f2916f5061dbf82cc41a18497345578f7/instancelib-0.4.9.1.tar.gz
BuildArch:	noarch

Requires:	python3-numpy
Requires:	python3-pandas
Requires:	python3-h5py
Requires:	python3-scikit-learn
Requires:	python3-openpyxl
Requires:	python3-xlrd
Requires:	python3-tqdm
Requires:	python3-more-itertools
Requires:	python3-typing-extensions
Requires:	python3-gensim
Requires:	python3-tables

%description
`instancelib` provides a **generic architecture** for datasets. 
&copy; Michiel Bron, 2021
## Quick tour
**Load dataset**: Load the dataset in an environment
```python
import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])
ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances
ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any
ins_labels = labels.get_labels(ins)
``` 
**Dataset manipulation**: Divide the dataset in a train and test set
```python
train, test = text_env.train_test_split(ds, train_size=0.70)
print(20 in train) # May be true or false, because of random sampling
```
**Train a model**:
```python
from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer
pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])
model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)
```
## Installation
See [installation.md](docs/installation.md) for an extended installation guide.
| Method | Instructions |
|--------|--------------|
| `pip` | Install from [PyPI](https://pypi.org/project/instancelib/) via `pip install instancelib`. |
| Local | Clone this repository and install via `pip install -e .` or locally run `python setup.py install`.
## Documentation
Full documentation of the latest version is provided at [https://instancelib.readthedocs.org](https://instancelib.readthedocs.org).
## Example usage
See [usage.py](usage.py) to see an example of how the package can be used.
## Releases
`instancelib` is officially released through [PyPI](https://pypi.org/project/instancelib/).
See [CHANGELOG.md](CHANGELOG.md) for a full overview of the changes for each version.
## Citation
```bibtex
@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}
```
## Library usage
This library is used in the following projects:
- [python-allib](https://github.com/mpbron/allib). A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
- [text_explainability](https://marcelrobeer.github.io/text_explainability/). A generic explainability architecture for explaining text machine learning models
- [text_sensitivity](https://marcelrobeer.github.io/text_sensitivity/). Sensitivity testing (fairness & robustness) for text machine learning models.
## Maintenance
### Contributors
- [Michiel Bron](https://www.uu.nl/staff/MPBron) (`@mpbron`)
### Todo
Tasks yet to be done:
* Implement support for ONNX models
* Implement support for Python DataLoaders
* Make the external dataset interface more user friendly
* Redesign LabelProvider to support more attribute levels
* CI/CD tests

%package -n python3-instancelib
Summary:	A typed dataset abstraction toolkit for machine learning projects
Provides:	python-instancelib
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-instancelib
`instancelib` provides a **generic architecture** for datasets. 
&copy; Michiel Bron, 2021
## Quick tour
**Load dataset**: Load the dataset in an environment
```python
import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])
ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances
ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any
ins_labels = labels.get_labels(ins)
``` 
**Dataset manipulation**: Divide the dataset in a train and test set
```python
train, test = text_env.train_test_split(ds, train_size=0.70)
print(20 in train) # May be true or false, because of random sampling
```
**Train a model**:
```python
from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer
pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])
model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)
```
## Installation
See [installation.md](docs/installation.md) for an extended installation guide.
| Method | Instructions |
|--------|--------------|
| `pip` | Install from [PyPI](https://pypi.org/project/instancelib/) via `pip install instancelib`. |
| Local | Clone this repository and install via `pip install -e .` or locally run `python setup.py install`.
## Documentation
Full documentation of the latest version is provided at [https://instancelib.readthedocs.org](https://instancelib.readthedocs.org).
## Example usage
See [usage.py](usage.py) to see an example of how the package can be used.
## Releases
`instancelib` is officially released through [PyPI](https://pypi.org/project/instancelib/).
See [CHANGELOG.md](CHANGELOG.md) for a full overview of the changes for each version.
## Citation
```bibtex
@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}
```
## Library usage
This library is used in the following projects:
- [python-allib](https://github.com/mpbron/allib). A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
- [text_explainability](https://marcelrobeer.github.io/text_explainability/). A generic explainability architecture for explaining text machine learning models
- [text_sensitivity](https://marcelrobeer.github.io/text_sensitivity/). Sensitivity testing (fairness & robustness) for text machine learning models.
## Maintenance
### Contributors
- [Michiel Bron](https://www.uu.nl/staff/MPBron) (`@mpbron`)
### Todo
Tasks yet to be done:
* Implement support for ONNX models
* Implement support for Python DataLoaders
* Make the external dataset interface more user friendly
* Redesign LabelProvider to support more attribute levels
* CI/CD tests

%package help
Summary:	Development documents and examples for instancelib
Provides:	python3-instancelib-doc
%description help
`instancelib` provides a **generic architecture** for datasets. 
&copy; Michiel Bron, 2021
## Quick tour
**Load dataset**: Load the dataset in an environment
```python
import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])
ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances
ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any
ins_labels = labels.get_labels(ins)
``` 
**Dataset manipulation**: Divide the dataset in a train and test set
```python
train, test = text_env.train_test_split(ds, train_size=0.70)
print(20 in train) # May be true or false, because of random sampling
```
**Train a model**:
```python
from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer
pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])
model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)
```
## Installation
See [installation.md](docs/installation.md) for an extended installation guide.
| Method | Instructions |
|--------|--------------|
| `pip` | Install from [PyPI](https://pypi.org/project/instancelib/) via `pip install instancelib`. |
| Local | Clone this repository and install via `pip install -e .` or locally run `python setup.py install`.
## Documentation
Full documentation of the latest version is provided at [https://instancelib.readthedocs.org](https://instancelib.readthedocs.org).
## Example usage
See [usage.py](usage.py) to see an example of how the package can be used.
## Releases
`instancelib` is officially released through [PyPI](https://pypi.org/project/instancelib/).
See [CHANGELOG.md](CHANGELOG.md) for a full overview of the changes for each version.
## Citation
```bibtex
@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}
```
## Library usage
This library is used in the following projects:
- [python-allib](https://github.com/mpbron/allib). A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
- [text_explainability](https://marcelrobeer.github.io/text_explainability/). A generic explainability architecture for explaining text machine learning models
- [text_sensitivity](https://marcelrobeer.github.io/text_sensitivity/). Sensitivity testing (fairness & robustness) for text machine learning models.
## Maintenance
### Contributors
- [Michiel Bron](https://www.uu.nl/staff/MPBron) (`@mpbron`)
### Todo
Tasks yet to be done:
* Implement support for ONNX models
* Implement support for Python DataLoaders
* Make the external dataset interface more user friendly
* Redesign LabelProvider to support more attribute levels
* CI/CD tests

%prep
%autosetup -n instancelib-0.4.9.1

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-instancelib -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.9.1-1
- Package Spec generated