%global _empty_manifest_terminate_build 0 Name: python-instancelib Version: 0.4.9.1 Release: 1 Summary: A typed dataset abstraction toolkit for machine learning projects License: GNU LGPL v3 URL: https://pypi.org/project/instancelib/ Source0: https://mirrors.nju.edu.cn/pypi/web/packages/6e/3d/7ee9dccc7fa94386539a2f528f8f2916f5061dbf82cc41a18497345578f7/instancelib-0.4.9.1.tar.gz BuildArch: noarch Requires: python3-numpy Requires: python3-pandas Requires: python3-h5py Requires: python3-scikit-learn Requires: python3-openpyxl Requires: python3-xlrd Requires: python3-tqdm Requires: python3-more-itertools Requires: python3-typing-extensions Requires: python3-gensim Requires: python3-tables %description `instancelib` provides a **generic architecture** for datasets. © Michiel Bron, 2021 ## Quick tour **Load dataset**: Load the dataset in an environment ```python import instancelib as il text_env = il.read_excel_dataset("./datasets/testdataset.xlsx", data_cols=["fulltext"], label_cols=["label"]) ds = text_env.dataset # A `dict-like` interface for instances labels = text_env.labels # An object that stores all labels labelset = labels.labelset # All labels that can be given to instances ins = ds[20] # Get instance with identifier key `20` ins_data = ins.data # Get the raw data for instance 20 ins_vector = ins.vector # Get the vector representation for 20 if any ins_labels = labels.get_labels(ins) ``` **Dataset manipulation**: Divide the dataset in a train and test set ```python train, test = text_env.train_test_split(ds, train_size=0.70) print(20 in train) # May be true or false, because of random sampling ``` **Train a model**: ```python from sklearn.pipeline import Pipeline from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()), ]) model = il.SkLearnDataClassifier.build(pipeline, text_env) model.fit_provider(train, labels) predictions = model.predict(test) ``` ## Installation See [installation.md](docs/installation.md) for an extended installation guide. | Method | Instructions | |--------|--------------| | `pip` | Install from [PyPI](https://pypi.org/project/instancelib/) via `pip install instancelib`. | | Local | Clone this repository and install via `pip install -e .` or locally run `python setup.py install`. ## Documentation Full documentation of the latest version is provided at [https://instancelib.readthedocs.org](https://instancelib.readthedocs.org). ## Example usage See [usage.py](usage.py) to see an example of how the package can be used. ## Releases `instancelib` is officially released through [PyPI](https://pypi.org/project/instancelib/). See [CHANGELOG.md](CHANGELOG.md) for a full overview of the changes for each version. ## Citation ```bibtex @misc{instancelib, title = {Python package instancelib}, author = {Michiel Bron}, howpublished = {\url{https://github.com/mpbron/instancelib}}, year = {2021} } ``` ## Library usage This library is used in the following projects: - [python-allib](https://github.com/mpbron/allib). A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems. - [text_explainability](https://marcelrobeer.github.io/text_explainability/). A generic explainability architecture for explaining text machine learning models - [text_sensitivity](https://marcelrobeer.github.io/text_sensitivity/). Sensitivity testing (fairness & robustness) for text machine learning models. ## Maintenance ### Contributors - [Michiel Bron](https://www.uu.nl/staff/MPBron) (`@mpbron`) ### Todo Tasks yet to be done: * Implement support for ONNX models * Implement support for Python DataLoaders * Make the external dataset interface more user friendly * Redesign LabelProvider to support more attribute levels * CI/CD tests %package -n python3-instancelib Summary: A typed dataset abstraction toolkit for machine learning projects Provides: python-instancelib BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-instancelib `instancelib` provides a **generic architecture** for datasets. © Michiel Bron, 2021 ## Quick tour **Load dataset**: Load the dataset in an environment ```python import instancelib as il text_env = il.read_excel_dataset("./datasets/testdataset.xlsx", data_cols=["fulltext"], label_cols=["label"]) ds = text_env.dataset # A `dict-like` interface for instances labels = text_env.labels # An object that stores all labels labelset = labels.labelset # All labels that can be given to instances ins = ds[20] # Get instance with identifier key `20` ins_data = ins.data # Get the raw data for instance 20 ins_vector = ins.vector # Get the vector representation for 20 if any ins_labels = labels.get_labels(ins) ``` **Dataset manipulation**: Divide the dataset in a train and test set ```python train, test = text_env.train_test_split(ds, train_size=0.70) print(20 in train) # May be true or false, because of random sampling ``` **Train a model**: ```python from sklearn.pipeline import Pipeline from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()), ]) model = il.SkLearnDataClassifier.build(pipeline, text_env) model.fit_provider(train, labels) predictions = model.predict(test) ``` ## Installation See [installation.md](docs/installation.md) for an extended installation guide. | Method | Instructions | |--------|--------------| | `pip` | Install from [PyPI](https://pypi.org/project/instancelib/) via `pip install instancelib`. | | Local | Clone this repository and install via `pip install -e .` or locally run `python setup.py install`. ## Documentation Full documentation of the latest version is provided at [https://instancelib.readthedocs.org](https://instancelib.readthedocs.org). ## Example usage See [usage.py](usage.py) to see an example of how the package can be used. ## Releases `instancelib` is officially released through [PyPI](https://pypi.org/project/instancelib/). See [CHANGELOG.md](CHANGELOG.md) for a full overview of the changes for each version. ## Citation ```bibtex @misc{instancelib, title = {Python package instancelib}, author = {Michiel Bron}, howpublished = {\url{https://github.com/mpbron/instancelib}}, year = {2021} } ``` ## Library usage This library is used in the following projects: - [python-allib](https://github.com/mpbron/allib). A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems. - [text_explainability](https://marcelrobeer.github.io/text_explainability/). A generic explainability architecture for explaining text machine learning models - [text_sensitivity](https://marcelrobeer.github.io/text_sensitivity/). Sensitivity testing (fairness & robustness) for text machine learning models. ## Maintenance ### Contributors - [Michiel Bron](https://www.uu.nl/staff/MPBron) (`@mpbron`) ### Todo Tasks yet to be done: * Implement support for ONNX models * Implement support for Python DataLoaders * Make the external dataset interface more user friendly * Redesign LabelProvider to support more attribute levels * CI/CD tests %package help Summary: Development documents and examples for instancelib Provides: python3-instancelib-doc %description help `instancelib` provides a **generic architecture** for datasets. © Michiel Bron, 2021 ## Quick tour **Load dataset**: Load the dataset in an environment ```python import instancelib as il text_env = il.read_excel_dataset("./datasets/testdataset.xlsx", data_cols=["fulltext"], label_cols=["label"]) ds = text_env.dataset # A `dict-like` interface for instances labels = text_env.labels # An object that stores all labels labelset = labels.labelset # All labels that can be given to instances ins = ds[20] # Get instance with identifier key `20` ins_data = ins.data # Get the raw data for instance 20 ins_vector = ins.vector # Get the vector representation for 20 if any ins_labels = labels.get_labels(ins) ``` **Dataset manipulation**: Divide the dataset in a train and test set ```python train, test = text_env.train_test_split(ds, train_size=0.70) print(20 in train) # May be true or false, because of random sampling ``` **Train a model**: ```python from sklearn.pipeline import Pipeline from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()), ]) model = il.SkLearnDataClassifier.build(pipeline, text_env) model.fit_provider(train, labels) predictions = model.predict(test) ``` ## Installation See [installation.md](docs/installation.md) for an extended installation guide. | Method | Instructions | |--------|--------------| | `pip` | Install from [PyPI](https://pypi.org/project/instancelib/) via `pip install instancelib`. | | Local | Clone this repository and install via `pip install -e .` or locally run `python setup.py install`. ## Documentation Full documentation of the latest version is provided at [https://instancelib.readthedocs.org](https://instancelib.readthedocs.org). ## Example usage See [usage.py](usage.py) to see an example of how the package can be used. ## Releases `instancelib` is officially released through [PyPI](https://pypi.org/project/instancelib/). See [CHANGELOG.md](CHANGELOG.md) for a full overview of the changes for each version. ## Citation ```bibtex @misc{instancelib, title = {Python package instancelib}, author = {Michiel Bron}, howpublished = {\url{https://github.com/mpbron/instancelib}}, year = {2021} } ``` ## Library usage This library is used in the following projects: - [python-allib](https://github.com/mpbron/allib). A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems. - [text_explainability](https://marcelrobeer.github.io/text_explainability/). A generic explainability architecture for explaining text machine learning models - [text_sensitivity](https://marcelrobeer.github.io/text_sensitivity/). Sensitivity testing (fairness & robustness) for text machine learning models. ## Maintenance ### Contributors - [Michiel Bron](https://www.uu.nl/staff/MPBron) (`@mpbron`) ### Todo Tasks yet to be done: * Implement support for ONNX models * Implement support for Python DataLoaders * Make the external dataset interface more user friendly * Redesign LabelProvider to support more attribute levels * CI/CD tests %prep %autosetup -n instancelib-0.4.9.1 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-instancelib -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri May 05 2023 Python_Bot - 0.4.9.1-1 - Package Spec generated