1 files changed, 787 insertions, 0 deletions
diff --git a/python-pytolemaic.spec b/python-pytolemaic.spec
new file mode 100644
index 0000000..d33744e
--- /dev/null
+++ b/python-pytolemaic.spec
@@ -0,0 +1,787 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-pytolemaic
+Version:	0.15.4
+Release:	1
+Summary:	Package for ML model analysis
+License:	Free To Use But Restricted
+URL:		https://github.com/broundal/Pytolemaic
+Source0:	https://mirrors.aliyun.com/pypi/web/packages/c1/7c/697a8443642d286a28b81d4b0790a22f249faac224927ae533c92caf9ae3/pytolemaic-0.15.4.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-numpy
+Requires:	python3-pandas
+Requires:	python3-scipy
+Requires:	python3-scikit-learn
+Requires:	python3-lime
+Requires:	python3-matplotlib
+Requires:	python3-xgboost
+
+%description
+![PyPI - Version](https://img.shields.io/pypi/v/pytolemaic?color=brightgreen)
+![Unittests](https://github.com/Broundal/Pytolemaic/workflows/Unittests/badge.svg?branch=master)
+![PyPI - License](https://img.shields.io/pypi/l/pytolemaic?color=orange)
+
+
+
+# Pytolemaic
+
+## What is Pytolemaic 
+Pytolemaic package analyzes your model and dataset and measure their quality. 
+
+The package supports classification/regression models built for tabular datasets (e.g. sklearn's regressors/classifiers),
+ but will also support custom made models as long as they implement sklearn's API. 
+
+The package is aimed for personal use and comes with no guarantees. 
+I hope you will find it useful. I will appreciate any feedback you have.
+
+## Install
+```
+pip install pytolemaic
+```
+
+## Basic usage
+```
+from pytolemaic import PyTrust
+
+pytrust = PyTrust(model=estimator,
+                  xtrain=xtrain, ytrain=ytrain,
+                  xtest=xtest, ytest=ytest)
+   
+# run all analysis and print insights:,
+insights = pytrust.insights()
+print("\n".join(insights))
+
+# run analysis and plot graphs
+pytrust.plot()
+```
+
+## supported features
+The package contains the following functionalities:
+
+#### On model creation
+- **Dataset Analysis**: Analysis aimed to detect issues in the dataset.
+- **Sensitivity Analysis**: Calculation of feature importance for given model, either via sensitivity to feature value or sensitivity to missing values. 
+- **Vulnerability report**: Based on the feature sensitivity we measure model's vulnerability in respect to imputation, leakage, and # of features.
+- **Scoring report**: Report model's score on test data with confidence interval.
+- **separation quality**: Measure whether train and test data comes from the same distribution.
+- **Overall quality**: Provides overall quality measures
+
+#### On prediction
+- **Prediction uncertainty**: Provides an uncertainty measure for given model's prediction.
+- **Lime explanation**: Provides Lime explanation for sample of interest.
+
+
+
+## How to use: 
+
+Get started by calling help() function (*Recommended!*):
+```
+   from pytolemaic import help
+   supported_keys = help()
+   # or
+   help(key='basic usage')
+```
+
+Example for performing all available analysis with PyTrust:
+```
+   from pytolemaic import PyTrust
+
+   pytrust = PyTrust(
+       model=estimator,
+       xtrain=xtrain, ytrain=ytrain,
+       xtest=xtest, ytest=ytest)
+       
+   # run all analysis and get a list of distilled insights",
+   insights = pytrust.insights()
+   print("\n".join(insights))
+    
+   # run all analysis and plot all graphs
+   pytrust.plot()
+   
+   # print all data gathered
+   import pprint
+   pprint(report.to_dict(printable=True))
+```
+
+In case of need to access only specific analysis (usually to save time)
+```
+   # dataset analysis report
+   dataset_analysis_report = pytrust.dataset_analysis_report
+   
+   # feature sensitivity report
+   sensitivity_report = pytrust.sensitivity_report
+   
+   # model's performance report
+   scoring_report = pytrust.scoring_report
+   
+   # overall model's quality report
+   quality_report = pytrust.quality_report
+   
+   # with any of the above reports
+   report = <desired report>
+   print("\n".join(report.insights()))
+   
+   report.plot() # plot graphs
+   pprint(report.to_dict(printable=True)) # export report as a dictionary
+   pprint(report.to_dict_meaning()) # print documentation for above dictionary
+          
+```
+
+Analysis of predictions
+```
+ 
+   # estimate uncertainty of a prediction
+   uncertainty_model = pytrust.create_uncertainty_model()
+   
+   # explain a prediction with Lime
+   create_lime_explainer = pytrust.create_lime_explainer()
+   
+```
+
+Examples on toy dataset can be found in [/examples/toy_examples/](./examples/toy_examples/)
+Examples on 'real-life' datasets can be found in [/examples/interesting_examples/](./examples/interesting_examples/) 
+
+## Output examples:
+
+#### Sensitivity Analysis:
+
+ - The sensitivity of each feature (\[0,1\], normalized to sum of 1):
+ 
+```
+ 'sensitivity_report': {
+    'method': 'shuffled',
+    'sensitivities': {
+        'age': 0.12395,
+        'capital-gain': 0.06725,
+        'capital-loss': 0.02465,
+        'education': 0.05769,
+        'education-num': 0.13765,
+        ...
+      }
+  }
+```
+                                                        
+ - Simple statistics on the feature sensitivity:
+ ```
+ 'shuffle_stats_report': {
+      'n_features': 14,
+      'n_low': 1,
+      'n_zero': 0
+ }
+ ```
+ 
+ - Naive vulnerability scores (\[0,1\], lower is better):
+
+   - **Imputation**: sensitivity of the model to missing values.
+   - **Leakge**: chance of the model to have leaking features.
+   - **Too many features**: Whether the model is based on too many features.
+ 
+ ```
+ 'vulnerability_report': {
+      'imputation': 0.35,
+      'leakage': 0,
+      'too_many_features': 0.14
+ }  
+ ```
+
+#### scoring report
+
+For given metric, the score and confidence intervals (CI) is calculated
+ ```
+'recall': {
+     'ci_high': 0.763,
+     'ci_low': 0.758,
+     'ci_ratio': 0.023,
+     'metric': 'recall',
+     'value': 0.760,
+},
+'auc': {
+     'ci_high': 0.909,
+     'ci_low': 0.907,
+     'ci_ratio': 0.022,
+     'metric': 'auc',
+     'value': 0.907
+}    
+ ```
+ 
+ Additionally, score quality measures the quality of the score based on the separability (auc score) between train and test sets.
+ 
+ Value of 1 means test set has same distribution as train set. Value of 0 means test set has fundamentally different distribution. 
+ ```
+ 'separation_quality': 0.00611         
+ ```
+  
+Combining the above measures into a single number we provide the overall quality of the model/dataset.
+
+Higher quality value (\[0,1\]) means better dataset/model.
+ ```
+quality_report : { 
+'model_quality_report': {
+    'model_loss': 0.24,
+    'model_quality': 0.41,
+    'vulnerability_report': {...}},
+    
+'test_quality_report': {
+    'ci_ratio': 0.023, 
+    'separation_quality': 0.006, 
+    'test_set_quality': 0},
+    
+'train_quality_report': {
+    'train_set_quality': 0.85,
+    'vulnerability_report': {...}}
+   
+ ```
+
+ 
+#### prediction uncertainty
+
+The module can be used to yield uncertainty measure for predictions. 
+```
+    uncertainty_model = pytrust.create_uncertainty_model(method='confidence')
+    predictions = uncertainty_model.predict(x_pred) # same as model.predict(x_pred)
+    uncertainty = uncertainty_model.uncertainty(x_pred)
+```
+
+
+#### Lime explanation
+
+The module can be used to produce lime explanations for sample of interest. 
+```
+    explainer = pytrust.create_lime_explainer()
+    explainer.explain(sample) # returns a dictionary
+    explainer.plot(sample) # produce a graphical explanation    
+```
+
+
+
+
+%package -n python3-pytolemaic
+Summary:	Package for ML model analysis
+Provides:	python-pytolemaic
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-pytolemaic
+![PyPI - Version](https://img.shields.io/pypi/v/pytolemaic?color=brightgreen)
+![Unittests](https://github.com/Broundal/Pytolemaic/workflows/Unittests/badge.svg?branch=master)
+![PyPI - License](https://img.shields.io/pypi/l/pytolemaic?color=orange)
+
+
+
+# Pytolemaic
+
+## What is Pytolemaic 
+Pytolemaic package analyzes your model and dataset and measure their quality. 
+
+The package supports classification/regression models built for tabular datasets (e.g. sklearn's regressors/classifiers),
+ but will also support custom made models as long as they implement sklearn's API. 
+
+The package is aimed for personal use and comes with no guarantees. 
+I hope you will find it useful. I will appreciate any feedback you have.
+
+## Install
+```
+pip install pytolemaic
+```
+
+## Basic usage
+```
+from pytolemaic import PyTrust
+
+pytrust = PyTrust(model=estimator,
+                  xtrain=xtrain, ytrain=ytrain,
+                  xtest=xtest, ytest=ytest)
+   
+# run all analysis and print insights:,
+insights = pytrust.insights()
+print("\n".join(insights))
+
+# run analysis and plot graphs
+pytrust.plot()
+```
+
+## supported features
+The package contains the following functionalities:
+
+#### On model creation
+- **Dataset Analysis**: Analysis aimed to detect issues in the dataset.
+- **Sensitivity Analysis**: Calculation of feature importance for given model, either via sensitivity to feature value or sensitivity to missing values. 
+- **Vulnerability report**: Based on the feature sensitivity we measure model's vulnerability in respect to imputation, leakage, and # of features.
+- **Scoring report**: Report model's score on test data with confidence interval.
+- **separation quality**: Measure whether train and test data comes from the same distribution.
+- **Overall quality**: Provides overall quality measures
+
+#### On prediction
+- **Prediction uncertainty**: Provides an uncertainty measure for given model's prediction.
+- **Lime explanation**: Provides Lime explanation for sample of interest.
+
+
+
+## How to use: 
+
+Get started by calling help() function (*Recommended!*):
+```
+   from pytolemaic import help
+   supported_keys = help()
+   # or
+   help(key='basic usage')
+```
+
+Example for performing all available analysis with PyTrust:
+```
+   from pytolemaic import PyTrust
+
+   pytrust = PyTrust(
+       model=estimator,
+       xtrain=xtrain, ytrain=ytrain,
+       xtest=xtest, ytest=ytest)
+       
+   # run all analysis and get a list of distilled insights",
+   insights = pytrust.insights()
+   print("\n".join(insights))
+    
+   # run all analysis and plot all graphs
+   pytrust.plot()
+   
+   # print all data gathered
+   import pprint
+   pprint(report.to_dict(printable=True))
+```
+
+In case of need to access only specific analysis (usually to save time)
+```
+   # dataset analysis report
+   dataset_analysis_report = pytrust.dataset_analysis_report
+   
+   # feature sensitivity report
+   sensitivity_report = pytrust.sensitivity_report
+   
+   # model's performance report
+   scoring_report = pytrust.scoring_report
+   
+   # overall model's quality report
+   quality_report = pytrust.quality_report
+   
+   # with any of the above reports
+   report = <desired report>
+   print("\n".join(report.insights()))
+   
+   report.plot() # plot graphs
+   pprint(report.to_dict(printable=True)) # export report as a dictionary
+   pprint(report.to_dict_meaning()) # print documentation for above dictionary
+          
+```
+
+Analysis of predictions
+```
+ 
+   # estimate uncertainty of a prediction
+   uncertainty_model = pytrust.create_uncertainty_model()
+   
+   # explain a prediction with Lime
+   create_lime_explainer = pytrust.create_lime_explainer()
+   
+```
+
+Examples on toy dataset can be found in [/examples/toy_examples/](./examples/toy_examples/)
+Examples on 'real-life' datasets can be found in [/examples/interesting_examples/](./examples/interesting_examples/) 
+
+## Output examples:
+
+#### Sensitivity Analysis:
+
+ - The sensitivity of each feature (\[0,1\], normalized to sum of 1):
+ 
+```
+ 'sensitivity_report': {
+    'method': 'shuffled',
+    'sensitivities': {
+        'age': 0.12395,
+        'capital-gain': 0.06725,
+        'capital-loss': 0.02465,
+        'education': 0.05769,
+        'education-num': 0.13765,
+        ...
+      }
+  }
+```
+                                                        
+ - Simple statistics on the feature sensitivity:
+ ```
+ 'shuffle_stats_report': {
+      'n_features': 14,
+      'n_low': 1,
+      'n_zero': 0
+ }
+ ```
+ 
+ - Naive vulnerability scores (\[0,1\], lower is better):
+
+   - **Imputation**: sensitivity of the model to missing values.
+   - **Leakge**: chance of the model to have leaking features.
+   - **Too many features**: Whether the model is based on too many features.
+ 
+ ```
+ 'vulnerability_report': {
+      'imputation': 0.35,
+      'leakage': 0,
+      'too_many_features': 0.14
+ }  
+ ```
+
+#### scoring report
+
+For given metric, the score and confidence intervals (CI) is calculated
+ ```
+'recall': {
+     'ci_high': 0.763,
+     'ci_low': 0.758,
+     'ci_ratio': 0.023,
+     'metric': 'recall',
+     'value': 0.760,
+},
+'auc': {
+     'ci_high': 0.909,
+     'ci_low': 0.907,
+     'ci_ratio': 0.022,
+     'metric': 'auc',
+     'value': 0.907
+}    
+ ```
+ 
+ Additionally, score quality measures the quality of the score based on the separability (auc score) between train and test sets.
+ 
+ Value of 1 means test set has same distribution as train set. Value of 0 means test set has fundamentally different distribution. 
+ ```
+ 'separation_quality': 0.00611         
+ ```
+  
+Combining the above measures into a single number we provide the overall quality of the model/dataset.
+
+Higher quality value (\[0,1\]) means better dataset/model.
+ ```
+quality_report : { 
+'model_quality_report': {
+    'model_loss': 0.24,
+    'model_quality': 0.41,
+    'vulnerability_report': {...}},
+    
+'test_quality_report': {
+    'ci_ratio': 0.023, 
+    'separation_quality': 0.006, 
+    'test_set_quality': 0},
+    
+'train_quality_report': {
+    'train_set_quality': 0.85,
+    'vulnerability_report': {...}}
+   
+ ```
+
+ 
+#### prediction uncertainty
+
+The module can be used to yield uncertainty measure for predictions. 
+```
+    uncertainty_model = pytrust.create_uncertainty_model(method='confidence')
+    predictions = uncertainty_model.predict(x_pred) # same as model.predict(x_pred)
+    uncertainty = uncertainty_model.uncertainty(x_pred)
+```
+
+
+#### Lime explanation
+
+The module can be used to produce lime explanations for sample of interest. 
+```
+    explainer = pytrust.create_lime_explainer()
+    explainer.explain(sample) # returns a dictionary
+    explainer.plot(sample) # produce a graphical explanation    
+```
+
+
+
+
+%package help
+Summary:	Development documents and examples for pytolemaic
+Provides:	python3-pytolemaic-doc
+%description help
+![PyPI - Version](https://img.shields.io/pypi/v/pytolemaic?color=brightgreen)
+![Unittests](https://github.com/Broundal/Pytolemaic/workflows/Unittests/badge.svg?branch=master)
+![PyPI - License](https://img.shields.io/pypi/l/pytolemaic?color=orange)
+
+
+
+# Pytolemaic
+
+## What is Pytolemaic 
+Pytolemaic package analyzes your model and dataset and measure their quality. 
+
+The package supports classification/regression models built for tabular datasets (e.g. sklearn's regressors/classifiers),
+ but will also support custom made models as long as they implement sklearn's API. 
+
+The package is aimed for personal use and comes with no guarantees. 
+I hope you will find it useful. I will appreciate any feedback you have.
+
+## Install
+```
+pip install pytolemaic
+```
+
+## Basic usage
+```
+from pytolemaic import PyTrust
+
+pytrust = PyTrust(model=estimator,
+                  xtrain=xtrain, ytrain=ytrain,
+                  xtest=xtest, ytest=ytest)
+   
+# run all analysis and print insights:,
+insights = pytrust.insights()
+print("\n".join(insights))
+
+# run analysis and plot graphs
+pytrust.plot()
+```
+
+## supported features
+The package contains the following functionalities:
+
+#### On model creation
+- **Dataset Analysis**: Analysis aimed to detect issues in the dataset.
+- **Sensitivity Analysis**: Calculation of feature importance for given model, either via sensitivity to feature value or sensitivity to missing values. 
+- **Vulnerability report**: Based on the feature sensitivity we measure model's vulnerability in respect to imputation, leakage, and # of features.
+- **Scoring report**: Report model's score on test data with confidence interval.
+- **separation quality**: Measure whether train and test data comes from the same distribution.
+- **Overall quality**: Provides overall quality measures
+
+#### On prediction
+- **Prediction uncertainty**: Provides an uncertainty measure for given model's prediction.
+- **Lime explanation**: Provides Lime explanation for sample of interest.
+
+
+
+## How to use: 
+
+Get started by calling help() function (*Recommended!*):
+```
+   from pytolemaic import help
+   supported_keys = help()
+   # or
+   help(key='basic usage')
+```
+
+Example for performing all available analysis with PyTrust:
+```
+   from pytolemaic import PyTrust
+
+   pytrust = PyTrust(
+       model=estimator,
+       xtrain=xtrain, ytrain=ytrain,
+       xtest=xtest, ytest=ytest)
+       
+   # run all analysis and get a list of distilled insights",
+   insights = pytrust.insights()
+   print("\n".join(insights))
+    
+   # run all analysis and plot all graphs
+   pytrust.plot()
+   
+   # print all data gathered
+   import pprint
+   pprint(report.to_dict(printable=True))
+```
+
+In case of need to access only specific analysis (usually to save time)
+```
+   # dataset analysis report
+   dataset_analysis_report = pytrust.dataset_analysis_report
+   
+   # feature sensitivity report
+   sensitivity_report = pytrust.sensitivity_report
+   
+   # model's performance report
+   scoring_report = pytrust.scoring_report
+   
+   # overall model's quality report
+   quality_report = pytrust.quality_report
+   
+   # with any of the above reports
+   report = <desired report>
+   print("\n".join(report.insights()))
+   
+   report.plot() # plot graphs
+   pprint(report.to_dict(printable=True)) # export report as a dictionary
+   pprint(report.to_dict_meaning()) # print documentation for above dictionary
+          
+```
+
+Analysis of predictions
+```
+ 
+   # estimate uncertainty of a prediction
+   uncertainty_model = pytrust.create_uncertainty_model()
+   
+   # explain a prediction with Lime
+   create_lime_explainer = pytrust.create_lime_explainer()
+   
+```
+
+Examples on toy dataset can be found in [/examples/toy_examples/](./examples/toy_examples/)
+Examples on 'real-life' datasets can be found in [/examples/interesting_examples/](./examples/interesting_examples/) 
+
+## Output examples:
+
+#### Sensitivity Analysis:
+
+ - The sensitivity of each feature (\[0,1\], normalized to sum of 1):
+ 
+```
+ 'sensitivity_report': {
+    'method': 'shuffled',
+    'sensitivities': {
+        'age': 0.12395,
+        'capital-gain': 0.06725,
+        'capital-loss': 0.02465,
+        'education': 0.05769,
+        'education-num': 0.13765,
+        ...
+      }
+  }
+```
+                                                        
+ - Simple statistics on the feature sensitivity:
+ ```
+ 'shuffle_stats_report': {
+      'n_features': 14,
+      'n_low': 1,
+      'n_zero': 0
+ }
+ ```
+ 
+ - Naive vulnerability scores (\[0,1\], lower is better):
+
+   - **Imputation**: sensitivity of the model to missing values.
+   - **Leakge**: chance of the model to have leaking features.
+   - **Too many features**: Whether the model is based on too many features.
+ 
+ ```
+ 'vulnerability_report': {
+      'imputation': 0.35,
+      'leakage': 0,
+      'too_many_features': 0.14
+ }  
+ ```
+
+#### scoring report
+
+For given metric, the score and confidence intervals (CI) is calculated
+ ```
+'recall': {
+     'ci_high': 0.763,
+     'ci_low': 0.758,
+     'ci_ratio': 0.023,
+     'metric': 'recall',
+     'value': 0.760,
+},
+'auc': {
+     'ci_high': 0.909,
+     'ci_low': 0.907,
+     'ci_ratio': 0.022,
+     'metric': 'auc',
+     'value': 0.907
+}    
+ ```
+ 
+ Additionally, score quality measures the quality of the score based on the separability (auc score) between train and test sets.
+ 
+ Value of 1 means test set has same distribution as train set. Value of 0 means test set has fundamentally different distribution. 
+ ```
+ 'separation_quality': 0.00611         
+ ```
+  
+Combining the above measures into a single number we provide the overall quality of the model/dataset.
+
+Higher quality value (\[0,1\]) means better dataset/model.
+ ```
+quality_report : { 
+'model_quality_report': {
+    'model_loss': 0.24,
+    'model_quality': 0.41,
+    'vulnerability_report': {...}},
+    
+'test_quality_report': {
+    'ci_ratio': 0.023, 
+    'separation_quality': 0.006, 
+    'test_set_quality': 0},
+    
+'train_quality_report': {
+    'train_set_quality': 0.85,
+    'vulnerability_report': {...}}
+   
+ ```
+
+ 
+#### prediction uncertainty
+
+The module can be used to yield uncertainty measure for predictions. 
+```
+    uncertainty_model = pytrust.create_uncertainty_model(method='confidence')
+    predictions = uncertainty_model.predict(x_pred) # same as model.predict(x_pred)
+    uncertainty = uncertainty_model.uncertainty(x_pred)
+```
+
+
+#### Lime explanation
+
+The module can be used to produce lime explanations for sample of interest. 
+```
+    explainer = pytrust.create_lime_explainer()
+    explainer.explain(sample) # returns a dictionary
+    explainer.plot(sample) # produce a graphical explanation    
+```
+
+
+
+
+%prep
+%autosetup -n pytolemaic-0.15.4
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pytolemaic -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri Jun 09 2023 Python_Bot <Python_Bot@openeuler.org> - 0.15.4-1
+- Package Spec generated