%global _empty_manifest_terminate_build 0
Name:		python-pytolemaic
Version:	0.15.4
Release:	1
Summary:	Package for ML model analysis
License:	Free To Use But Restricted
URL:		https://github.com/broundal/Pytolemaic
Source0:	https://mirrors.aliyun.com/pypi/web/packages/c1/7c/697a8443642d286a28b81d4b0790a22f249faac224927ae533c92caf9ae3/pytolemaic-0.15.4.tar.gz
BuildArch:	noarch

Requires:	python3-numpy
Requires:	python3-pandas
Requires:	python3-scipy
Requires:	python3-scikit-learn
Requires:	python3-lime
Requires:	python3-matplotlib
Requires:	python3-xgboost

%description
![PyPI - Version](https://img.shields.io/pypi/v/pytolemaic?color=brightgreen)
![Unittests](https://github.com/Broundal/Pytolemaic/workflows/Unittests/badge.svg?branch=master)
![PyPI - License](https://img.shields.io/pypi/l/pytolemaic?color=orange)


# Pytolemaic

## What is Pytolemaic 
Pytolemaic package analyzes your model and dataset and measure their quality. 

The package supports classification/regression models built for tabular datasets (e.g. sklearn's regressors/classifiers),
 but will also support custom made models as long as they implement sklearn's API. 

The package is aimed for personal use and comes with no guarantees. 
I hope you will find it useful. I will appreciate any feedback you have.

## Install
```
pip install pytolemaic
```

## Basic usage
```
from pytolemaic import PyTrust

pytrust = PyTrust(model=estimator,
                  xtrain=xtrain, ytrain=ytrain,
                  xtest=xtest, ytest=ytest)
   
# run all analysis and print insights:,
insights = pytrust.insights()
print("\n".join(insights))

# run analysis and plot graphs
pytrust.plot()
```

## supported features
The package contains the following functionalities:

#### On model creation
- **Dataset Analysis**: Analysis aimed to detect issues in the dataset.
- **Sensitivity Analysis**: Calculation of feature importance for given model, either via sensitivity to feature value or sensitivity to missing values. 
- **Vulnerability report**: Based on the feature sensitivity we measure model's vulnerability in respect to imputation, leakage, and # of features.
- **Scoring report**: Report model's score on test data with confidence interval.
- **separation quality**: Measure whether train and test data comes from the same distribution.
- **Overall quality**: Provides overall quality measures

#### On prediction
- **Prediction uncertainty**: Provides an uncertainty measure for given model's prediction.
- **Lime explanation**: Provides Lime explanation for sample of interest.


## How to use: 

Get started by calling help() function (*Recommended!*):
```
   from pytolemaic import help
   supported_keys = help()
   # or
   help(key='basic usage')
```

Example for performing all available analysis with PyTrust:
```
   from pytolemaic import PyTrust

   pytrust = PyTrust(
       model=estimator,
       xtrain=xtrain, ytrain=ytrain,
       xtest=xtest, ytest=ytest)
       
   # run all analysis and get a list of distilled insights",
   insights = pytrust.insights()
   print("\n".join(insights))
    
   # run all analysis and plot all graphs
   pytrust.plot()
   
   # print all data gathered
   import pprint
   pprint(report.to_dict(printable=True))
```

In case of need to access only specific analysis (usually to save time)
```
   # dataset analysis report
   dataset_analysis_report = pytrust.dataset_analysis_report
   
   # feature sensitivity report
   sensitivity_report = pytrust.sensitivity_report
   
   # model's performance report
   scoring_report = pytrust.scoring_report
   
   # overall model's quality report
   quality_report = pytrust.quality_report
   
   # with any of the above reports
   report = <desired report>
   print("\n".join(report.insights()))
   
   report.plot() # plot graphs
   pprint(report.to_dict(printable=True)) # export report as a dictionary
   pprint(report.to_dict_meaning()) # print documentation for above dictionary
          
```

Analysis of predictions
```
 
   # estimate uncertainty of a prediction
   uncertainty_model = pytrust.create_uncertainty_model()
   
   # explain a prediction with Lime
   create_lime_explainer = pytrust.create_lime_explainer()
   
```

Examples on toy dataset can be found in [/examples/toy_examples/](./examples/toy_examples/)
Examples on 'real-life' datasets can be found in [/examples/interesting_examples/](./examples/interesting_examples/) 

## Output examples:

#### Sensitivity Analysis:

 - The sensitivity of each feature (\[0,1\], normalized to sum of 1):
 
```
 'sensitivity_report': {
    'method': 'shuffled',
    'sensitivities': {
        'age': 0.12395,
        'capital-gain': 0.06725,
        'capital-loss': 0.02465,
        'education': 0.05769,
        'education-num': 0.13765,
        ...
      }
  }
```
                                                        
 - Simple statistics on the feature sensitivity:
 ```
 'shuffle_stats_report': {
      'n_features': 14,
      'n_low': 1,
      'n_zero': 0
 }
 ```
 
 - Naive vulnerability scores (\[0,1\], lower is better):

   - **Imputation**: sensitivity of the model to missing values.
   - **Leakge**: chance of the model to have leaking features.
   - **Too many features**: Whether the model is based on too many features.
 
 ```
 'vulnerability_report': {
      'imputation': 0.35,
      'leakage': 0,
      'too_many_features': 0.14
 }  
 ```

#### scoring report

For given metric, the score and confidence intervals (CI) is calculated
 ```
'recall': {
     'ci_high': 0.763,
     'ci_low': 0.758,
     'ci_ratio': 0.023,
     'metric': 'recall',
     'value': 0.760,
},
'auc': {
     'ci_high': 0.909,
     'ci_low': 0.907,
     'ci_ratio': 0.022,
     'metric': 'auc',
     'value': 0.907
}    
 ```
 
 Additionally, score quality measures the quality of the score based on the separability (auc score) between train and test sets.
 
 Value of 1 means test set has same distribution as train set. Value of 0 means test set has fundamentally different distribution. 
 ```
 'separation_quality': 0.00611         
 ```
  
Combining the above measures into a single number we provide the overall quality of the model/dataset.

Higher quality value (\[0,1\]) means better dataset/model.
 ```
quality_report : { 
'model_quality_report': {
    'model_loss': 0.24,
    'model_quality': 0.41,
    'vulnerability_report': {...}},
    
'test_quality_report': {
    'ci_ratio': 0.023, 
    'separation_quality': 0.006, 
    'test_set_quality': 0},
    
'train_quality_report': {
    'train_set_quality': 0.85,
    'vulnerability_report': {...}}
   
 ```

 
#### prediction uncertainty

The module can be used to yield uncertainty measure for predictions. 
```
    uncertainty_model = pytrust.create_uncertainty_model(method='confidence')
    predictions = uncertainty_model.predict(x_pred) # same as model.predict(x_pred)
    uncertainty = uncertainty_model.uncertainty(x_pred)
```


#### Lime explanation

The module can be used to produce lime explanations for sample of interest. 
```
    explainer = pytrust.create_lime_explainer()
    explainer.explain(sample) # returns a dictionary
    explainer.plot(sample) # produce a graphical explanation    
```


%package -n python3-pytolemaic
Summary:	Package for ML model analysis
Provides:	python-pytolemaic
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-pytolemaic
![PyPI - Version](https://img.shields.io/pypi/v/pytolemaic?color=brightgreen)
![Unittests](https://github.com/Broundal/Pytolemaic/workflows/Unittests/badge.svg?branch=master)
![PyPI - License](https://img.shields.io/pypi/l/pytolemaic?color=orange)


# Pytolemaic

## What is Pytolemaic 
Pytolemaic package analyzes your model and dataset and measure their quality. 

The package supports classification/regression models built for tabular datasets (e.g. sklearn's regressors/classifiers),
 but will also support custom made models as long as they implement sklearn's API. 

The package is aimed for personal use and comes with no guarantees. 
I hope you will find it useful. I will appreciate any feedback you have.

## Install
```
pip install pytolemaic
```

## Basic usage
```
from pytolemaic import PyTrust

pytrust = PyTrust(model=estimator,
                  xtrain=xtrain, ytrain=ytrain,
                  xtest=xtest, ytest=ytest)
   
# run all analysis and print insights:,
insights = pytrust.insights()
print("\n".join(insights))

# run analysis and plot graphs
pytrust.plot()
```

## supported features
The package contains the following functionalities:

#### On model creation
- **Dataset Analysis**: Analysis aimed to detect issues in the dataset.
- **Sensitivity Analysis**: Calculation of feature importance for given model, either via sensitivity to feature value or sensitivity to missing values. 
- **Vulnerability report**: Based on the feature sensitivity we measure model's vulnerability in respect to imputation, leakage, and # of features.
- **Scoring report**: Report model's score on test data with confidence interval.
- **separation quality**: Measure whether train and test data comes from the same distribution.
- **Overall quality**: Provides overall quality measures

#### On prediction
- **Prediction uncertainty**: Provides an uncertainty measure for given model's prediction.
- **Lime explanation**: Provides Lime explanation for sample of interest.


## How to use: 

Get started by calling help() function (*Recommended!*):
```
   from pytolemaic import help
   supported_keys = help()
   # or
   help(key='basic usage')
```

Example for performing all available analysis with PyTrust:
```
   from pytolemaic import PyTrust

   pytrust = PyTrust(
       model=estimator,
       xtrain=xtrain, ytrain=ytrain,
       xtest=xtest, ytest=ytest)
       
   # run all analysis and get a list of distilled insights",
   insights = pytrust.insights()
   print("\n".join(insights))
    
   # run all analysis and plot all graphs
   pytrust.plot()
   
   # print all data gathered
   import pprint
   pprint(report.to_dict(printable=True))
```

In case of need to access only specific analysis (usually to save time)
```
   # dataset analysis report
   dataset_analysis_report = pytrust.dataset_analysis_report
   
   # feature sensitivity report
   sensitivity_report = pytrust.sensitivity_report
   
   # model's performance report
   scoring_report = pytrust.scoring_report
   
   # overall model's quality report
   quality_report = pytrust.quality_report
   
   # with any of the above reports
   report = <desired report>
   print("\n".join(report.insights()))
   
   report.plot() # plot graphs
   pprint(report.to_dict(printable=True)) # export report as a dictionary
   pprint(report.to_dict_meaning()) # print documentation for above dictionary
          
```

Analysis of predictions
```
 
   # estimate uncertainty of a prediction
   uncertainty_model = pytrust.create_uncertainty_model()
   
   # explain a prediction with Lime
   create_lime_explainer = pytrust.create_lime_explainer()
   
```

Examples on toy dataset can be found in [/examples/toy_examples/](./examples/toy_examples/)
Examples on 'real-life' datasets can be found in [/examples/interesting_examples/](./examples/interesting_examples/) 

## Output examples:

#### Sensitivity Analysis:

 - The sensitivity of each feature (\[0,1\], normalized to sum of 1):
 
```
 'sensitivity_report': {
    'method': 'shuffled',
    'sensitivities': {
        'age': 0.12395,
        'capital-gain': 0.06725,
        'capital-loss': 0.02465,
        'education': 0.05769,
        'education-num': 0.13765,
        ...
      }
  }
```
                                                        
 - Simple statistics on the feature sensitivity:
 ```
 'shuffle_stats_report': {
      'n_features': 14,
      'n_low': 1,
      'n_zero': 0
 }
 ```
 
 - Naive vulnerability scores (\[0,1\], lower is better):

   - **Imputation**: sensitivity of the model to missing values.
   - **Leakge**: chance of the model to have leaking features.
   - **Too many features**: Whether the model is based on too many features.
 
 ```
 'vulnerability_report': {
      'imputation': 0.35,
      'leakage': 0,
      'too_many_features': 0.14
 }  
 ```

#### scoring report

For given metric, the score and confidence intervals (CI) is calculated
 ```
'recall': {
     'ci_high': 0.763,
     'ci_low': 0.758,
     'ci_ratio': 0.023,
     'metric': 'recall',
     'value': 0.760,
},
'auc': {
     'ci_high': 0.909,
     'ci_low': 0.907,
     'ci_ratio': 0.022,
     'metric': 'auc',
     'value': 0.907
}    
 ```
 
 Additionally, score quality measures the quality of the score based on the separability (auc score) between train and test sets.
 
 Value of 1 means test set has same distribution as train set. Value of 0 means test set has fundamentally different distribution. 
 ```
 'separation_quality': 0.00611         
 ```
  
Combining the above measures into a single number we provide the overall quality of the model/dataset.

Higher quality value (\[0,1\]) means better dataset/model.
 ```
quality_report : { 
'model_quality_report': {
    'model_loss': 0.24,
    'model_quality': 0.41,
    'vulnerability_report': {...}},
    
'test_quality_report': {
    'ci_ratio': 0.023, 
    'separation_quality': 0.006, 
    'test_set_quality': 0},
    
'train_quality_report': {
    'train_set_quality': 0.85,
    'vulnerability_report': {...}}
   
 ```

 
#### prediction uncertainty

The module can be used to yield uncertainty measure for predictions. 
```
    uncertainty_model = pytrust.create_uncertainty_model(method='confidence')
    predictions = uncertainty_model.predict(x_pred) # same as model.predict(x_pred)
    uncertainty = uncertainty_model.uncertainty(x_pred)
```


#### Lime explanation

The module can be used to produce lime explanations for sample of interest. 
```
    explainer = pytrust.create_lime_explainer()
    explainer.explain(sample) # returns a dictionary
    explainer.plot(sample) # produce a graphical explanation    
```


%package help
Summary:	Development documents and examples for pytolemaic
Provides:	python3-pytolemaic-doc
%description help
![PyPI - Version](https://img.shields.io/pypi/v/pytolemaic?color=brightgreen)
![Unittests](https://github.com/Broundal/Pytolemaic/workflows/Unittests/badge.svg?branch=master)
![PyPI - License](https://img.shields.io/pypi/l/pytolemaic?color=orange)


# Pytolemaic

## What is Pytolemaic 
Pytolemaic package analyzes your model and dataset and measure their quality. 

The package supports classification/regression models built for tabular datasets (e.g. sklearn's regressors/classifiers),
 but will also support custom made models as long as they implement sklearn's API. 

The package is aimed for personal use and comes with no guarantees. 
I hope you will find it useful. I will appreciate any feedback you have.

## Install
```
pip install pytolemaic
```

## Basic usage
```
from pytolemaic import PyTrust

pytrust = PyTrust(model=estimator,
                  xtrain=xtrain, ytrain=ytrain,
                  xtest=xtest, ytest=ytest)
   
# run all analysis and print insights:,
insights = pytrust.insights()
print("\n".join(insights))

# run analysis and plot graphs
pytrust.plot()
```

## supported features
The package contains the following functionalities:

#### On model creation
- **Dataset Analysis**: Analysis aimed to detect issues in the dataset.
- **Sensitivity Analysis**: Calculation of feature importance for given model, either via sensitivity to feature value or sensitivity to missing values. 
- **Vulnerability report**: Based on the feature sensitivity we measure model's vulnerability in respect to imputation, leakage, and # of features.
- **Scoring report**: Report model's score on test data with confidence interval.
- **separation quality**: Measure whether train and test data comes from the same distribution.
- **Overall quality**: Provides overall quality measures

#### On prediction
- **Prediction uncertainty**: Provides an uncertainty measure for given model's prediction.
- **Lime explanation**: Provides Lime explanation for sample of interest.


## How to use: 

Get started by calling help() function (*Recommended!*):
```
   from pytolemaic import help
   supported_keys = help()
   # or
   help(key='basic usage')
```

Example for performing all available analysis with PyTrust:
```
   from pytolemaic import PyTrust

   pytrust = PyTrust(
       model=estimator,
       xtrain=xtrain, ytrain=ytrain,
       xtest=xtest, ytest=ytest)
       
   # run all analysis and get a list of distilled insights",
   insights = pytrust.insights()
   print("\n".join(insights))
    
   # run all analysis and plot all graphs
   pytrust.plot()
   
   # print all data gathered
   import pprint
   pprint(report.to_dict(printable=True))
```

In case of need to access only specific analysis (usually to save time)
```
   # dataset analysis report
   dataset_analysis_report = pytrust.dataset_analysis_report
   
   # feature sensitivity report
   sensitivity_report = pytrust.sensitivity_report
   
   # model's performance report
   scoring_report = pytrust.scoring_report
   
   # overall model's quality report
   quality_report = pytrust.quality_report
   
   # with any of the above reports
   report = <desired report>
   print("\n".join(report.insights()))
   
   report.plot() # plot graphs
   pprint(report.to_dict(printable=True)) # export report as a dictionary
   pprint(report.to_dict_meaning()) # print documentation for above dictionary
          
```

Analysis of predictions
```
 
   # estimate uncertainty of a prediction
   uncertainty_model = pytrust.create_uncertainty_model()
   
   # explain a prediction with Lime
   create_lime_explainer = pytrust.create_lime_explainer()
   
```

Examples on toy dataset can be found in [/examples/toy_examples/](./examples/toy_examples/)
Examples on 'real-life' datasets can be found in [/examples/interesting_examples/](./examples/interesting_examples/) 

## Output examples:

#### Sensitivity Analysis:

 - The sensitivity of each feature (\[0,1\], normalized to sum of 1):
 
```
 'sensitivity_report': {
    'method': 'shuffled',
    'sensitivities': {
        'age': 0.12395,
        'capital-gain': 0.06725,
        'capital-loss': 0.02465,
        'education': 0.05769,
        'education-num': 0.13765,
        ...
      }
  }
```
                                                        
 - Simple statistics on the feature sensitivity:
 ```
 'shuffle_stats_report': {
      'n_features': 14,
      'n_low': 1,
      'n_zero': 0
 }
 ```
 
 - Naive vulnerability scores (\[0,1\], lower is better):

   - **Imputation**: sensitivity of the model to missing values.
   - **Leakge**: chance of the model to have leaking features.
   - **Too many features**: Whether the model is based on too many features.
 
 ```
 'vulnerability_report': {
      'imputation': 0.35,
      'leakage': 0,
      'too_many_features': 0.14
 }  
 ```

#### scoring report

For given metric, the score and confidence intervals (CI) is calculated
 ```
'recall': {
     'ci_high': 0.763,
     'ci_low': 0.758,
     'ci_ratio': 0.023,
     'metric': 'recall',
     'value': 0.760,
},
'auc': {
     'ci_high': 0.909,
     'ci_low': 0.907,
     'ci_ratio': 0.022,
     'metric': 'auc',
     'value': 0.907
}    
 ```
 
 Additionally, score quality measures the quality of the score based on the separability (auc score) between train and test sets.
 
 Value of 1 means test set has same distribution as train set. Value of 0 means test set has fundamentally different distribution. 
 ```
 'separation_quality': 0.00611         
 ```
  
Combining the above measures into a single number we provide the overall quality of the model/dataset.

Higher quality value (\[0,1\]) means better dataset/model.
 ```
quality_report : { 
'model_quality_report': {
    'model_loss': 0.24,
    'model_quality': 0.41,
    'vulnerability_report': {...}},
    
'test_quality_report': {
    'ci_ratio': 0.023, 
    'separation_quality': 0.006, 
    'test_set_quality': 0},
    
'train_quality_report': {
    'train_set_quality': 0.85,
    'vulnerability_report': {...}}
   
 ```

 
#### prediction uncertainty

The module can be used to yield uncertainty measure for predictions. 
```
    uncertainty_model = pytrust.create_uncertainty_model(method='confidence')
    predictions = uncertainty_model.predict(x_pred) # same as model.predict(x_pred)
    uncertainty = uncertainty_model.uncertainty(x_pred)
```


#### Lime explanation

The module can be used to produce lime explanations for sample of interest. 
```
    explainer = pytrust.create_lime_explainer()
    explainer.explain(sample) # returns a dictionary
    explainer.plot(sample) # produce a graphical explanation    
```


%prep
%autosetup -n pytolemaic-0.15.4

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-pytolemaic -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri Jun 09 2023 Python_Bot <Python_Bot@openeuler.org> - 0.15.4-1
- Package Spec generated