summaryrefslogtreecommitdiff
path: root/python-monotonic-binning.spec
blob: 943c4b1f1ba49708e2032f6788dbe4db5392b9ff (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
%global _empty_manifest_terminate_build 0
Name:		python-monotonic-binning
Version:	0.0.1
Release:	1
Summary:	Monotonic Variable Binning by WOE
License:	MIT License
URL:		https://github.com/jstephenj14/Monotonic-WOE-Binning-Algorithm
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/c8/3f/af5dfe5546d0be72a528f3f8a174a9b21a88f6fd701f82de84c14ecc7928/monotonic_binning-0.0.1.tar.gz
BuildArch:	noarch


%description
# Monotonic-WOE-Binning-Algorithm

_This algorithm is based on the excellent paper by Mironchyk and Tchistiakov (2017) named "Monotone	optimal	binning	algorithm for credit risk modeling"._

### How to use

1. pip install monotonic_binning: `pip install -i https://test.pypi.org/simple/simple/ monotonic-binning`  
2. Import monotonic_woe_binning: `from monotonic_binning import monotonic_woe_binning as bin`
3. Use `fit` and `transform` to bin variables for train and test datasets respectively

### Demo Run Details

The `demo_run.py` file available under `tests/` uses German credit card data from [Penn State's online course](https://online.stat.psu.edu/stat508/resource/analysis/gcd) and gives an overview of how to use the package.

### Summary of Monotonic WOE 

The weight-of-evidence (WOE) method of evaluating strength of predictors is an understated one in the field of analytics.
While it is standard fare in credit risk modelling, it is under-utilized in other settings though its formulation makes it
generic enough for use in other domains too. The WOE method primarily aims to bin variables into buckets that deliver the most
information to a potential classification model. Quite often, WOE binning methods measure effectiveness of such bins using Information Value
or IV. For a more detailed introduction to WOE and IV, [this article](http://ucanalytics.com/blogs/information-value-and-weight-of-evidencebanking-case/)
is a useful read. 

In the world of credit risk modelling, regulatory oversight often requires that the variables that go into models
are split into bins 

- whose weight of evidence (WOE) values maintain a monotonic relationship with the 1/0 variable (loan default or not default for example.)
- are reasonably sized and large enough to be respresentative of population segments, and
- maximize the IV value of the given variable in the process of this binning. 

To exemplify the constraints such a problem, consider a simple dataset containing age and a default indicator (1 if defaulted, 0 if not).
The following is a possible scenario in which the variable is binned into three groups in such a manner that their WOE values decrease monotomically
as the ages of customers increase. 

<a href="https://drive.google.com/uc?export=view&id=10NHDsJQbZRgO3QQGK2dMkoAmzJxtQR_A"><img src="https://drive.google.com/uc?export=view&id=10NHDsJQbZRgO3QQGK2dMkoAmzJxtQR_A" style="width: 500px; max-width: 100%; height: auto" title="WOE Table" /></a>

The WOE is derived in such a manner that as the WOE value increases, the default rate decreases. So we can infer 
that younger customers are more likely to default in comparison to older customers.

Arriving at the perfect bin cutoffs to meet all three requirements discussed earlier is a non-trivial exercise. Most statistical software
provide this type of optimal discretization of interval variables. R's [smbinning package](https://cran.r-project.org/web/packages/smbinning/smbinning.pdf)
and SAS' [proc transreg](https://statcompute.wordpress.com/2017/09/24/granular-monotonic-binning-in-sas/) are two such examples. To my knowledge, Python's solutions to this problem are fairly sparse. 

This package is an attempt to complement already exhaustive packages like [scorecardpy](https://github.com/ShichenXie/scorecardpy) with the capability to bin variables with monotonic WOE.




%package -n python3-monotonic-binning
Summary:	Monotonic Variable Binning by WOE
Provides:	python-monotonic-binning
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-monotonic-binning
# Monotonic-WOE-Binning-Algorithm

_This algorithm is based on the excellent paper by Mironchyk and Tchistiakov (2017) named "Monotone	optimal	binning	algorithm for credit risk modeling"._

### How to use

1. pip install monotonic_binning: `pip install -i https://test.pypi.org/simple/simple/ monotonic-binning`  
2. Import monotonic_woe_binning: `from monotonic_binning import monotonic_woe_binning as bin`
3. Use `fit` and `transform` to bin variables for train and test datasets respectively

### Demo Run Details

The `demo_run.py` file available under `tests/` uses German credit card data from [Penn State's online course](https://online.stat.psu.edu/stat508/resource/analysis/gcd) and gives an overview of how to use the package.

### Summary of Monotonic WOE 

The weight-of-evidence (WOE) method of evaluating strength of predictors is an understated one in the field of analytics.
While it is standard fare in credit risk modelling, it is under-utilized in other settings though its formulation makes it
generic enough for use in other domains too. The WOE method primarily aims to bin variables into buckets that deliver the most
information to a potential classification model. Quite often, WOE binning methods measure effectiveness of such bins using Information Value
or IV. For a more detailed introduction to WOE and IV, [this article](http://ucanalytics.com/blogs/information-value-and-weight-of-evidencebanking-case/)
is a useful read. 

In the world of credit risk modelling, regulatory oversight often requires that the variables that go into models
are split into bins 

- whose weight of evidence (WOE) values maintain a monotonic relationship with the 1/0 variable (loan default or not default for example.)
- are reasonably sized and large enough to be respresentative of population segments, and
- maximize the IV value of the given variable in the process of this binning. 

To exemplify the constraints such a problem, consider a simple dataset containing age and a default indicator (1 if defaulted, 0 if not).
The following is a possible scenario in which the variable is binned into three groups in such a manner that their WOE values decrease monotomically
as the ages of customers increase. 

<a href="https://drive.google.com/uc?export=view&id=10NHDsJQbZRgO3QQGK2dMkoAmzJxtQR_A"><img src="https://drive.google.com/uc?export=view&id=10NHDsJQbZRgO3QQGK2dMkoAmzJxtQR_A" style="width: 500px; max-width: 100%; height: auto" title="WOE Table" /></a>

The WOE is derived in such a manner that as the WOE value increases, the default rate decreases. So we can infer 
that younger customers are more likely to default in comparison to older customers.

Arriving at the perfect bin cutoffs to meet all three requirements discussed earlier is a non-trivial exercise. Most statistical software
provide this type of optimal discretization of interval variables. R's [smbinning package](https://cran.r-project.org/web/packages/smbinning/smbinning.pdf)
and SAS' [proc transreg](https://statcompute.wordpress.com/2017/09/24/granular-monotonic-binning-in-sas/) are two such examples. To my knowledge, Python's solutions to this problem are fairly sparse. 

This package is an attempt to complement already exhaustive packages like [scorecardpy](https://github.com/ShichenXie/scorecardpy) with the capability to bin variables with monotonic WOE.




%package help
Summary:	Development documents and examples for monotonic-binning
Provides:	python3-monotonic-binning-doc
%description help
# Monotonic-WOE-Binning-Algorithm

_This algorithm is based on the excellent paper by Mironchyk and Tchistiakov (2017) named "Monotone	optimal	binning	algorithm for credit risk modeling"._

### How to use

1. pip install monotonic_binning: `pip install -i https://test.pypi.org/simple/simple/ monotonic-binning`  
2. Import monotonic_woe_binning: `from monotonic_binning import monotonic_woe_binning as bin`
3. Use `fit` and `transform` to bin variables for train and test datasets respectively

### Demo Run Details

The `demo_run.py` file available under `tests/` uses German credit card data from [Penn State's online course](https://online.stat.psu.edu/stat508/resource/analysis/gcd) and gives an overview of how to use the package.

### Summary of Monotonic WOE 

The weight-of-evidence (WOE) method of evaluating strength of predictors is an understated one in the field of analytics.
While it is standard fare in credit risk modelling, it is under-utilized in other settings though its formulation makes it
generic enough for use in other domains too. The WOE method primarily aims to bin variables into buckets that deliver the most
information to a potential classification model. Quite often, WOE binning methods measure effectiveness of such bins using Information Value
or IV. For a more detailed introduction to WOE and IV, [this article](http://ucanalytics.com/blogs/information-value-and-weight-of-evidencebanking-case/)
is a useful read. 

In the world of credit risk modelling, regulatory oversight often requires that the variables that go into models
are split into bins 

- whose weight of evidence (WOE) values maintain a monotonic relationship with the 1/0 variable (loan default or not default for example.)
- are reasonably sized and large enough to be respresentative of population segments, and
- maximize the IV value of the given variable in the process of this binning. 

To exemplify the constraints such a problem, consider a simple dataset containing age and a default indicator (1 if defaulted, 0 if not).
The following is a possible scenario in which the variable is binned into three groups in such a manner that their WOE values decrease monotomically
as the ages of customers increase. 

<a href="https://drive.google.com/uc?export=view&id=10NHDsJQbZRgO3QQGK2dMkoAmzJxtQR_A"><img src="https://drive.google.com/uc?export=view&id=10NHDsJQbZRgO3QQGK2dMkoAmzJxtQR_A" style="width: 500px; max-width: 100%; height: auto" title="WOE Table" /></a>

The WOE is derived in such a manner that as the WOE value increases, the default rate decreases. So we can infer 
that younger customers are more likely to default in comparison to older customers.

Arriving at the perfect bin cutoffs to meet all three requirements discussed earlier is a non-trivial exercise. Most statistical software
provide this type of optimal discretization of interval variables. R's [smbinning package](https://cran.r-project.org/web/packages/smbinning/smbinning.pdf)
and SAS' [proc transreg](https://statcompute.wordpress.com/2017/09/24/granular-monotonic-binning-in-sas/) are two such examples. To my knowledge, Python's solutions to this problem are fairly sparse. 

This package is an attempt to complement already exhaustive packages like [scorecardpy](https://github.com/ShichenXie/scorecardpy) with the capability to bin variables with monotonic WOE.




%prep
%autosetup -n monotonic-binning-0.0.1

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-monotonic-binning -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.1-1
- Package Spec generated