%global _empty_manifest_terminate_build 0 Name: python-regressionpack Version: 1.1.1 Release: 1 Summary: Library for making regression with errorbars a walk in the park. License: GNU Lesser General Public License v3 (LGPLv3) URL: https://pypi.org/project/regressionpack/ Source0: https://mirrors.nju.edu.cn/pypi/web/packages/c6/7d/73183fe070b2f14f6cb14d6458e3250eca6a1339c642dc2f736347d12d97/regressionpack-1.1.1.tar.gz BuildArch: noarch Requires: python3-numpy Requires: python3-scipy Requires: python3-matplotlib %description # regressionpack A library of higher level regression functions that also include errorbars computed using a provided confidence interval. Available regressions so far include * [Linear](https://en.wikipedia.org/wiki/Linear_regression) * [Polynomial](https://en.wikipedia.org/wiki/Polynomial_regression) * GenericCurveFit * CosineFit * Exponential * Logistic * SystemIdentification # Versions * 0.1.4: Added the Scaler class in regressionpack.utilities, and a built-in rescale in the Linear class (also accessible from Polynomial). * 0.1.5: Patched an error in utilities.FFTGuess. The number of points fed in np.linspace was a float instead of an int. * 0.1.6: Added missing type hints in all classes and utilities. Now using nptyping for numpy.ndarray type hints. * 1.0.0: This package has been used long enough without incident and is ready to graduate to 1.0.0! * 1.0.2: Added Gaussian fit. * 1.0.3: Added an option to silence the warning in FFTGuess. Warning is always enabled by default. * 1.0.4: Fixed a glitch with FFTGuess caused by the interpolation having weird behavior if the x values are not strictly ascending. * 1.0.5: It's not regression-related, but I included a class to perform the Jonckheere-Terpstra test on groups of samples. * 1.0.6: Fixed a problem introduced in version 1.0.5: the type annotations used were not compatible with python below 3.10 and just broke the package. * 1.0.7: Remove Jonckheere-Terpstra, it will have its own package. * 1.1.0: Remove dependency to nptyping. Will use numpy 1.21.5 which supports typing instead. ## Getting Started These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system. ### Prerequisites This package was developped using: * python 3.8.3 * numpy 1.18.1 * scipy 1.4.1 * nptyping 1.3 While it may still work with older versions, I did not take the time to verify. ### Installing ``` pip install regressionpack ``` Note that this will also install numpy 1.18.1 and scipy 1.4.1 if they are not already present. Once installation is done, you may use the package by importing it this way: ```python import regressionpack ``` ## Example applications Everything below needs these imports to work: ```python from regressionpack import Linear, Polynomial, GenericCurveFit, CosineFit, Exponential import numpy as np from matplotlib import pyplot as plt ``` ### Using the classes All the classes are meant to be used the same way (with sometimes minor changes). The most differences come during instanciation, as some models require more input parameters than others. Refer to the relevant models example for this. ```python L = Linear(xdata, ydata) L.Fit() # Performs the fitting L.Eval(x) # Evaluates the model using provided x data. The shape must be identical to the xdata, except along the FitDim. ``` ### Linear regression This is the most generic multivariate linear regression. You can use it to fit any data with any base functions you choose fit. This was built from the [wikipedia](https://en.wikipedia.org/wiki/Linear_regression) page. You may notice some similarity in the nomenclature/variable names used in the source code. ```python x = np.linspace(-7,6,1000) _x1 = x.reshape((-1,1)) # A plain linear base _x2 = np.cos(_x1) # A cosine with frequency 1 base _x3 = np.exp(_x1) # An exponential base X = np.hstack((_x1, _x2, _x3)) # Put the base vectors together Y = 3*_x1 + 10*_x2 + .1*_x3 # Create the result Y += np.random.randn(*Y.shape)*2 # Add some noise L = Linear(X, Y) L.Fit() YF = L.Eval(X).flatten() dYF = L.EvalFitError(X).flatten() dYFp = L.EvalPredictionError(X).flatten() fig = plt.figure() plt.xlabel('x') plt.ylabel('y') plt.grid(True) plt.plot(x, Y, '.k', label='data') plt.plot(x, YF,'--b', label='fit') plt.fill_between(x, YF-dYF, YF+dYF,color='k', alpha=.3, label='Fit error') plt.fill_between(x, YF-dYFp, YF+dYFp,color='b', alpha=.3, label='Prediction error') plt.legend() #Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95) frac = np.mean((Y.flatten() >= YF-dYFp) & (Y.flatten() <= YF + dYFp)) print(frac) ``` #### Polynomial regression Using the previously shown Linear class, the specific case of the polynomial regression was implemented. This is done by creating a Vandermonde matrix of the input data before using the Linear class. Here we will show an example where we measure the photocurrent of a photodiode over the whole C-band (1525-1575 nm) at multiple optical powers. The goal is then to obtain the wavelength-dependant responsivity. ```python wavelengths = np.linspace(1525,1575,5001).reshape((-1,1)) power = np.linspace(1e-6,1e-3,30).reshape((1,-1)) #Generate a photocurrent curve with wavelength dependance (5 nm ripples) and nonlinear behavior, also a dark current of 2 nA photocurrents = (0.1*power - 2*power**2) * (1 + 0.1 * np.cos(2*np.pi/5 * wavelengths)) + 2e-9 #Generate the data for fitting #Add 1% of relative noise stdev #And 10 pA of stdev absolute noise Y = photocurrents*(1 + np.random.randn(*photocurrents.shape)*0.01) + np.random.randn(*photocurrents.shape) * 10e-12 # Repeat the power axis X = np.repeat(power, Y.shape[0],axis=0) X *= (1 + .01*np.random.randn(*X.shape)) # 1% relative noise stdev X += np.random.randn(*X.shape)*100e-9 # 100 nW noise #Fit using a second order polynomial P = Polynomial(X, Y, 2, 1) P.Fit() plt.figure() plt.grid(True) plt.xlabel('Wavelength [nm]') plt.ylabel('Responsivity [A/W]') plt.plot(wavelengths, P.Beta[:,1],'k') plt.fill_between(wavelengths.flatten(), P.Beta[:,1] - P.BetaFitError[:,1], P.Beta[:,1] + P.BetaFitError[:,1], color='blue', alpha=0.7) # Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95) YF = P.Eval(X) dYFp = P.EvalPredictionError(X) frac = np.mean((Y >= YF-dYFp) & (Y <= YF + dYFp)) print(frac) print(np.min(P.AdjR2)) ``` #### X-parameter rescaling In order to avoid numerical imprecision problems, $X$ should be re-scaled to $$\xi = \frac{X - \mu_X}{\sigma_{x}}$$ where $\mu_X$ and $\sigma_X$ are the mean and standard deviation of $X$. Note that when $\sigma_X = 0$, the matching value of $\xi$ should be 1 instead. This will be the case of the first column in a polynomial regression (all values are ones). Reminder: $$ Y = X\beta + \epsilon $$ $$ \hat{Y} = X\beta $$ $$ \beta' = (\xi^T \xi)^{-1} \xi^T Y $$ $$ \beta = (X^T X)^{-1} X^T Y $$ We already know that $X$ is badly scaled, while $\xi$ is well-scaled. Using this, we could attempt to find the original $\beta$ with $$ \beta = ((\xi^T \xi)^{-1} \xi^T X)^{-1} \beta' $$ But keep in mind that everything hinges on that new matrix being somewhat decently-scaled for the inversion to work properly. Also, keep in mind that even if this works, we might still get scaling problems when doing the forward computation to get back $\hat{Y}$. ##### First example Here we will use a badly scaled polynomial and try with and without scaling. We can see that it does make a huge difference, as the unscaled fit is completely lost, while the scaled fit is spot on. ```python from regressionpack.utilities import MatMul all_coefs = [1, -5989, 11956034, -7956067976] # Create some badly scaled data nb = 100 X = np.zeros((nb,4)) x = np.linspace(1990, 2000,nb) X[:,0] = 1 X[:,1] = x X[:,2] = x**2 X[:,3] = x**3 coefs = np.reshape(list(reversed(all_coefs)), (4,1)) Y = MatMul(X, coefs) y = Y.flatten() y += np.random.randn(*y.shape) * np.std(y)*0.1 L_uns = Polynomial(x,y,3) L_uns.Fit() YF_uns = L_uns.Eval(x) dYF_uns = L_uns.EvalPredictionError(x) L_sca = Polynomial(x,y,3,rescale=True) L_sca.Fit() YF_sca = L_sca.Eval(x) dYF_sca = L_sca.EvalPredictionError(x) plt.plot(x,Y, '.k', label='Original') plt.plot(x, YF_uns, '--r', label='Unscaled Fit') plt.fill_between(x, YF_uns - dYF_uns, YF_uns+dYF_uns, color='r',alpha=0.3) plt.plot(x, YF_sca, '-.g', label='Scaled Fit') plt.fill_between(x, YF_sca - dYF_sca, YF_sca+dYF_sca, color='g',alpha=0.3) L_sca.Beta/coefs.flatten() -1 ``` ##### Second example Here we will use data with an even worse scaling to see how it goes. While the scaling does improve the result, it won't do magic. ```python # Create some problematic data # the x is badly conditioned in that its spread is small compared to its mean. This may cause numerical precision problems in the Vandermonde matrix. x = np.linspace(1525, 1565, 101)*1e-9 X = np.zeros((101,5)) for i in range(5): X[:,i] = x**i coefs = np.reshape([56854106.8500000, -147284967000000., 1.43076500000000e+20, -6.17700000000000e+25, 1.00000000000000e+31],(-1,1)) y = MatMul(X,coefs) y += np.random.randn(*y.shape) * .03*np.std(y) y = y.flatten() # Perform polynomial fit P_unsc = Polynomial(x, y, 4, rescale=False) P_unsc.Fit() yf_p = P_unsc.Eval(x) dyf_p = P_unsc.EvalPredictionError(x) # Perform scaled fit P_sca = Polynomial(x,y,4, rescale=True) P_sca.Fit() yf_sp = P_sca.Eval(x) dyf_sp = P_sca.EvalPredictionError(x) # Plot results fig, axes = plt.subplots(2,1,sharex=True) [axe.grid(True) for axe in axes] axes[0].set_ylabel('Unscaled') axes[1].set_ylabel('Scaled') axes[1].set_xlabel('$x$') # Unscaled x2 = x * 1e9 axes[0].plot(x2,y,'.k') axes[0].plot(x2, yf_p, '.-r') axes[0].fill_between(x2, yf_p-dyf_p, yf_p+dyf_p, color='r', alpha = 0.3) # Scaled axes[1].plot(x2,y,'.k') axes[1].plot(x2, yf_sp, '.-r') axes[1].fill_between(x2, yf_sp-dyf_sp, yf_sp+dyf_sp, color='r', alpha = 0.3) plt.show() # Ratio to real coefficients P_sca.Beta / coefs.flatten() - 1 ``` ### GenericCurveFit A simple way of wrapping up a custom fit function and its error bars calculations, all based on scipy.curve_fit. Example using a custom class with inherirance: ```python class myCustomFit(GenericCurveFit): def FitFunc(self, x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: return a * x + np.exp(-b*x**2) + np.sin(c*x) def Jacobian(self, x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: out = np.zeros((x.shape[0],) + (3,)) out[:,0] = x out[:,1] = -x**2 * np.exp(-b*x**2) out[:,2] = x*np.cos(c*x) return out def __init__(self, x:np.ndarray, y:np.ndarray, p0:np.ndarray=None, bounds=(-np.inf, np.inf), confidenceInterval:float=0.95, simult:bool=False, **kwargs): super(myCustomFit, self).__init__(x, y, self.FitFunc, self.Jacobian, p0, bounds, confidenceInterval, simult, **kwargs ) # Example using standalone functions directly in the GenericCurveFit: def func(x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: return a * x + np.exp(-b*x**2) + np.sin(c*x) def jac( x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: out = np.zeros((x.shape[0],) + (3,)) out[:,0] = x out[:,1] = -x**2 * np.exp(-b*x**2) out[:,2] = x*np.cos(c*x) return out """ Change this value to use the custom class or the standalone functions. The result should be the same. """ useClass = True # Generate some data x = np.linspace(-3,5,3000) p = (.1,.2,6.2) y = func(x, *p) y += np.random.randn(*y.shape) * 0.3 # Pick between the class and the standalone functions if useClass: GCF = myCustomFit(x, y, (1,1,7)) else: GCF = GenericCurveFit(x, y, func, jac, (1,1,7)) # Perform the fitting GCF.Fit() # Evaluate at a greater resolution xx = np.linspace(x.min(),x.max(),1000) yy = GCF.Eval(xx) dyy = GCF.EvalPredictionError(xx) # Show results and compare plt.plot(x,y,'.', label='data') plt.plot(xx,yy,'--', label='Fit') plt.fill_between(xx, yy-dyy, yy+dyy, alpha=0.3) # Check what fraction of the data is within the prediction interval yf = GCF.Eval(x) dyf = GCF.EvalPredictionError(x) print("Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95)") frac = np.mean((y <= yf + dyf) & (y >= yf - dyf)) print(frac) print("Compare parameters with results") print(p) print(GCF.Beta) print(GCF.BetaFitError) print(GCF.AdjR2) ``` #### Cosine fit This cosine fit inherits from GenericCurveFit. It also uses the Fourier transform of the data to do the initial guess of the fit parameters. The amplitude $A$ is approximated by the highest amplitude peak (that is not the DC offset) of the FFT. The frequency $\omega$ is the one associated with that highest peak. The phase shift $\phi$ is the angle of the complex FFT component and the constant is the DC offset. Using those as initial guesses, we proceed with least squares fitting on that objective function: $$\hat{y} = A \cos{\left( \omega x + \phi \right)} + b$$ To increase the speed, improve the results as well as enabling the computation of the errorbars, one must provide the Jacobian. Inheriting from the GenericCurveFit allows us to easily implement the cosine fit in a few lines of code (feel free to look in CosineFit.py). Ok, I cheated a little bit: a key component here involved the educated guess of the initial fit parameters. The function that does this (FFTGuess) actually takes more lines than this entire class. Also, since the FFT can be efficiently ran over large multidimensional datasets and provides a quite good estimate, I judged it deserved its own spot among the utilities function of this package. ```python x = np.linspace(-3,8,100) y = 3*np.cos(4*x + .66) - 2 y += np.random.randn(*y.shape)*0.3 CF = CosineFit(x, y) CF.Fit() yf = CF.Eval(x) dyf = CF.EvalPredictionError(x) print("Parameters vs fit results:") print((3, 4, .66, -2)) print(CF.Beta) print(CF.BetaFitError) plt.figure() plt.plot(x, y, '.k') plt.plot(x,yf,'--r') plt.fill_between(x, yf-dyf, yf+dyf, color='b', alpha=0.3) print("Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95)") factor = np.mean((y <= yf + dyf) & (y >= yf - dyf)) print(factor) ``` #### Exponential ```python x = np.linspace(-3,5) y = 2 * np.exp(2.4*x) + 3 y += np.random.randn(*y.shape)*3 plt.plot(x,y,'.') E = Exponential(x, y) E.Fit() xx = np.linspace(x.min(), x.max(), 1000) yy = E.Eval(xx) dyy = E.EvalPredictionError(xx) plt.plot(xx, yy,'--') plt.fill_between(xx, yy-dyy, yy+dyy,alpha=0.3) plt.yscale('log') print(E.Beta) print(E.BetaFitError) ``` #### System Identification Let's analyze a simple spring-mass-damper system with transfer function $$ G(s) = \frac{\frac{c}{m}s + \frac{k}{m}}{s^2 + \frac{c}{m}s + \frac{k}{m}}$$ Let's say that for some reason, the values of $c$, $m$ and $k$ are unknown to us, but we can measure the input $r(t)$ and output $c(t)$. ```python from scipy.signal import TransferFunction, lsim, impulse from scipy.special import erf # Define some constants m = 500 k = 3.2e6 c = 10e3 h_dip = 12.8e-3 nb = 1000 T = 2 t = np.linspace(0,T,nb,endpoint=False) # The transfer function num = [c/m, k/m] den = [1, c/m, k/m] G = TransferFunction(num, den) # The measured input and output values (with noise) r = h_dip * (erf(30*(t-T/4))+1) _, c, _ = lsim(G, r, t) r += np.random.randn(*r.shape)*1e-4 c += np.random.randn(*c.shape)*1e-4 # Extracting the transfer function SI = SystemIdentification(t, r, c, 2, 3) SI.Fit() r_imp = np.zeros_like(r) r_imp[0] = 2*nb/T _, c_imp_theo = impulse(G, T=t) c_imp_esti = SI.ForcedResponse(t, r_imp) c_imp_err = SI.ForcedResponsePredictionError(t, r_imp) # Show results fig, axes = plt.subplots(1,2, sharex=True) #[ax.grid(True) for ax in axes] axes[0].plot(t,r, label='$r(t)$') axes[0].plot(t,c, label='$c(t)$') axes[0].legend() axes[1].plot(t, c_imp_theo, '-k', label='$g(t)$') axes[1].plot(t, c_imp_esti, '--r', label='$\hat{g}(t)$') axes[1].fill_between(t, c_imp_esti - c_imp_err, c_imp_esti + c_imp_err, color='r',alpha=0.3) axes[1].legend() print(G, SI.TransferFunction) ``` #### Gaussian ```python from regressionpack import Gaussian a, b, c, d = 4, 3, 2, 1 x = np.linspace(-3,3,100) y = a * np.exp(-b * (x-c)**2) + d + np.random.randn(x.size)*0.1 plt.plot(x,y, '.k') g = Gaussian(x, y) g.Fit() yf = g.Eval(x) dyf = g.EvalPredictionError(x) plt.plot(x, yf, '-b') plt.fill_between(x, yf-dyf, yf+dyf, color='b', alpha=0.3) ``` #### Ellipse Not finished yet :( ## Built With * [Python](https://www.python.org/) - The language * [numpy](https://numpy.org/) - the numeric library * [scipy](https://docs.scipy.org/) - the scientific library ## Contributing Contact me and discuss your ideas and ambitions. ## Authors * **FusedSilica** - *Initial work* ## License This project is licensed under the GNU LGPLv3 License - see the [LICENSE.md](LICENSE.md) file for details %package -n python3-regressionpack Summary: Library for making regression with errorbars a walk in the park. Provides: python-regressionpack BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-regressionpack # regressionpack A library of higher level regression functions that also include errorbars computed using a provided confidence interval. Available regressions so far include * [Linear](https://en.wikipedia.org/wiki/Linear_regression) * [Polynomial](https://en.wikipedia.org/wiki/Polynomial_regression) * GenericCurveFit * CosineFit * Exponential * Logistic * SystemIdentification # Versions * 0.1.4: Added the Scaler class in regressionpack.utilities, and a built-in rescale in the Linear class (also accessible from Polynomial). * 0.1.5: Patched an error in utilities.FFTGuess. The number of points fed in np.linspace was a float instead of an int. * 0.1.6: Added missing type hints in all classes and utilities. Now using nptyping for numpy.ndarray type hints. * 1.0.0: This package has been used long enough without incident and is ready to graduate to 1.0.0! * 1.0.2: Added Gaussian fit. * 1.0.3: Added an option to silence the warning in FFTGuess. Warning is always enabled by default. * 1.0.4: Fixed a glitch with FFTGuess caused by the interpolation having weird behavior if the x values are not strictly ascending. * 1.0.5: It's not regression-related, but I included a class to perform the Jonckheere-Terpstra test on groups of samples. * 1.0.6: Fixed a problem introduced in version 1.0.5: the type annotations used were not compatible with python below 3.10 and just broke the package. * 1.0.7: Remove Jonckheere-Terpstra, it will have its own package. * 1.1.0: Remove dependency to nptyping. Will use numpy 1.21.5 which supports typing instead. ## Getting Started These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system. ### Prerequisites This package was developped using: * python 3.8.3 * numpy 1.18.1 * scipy 1.4.1 * nptyping 1.3 While it may still work with older versions, I did not take the time to verify. ### Installing ``` pip install regressionpack ``` Note that this will also install numpy 1.18.1 and scipy 1.4.1 if they are not already present. Once installation is done, you may use the package by importing it this way: ```python import regressionpack ``` ## Example applications Everything below needs these imports to work: ```python from regressionpack import Linear, Polynomial, GenericCurveFit, CosineFit, Exponential import numpy as np from matplotlib import pyplot as plt ``` ### Using the classes All the classes are meant to be used the same way (with sometimes minor changes). The most differences come during instanciation, as some models require more input parameters than others. Refer to the relevant models example for this. ```python L = Linear(xdata, ydata) L.Fit() # Performs the fitting L.Eval(x) # Evaluates the model using provided x data. The shape must be identical to the xdata, except along the FitDim. ``` ### Linear regression This is the most generic multivariate linear regression. You can use it to fit any data with any base functions you choose fit. This was built from the [wikipedia](https://en.wikipedia.org/wiki/Linear_regression) page. You may notice some similarity in the nomenclature/variable names used in the source code. ```python x = np.linspace(-7,6,1000) _x1 = x.reshape((-1,1)) # A plain linear base _x2 = np.cos(_x1) # A cosine with frequency 1 base _x3 = np.exp(_x1) # An exponential base X = np.hstack((_x1, _x2, _x3)) # Put the base vectors together Y = 3*_x1 + 10*_x2 + .1*_x3 # Create the result Y += np.random.randn(*Y.shape)*2 # Add some noise L = Linear(X, Y) L.Fit() YF = L.Eval(X).flatten() dYF = L.EvalFitError(X).flatten() dYFp = L.EvalPredictionError(X).flatten() fig = plt.figure() plt.xlabel('x') plt.ylabel('y') plt.grid(True) plt.plot(x, Y, '.k', label='data') plt.plot(x, YF,'--b', label='fit') plt.fill_between(x, YF-dYF, YF+dYF,color='k', alpha=.3, label='Fit error') plt.fill_between(x, YF-dYFp, YF+dYFp,color='b', alpha=.3, label='Prediction error') plt.legend() #Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95) frac = np.mean((Y.flatten() >= YF-dYFp) & (Y.flatten() <= YF + dYFp)) print(frac) ``` #### Polynomial regression Using the previously shown Linear class, the specific case of the polynomial regression was implemented. This is done by creating a Vandermonde matrix of the input data before using the Linear class. Here we will show an example where we measure the photocurrent of a photodiode over the whole C-band (1525-1575 nm) at multiple optical powers. The goal is then to obtain the wavelength-dependant responsivity. ```python wavelengths = np.linspace(1525,1575,5001).reshape((-1,1)) power = np.linspace(1e-6,1e-3,30).reshape((1,-1)) #Generate a photocurrent curve with wavelength dependance (5 nm ripples) and nonlinear behavior, also a dark current of 2 nA photocurrents = (0.1*power - 2*power**2) * (1 + 0.1 * np.cos(2*np.pi/5 * wavelengths)) + 2e-9 #Generate the data for fitting #Add 1% of relative noise stdev #And 10 pA of stdev absolute noise Y = photocurrents*(1 + np.random.randn(*photocurrents.shape)*0.01) + np.random.randn(*photocurrents.shape) * 10e-12 # Repeat the power axis X = np.repeat(power, Y.shape[0],axis=0) X *= (1 + .01*np.random.randn(*X.shape)) # 1% relative noise stdev X += np.random.randn(*X.shape)*100e-9 # 100 nW noise #Fit using a second order polynomial P = Polynomial(X, Y, 2, 1) P.Fit() plt.figure() plt.grid(True) plt.xlabel('Wavelength [nm]') plt.ylabel('Responsivity [A/W]') plt.plot(wavelengths, P.Beta[:,1],'k') plt.fill_between(wavelengths.flatten(), P.Beta[:,1] - P.BetaFitError[:,1], P.Beta[:,1] + P.BetaFitError[:,1], color='blue', alpha=0.7) # Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95) YF = P.Eval(X) dYFp = P.EvalPredictionError(X) frac = np.mean((Y >= YF-dYFp) & (Y <= YF + dYFp)) print(frac) print(np.min(P.AdjR2)) ``` #### X-parameter rescaling In order to avoid numerical imprecision problems, $X$ should be re-scaled to $$\xi = \frac{X - \mu_X}{\sigma_{x}}$$ where $\mu_X$ and $\sigma_X$ are the mean and standard deviation of $X$. Note that when $\sigma_X = 0$, the matching value of $\xi$ should be 1 instead. This will be the case of the first column in a polynomial regression (all values are ones). Reminder: $$ Y = X\beta + \epsilon $$ $$ \hat{Y} = X\beta $$ $$ \beta' = (\xi^T \xi)^{-1} \xi^T Y $$ $$ \beta = (X^T X)^{-1} X^T Y $$ We already know that $X$ is badly scaled, while $\xi$ is well-scaled. Using this, we could attempt to find the original $\beta$ with $$ \beta = ((\xi^T \xi)^{-1} \xi^T X)^{-1} \beta' $$ But keep in mind that everything hinges on that new matrix being somewhat decently-scaled for the inversion to work properly. Also, keep in mind that even if this works, we might still get scaling problems when doing the forward computation to get back $\hat{Y}$. ##### First example Here we will use a badly scaled polynomial and try with and without scaling. We can see that it does make a huge difference, as the unscaled fit is completely lost, while the scaled fit is spot on. ```python from regressionpack.utilities import MatMul all_coefs = [1, -5989, 11956034, -7956067976] # Create some badly scaled data nb = 100 X = np.zeros((nb,4)) x = np.linspace(1990, 2000,nb) X[:,0] = 1 X[:,1] = x X[:,2] = x**2 X[:,3] = x**3 coefs = np.reshape(list(reversed(all_coefs)), (4,1)) Y = MatMul(X, coefs) y = Y.flatten() y += np.random.randn(*y.shape) * np.std(y)*0.1 L_uns = Polynomial(x,y,3) L_uns.Fit() YF_uns = L_uns.Eval(x) dYF_uns = L_uns.EvalPredictionError(x) L_sca = Polynomial(x,y,3,rescale=True) L_sca.Fit() YF_sca = L_sca.Eval(x) dYF_sca = L_sca.EvalPredictionError(x) plt.plot(x,Y, '.k', label='Original') plt.plot(x, YF_uns, '--r', label='Unscaled Fit') plt.fill_between(x, YF_uns - dYF_uns, YF_uns+dYF_uns, color='r',alpha=0.3) plt.plot(x, YF_sca, '-.g', label='Scaled Fit') plt.fill_between(x, YF_sca - dYF_sca, YF_sca+dYF_sca, color='g',alpha=0.3) L_sca.Beta/coefs.flatten() -1 ``` ##### Second example Here we will use data with an even worse scaling to see how it goes. While the scaling does improve the result, it won't do magic. ```python # Create some problematic data # the x is badly conditioned in that its spread is small compared to its mean. This may cause numerical precision problems in the Vandermonde matrix. x = np.linspace(1525, 1565, 101)*1e-9 X = np.zeros((101,5)) for i in range(5): X[:,i] = x**i coefs = np.reshape([56854106.8500000, -147284967000000., 1.43076500000000e+20, -6.17700000000000e+25, 1.00000000000000e+31],(-1,1)) y = MatMul(X,coefs) y += np.random.randn(*y.shape) * .03*np.std(y) y = y.flatten() # Perform polynomial fit P_unsc = Polynomial(x, y, 4, rescale=False) P_unsc.Fit() yf_p = P_unsc.Eval(x) dyf_p = P_unsc.EvalPredictionError(x) # Perform scaled fit P_sca = Polynomial(x,y,4, rescale=True) P_sca.Fit() yf_sp = P_sca.Eval(x) dyf_sp = P_sca.EvalPredictionError(x) # Plot results fig, axes = plt.subplots(2,1,sharex=True) [axe.grid(True) for axe in axes] axes[0].set_ylabel('Unscaled') axes[1].set_ylabel('Scaled') axes[1].set_xlabel('$x$') # Unscaled x2 = x * 1e9 axes[0].plot(x2,y,'.k') axes[0].plot(x2, yf_p, '.-r') axes[0].fill_between(x2, yf_p-dyf_p, yf_p+dyf_p, color='r', alpha = 0.3) # Scaled axes[1].plot(x2,y,'.k') axes[1].plot(x2, yf_sp, '.-r') axes[1].fill_between(x2, yf_sp-dyf_sp, yf_sp+dyf_sp, color='r', alpha = 0.3) plt.show() # Ratio to real coefficients P_sca.Beta / coefs.flatten() - 1 ``` ### GenericCurveFit A simple way of wrapping up a custom fit function and its error bars calculations, all based on scipy.curve_fit. Example using a custom class with inherirance: ```python class myCustomFit(GenericCurveFit): def FitFunc(self, x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: return a * x + np.exp(-b*x**2) + np.sin(c*x) def Jacobian(self, x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: out = np.zeros((x.shape[0],) + (3,)) out[:,0] = x out[:,1] = -x**2 * np.exp(-b*x**2) out[:,2] = x*np.cos(c*x) return out def __init__(self, x:np.ndarray, y:np.ndarray, p0:np.ndarray=None, bounds=(-np.inf, np.inf), confidenceInterval:float=0.95, simult:bool=False, **kwargs): super(myCustomFit, self).__init__(x, y, self.FitFunc, self.Jacobian, p0, bounds, confidenceInterval, simult, **kwargs ) # Example using standalone functions directly in the GenericCurveFit: def func(x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: return a * x + np.exp(-b*x**2) + np.sin(c*x) def jac( x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: out = np.zeros((x.shape[0],) + (3,)) out[:,0] = x out[:,1] = -x**2 * np.exp(-b*x**2) out[:,2] = x*np.cos(c*x) return out """ Change this value to use the custom class or the standalone functions. The result should be the same. """ useClass = True # Generate some data x = np.linspace(-3,5,3000) p = (.1,.2,6.2) y = func(x, *p) y += np.random.randn(*y.shape) * 0.3 # Pick between the class and the standalone functions if useClass: GCF = myCustomFit(x, y, (1,1,7)) else: GCF = GenericCurveFit(x, y, func, jac, (1,1,7)) # Perform the fitting GCF.Fit() # Evaluate at a greater resolution xx = np.linspace(x.min(),x.max(),1000) yy = GCF.Eval(xx) dyy = GCF.EvalPredictionError(xx) # Show results and compare plt.plot(x,y,'.', label='data') plt.plot(xx,yy,'--', label='Fit') plt.fill_between(xx, yy-dyy, yy+dyy, alpha=0.3) # Check what fraction of the data is within the prediction interval yf = GCF.Eval(x) dyf = GCF.EvalPredictionError(x) print("Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95)") frac = np.mean((y <= yf + dyf) & (y >= yf - dyf)) print(frac) print("Compare parameters with results") print(p) print(GCF.Beta) print(GCF.BetaFitError) print(GCF.AdjR2) ``` #### Cosine fit This cosine fit inherits from GenericCurveFit. It also uses the Fourier transform of the data to do the initial guess of the fit parameters. The amplitude $A$ is approximated by the highest amplitude peak (that is not the DC offset) of the FFT. The frequency $\omega$ is the one associated with that highest peak. The phase shift $\phi$ is the angle of the complex FFT component and the constant is the DC offset. Using those as initial guesses, we proceed with least squares fitting on that objective function: $$\hat{y} = A \cos{\left( \omega x + \phi \right)} + b$$ To increase the speed, improve the results as well as enabling the computation of the errorbars, one must provide the Jacobian. Inheriting from the GenericCurveFit allows us to easily implement the cosine fit in a few lines of code (feel free to look in CosineFit.py). Ok, I cheated a little bit: a key component here involved the educated guess of the initial fit parameters. The function that does this (FFTGuess) actually takes more lines than this entire class. Also, since the FFT can be efficiently ran over large multidimensional datasets and provides a quite good estimate, I judged it deserved its own spot among the utilities function of this package. ```python x = np.linspace(-3,8,100) y = 3*np.cos(4*x + .66) - 2 y += np.random.randn(*y.shape)*0.3 CF = CosineFit(x, y) CF.Fit() yf = CF.Eval(x) dyf = CF.EvalPredictionError(x) print("Parameters vs fit results:") print((3, 4, .66, -2)) print(CF.Beta) print(CF.BetaFitError) plt.figure() plt.plot(x, y, '.k') plt.plot(x,yf,'--r') plt.fill_between(x, yf-dyf, yf+dyf, color='b', alpha=0.3) print("Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95)") factor = np.mean((y <= yf + dyf) & (y >= yf - dyf)) print(factor) ``` #### Exponential ```python x = np.linspace(-3,5) y = 2 * np.exp(2.4*x) + 3 y += np.random.randn(*y.shape)*3 plt.plot(x,y,'.') E = Exponential(x, y) E.Fit() xx = np.linspace(x.min(), x.max(), 1000) yy = E.Eval(xx) dyy = E.EvalPredictionError(xx) plt.plot(xx, yy,'--') plt.fill_between(xx, yy-dyy, yy+dyy,alpha=0.3) plt.yscale('log') print(E.Beta) print(E.BetaFitError) ``` #### System Identification Let's analyze a simple spring-mass-damper system with transfer function $$ G(s) = \frac{\frac{c}{m}s + \frac{k}{m}}{s^2 + \frac{c}{m}s + \frac{k}{m}}$$ Let's say that for some reason, the values of $c$, $m$ and $k$ are unknown to us, but we can measure the input $r(t)$ and output $c(t)$. ```python from scipy.signal import TransferFunction, lsim, impulse from scipy.special import erf # Define some constants m = 500 k = 3.2e6 c = 10e3 h_dip = 12.8e-3 nb = 1000 T = 2 t = np.linspace(0,T,nb,endpoint=False) # The transfer function num = [c/m, k/m] den = [1, c/m, k/m] G = TransferFunction(num, den) # The measured input and output values (with noise) r = h_dip * (erf(30*(t-T/4))+1) _, c, _ = lsim(G, r, t) r += np.random.randn(*r.shape)*1e-4 c += np.random.randn(*c.shape)*1e-4 # Extracting the transfer function SI = SystemIdentification(t, r, c, 2, 3) SI.Fit() r_imp = np.zeros_like(r) r_imp[0] = 2*nb/T _, c_imp_theo = impulse(G, T=t) c_imp_esti = SI.ForcedResponse(t, r_imp) c_imp_err = SI.ForcedResponsePredictionError(t, r_imp) # Show results fig, axes = plt.subplots(1,2, sharex=True) #[ax.grid(True) for ax in axes] axes[0].plot(t,r, label='$r(t)$') axes[0].plot(t,c, label='$c(t)$') axes[0].legend() axes[1].plot(t, c_imp_theo, '-k', label='$g(t)$') axes[1].plot(t, c_imp_esti, '--r', label='$\hat{g}(t)$') axes[1].fill_between(t, c_imp_esti - c_imp_err, c_imp_esti + c_imp_err, color='r',alpha=0.3) axes[1].legend() print(G, SI.TransferFunction) ``` #### Gaussian ```python from regressionpack import Gaussian a, b, c, d = 4, 3, 2, 1 x = np.linspace(-3,3,100) y = a * np.exp(-b * (x-c)**2) + d + np.random.randn(x.size)*0.1 plt.plot(x,y, '.k') g = Gaussian(x, y) g.Fit() yf = g.Eval(x) dyf = g.EvalPredictionError(x) plt.plot(x, yf, '-b') plt.fill_between(x, yf-dyf, yf+dyf, color='b', alpha=0.3) ``` #### Ellipse Not finished yet :( ## Built With * [Python](https://www.python.org/) - The language * [numpy](https://numpy.org/) - the numeric library * [scipy](https://docs.scipy.org/) - the scientific library ## Contributing Contact me and discuss your ideas and ambitions. ## Authors * **FusedSilica** - *Initial work* ## License This project is licensed under the GNU LGPLv3 License - see the [LICENSE.md](LICENSE.md) file for details %package help Summary: Development documents and examples for regressionpack Provides: python3-regressionpack-doc %description help # regressionpack A library of higher level regression functions that also include errorbars computed using a provided confidence interval. Available regressions so far include * [Linear](https://en.wikipedia.org/wiki/Linear_regression) * [Polynomial](https://en.wikipedia.org/wiki/Polynomial_regression) * GenericCurveFit * CosineFit * Exponential * Logistic * SystemIdentification # Versions * 0.1.4: Added the Scaler class in regressionpack.utilities, and a built-in rescale in the Linear class (also accessible from Polynomial). * 0.1.5: Patched an error in utilities.FFTGuess. The number of points fed in np.linspace was a float instead of an int. * 0.1.6: Added missing type hints in all classes and utilities. Now using nptyping for numpy.ndarray type hints. * 1.0.0: This package has been used long enough without incident and is ready to graduate to 1.0.0! * 1.0.2: Added Gaussian fit. * 1.0.3: Added an option to silence the warning in FFTGuess. Warning is always enabled by default. * 1.0.4: Fixed a glitch with FFTGuess caused by the interpolation having weird behavior if the x values are not strictly ascending. * 1.0.5: It's not regression-related, but I included a class to perform the Jonckheere-Terpstra test on groups of samples. * 1.0.6: Fixed a problem introduced in version 1.0.5: the type annotations used were not compatible with python below 3.10 and just broke the package. * 1.0.7: Remove Jonckheere-Terpstra, it will have its own package. * 1.1.0: Remove dependency to nptyping. Will use numpy 1.21.5 which supports typing instead. ## Getting Started These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system. ### Prerequisites This package was developped using: * python 3.8.3 * numpy 1.18.1 * scipy 1.4.1 * nptyping 1.3 While it may still work with older versions, I did not take the time to verify. ### Installing ``` pip install regressionpack ``` Note that this will also install numpy 1.18.1 and scipy 1.4.1 if they are not already present. Once installation is done, you may use the package by importing it this way: ```python import regressionpack ``` ## Example applications Everything below needs these imports to work: ```python from regressionpack import Linear, Polynomial, GenericCurveFit, CosineFit, Exponential import numpy as np from matplotlib import pyplot as plt ``` ### Using the classes All the classes are meant to be used the same way (with sometimes minor changes). The most differences come during instanciation, as some models require more input parameters than others. Refer to the relevant models example for this. ```python L = Linear(xdata, ydata) L.Fit() # Performs the fitting L.Eval(x) # Evaluates the model using provided x data. The shape must be identical to the xdata, except along the FitDim. ``` ### Linear regression This is the most generic multivariate linear regression. You can use it to fit any data with any base functions you choose fit. This was built from the [wikipedia](https://en.wikipedia.org/wiki/Linear_regression) page. You may notice some similarity in the nomenclature/variable names used in the source code. ```python x = np.linspace(-7,6,1000) _x1 = x.reshape((-1,1)) # A plain linear base _x2 = np.cos(_x1) # A cosine with frequency 1 base _x3 = np.exp(_x1) # An exponential base X = np.hstack((_x1, _x2, _x3)) # Put the base vectors together Y = 3*_x1 + 10*_x2 + .1*_x3 # Create the result Y += np.random.randn(*Y.shape)*2 # Add some noise L = Linear(X, Y) L.Fit() YF = L.Eval(X).flatten() dYF = L.EvalFitError(X).flatten() dYFp = L.EvalPredictionError(X).flatten() fig = plt.figure() plt.xlabel('x') plt.ylabel('y') plt.grid(True) plt.plot(x, Y, '.k', label='data') plt.plot(x, YF,'--b', label='fit') plt.fill_between(x, YF-dYF, YF+dYF,color='k', alpha=.3, label='Fit error') plt.fill_between(x, YF-dYFp, YF+dYFp,color='b', alpha=.3, label='Prediction error') plt.legend() #Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95) frac = np.mean((Y.flatten() >= YF-dYFp) & (Y.flatten() <= YF + dYFp)) print(frac) ``` #### Polynomial regression Using the previously shown Linear class, the specific case of the polynomial regression was implemented. This is done by creating a Vandermonde matrix of the input data before using the Linear class. Here we will show an example where we measure the photocurrent of a photodiode over the whole C-band (1525-1575 nm) at multiple optical powers. The goal is then to obtain the wavelength-dependant responsivity. ```python wavelengths = np.linspace(1525,1575,5001).reshape((-1,1)) power = np.linspace(1e-6,1e-3,30).reshape((1,-1)) #Generate a photocurrent curve with wavelength dependance (5 nm ripples) and nonlinear behavior, also a dark current of 2 nA photocurrents = (0.1*power - 2*power**2) * (1 + 0.1 * np.cos(2*np.pi/5 * wavelengths)) + 2e-9 #Generate the data for fitting #Add 1% of relative noise stdev #And 10 pA of stdev absolute noise Y = photocurrents*(1 + np.random.randn(*photocurrents.shape)*0.01) + np.random.randn(*photocurrents.shape) * 10e-12 # Repeat the power axis X = np.repeat(power, Y.shape[0],axis=0) X *= (1 + .01*np.random.randn(*X.shape)) # 1% relative noise stdev X += np.random.randn(*X.shape)*100e-9 # 100 nW noise #Fit using a second order polynomial P = Polynomial(X, Y, 2, 1) P.Fit() plt.figure() plt.grid(True) plt.xlabel('Wavelength [nm]') plt.ylabel('Responsivity [A/W]') plt.plot(wavelengths, P.Beta[:,1],'k') plt.fill_between(wavelengths.flatten(), P.Beta[:,1] - P.BetaFitError[:,1], P.Beta[:,1] + P.BetaFitError[:,1], color='blue', alpha=0.7) # Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95) YF = P.Eval(X) dYFp = P.EvalPredictionError(X) frac = np.mean((Y >= YF-dYFp) & (Y <= YF + dYFp)) print(frac) print(np.min(P.AdjR2)) ``` #### X-parameter rescaling In order to avoid numerical imprecision problems, $X$ should be re-scaled to $$\xi = \frac{X - \mu_X}{\sigma_{x}}$$ where $\mu_X$ and $\sigma_X$ are the mean and standard deviation of $X$. Note that when $\sigma_X = 0$, the matching value of $\xi$ should be 1 instead. This will be the case of the first column in a polynomial regression (all values are ones). Reminder: $$ Y = X\beta + \epsilon $$ $$ \hat{Y} = X\beta $$ $$ \beta' = (\xi^T \xi)^{-1} \xi^T Y $$ $$ \beta = (X^T X)^{-1} X^T Y $$ We already know that $X$ is badly scaled, while $\xi$ is well-scaled. Using this, we could attempt to find the original $\beta$ with $$ \beta = ((\xi^T \xi)^{-1} \xi^T X)^{-1} \beta' $$ But keep in mind that everything hinges on that new matrix being somewhat decently-scaled for the inversion to work properly. Also, keep in mind that even if this works, we might still get scaling problems when doing the forward computation to get back $\hat{Y}$. ##### First example Here we will use a badly scaled polynomial and try with and without scaling. We can see that it does make a huge difference, as the unscaled fit is completely lost, while the scaled fit is spot on. ```python from regressionpack.utilities import MatMul all_coefs = [1, -5989, 11956034, -7956067976] # Create some badly scaled data nb = 100 X = np.zeros((nb,4)) x = np.linspace(1990, 2000,nb) X[:,0] = 1 X[:,1] = x X[:,2] = x**2 X[:,3] = x**3 coefs = np.reshape(list(reversed(all_coefs)), (4,1)) Y = MatMul(X, coefs) y = Y.flatten() y += np.random.randn(*y.shape) * np.std(y)*0.1 L_uns = Polynomial(x,y,3) L_uns.Fit() YF_uns = L_uns.Eval(x) dYF_uns = L_uns.EvalPredictionError(x) L_sca = Polynomial(x,y,3,rescale=True) L_sca.Fit() YF_sca = L_sca.Eval(x) dYF_sca = L_sca.EvalPredictionError(x) plt.plot(x,Y, '.k', label='Original') plt.plot(x, YF_uns, '--r', label='Unscaled Fit') plt.fill_between(x, YF_uns - dYF_uns, YF_uns+dYF_uns, color='r',alpha=0.3) plt.plot(x, YF_sca, '-.g', label='Scaled Fit') plt.fill_between(x, YF_sca - dYF_sca, YF_sca+dYF_sca, color='g',alpha=0.3) L_sca.Beta/coefs.flatten() -1 ``` ##### Second example Here we will use data with an even worse scaling to see how it goes. While the scaling does improve the result, it won't do magic. ```python # Create some problematic data # the x is badly conditioned in that its spread is small compared to its mean. This may cause numerical precision problems in the Vandermonde matrix. x = np.linspace(1525, 1565, 101)*1e-9 X = np.zeros((101,5)) for i in range(5): X[:,i] = x**i coefs = np.reshape([56854106.8500000, -147284967000000., 1.43076500000000e+20, -6.17700000000000e+25, 1.00000000000000e+31],(-1,1)) y = MatMul(X,coefs) y += np.random.randn(*y.shape) * .03*np.std(y) y = y.flatten() # Perform polynomial fit P_unsc = Polynomial(x, y, 4, rescale=False) P_unsc.Fit() yf_p = P_unsc.Eval(x) dyf_p = P_unsc.EvalPredictionError(x) # Perform scaled fit P_sca = Polynomial(x,y,4, rescale=True) P_sca.Fit() yf_sp = P_sca.Eval(x) dyf_sp = P_sca.EvalPredictionError(x) # Plot results fig, axes = plt.subplots(2,1,sharex=True) [axe.grid(True) for axe in axes] axes[0].set_ylabel('Unscaled') axes[1].set_ylabel('Scaled') axes[1].set_xlabel('$x$') # Unscaled x2 = x * 1e9 axes[0].plot(x2,y,'.k') axes[0].plot(x2, yf_p, '.-r') axes[0].fill_between(x2, yf_p-dyf_p, yf_p+dyf_p, color='r', alpha = 0.3) # Scaled axes[1].plot(x2,y,'.k') axes[1].plot(x2, yf_sp, '.-r') axes[1].fill_between(x2, yf_sp-dyf_sp, yf_sp+dyf_sp, color='r', alpha = 0.3) plt.show() # Ratio to real coefficients P_sca.Beta / coefs.flatten() - 1 ``` ### GenericCurveFit A simple way of wrapping up a custom fit function and its error bars calculations, all based on scipy.curve_fit. Example using a custom class with inherirance: ```python class myCustomFit(GenericCurveFit): def FitFunc(self, x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: return a * x + np.exp(-b*x**2) + np.sin(c*x) def Jacobian(self, x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: out = np.zeros((x.shape[0],) + (3,)) out[:,0] = x out[:,1] = -x**2 * np.exp(-b*x**2) out[:,2] = x*np.cos(c*x) return out def __init__(self, x:np.ndarray, y:np.ndarray, p0:np.ndarray=None, bounds=(-np.inf, np.inf), confidenceInterval:float=0.95, simult:bool=False, **kwargs): super(myCustomFit, self).__init__(x, y, self.FitFunc, self.Jacobian, p0, bounds, confidenceInterval, simult, **kwargs ) # Example using standalone functions directly in the GenericCurveFit: def func(x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: return a * x + np.exp(-b*x**2) + np.sin(c*x) def jac( x:np.ndarray, a:float, b:float, c:float) -> np.ndarray: out = np.zeros((x.shape[0],) + (3,)) out[:,0] = x out[:,1] = -x**2 * np.exp(-b*x**2) out[:,2] = x*np.cos(c*x) return out """ Change this value to use the custom class or the standalone functions. The result should be the same. """ useClass = True # Generate some data x = np.linspace(-3,5,3000) p = (.1,.2,6.2) y = func(x, *p) y += np.random.randn(*y.shape) * 0.3 # Pick between the class and the standalone functions if useClass: GCF = myCustomFit(x, y, (1,1,7)) else: GCF = GenericCurveFit(x, y, func, jac, (1,1,7)) # Perform the fitting GCF.Fit() # Evaluate at a greater resolution xx = np.linspace(x.min(),x.max(),1000) yy = GCF.Eval(xx) dyy = GCF.EvalPredictionError(xx) # Show results and compare plt.plot(x,y,'.', label='data') plt.plot(xx,yy,'--', label='Fit') plt.fill_between(xx, yy-dyy, yy+dyy, alpha=0.3) # Check what fraction of the data is within the prediction interval yf = GCF.Eval(x) dyf = GCF.EvalPredictionError(x) print("Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95)") frac = np.mean((y <= yf + dyf) & (y >= yf - dyf)) print(frac) print("Compare parameters with results") print(p) print(GCF.Beta) print(GCF.BetaFitError) print(GCF.AdjR2) ``` #### Cosine fit This cosine fit inherits from GenericCurveFit. It also uses the Fourier transform of the data to do the initial guess of the fit parameters. The amplitude $A$ is approximated by the highest amplitude peak (that is not the DC offset) of the FFT. The frequency $\omega$ is the one associated with that highest peak. The phase shift $\phi$ is the angle of the complex FFT component and the constant is the DC offset. Using those as initial guesses, we proceed with least squares fitting on that objective function: $$\hat{y} = A \cos{\left( \omega x + \phi \right)} + b$$ To increase the speed, improve the results as well as enabling the computation of the errorbars, one must provide the Jacobian. Inheriting from the GenericCurveFit allows us to easily implement the cosine fit in a few lines of code (feel free to look in CosineFit.py). Ok, I cheated a little bit: a key component here involved the educated guess of the initial fit parameters. The function that does this (FFTGuess) actually takes more lines than this entire class. Also, since the FFT can be efficiently ran over large multidimensional datasets and provides a quite good estimate, I judged it deserved its own spot among the utilities function of this package. ```python x = np.linspace(-3,8,100) y = 3*np.cos(4*x + .66) - 2 y += np.random.randn(*y.shape)*0.3 CF = CosineFit(x, y) CF.Fit() yf = CF.Eval(x) dyf = CF.EvalPredictionError(x) print("Parameters vs fit results:") print((3, 4, .66, -2)) print(CF.Beta) print(CF.BetaFitError) plt.figure() plt.plot(x, y, '.k') plt.plot(x,yf,'--r') plt.fill_between(x, yf-dyf, yf+dyf, color='b', alpha=0.3) print("Check that the fraction of points inside the prediction interval match the confidence interval (should be around 0.95)") factor = np.mean((y <= yf + dyf) & (y >= yf - dyf)) print(factor) ``` #### Exponential ```python x = np.linspace(-3,5) y = 2 * np.exp(2.4*x) + 3 y += np.random.randn(*y.shape)*3 plt.plot(x,y,'.') E = Exponential(x, y) E.Fit() xx = np.linspace(x.min(), x.max(), 1000) yy = E.Eval(xx) dyy = E.EvalPredictionError(xx) plt.plot(xx, yy,'--') plt.fill_between(xx, yy-dyy, yy+dyy,alpha=0.3) plt.yscale('log') print(E.Beta) print(E.BetaFitError) ``` #### System Identification Let's analyze a simple spring-mass-damper system with transfer function $$ G(s) = \frac{\frac{c}{m}s + \frac{k}{m}}{s^2 + \frac{c}{m}s + \frac{k}{m}}$$ Let's say that for some reason, the values of $c$, $m$ and $k$ are unknown to us, but we can measure the input $r(t)$ and output $c(t)$. ```python from scipy.signal import TransferFunction, lsim, impulse from scipy.special import erf # Define some constants m = 500 k = 3.2e6 c = 10e3 h_dip = 12.8e-3 nb = 1000 T = 2 t = np.linspace(0,T,nb,endpoint=False) # The transfer function num = [c/m, k/m] den = [1, c/m, k/m] G = TransferFunction(num, den) # The measured input and output values (with noise) r = h_dip * (erf(30*(t-T/4))+1) _, c, _ = lsim(G, r, t) r += np.random.randn(*r.shape)*1e-4 c += np.random.randn(*c.shape)*1e-4 # Extracting the transfer function SI = SystemIdentification(t, r, c, 2, 3) SI.Fit() r_imp = np.zeros_like(r) r_imp[0] = 2*nb/T _, c_imp_theo = impulse(G, T=t) c_imp_esti = SI.ForcedResponse(t, r_imp) c_imp_err = SI.ForcedResponsePredictionError(t, r_imp) # Show results fig, axes = plt.subplots(1,2, sharex=True) #[ax.grid(True) for ax in axes] axes[0].plot(t,r, label='$r(t)$') axes[0].plot(t,c, label='$c(t)$') axes[0].legend() axes[1].plot(t, c_imp_theo, '-k', label='$g(t)$') axes[1].plot(t, c_imp_esti, '--r', label='$\hat{g}(t)$') axes[1].fill_between(t, c_imp_esti - c_imp_err, c_imp_esti + c_imp_err, color='r',alpha=0.3) axes[1].legend() print(G, SI.TransferFunction) ``` #### Gaussian ```python from regressionpack import Gaussian a, b, c, d = 4, 3, 2, 1 x = np.linspace(-3,3,100) y = a * np.exp(-b * (x-c)**2) + d + np.random.randn(x.size)*0.1 plt.plot(x,y, '.k') g = Gaussian(x, y) g.Fit() yf = g.Eval(x) dyf = g.EvalPredictionError(x) plt.plot(x, yf, '-b') plt.fill_between(x, yf-dyf, yf+dyf, color='b', alpha=0.3) ``` #### Ellipse Not finished yet :( ## Built With * [Python](https://www.python.org/) - The language * [numpy](https://numpy.org/) - the numeric library * [scipy](https://docs.scipy.org/) - the scientific library ## Contributing Contact me and discuss your ideas and ambitions. ## Authors * **FusedSilica** - *Initial work* ## License This project is licensed under the GNU LGPLv3 License - see the [LICENSE.md](LICENSE.md) file for details %prep %autosetup -n regressionpack-1.1.1 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-regressionpack -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Wed May 10 2023 Python_Bot - 1.1.1-1 - Package Spec generated