summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-05 12:49:26 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-05 12:49:26 +0000
commit9e24ec97814a2798b7141d7b64eff16d89d81727 (patch)
tree4e50bf9b8a7e6d1aa8807805710b8c7d421e2b96
parent1891e86409a1c6e904bd6b56861e7d6bd77e79c9 (diff)
automatic import of python-c-lassoopeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-c-lasso.spec1625
-rw-r--r--sources1
3 files changed, 1627 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..9299d85 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/c-lasso-1.0.11.tar.gz
diff --git a/python-c-lasso.spec b/python-c-lasso.spec
new file mode 100644
index 0000000..76fc7ad
--- /dev/null
+++ b/python-c-lasso.spec
@@ -0,0 +1,1625 @@
+%global _empty_manifest_terminate_build 0
+Name: python-c-lasso
+Version: 1.0.11
+Release: 1
+Summary: Algorithms for constrained Lasso problems
+License: MIT
+URL: https://github.com/Leo-Simpson/CLasso
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/2f/2b/e668407260df3d2779b12d77eb85aa065cec19dad57e368a519d949c293f/c-lasso-1.0.11.tar.gz
+BuildArch: noarch
+
+Requires: python3-numpy
+Requires: python3-h5py
+Requires: python3-scipy
+Requires: python3-sphinx
+Requires: python3-sphinx-gallery
+Requires: python3-sphinx-rtd-theme
+Requires: python3-numpydoc
+Requires: python3-matplotlib
+Requires: python3-pandas
+Requires: python3-pytest
+Requires: python3-pytest-cov
+
+%description
+[![arXiv](https://img.shields.io/badge/arXiv-2011.00898-b31b1b.svg)](https://arxiv.org/abs/2011.00898)
+[![DOI](https://joss.theoj.org/papers/10.21105/joss.02844/status.svg)](https://doi.org/10.21105/joss.02844)
+
+<img src="https://i.imgur.com/2nGwlux.png" alt="c-lasso" height="145" align="right"/>
+
+# c-lasso: a Python package for constrained sparse regression and classification
+
+
+c-lasso is a Python package that enables sparse and robust linear regression and classification with linear equality
+constraints on the model parameters. For detailed info, one can check the [documentation](https://c-lasso.readthedocs.io/en/latest/).
+
+The forward model is assumed to be:
+
+<img src="https://latex.codecogs.com/gif.latex?y=X\beta&plus;\sigma\epsilon\qquad\text{s.t.}\qquad&space;C\beta=0" title="y=X\beta+\sigma\epsilon\qquad\text{s.t.}\qquad C\beta=0" />
+
+Here, y and X are given outcome and predictor data. The vector y can be continuous (for regression) or binary (for classification). C is a general constraint matrix. The vector &beta; comprises the unknown coefficients and &sigma; an
+unknown scale.
+
+The package handles several different estimators for inferring &beta; (and &sigma;), including
+the constrained Lasso, the constrained scaled Lasso, sparse Huber M-estimation with linear equality constraints, and regularized Support Vector Machines.
+Several different algorithmic strategies, including path and proximal splitting algorithms, are implemented to solve
+the underlying convex optimization problems.
+
+We also include two model selection strategies for determining the sparsity of the model parameters: k-fold cross-validation and stability selection.
+
+This package is intended to fill the gap between popular python tools such as [scikit-learn](https://scikit-learn.org/stable/) which CANNOT solve sparse constrained problems and general-purpose optimization solvers that do not scale well or are inaccurate (see [benchmarks](./benchmark/README.md)) for the considered problems. In its current stage, however, c-lasso is not yet compatible with the scikit-learn API but rather a stand-alone tool.
+
+Below we show several use cases of the package, including an application of sparse *log-contrast*
+regression tasks for *compositional* microbiome data.
+
+The code builds on results from several papers which can be found in the [References](#references). We also refer to the accompanying [JOSS paper submission](https://github.com/Leo-Simpson/c-lasso/blob/master/paper/paper.md), also available on [arXiv](https://arxiv.org/pdf/2011.00898.pdf).
+
+## Table of Contents
+
+* [Installation](#installation)
+* [Regression and classification problems](#regression-and-classification-problems)
+* [Getting started](#getting-started)
+* [Log-contrast regression for microbiome data](#log-contrast-regression-for-microbiome-data)
+* [Optimization schemes](#optimization-schemes)
+
+
+* [References](#references)
+
+
+## Installation
+
+c-lasso is available on pip. You can install the package
+in the shell using
+
+```shell
+pip install c-lasso
+```
+To use the c-lasso package in Python, type
+
+```python
+
+from classo import classo_problem
+# one can add auxiliary functions as well such as random_data or csv_to_np
+```
+
+The `c-lasso` package depends on the following Python packages:
+
+- `numpy`;
+- `matplotlib`;
+- `scipy`;
+- `pandas`;
+- `pytest` (for tests)
+
+## Regression and classification problems
+
+The c-lasso package can solve six different types of estimation problems:
+four regression-type and two classification-type formulations.
+
+#### [R1] Standard constrained Lasso regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;||&space;X\beta-y&space;||^2&space;&plus;&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
+
+This is the standard Lasso problem with linear equality constraints on the &beta; vector.
+The objective function combines Least-Squares for model fitting with l1 penalty for sparsity.
+
+#### [R2] Constrained sparse Huber regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;h_{\rho}(X\beta-y&space;)&space;&plus;&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
+
+This regression problem uses the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) as objective function
+for robust model fitting with l1 and linear equality constraints on the &beta; vector. The parameter &rho;=1.345.
+
+#### [R3] Constrained scaled Lasso regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\frac{||&space;X\beta&space;-&space;y||^2}{\sigma}&space;&plus;&space;\frac{n}{2}&space;\sigma&plus;&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \frac{|| X\beta - y||^2}{\sigma} + \frac{n}{2} \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
+
+This formulation is similar to [R1] but allows for joint estimation of the (constrained) &beta; vector and
+the standard deviation &sigma; in a concomitant fashion (see [References](#references) [4,5] for further info).
+This is the default problem formulation in c-lasso.
+
+#### [R4] Constrained sparse Huber regression with concomitant scale estimation:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\left(&space;h_{\rho}&space;\left(&space;\frac{&space;X\beta&space;-&space;y}{\sigma}&space;\right)&plus;&space;n&space;\right)&space;\sigma&plus;&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \left( h_{\rho} \left( \frac{ X\beta - y}{\sigma} \right)+ n \right) \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
+
+This formulation combines [R2] and [R3] to allow robust joint estimation of the (constrained) &beta; vector and
+the scale &sigma; in a concomitant fashion (see [References](#references) [4,5] for further info).
+
+#### [C1] Constrained sparse classification with Square Hinge loss:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d}&space;\sum_{i=1}^n&space;l(y_i&space;x_i^\top&space;\beta)&space;&plus;&space;\lambda&space;\left\lVert&space;\beta\right\rVert_1&space;\qquad&space;s.t.&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d} \sum_{i=1}^n l(y_i x_i \beta) + \lambda \left\lVert \beta\right\rVert_1 \qquad s.t. \qquad C\beta = 0" />
+
+where the x<sub>i</sub> are the rows of X and l is defined as:
+
+<img src="https://latex.codecogs.com/gif.latex?l(r)&space;=&space;\begin{cases}&space;(1-r)^2&space;&&space;if&space;\quad&space;r&space;\leq&space;1&space;\\&space;0&space;&if&space;\quad&space;r&space;\geq&space;1&space;\end{cases}" title="l(r) = \begin{cases} (1-r)^2 & if \quad r \leq 1 \\ 0 &if \quad r \geq 1 \end{cases}" />
+
+This formulation is similar to [R1] but adapted for classification tasks using the Square Hinge loss
+with (constrained) sparse &beta; vector estimation.
+
+#### [C2] Constrained sparse classification with Huberized Square Hinge loss:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d}&space;\sum_{i=1}^n&space;l_{\rho}(y_i&space;x_i^\top\beta)&space;&plus;&space;\lambda&space;\left\lVert&space;\beta\right\rVert_1&space;\qquad&space;s.t.&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d} \sum_{i=1}^n l_{\rho}(y_i x_i\beta) + \lambda \left\lVert \beta\right\rVert_1 \qquad s.t. \qquad C\beta = 0" />
+
+where the x<sub>i</sub> are the rows of X and l<sub>ρ</sub> is defined as:
+
+<img src="https://latex.codecogs.com/gif.latex?l_{\rho}(r)&space;=&space;\begin{cases}&space;(1-r)^2&space;&if&space;\quad&space;\rho&space;\leq&space;r&space;\leq&space;1&space;\\&space;(1-\rho)(1&plus;\rho-2r)&space;&&space;if&space;\quad&space;r&space;\leq&space;\rho&space;\\&space;0&space;&if&space;\quad&space;r&space;\geq&space;1&space;\end{cases}" title="l_{\rho}(r) = \begin{cases} (1-r)^2 &if \quad \rho \leq r \leq 1 \\ (1-\rho)(1+\rho-2r) & if \quad r \leq \rho \\ 0 &if \quad r \geq 1 \end{cases}" />
+
+
+This formulation is similar to [C1] but uses the Huberized Square Hinge loss for robust classification
+with (constrained) sparse &beta; vector estimation.
+
+
+## Getting started
+
+#### Basic example
+
+We begin with a basic example that shows how to run c-lasso on synthetic data. This example and the next one can be found on the notebook 'Synthetic data Notebook.ipynb'
+
+The c-lasso package includes
+the routine ```random_data``` that allows you to generate problem instances using normally distributed data.
+
+```python
+m, d, d_nonzero, k, sigma = 100, 200, 5, 1, 0.5
+(X, C, y), sol = random_data(m, d, d_nonzero, k, sigma, zerosum=True, seed=1)
+```
+This code snippet generates a problem instance with sparse &beta; in dimension
+d=100 (sparsity d_nonzero=5). The design matrix X comprises n=100 samples generated from an i.i.d standard normal
+distribution. The dimension of the constraint matrix C is d x k matrix. The noise level is &sigma;=0.5.
+The input ```zerosum=True``` implies that C is the all-ones vector and C&beta;=0. The n-dimensional outcome vector y
+and the regression vector &beta; is then generated to satisfy the given constraints.
+
+Next we can define a default c-lasso problem instance with the generated data:
+```python
+problem = classo_problem(X, y, C)
+```
+You can look at the generated problem instance by typing:
+
+```python
+print(problem)
+```
+
+This gives you a summary of the form:
+
+```
+FORMULATION: R3
+
+MODEL SELECTION COMPUTED:
+ Stability selection
+
+STABILITY SELECTION PARAMETERS:
+ numerical_method : not specified
+ method : first
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lamin = 0.01
+ Nlam = 50
+```
+As we have not specified any problem, algorithm, or model selection settings, this problem instance
+represents the *default* settings for a c-lasso instance:
+- The problem is of regression type and uses formulation [R3], i.e. with concomitant scale estimation.
+- The *default* optimization scheme is the path algorithm (see [Optimization schemes](#optimization-schemes) for further info).
+- For model selection, stability selection at a theoretically derived &lambda; value is used (see [Reference](#references) [4] for details). Stability selection comprises a relatively large number of parameters. For a description of the settings, we refer to the more advanced examples below and the API.
+
+You can solve the corresponding c-lasso problem instance using
+
+```python
+problem.solve()
+```
+
+After completion, the results of the optimization and model selection routines
+can be visualized using
+
+```python
+print(problem.solution)
+```
+
+The command shows the running time(s) for the c-lasso problem instance, and the selected variables for sability selection
+
+```
+STABILITY SELECTION :
+ Selected variables : 7 63 148 164 168
+ Running time : 1.546s
+
+```
+
+Here, we only used stability selection as *default* model selection strategy.
+The command also allows you to inspect the computed stability profile for all variables
+at the theoretical &lambda;
+
+![1.StabSel](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/basic/StabSel.png)
+
+
+The refitted &beta; values on the selected support are also displayed in the next plot
+
+![beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/basic/beta.png)
+
+
+#### Advanced example
+
+In the next example, we show how one can specify different aspects of the problem
+formulation and model selection strategy.
+
+```python
+m, d, d_nonzero, k, sigma = 100, 200, 5, 0, 0.5
+(X, C, y), sol = random_data(m, d, d_nonzero, k, sigma, zerosum = True, seed = 4)
+problem = classo_problem(X, y, C)
+problem.formulation.huber = True
+problem.formulation.concomitant = False
+problem.model_selection.CV = True
+problem.model_selection.LAMfixed = True
+problem.model_selection.PATH = True
+problem.model_selection.StabSelparameters.method = 'max'
+problem.model_selection.CVparameters.seed = 1
+problem.model_selection.LAMfixedparameters.rescaled_lam = True
+problem.model_selection.LAMfixedparameters.lam = .1
+
+problem.solve()
+print(problem)
+
+print(problem.solution)
+
+```
+
+Results :
+```
+ FORMULATION: R2
+
+ MODEL SELECTION COMPUTED:
+ Lambda fixed
+ Path
+ Cross Validation
+ Stability selection
+
+ LAMBDA FIXED PARAMETERS:
+ numerical_method = Path-Alg
+ rescaled lam : True
+ threshold = 0.09
+ lam = 0.1
+ theoretical_lam = 0.224
+
+ PATH PARAMETERS:
+ numerical_method : Path-Alg
+ lamin = 0.001
+ Nlam = 80
+
+
+ CROSS VALIDATION PARAMETERS:
+ numerical_method : Path-Alg
+ one-SE method : True
+ Nsubset = 5
+ lamin = 0.001
+ Nlam = 80
+
+
+ STABILITY SELECTION PARAMETERS:
+ numerical_method : Path-Alg
+ method : max
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lamin = 0.01
+ Nlam = 50
+
+ LAMBDA FIXED :
+ Selected variables : 17 59 123
+ Running time : 0.104s
+
+ PATH COMPUTATION :
+ Running time : 0.638s
+
+ CROSS VALIDATION :
+ Selected variables : 16 17 57 59 64 73 74 76 93 115 123 134 137 181
+ Running time : 2.1s
+
+ STABILITY SELECTION :
+ Selected variables : 17 59 76 123 137
+ Running time : 6.062s
+
+```
+
+
+![2.StabSel](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/StabSel.png)
+
+![2.StabSel-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/StabSel-beta.png)
+
+![2.CV-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/CVbeta.png)
+
+![2.CV-graph](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/CV.png)
+
+![2.LAM-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/beta.png)
+
+![2.Path](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/Beta-path.png)
+
+
+## Log-contrast regression for microbiome data
+
+In the [the accompanying notebook](./examples/example-notebook.ipynb) we study several microbiome data sets. We showcase two examples below.
+
+#### BMI prediction using the COMBO dataset
+
+We first consider the [COMBO data set](./examples/COMBO_data) and show how to predict Body Mass Index (BMI) from microbial genus abundances and two non-compositional covariates using "filtered_data".
+
+```python
+from classo import csv_to_np, classo_problem, clr
+
+# Load microbiome and covariate data X
+X0 = csv_to_np('COMBO_data/complete_data/GeneraCounts.csv', begin = 0).astype(float)
+X_C = csv_to_np('COMBO_data/CaloriData.csv', begin = 0).astype(float)
+X_F = csv_to_np('COMBO_data/FatData.csv', begin = 0).astype(float)
+
+# Load BMI measurements y
+y = csv_to_np('COMBO_data/BMI.csv', begin = 0).astype(float)[:, 0]
+labels = csv_to_np('COMBO_data/complete_data/GeneraPhylo.csv').astype(str)[:, -1]
+
+
+# Normalize/transform data
+y = y - np.mean(y) #BMI data (n = 96)
+X_C = X_C - np.mean(X_C, axis = 0) #Covariate data (Calorie)
+X_F = X_F - np.mean(X_F, axis = 0) #Covariate data (Fat)
+X0 = clr(X0, 1 / 2).T
+
+# Set up design matrix and zero-sum constraints for 45 genera
+X = np.concatenate((X0, X_C, X_F, np.ones((len(X0), 1))), axis = 1) # Joint microbiome and covariate data and offset
+label = np.concatenate([labels, np.array(['Calorie', 'Fat', 'Bias'])])
+C = np.ones((1, len(X[0])))
+C[0, -1], C[0, -2], C[0, -3] = 0., 0., 0.
+
+
+# Set up c-lassso problem
+problem = classo_problem(X, y, C, label = label)
+
+
+# Use stability selection with theoretical lambda [Combettes & Müller, 2020b]
+problem.model_selection.StabSelparameters.method = 'lam'
+problem.model_selection.StabSelparameters.threshold_label = 0.5
+
+# Use formulation R3
+problem.formulation.concomitant = True
+
+problem.solve()
+print(problem)
+print(problem.solution)
+
+# Use formulation R4
+problem.formulation.huber = True
+problem.formulation.concomitant = True
+
+problem.solve()
+print(problem)
+print(problem.solution)
+
+```
+
+![3.Stability profile R3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R3-StabSel.png)
+
+![3.Beta solution R3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R3-StabSel-beta.png)
+
+![3.Stability profile R4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R4-StabSel.png)
+
+![3.Beta solution R4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R4-StabSel-beta.png)
+
+
+<!---
+<img src="https://i.imgur.com/8tFmM8T.png" alt="Central Park Soil Microbiome" height="250" align="right"/>
+#### pH prediction using the Central Park soil dataset
+The next microbiome example considers the [Central Park Soil dataset](./examples/pH_data) from [Ramirez et al.](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2014.1988). The sample locations are shown in the Figure on the right.)
+-->
+
+#### pH prediction using the 88 soils dataset
+
+The next microbiome example considers the [88 soils dataset](./examples/pH_data) from [Lauber et al., 2009](https://pubmed.ncbi.nlm.nih.gov/19502440/).
+
+The task is to predict pH concentration in the soil from microbial abundance data. A similar analysis is available
+in [Tree-Aggregated Predictive Modeling of Microbiome Data](https://www.biorxiv.org/content/10.1101/2020.09.01.277632v1)
+with Central Park soil data from [Ramirez et al.](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2014.1988).
+
+Code to run this application is available in [the accompanying notebook](./examples/example-notebook.ipynb) under `pH data`. Below is a summary of a c-lasso problem instance (using the R3 formulation).
+
+```
+FORMULATION: R3
+
+MODEL SELECTION COMPUTED:
+ Lambda fixed
+ Path
+ Stability selection
+
+LAMBDA FIXED PARAMETERS:
+ numerical_method = Path-Alg
+ rescaled lam : True
+ threshold = 0.004
+ lam : theoretical
+ theoretical_lam = 0.2182
+
+PATH PARAMETERS:
+ numerical_method : Path-Alg
+ lamin = 0.001
+ Nlam = 80
+
+
+STABILITY SELECTION PARAMETERS:
+ numerical_method : Path-Alg
+ method : lam
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lam = theoretical
+ theoretical_lam = 0.3085
+```
+
+The c-lasso estimation results are summarized below:
+
+```
+LAMBDA FIXED :
+ Sigma = 0.198
+ Selected variables : 14 18 19 39 43 57 62 85 93 94 104 107
+ Running time : 0.008s
+
+ PATH COMPUTATION :
+ Running time : 0.12s
+
+ STABILITY SELECTION :
+ Selected variables : 2 12 15
+ Running time : 0.287s
+```
+
+![Ex4.1](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-Beta-path.png)
+
+![Ex4.2](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-Sigma-path.png)
+
+![Ex4.3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-StabSel.png)
+
+![Ex4.4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-StabSel-beta.png)
+
+![Ex4.5](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-beta.png)
+
+
+## Optimization schemes
+
+The available problem formulations [R1-C2] require different algorithmic strategies for
+efficiently solving the underlying optimization problem. We have implemented four
+algorithms (with provable convergence guarantees) that vary in generality and are not
+necessarily applicable to all problems. For each problem type, c-lasso has a default algorithm
+setting that proved to be the fastest in our numerical experiments.
+
+### Path algorithms (Path-Alg)
+This is the default algorithm for non-concomitant problems [R1,R3,C1,C2].
+The algorithm uses the fact that the solution path along &lambda; is piecewise-
+affine (as shown, e.g., in [1]). When Least-Squares is used as objective function,
+we derive a novel efficient procedure that allows us to also derive the
+solution for the concomitant problem [R2] along the path with little extra computational overhead.
+
+### Projected primal-dual splitting method (P-PDS):
+This algorithm is derived from [2] and belongs to the class of
+proximal splitting algorithms. It extends the classical Forward-Backward (FB)
+(aka proximal gradient descent) algorithm to handle an additional linear equality constraint
+via projection. In the absence of a linear constraint, the method reduces to FB.
+This method can solve problem [R1]. For the Huber problem [R3],
+P-PDS can solve the mean-shift formulation of the problem (see [6]).
+
+### Projection-free primal-dual splitting method (PF-PDS):
+This algorithm is a special case of an algorithm proposed in [3] (Eq.4.5) and also belongs to the class of
+proximal splitting algorithms. The algorithm does not require projection operators
+which may be beneficial when C has a more complex structure. In the absence of a linear constraint,
+the method reduces to the Forward-Backward-Forward scheme. This method can solve problem [R1].
+For the Huber problem [R3], PF-PDS can solve the mean-shift formulation of the problem (see [6]).
+
+### Douglas-Rachford-type splitting method (DR)
+This algorithm is the most general algorithm and can solve all regression problems
+[R1-R4]. It is based on Doulgas Rachford splitting in a higher-dimensional product space.
+It makes use of the proximity operators of the perspective of the LS objective (see [4,5])
+The Huber problem with concomitant scale [R4] is reformulated as scaled Lasso problem
+with the mean shift (see [6]) and thus solved in (n + d) dimensions.
+
+
+
+## References
+
+* [1] B. R. Gaines, J. Kim, and H. Zhou, [Algorithms for Fitting the Constrained Lasso](https://www.tandfonline.com/doi/abs/10.1080/10618600.2018.1473777?journalCode=ucgs20), J. Comput. Graph. Stat., vol. 27, no. 4, pp. 861–871, 2018.
+
+* [2] L. Briceno-Arias and S.L. Rivera, [A Projected Primal–Dual Method for Solving Constrained Monotone Inclusions](https://link.springer.com/article/10.1007/s10957-018-1430-2?shared-article-renderer), J. Optim. Theory Appl., vol. 180, Issue 3, March 2019.
+
+* [3] P. L. Combettes and J.C. Pesquet, [Primal-Dual Splitting Algorithm for Solving Inclusions with Mixtures of Composite, Lipschitzian, and Parallel-Sum Type Monotone Operators](https://arxiv.org/pdf/1107.0081.pdf), Set-Valued and Variational Analysis, vol. 20, pp. 307-330, 2012.
+
+* [4] P. L. Combettes and C. L. Müller, [Perspective M-estimation via proximal decomposition](https://arxiv.org/abs/1805.06098), Electronic Journal of Statistics, 2020, [Journal version](https://projecteuclid.org/euclid.ejs/1578452535)
+
+* [5] P. L. Combettes and C. L. Müller, [Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications](https://arxiv.org/abs/1903.01050), Statistics in Bioscience, 2020.
+
+* [6] A. Mishra and C. L. Müller, [Robust regression with compositional covariates](https://arxiv.org/abs/1909.04990), arXiv, 2019.
+
+* [7] S. Rosset and J. Zhu, [Piecewise linear regularized solution paths](https://projecteuclid.org/euclid.aos/1185303996), Ann. Stat., vol. 35, no. 3, pp. 1012–1030, 2007.
+
+* [8] J. Bien, X. Yan, L. Simpson, and C. L. Müller, [Tree-Aggregated Predictive Modeling of Microbiome Data](https://www.biorxiv.org/content/10.1101/2020.09.01.277632v1), biorxiv, 2020.
+
+
+
+
+
+
+%package -n python3-c-lasso
+Summary: Algorithms for constrained Lasso problems
+Provides: python-c-lasso
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-c-lasso
+[![arXiv](https://img.shields.io/badge/arXiv-2011.00898-b31b1b.svg)](https://arxiv.org/abs/2011.00898)
+[![DOI](https://joss.theoj.org/papers/10.21105/joss.02844/status.svg)](https://doi.org/10.21105/joss.02844)
+
+<img src="https://i.imgur.com/2nGwlux.png" alt="c-lasso" height="145" align="right"/>
+
+# c-lasso: a Python package for constrained sparse regression and classification
+
+
+c-lasso is a Python package that enables sparse and robust linear regression and classification with linear equality
+constraints on the model parameters. For detailed info, one can check the [documentation](https://c-lasso.readthedocs.io/en/latest/).
+
+The forward model is assumed to be:
+
+<img src="https://latex.codecogs.com/gif.latex?y=X\beta&plus;\sigma\epsilon\qquad\text{s.t.}\qquad&space;C\beta=0" title="y=X\beta+\sigma\epsilon\qquad\text{s.t.}\qquad C\beta=0" />
+
+Here, y and X are given outcome and predictor data. The vector y can be continuous (for regression) or binary (for classification). C is a general constraint matrix. The vector &beta; comprises the unknown coefficients and &sigma; an
+unknown scale.
+
+The package handles several different estimators for inferring &beta; (and &sigma;), including
+the constrained Lasso, the constrained scaled Lasso, sparse Huber M-estimation with linear equality constraints, and regularized Support Vector Machines.
+Several different algorithmic strategies, including path and proximal splitting algorithms, are implemented to solve
+the underlying convex optimization problems.
+
+We also include two model selection strategies for determining the sparsity of the model parameters: k-fold cross-validation and stability selection.
+
+This package is intended to fill the gap between popular python tools such as [scikit-learn](https://scikit-learn.org/stable/) which CANNOT solve sparse constrained problems and general-purpose optimization solvers that do not scale well or are inaccurate (see [benchmarks](./benchmark/README.md)) for the considered problems. In its current stage, however, c-lasso is not yet compatible with the scikit-learn API but rather a stand-alone tool.
+
+Below we show several use cases of the package, including an application of sparse *log-contrast*
+regression tasks for *compositional* microbiome data.
+
+The code builds on results from several papers which can be found in the [References](#references). We also refer to the accompanying [JOSS paper submission](https://github.com/Leo-Simpson/c-lasso/blob/master/paper/paper.md), also available on [arXiv](https://arxiv.org/pdf/2011.00898.pdf).
+
+## Table of Contents
+
+* [Installation](#installation)
+* [Regression and classification problems](#regression-and-classification-problems)
+* [Getting started](#getting-started)
+* [Log-contrast regression for microbiome data](#log-contrast-regression-for-microbiome-data)
+* [Optimization schemes](#optimization-schemes)
+
+
+* [References](#references)
+
+
+## Installation
+
+c-lasso is available on pip. You can install the package
+in the shell using
+
+```shell
+pip install c-lasso
+```
+To use the c-lasso package in Python, type
+
+```python
+
+from classo import classo_problem
+# one can add auxiliary functions as well such as random_data or csv_to_np
+```
+
+The `c-lasso` package depends on the following Python packages:
+
+- `numpy`;
+- `matplotlib`;
+- `scipy`;
+- `pandas`;
+- `pytest` (for tests)
+
+## Regression and classification problems
+
+The c-lasso package can solve six different types of estimation problems:
+four regression-type and two classification-type formulations.
+
+#### [R1] Standard constrained Lasso regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;||&space;X\beta-y&space;||^2&space;&plus;&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
+
+This is the standard Lasso problem with linear equality constraints on the &beta; vector.
+The objective function combines Least-Squares for model fitting with l1 penalty for sparsity.
+
+#### [R2] Constrained sparse Huber regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;h_{\rho}(X\beta-y&space;)&space;&plus;&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
+
+This regression problem uses the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) as objective function
+for robust model fitting with l1 and linear equality constraints on the &beta; vector. The parameter &rho;=1.345.
+
+#### [R3] Constrained scaled Lasso regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\frac{||&space;X\beta&space;-&space;y||^2}{\sigma}&space;&plus;&space;\frac{n}{2}&space;\sigma&plus;&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \frac{|| X\beta - y||^2}{\sigma} + \frac{n}{2} \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
+
+This formulation is similar to [R1] but allows for joint estimation of the (constrained) &beta; vector and
+the standard deviation &sigma; in a concomitant fashion (see [References](#references) [4,5] for further info).
+This is the default problem formulation in c-lasso.
+
+#### [R4] Constrained sparse Huber regression with concomitant scale estimation:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\left(&space;h_{\rho}&space;\left(&space;\frac{&space;X\beta&space;-&space;y}{\sigma}&space;\right)&plus;&space;n&space;\right)&space;\sigma&plus;&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \left( h_{\rho} \left( \frac{ X\beta - y}{\sigma} \right)+ n \right) \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
+
+This formulation combines [R2] and [R3] to allow robust joint estimation of the (constrained) &beta; vector and
+the scale &sigma; in a concomitant fashion (see [References](#references) [4,5] for further info).
+
+#### [C1] Constrained sparse classification with Square Hinge loss:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d}&space;\sum_{i=1}^n&space;l(y_i&space;x_i^\top&space;\beta)&space;&plus;&space;\lambda&space;\left\lVert&space;\beta\right\rVert_1&space;\qquad&space;s.t.&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d} \sum_{i=1}^n l(y_i x_i \beta) + \lambda \left\lVert \beta\right\rVert_1 \qquad s.t. \qquad C\beta = 0" />
+
+where the x<sub>i</sub> are the rows of X and l is defined as:
+
+<img src="https://latex.codecogs.com/gif.latex?l(r)&space;=&space;\begin{cases}&space;(1-r)^2&space;&&space;if&space;\quad&space;r&space;\leq&space;1&space;\\&space;0&space;&if&space;\quad&space;r&space;\geq&space;1&space;\end{cases}" title="l(r) = \begin{cases} (1-r)^2 & if \quad r \leq 1 \\ 0 &if \quad r \geq 1 \end{cases}" />
+
+This formulation is similar to [R1] but adapted for classification tasks using the Square Hinge loss
+with (constrained) sparse &beta; vector estimation.
+
+#### [C2] Constrained sparse classification with Huberized Square Hinge loss:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d}&space;\sum_{i=1}^n&space;l_{\rho}(y_i&space;x_i^\top\beta)&space;&plus;&space;\lambda&space;\left\lVert&space;\beta\right\rVert_1&space;\qquad&space;s.t.&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d} \sum_{i=1}^n l_{\rho}(y_i x_i\beta) + \lambda \left\lVert \beta\right\rVert_1 \qquad s.t. \qquad C\beta = 0" />
+
+where the x<sub>i</sub> are the rows of X and l<sub>ρ</sub> is defined as:
+
+<img src="https://latex.codecogs.com/gif.latex?l_{\rho}(r)&space;=&space;\begin{cases}&space;(1-r)^2&space;&if&space;\quad&space;\rho&space;\leq&space;r&space;\leq&space;1&space;\\&space;(1-\rho)(1&plus;\rho-2r)&space;&&space;if&space;\quad&space;r&space;\leq&space;\rho&space;\\&space;0&space;&if&space;\quad&space;r&space;\geq&space;1&space;\end{cases}" title="l_{\rho}(r) = \begin{cases} (1-r)^2 &if \quad \rho \leq r \leq 1 \\ (1-\rho)(1+\rho-2r) & if \quad r \leq \rho \\ 0 &if \quad r \geq 1 \end{cases}" />
+
+
+This formulation is similar to [C1] but uses the Huberized Square Hinge loss for robust classification
+with (constrained) sparse &beta; vector estimation.
+
+
+## Getting started
+
+#### Basic example
+
+We begin with a basic example that shows how to run c-lasso on synthetic data. This example and the next one can be found on the notebook 'Synthetic data Notebook.ipynb'
+
+The c-lasso package includes
+the routine ```random_data``` that allows you to generate problem instances using normally distributed data.
+
+```python
+m, d, d_nonzero, k, sigma = 100, 200, 5, 1, 0.5
+(X, C, y), sol = random_data(m, d, d_nonzero, k, sigma, zerosum=True, seed=1)
+```
+This code snippet generates a problem instance with sparse &beta; in dimension
+d=100 (sparsity d_nonzero=5). The design matrix X comprises n=100 samples generated from an i.i.d standard normal
+distribution. The dimension of the constraint matrix C is d x k matrix. The noise level is &sigma;=0.5.
+The input ```zerosum=True``` implies that C is the all-ones vector and C&beta;=0. The n-dimensional outcome vector y
+and the regression vector &beta; is then generated to satisfy the given constraints.
+
+Next we can define a default c-lasso problem instance with the generated data:
+```python
+problem = classo_problem(X, y, C)
+```
+You can look at the generated problem instance by typing:
+
+```python
+print(problem)
+```
+
+This gives you a summary of the form:
+
+```
+FORMULATION: R3
+
+MODEL SELECTION COMPUTED:
+ Stability selection
+
+STABILITY SELECTION PARAMETERS:
+ numerical_method : not specified
+ method : first
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lamin = 0.01
+ Nlam = 50
+```
+As we have not specified any problem, algorithm, or model selection settings, this problem instance
+represents the *default* settings for a c-lasso instance:
+- The problem is of regression type and uses formulation [R3], i.e. with concomitant scale estimation.
+- The *default* optimization scheme is the path algorithm (see [Optimization schemes](#optimization-schemes) for further info).
+- For model selection, stability selection at a theoretically derived &lambda; value is used (see [Reference](#references) [4] for details). Stability selection comprises a relatively large number of parameters. For a description of the settings, we refer to the more advanced examples below and the API.
+
+You can solve the corresponding c-lasso problem instance using
+
+```python
+problem.solve()
+```
+
+After completion, the results of the optimization and model selection routines
+can be visualized using
+
+```python
+print(problem.solution)
+```
+
+The command shows the running time(s) for the c-lasso problem instance, and the selected variables for sability selection
+
+```
+STABILITY SELECTION :
+ Selected variables : 7 63 148 164 168
+ Running time : 1.546s
+
+```
+
+Here, we only used stability selection as *default* model selection strategy.
+The command also allows you to inspect the computed stability profile for all variables
+at the theoretical &lambda;
+
+![1.StabSel](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/basic/StabSel.png)
+
+
+The refitted &beta; values on the selected support are also displayed in the next plot
+
+![beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/basic/beta.png)
+
+
+#### Advanced example
+
+In the next example, we show how one can specify different aspects of the problem
+formulation and model selection strategy.
+
+```python
+m, d, d_nonzero, k, sigma = 100, 200, 5, 0, 0.5
+(X, C, y), sol = random_data(m, d, d_nonzero, k, sigma, zerosum = True, seed = 4)
+problem = classo_problem(X, y, C)
+problem.formulation.huber = True
+problem.formulation.concomitant = False
+problem.model_selection.CV = True
+problem.model_selection.LAMfixed = True
+problem.model_selection.PATH = True
+problem.model_selection.StabSelparameters.method = 'max'
+problem.model_selection.CVparameters.seed = 1
+problem.model_selection.LAMfixedparameters.rescaled_lam = True
+problem.model_selection.LAMfixedparameters.lam = .1
+
+problem.solve()
+print(problem)
+
+print(problem.solution)
+
+```
+
+Results :
+```
+ FORMULATION: R2
+
+ MODEL SELECTION COMPUTED:
+ Lambda fixed
+ Path
+ Cross Validation
+ Stability selection
+
+ LAMBDA FIXED PARAMETERS:
+ numerical_method = Path-Alg
+ rescaled lam : True
+ threshold = 0.09
+ lam = 0.1
+ theoretical_lam = 0.224
+
+ PATH PARAMETERS:
+ numerical_method : Path-Alg
+ lamin = 0.001
+ Nlam = 80
+
+
+ CROSS VALIDATION PARAMETERS:
+ numerical_method : Path-Alg
+ one-SE method : True
+ Nsubset = 5
+ lamin = 0.001
+ Nlam = 80
+
+
+ STABILITY SELECTION PARAMETERS:
+ numerical_method : Path-Alg
+ method : max
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lamin = 0.01
+ Nlam = 50
+
+ LAMBDA FIXED :
+ Selected variables : 17 59 123
+ Running time : 0.104s
+
+ PATH COMPUTATION :
+ Running time : 0.638s
+
+ CROSS VALIDATION :
+ Selected variables : 16 17 57 59 64 73 74 76 93 115 123 134 137 181
+ Running time : 2.1s
+
+ STABILITY SELECTION :
+ Selected variables : 17 59 76 123 137
+ Running time : 6.062s
+
+```
+
+
+![2.StabSel](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/StabSel.png)
+
+![2.StabSel-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/StabSel-beta.png)
+
+![2.CV-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/CVbeta.png)
+
+![2.CV-graph](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/CV.png)
+
+![2.LAM-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/beta.png)
+
+![2.Path](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/Beta-path.png)
+
+
+## Log-contrast regression for microbiome data
+
+In the [the accompanying notebook](./examples/example-notebook.ipynb) we study several microbiome data sets. We showcase two examples below.
+
+#### BMI prediction using the COMBO dataset
+
+We first consider the [COMBO data set](./examples/COMBO_data) and show how to predict Body Mass Index (BMI) from microbial genus abundances and two non-compositional covariates using "filtered_data".
+
+```python
+from classo import csv_to_np, classo_problem, clr
+
+# Load microbiome and covariate data X
+X0 = csv_to_np('COMBO_data/complete_data/GeneraCounts.csv', begin = 0).astype(float)
+X_C = csv_to_np('COMBO_data/CaloriData.csv', begin = 0).astype(float)
+X_F = csv_to_np('COMBO_data/FatData.csv', begin = 0).astype(float)
+
+# Load BMI measurements y
+y = csv_to_np('COMBO_data/BMI.csv', begin = 0).astype(float)[:, 0]
+labels = csv_to_np('COMBO_data/complete_data/GeneraPhylo.csv').astype(str)[:, -1]
+
+
+# Normalize/transform data
+y = y - np.mean(y) #BMI data (n = 96)
+X_C = X_C - np.mean(X_C, axis = 0) #Covariate data (Calorie)
+X_F = X_F - np.mean(X_F, axis = 0) #Covariate data (Fat)
+X0 = clr(X0, 1 / 2).T
+
+# Set up design matrix and zero-sum constraints for 45 genera
+X = np.concatenate((X0, X_C, X_F, np.ones((len(X0), 1))), axis = 1) # Joint microbiome and covariate data and offset
+label = np.concatenate([labels, np.array(['Calorie', 'Fat', 'Bias'])])
+C = np.ones((1, len(X[0])))
+C[0, -1], C[0, -2], C[0, -3] = 0., 0., 0.
+
+
+# Set up c-lassso problem
+problem = classo_problem(X, y, C, label = label)
+
+
+# Use stability selection with theoretical lambda [Combettes & Müller, 2020b]
+problem.model_selection.StabSelparameters.method = 'lam'
+problem.model_selection.StabSelparameters.threshold_label = 0.5
+
+# Use formulation R3
+problem.formulation.concomitant = True
+
+problem.solve()
+print(problem)
+print(problem.solution)
+
+# Use formulation R4
+problem.formulation.huber = True
+problem.formulation.concomitant = True
+
+problem.solve()
+print(problem)
+print(problem.solution)
+
+```
+
+![3.Stability profile R3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R3-StabSel.png)
+
+![3.Beta solution R3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R3-StabSel-beta.png)
+
+![3.Stability profile R4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R4-StabSel.png)
+
+![3.Beta solution R4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R4-StabSel-beta.png)
+
+
+<!---
+<img src="https://i.imgur.com/8tFmM8T.png" alt="Central Park Soil Microbiome" height="250" align="right"/>
+#### pH prediction using the Central Park soil dataset
+The next microbiome example considers the [Central Park Soil dataset](./examples/pH_data) from [Ramirez et al.](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2014.1988). The sample locations are shown in the Figure on the right.)
+-->
+
+#### pH prediction using the 88 soils dataset
+
+The next microbiome example considers the [88 soils dataset](./examples/pH_data) from [Lauber et al., 2009](https://pubmed.ncbi.nlm.nih.gov/19502440/).
+
+The task is to predict pH concentration in the soil from microbial abundance data. A similar analysis is available
+in [Tree-Aggregated Predictive Modeling of Microbiome Data](https://www.biorxiv.org/content/10.1101/2020.09.01.277632v1)
+with Central Park soil data from [Ramirez et al.](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2014.1988).
+
+Code to run this application is available in [the accompanying notebook](./examples/example-notebook.ipynb) under `pH data`. Below is a summary of a c-lasso problem instance (using the R3 formulation).
+
+```
+FORMULATION: R3
+
+MODEL SELECTION COMPUTED:
+ Lambda fixed
+ Path
+ Stability selection
+
+LAMBDA FIXED PARAMETERS:
+ numerical_method = Path-Alg
+ rescaled lam : True
+ threshold = 0.004
+ lam : theoretical
+ theoretical_lam = 0.2182
+
+PATH PARAMETERS:
+ numerical_method : Path-Alg
+ lamin = 0.001
+ Nlam = 80
+
+
+STABILITY SELECTION PARAMETERS:
+ numerical_method : Path-Alg
+ method : lam
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lam = theoretical
+ theoretical_lam = 0.3085
+```
+
+The c-lasso estimation results are summarized below:
+
+```
+LAMBDA FIXED :
+ Sigma = 0.198
+ Selected variables : 14 18 19 39 43 57 62 85 93 94 104 107
+ Running time : 0.008s
+
+ PATH COMPUTATION :
+ Running time : 0.12s
+
+ STABILITY SELECTION :
+ Selected variables : 2 12 15
+ Running time : 0.287s
+```
+
+![Ex4.1](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-Beta-path.png)
+
+![Ex4.2](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-Sigma-path.png)
+
+![Ex4.3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-StabSel.png)
+
+![Ex4.4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-StabSel-beta.png)
+
+![Ex4.5](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-beta.png)
+
+
+## Optimization schemes
+
+The available problem formulations [R1-C2] require different algorithmic strategies for
+efficiently solving the underlying optimization problem. We have implemented four
+algorithms (with provable convergence guarantees) that vary in generality and are not
+necessarily applicable to all problems. For each problem type, c-lasso has a default algorithm
+setting that proved to be the fastest in our numerical experiments.
+
+### Path algorithms (Path-Alg)
+This is the default algorithm for non-concomitant problems [R1,R3,C1,C2].
+The algorithm uses the fact that the solution path along &lambda; is piecewise-
+affine (as shown, e.g., in [1]). When Least-Squares is used as objective function,
+we derive a novel efficient procedure that allows us to also derive the
+solution for the concomitant problem [R2] along the path with little extra computational overhead.
+
+### Projected primal-dual splitting method (P-PDS):
+This algorithm is derived from [2] and belongs to the class of
+proximal splitting algorithms. It extends the classical Forward-Backward (FB)
+(aka proximal gradient descent) algorithm to handle an additional linear equality constraint
+via projection. In the absence of a linear constraint, the method reduces to FB.
+This method can solve problem [R1]. For the Huber problem [R3],
+P-PDS can solve the mean-shift formulation of the problem (see [6]).
+
+### Projection-free primal-dual splitting method (PF-PDS):
+This algorithm is a special case of an algorithm proposed in [3] (Eq.4.5) and also belongs to the class of
+proximal splitting algorithms. The algorithm does not require projection operators
+which may be beneficial when C has a more complex structure. In the absence of a linear constraint,
+the method reduces to the Forward-Backward-Forward scheme. This method can solve problem [R1].
+For the Huber problem [R3], PF-PDS can solve the mean-shift formulation of the problem (see [6]).
+
+### Douglas-Rachford-type splitting method (DR)
+This algorithm is the most general algorithm and can solve all regression problems
+[R1-R4]. It is based on Doulgas Rachford splitting in a higher-dimensional product space.
+It makes use of the proximity operators of the perspective of the LS objective (see [4,5])
+The Huber problem with concomitant scale [R4] is reformulated as scaled Lasso problem
+with the mean shift (see [6]) and thus solved in (n + d) dimensions.
+
+
+
+## References
+
+* [1] B. R. Gaines, J. Kim, and H. Zhou, [Algorithms for Fitting the Constrained Lasso](https://www.tandfonline.com/doi/abs/10.1080/10618600.2018.1473777?journalCode=ucgs20), J. Comput. Graph. Stat., vol. 27, no. 4, pp. 861–871, 2018.
+
+* [2] L. Briceno-Arias and S.L. Rivera, [A Projected Primal–Dual Method for Solving Constrained Monotone Inclusions](https://link.springer.com/article/10.1007/s10957-018-1430-2?shared-article-renderer), J. Optim. Theory Appl., vol. 180, Issue 3, March 2019.
+
+* [3] P. L. Combettes and J.C. Pesquet, [Primal-Dual Splitting Algorithm for Solving Inclusions with Mixtures of Composite, Lipschitzian, and Parallel-Sum Type Monotone Operators](https://arxiv.org/pdf/1107.0081.pdf), Set-Valued and Variational Analysis, vol. 20, pp. 307-330, 2012.
+
+* [4] P. L. Combettes and C. L. Müller, [Perspective M-estimation via proximal decomposition](https://arxiv.org/abs/1805.06098), Electronic Journal of Statistics, 2020, [Journal version](https://projecteuclid.org/euclid.ejs/1578452535)
+
+* [5] P. L. Combettes and C. L. Müller, [Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications](https://arxiv.org/abs/1903.01050), Statistics in Bioscience, 2020.
+
+* [6] A. Mishra and C. L. Müller, [Robust regression with compositional covariates](https://arxiv.org/abs/1909.04990), arXiv, 2019.
+
+* [7] S. Rosset and J. Zhu, [Piecewise linear regularized solution paths](https://projecteuclid.org/euclid.aos/1185303996), Ann. Stat., vol. 35, no. 3, pp. 1012–1030, 2007.
+
+* [8] J. Bien, X. Yan, L. Simpson, and C. L. Müller, [Tree-Aggregated Predictive Modeling of Microbiome Data](https://www.biorxiv.org/content/10.1101/2020.09.01.277632v1), biorxiv, 2020.
+
+
+
+
+
+
+%package help
+Summary: Development documents and examples for c-lasso
+Provides: python3-c-lasso-doc
+%description help
+[![arXiv](https://img.shields.io/badge/arXiv-2011.00898-b31b1b.svg)](https://arxiv.org/abs/2011.00898)
+[![DOI](https://joss.theoj.org/papers/10.21105/joss.02844/status.svg)](https://doi.org/10.21105/joss.02844)
+
+<img src="https://i.imgur.com/2nGwlux.png" alt="c-lasso" height="145" align="right"/>
+
+# c-lasso: a Python package for constrained sparse regression and classification
+
+
+c-lasso is a Python package that enables sparse and robust linear regression and classification with linear equality
+constraints on the model parameters. For detailed info, one can check the [documentation](https://c-lasso.readthedocs.io/en/latest/).
+
+The forward model is assumed to be:
+
+<img src="https://latex.codecogs.com/gif.latex?y=X\beta&plus;\sigma\epsilon\qquad\text{s.t.}\qquad&space;C\beta=0" title="y=X\beta+\sigma\epsilon\qquad\text{s.t.}\qquad C\beta=0" />
+
+Here, y and X are given outcome and predictor data. The vector y can be continuous (for regression) or binary (for classification). C is a general constraint matrix. The vector &beta; comprises the unknown coefficients and &sigma; an
+unknown scale.
+
+The package handles several different estimators for inferring &beta; (and &sigma;), including
+the constrained Lasso, the constrained scaled Lasso, sparse Huber M-estimation with linear equality constraints, and regularized Support Vector Machines.
+Several different algorithmic strategies, including path and proximal splitting algorithms, are implemented to solve
+the underlying convex optimization problems.
+
+We also include two model selection strategies for determining the sparsity of the model parameters: k-fold cross-validation and stability selection.
+
+This package is intended to fill the gap between popular python tools such as [scikit-learn](https://scikit-learn.org/stable/) which CANNOT solve sparse constrained problems and general-purpose optimization solvers that do not scale well or are inaccurate (see [benchmarks](./benchmark/README.md)) for the considered problems. In its current stage, however, c-lasso is not yet compatible with the scikit-learn API but rather a stand-alone tool.
+
+Below we show several use cases of the package, including an application of sparse *log-contrast*
+regression tasks for *compositional* microbiome data.
+
+The code builds on results from several papers which can be found in the [References](#references). We also refer to the accompanying [JOSS paper submission](https://github.com/Leo-Simpson/c-lasso/blob/master/paper/paper.md), also available on [arXiv](https://arxiv.org/pdf/2011.00898.pdf).
+
+## Table of Contents
+
+* [Installation](#installation)
+* [Regression and classification problems](#regression-and-classification-problems)
+* [Getting started](#getting-started)
+* [Log-contrast regression for microbiome data](#log-contrast-regression-for-microbiome-data)
+* [Optimization schemes](#optimization-schemes)
+
+
+* [References](#references)
+
+
+## Installation
+
+c-lasso is available on pip. You can install the package
+in the shell using
+
+```shell
+pip install c-lasso
+```
+To use the c-lasso package in Python, type
+
+```python
+
+from classo import classo_problem
+# one can add auxiliary functions as well such as random_data or csv_to_np
+```
+
+The `c-lasso` package depends on the following Python packages:
+
+- `numpy`;
+- `matplotlib`;
+- `scipy`;
+- `pandas`;
+- `pytest` (for tests)
+
+## Regression and classification problems
+
+The c-lasso package can solve six different types of estimation problems:
+four regression-type and two classification-type formulations.
+
+#### [R1] Standard constrained Lasso regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;||&space;X\beta-y&space;||^2&space;&plus;&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
+
+This is the standard Lasso problem with linear equality constraints on the &beta; vector.
+The objective function combines Least-Squares for model fitting with l1 penalty for sparsity.
+
+#### [R2] Constrained sparse Huber regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;h_{\rho}(X\beta-y&space;)&space;&plus;&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
+
+This regression problem uses the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) as objective function
+for robust model fitting with l1 and linear equality constraints on the &beta; vector. The parameter &rho;=1.345.
+
+#### [R3] Constrained scaled Lasso regression:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\frac{||&space;X\beta&space;-&space;y||^2}{\sigma}&space;&plus;&space;\frac{n}{2}&space;\sigma&plus;&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \frac{|| X\beta - y||^2}{\sigma} + \frac{n}{2} \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
+
+This formulation is similar to [R1] but allows for joint estimation of the (constrained) &beta; vector and
+the standard deviation &sigma; in a concomitant fashion (see [References](#references) [4,5] for further info).
+This is the default problem formulation in c-lasso.
+
+#### [R4] Constrained sparse Huber regression with concomitant scale estimation:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\left(&space;h_{\rho}&space;\left(&space;\frac{&space;X\beta&space;-&space;y}{\sigma}&space;\right)&plus;&space;n&space;\right)&space;\sigma&plus;&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \left( h_{\rho} \left( \frac{ X\beta - y}{\sigma} \right)+ n \right) \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
+
+This formulation combines [R2] and [R3] to allow robust joint estimation of the (constrained) &beta; vector and
+the scale &sigma; in a concomitant fashion (see [References](#references) [4,5] for further info).
+
+#### [C1] Constrained sparse classification with Square Hinge loss:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d}&space;\sum_{i=1}^n&space;l(y_i&space;x_i^\top&space;\beta)&space;&plus;&space;\lambda&space;\left\lVert&space;\beta\right\rVert_1&space;\qquad&space;s.t.&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d} \sum_{i=1}^n l(y_i x_i \beta) + \lambda \left\lVert \beta\right\rVert_1 \qquad s.t. \qquad C\beta = 0" />
+
+where the x<sub>i</sub> are the rows of X and l is defined as:
+
+<img src="https://latex.codecogs.com/gif.latex?l(r)&space;=&space;\begin{cases}&space;(1-r)^2&space;&&space;if&space;\quad&space;r&space;\leq&space;1&space;\\&space;0&space;&if&space;\quad&space;r&space;\geq&space;1&space;\end{cases}" title="l(r) = \begin{cases} (1-r)^2 & if \quad r \leq 1 \\ 0 &if \quad r \geq 1 \end{cases}" />
+
+This formulation is similar to [R1] but adapted for classification tasks using the Square Hinge loss
+with (constrained) sparse &beta; vector estimation.
+
+#### [C2] Constrained sparse classification with Huberized Square Hinge loss:
+
+<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d}&space;\sum_{i=1}^n&space;l_{\rho}(y_i&space;x_i^\top\beta)&space;&plus;&space;\lambda&space;\left\lVert&space;\beta\right\rVert_1&space;\qquad&space;s.t.&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d} \sum_{i=1}^n l_{\rho}(y_i x_i\beta) + \lambda \left\lVert \beta\right\rVert_1 \qquad s.t. \qquad C\beta = 0" />
+
+where the x<sub>i</sub> are the rows of X and l<sub>ρ</sub> is defined as:
+
+<img src="https://latex.codecogs.com/gif.latex?l_{\rho}(r)&space;=&space;\begin{cases}&space;(1-r)^2&space;&if&space;\quad&space;\rho&space;\leq&space;r&space;\leq&space;1&space;\\&space;(1-\rho)(1&plus;\rho-2r)&space;&&space;if&space;\quad&space;r&space;\leq&space;\rho&space;\\&space;0&space;&if&space;\quad&space;r&space;\geq&space;1&space;\end{cases}" title="l_{\rho}(r) = \begin{cases} (1-r)^2 &if \quad \rho \leq r \leq 1 \\ (1-\rho)(1+\rho-2r) & if \quad r \leq \rho \\ 0 &if \quad r \geq 1 \end{cases}" />
+
+
+This formulation is similar to [C1] but uses the Huberized Square Hinge loss for robust classification
+with (constrained) sparse &beta; vector estimation.
+
+
+## Getting started
+
+#### Basic example
+
+We begin with a basic example that shows how to run c-lasso on synthetic data. This example and the next one can be found on the notebook 'Synthetic data Notebook.ipynb'
+
+The c-lasso package includes
+the routine ```random_data``` that allows you to generate problem instances using normally distributed data.
+
+```python
+m, d, d_nonzero, k, sigma = 100, 200, 5, 1, 0.5
+(X, C, y), sol = random_data(m, d, d_nonzero, k, sigma, zerosum=True, seed=1)
+```
+This code snippet generates a problem instance with sparse &beta; in dimension
+d=100 (sparsity d_nonzero=5). The design matrix X comprises n=100 samples generated from an i.i.d standard normal
+distribution. The dimension of the constraint matrix C is d x k matrix. The noise level is &sigma;=0.5.
+The input ```zerosum=True``` implies that C is the all-ones vector and C&beta;=0. The n-dimensional outcome vector y
+and the regression vector &beta; is then generated to satisfy the given constraints.
+
+Next we can define a default c-lasso problem instance with the generated data:
+```python
+problem = classo_problem(X, y, C)
+```
+You can look at the generated problem instance by typing:
+
+```python
+print(problem)
+```
+
+This gives you a summary of the form:
+
+```
+FORMULATION: R3
+
+MODEL SELECTION COMPUTED:
+ Stability selection
+
+STABILITY SELECTION PARAMETERS:
+ numerical_method : not specified
+ method : first
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lamin = 0.01
+ Nlam = 50
+```
+As we have not specified any problem, algorithm, or model selection settings, this problem instance
+represents the *default* settings for a c-lasso instance:
+- The problem is of regression type and uses formulation [R3], i.e. with concomitant scale estimation.
+- The *default* optimization scheme is the path algorithm (see [Optimization schemes](#optimization-schemes) for further info).
+- For model selection, stability selection at a theoretically derived &lambda; value is used (see [Reference](#references) [4] for details). Stability selection comprises a relatively large number of parameters. For a description of the settings, we refer to the more advanced examples below and the API.
+
+You can solve the corresponding c-lasso problem instance using
+
+```python
+problem.solve()
+```
+
+After completion, the results of the optimization and model selection routines
+can be visualized using
+
+```python
+print(problem.solution)
+```
+
+The command shows the running time(s) for the c-lasso problem instance, and the selected variables for sability selection
+
+```
+STABILITY SELECTION :
+ Selected variables : 7 63 148 164 168
+ Running time : 1.546s
+
+```
+
+Here, we only used stability selection as *default* model selection strategy.
+The command also allows you to inspect the computed stability profile for all variables
+at the theoretical &lambda;
+
+![1.StabSel](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/basic/StabSel.png)
+
+
+The refitted &beta; values on the selected support are also displayed in the next plot
+
+![beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/basic/beta.png)
+
+
+#### Advanced example
+
+In the next example, we show how one can specify different aspects of the problem
+formulation and model selection strategy.
+
+```python
+m, d, d_nonzero, k, sigma = 100, 200, 5, 0, 0.5
+(X, C, y), sol = random_data(m, d, d_nonzero, k, sigma, zerosum = True, seed = 4)
+problem = classo_problem(X, y, C)
+problem.formulation.huber = True
+problem.formulation.concomitant = False
+problem.model_selection.CV = True
+problem.model_selection.LAMfixed = True
+problem.model_selection.PATH = True
+problem.model_selection.StabSelparameters.method = 'max'
+problem.model_selection.CVparameters.seed = 1
+problem.model_selection.LAMfixedparameters.rescaled_lam = True
+problem.model_selection.LAMfixedparameters.lam = .1
+
+problem.solve()
+print(problem)
+
+print(problem.solution)
+
+```
+
+Results :
+```
+ FORMULATION: R2
+
+ MODEL SELECTION COMPUTED:
+ Lambda fixed
+ Path
+ Cross Validation
+ Stability selection
+
+ LAMBDA FIXED PARAMETERS:
+ numerical_method = Path-Alg
+ rescaled lam : True
+ threshold = 0.09
+ lam = 0.1
+ theoretical_lam = 0.224
+
+ PATH PARAMETERS:
+ numerical_method : Path-Alg
+ lamin = 0.001
+ Nlam = 80
+
+
+ CROSS VALIDATION PARAMETERS:
+ numerical_method : Path-Alg
+ one-SE method : True
+ Nsubset = 5
+ lamin = 0.001
+ Nlam = 80
+
+
+ STABILITY SELECTION PARAMETERS:
+ numerical_method : Path-Alg
+ method : max
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lamin = 0.01
+ Nlam = 50
+
+ LAMBDA FIXED :
+ Selected variables : 17 59 123
+ Running time : 0.104s
+
+ PATH COMPUTATION :
+ Running time : 0.638s
+
+ CROSS VALIDATION :
+ Selected variables : 16 17 57 59 64 73 74 76 93 115 123 134 137 181
+ Running time : 2.1s
+
+ STABILITY SELECTION :
+ Selected variables : 17 59 76 123 137
+ Running time : 6.062s
+
+```
+
+
+![2.StabSel](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/StabSel.png)
+
+![2.StabSel-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/StabSel-beta.png)
+
+![2.CV-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/CVbeta.png)
+
+![2.CV-graph](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/CV.png)
+
+![2.LAM-beta](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/beta.png)
+
+![2.Path](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/advanced/Beta-path.png)
+
+
+## Log-contrast regression for microbiome data
+
+In the [the accompanying notebook](./examples/example-notebook.ipynb) we study several microbiome data sets. We showcase two examples below.
+
+#### BMI prediction using the COMBO dataset
+
+We first consider the [COMBO data set](./examples/COMBO_data) and show how to predict Body Mass Index (BMI) from microbial genus abundances and two non-compositional covariates using "filtered_data".
+
+```python
+from classo import csv_to_np, classo_problem, clr
+
+# Load microbiome and covariate data X
+X0 = csv_to_np('COMBO_data/complete_data/GeneraCounts.csv', begin = 0).astype(float)
+X_C = csv_to_np('COMBO_data/CaloriData.csv', begin = 0).astype(float)
+X_F = csv_to_np('COMBO_data/FatData.csv', begin = 0).astype(float)
+
+# Load BMI measurements y
+y = csv_to_np('COMBO_data/BMI.csv', begin = 0).astype(float)[:, 0]
+labels = csv_to_np('COMBO_data/complete_data/GeneraPhylo.csv').astype(str)[:, -1]
+
+
+# Normalize/transform data
+y = y - np.mean(y) #BMI data (n = 96)
+X_C = X_C - np.mean(X_C, axis = 0) #Covariate data (Calorie)
+X_F = X_F - np.mean(X_F, axis = 0) #Covariate data (Fat)
+X0 = clr(X0, 1 / 2).T
+
+# Set up design matrix and zero-sum constraints for 45 genera
+X = np.concatenate((X0, X_C, X_F, np.ones((len(X0), 1))), axis = 1) # Joint microbiome and covariate data and offset
+label = np.concatenate([labels, np.array(['Calorie', 'Fat', 'Bias'])])
+C = np.ones((1, len(X[0])))
+C[0, -1], C[0, -2], C[0, -3] = 0., 0., 0.
+
+
+# Set up c-lassso problem
+problem = classo_problem(X, y, C, label = label)
+
+
+# Use stability selection with theoretical lambda [Combettes & Müller, 2020b]
+problem.model_selection.StabSelparameters.method = 'lam'
+problem.model_selection.StabSelparameters.threshold_label = 0.5
+
+# Use formulation R3
+problem.formulation.concomitant = True
+
+problem.solve()
+print(problem)
+print(problem.solution)
+
+# Use formulation R4
+problem.formulation.huber = True
+problem.formulation.concomitant = True
+
+problem.solve()
+print(problem)
+print(problem.solution)
+
+```
+
+![3.Stability profile R3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R3-StabSel.png)
+
+![3.Beta solution R3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R3-StabSel-beta.png)
+
+![3.Stability profile R4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R4-StabSel.png)
+
+![3.Beta solution R4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/exampleFilteredCOMBO/R4-StabSel-beta.png)
+
+
+<!---
+<img src="https://i.imgur.com/8tFmM8T.png" alt="Central Park Soil Microbiome" height="250" align="right"/>
+#### pH prediction using the Central Park soil dataset
+The next microbiome example considers the [Central Park Soil dataset](./examples/pH_data) from [Ramirez et al.](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2014.1988). The sample locations are shown in the Figure on the right.)
+-->
+
+#### pH prediction using the 88 soils dataset
+
+The next microbiome example considers the [88 soils dataset](./examples/pH_data) from [Lauber et al., 2009](https://pubmed.ncbi.nlm.nih.gov/19502440/).
+
+The task is to predict pH concentration in the soil from microbial abundance data. A similar analysis is available
+in [Tree-Aggregated Predictive Modeling of Microbiome Data](https://www.biorxiv.org/content/10.1101/2020.09.01.277632v1)
+with Central Park soil data from [Ramirez et al.](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2014.1988).
+
+Code to run this application is available in [the accompanying notebook](./examples/example-notebook.ipynb) under `pH data`. Below is a summary of a c-lasso problem instance (using the R3 formulation).
+
+```
+FORMULATION: R3
+
+MODEL SELECTION COMPUTED:
+ Lambda fixed
+ Path
+ Stability selection
+
+LAMBDA FIXED PARAMETERS:
+ numerical_method = Path-Alg
+ rescaled lam : True
+ threshold = 0.004
+ lam : theoretical
+ theoretical_lam = 0.2182
+
+PATH PARAMETERS:
+ numerical_method : Path-Alg
+ lamin = 0.001
+ Nlam = 80
+
+
+STABILITY SELECTION PARAMETERS:
+ numerical_method : Path-Alg
+ method : lam
+ B = 50
+ q = 10
+ percent_nS = 0.5
+ threshold = 0.7
+ lam = theoretical
+ theoretical_lam = 0.3085
+```
+
+The c-lasso estimation results are summarized below:
+
+```
+LAMBDA FIXED :
+ Sigma = 0.198
+ Selected variables : 14 18 19 39 43 57 62 85 93 94 104 107
+ Running time : 0.008s
+
+ PATH COMPUTATION :
+ Running time : 0.12s
+
+ STABILITY SELECTION :
+ Selected variables : 2 12 15
+ Running time : 0.287s
+```
+
+![Ex4.1](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-Beta-path.png)
+
+![Ex4.2](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-Sigma-path.png)
+
+![Ex4.3](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-StabSel.png)
+
+![Ex4.4](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-StabSel-beta.png)
+
+![Ex4.5](https://github.com/Leo-Simpson/c-lasso/blob/master/figures/examplePH/R3-beta.png)
+
+
+## Optimization schemes
+
+The available problem formulations [R1-C2] require different algorithmic strategies for
+efficiently solving the underlying optimization problem. We have implemented four
+algorithms (with provable convergence guarantees) that vary in generality and are not
+necessarily applicable to all problems. For each problem type, c-lasso has a default algorithm
+setting that proved to be the fastest in our numerical experiments.
+
+### Path algorithms (Path-Alg)
+This is the default algorithm for non-concomitant problems [R1,R3,C1,C2].
+The algorithm uses the fact that the solution path along &lambda; is piecewise-
+affine (as shown, e.g., in [1]). When Least-Squares is used as objective function,
+we derive a novel efficient procedure that allows us to also derive the
+solution for the concomitant problem [R2] along the path with little extra computational overhead.
+
+### Projected primal-dual splitting method (P-PDS):
+This algorithm is derived from [2] and belongs to the class of
+proximal splitting algorithms. It extends the classical Forward-Backward (FB)
+(aka proximal gradient descent) algorithm to handle an additional linear equality constraint
+via projection. In the absence of a linear constraint, the method reduces to FB.
+This method can solve problem [R1]. For the Huber problem [R3],
+P-PDS can solve the mean-shift formulation of the problem (see [6]).
+
+### Projection-free primal-dual splitting method (PF-PDS):
+This algorithm is a special case of an algorithm proposed in [3] (Eq.4.5) and also belongs to the class of
+proximal splitting algorithms. The algorithm does not require projection operators
+which may be beneficial when C has a more complex structure. In the absence of a linear constraint,
+the method reduces to the Forward-Backward-Forward scheme. This method can solve problem [R1].
+For the Huber problem [R3], PF-PDS can solve the mean-shift formulation of the problem (see [6]).
+
+### Douglas-Rachford-type splitting method (DR)
+This algorithm is the most general algorithm and can solve all regression problems
+[R1-R4]. It is based on Doulgas Rachford splitting in a higher-dimensional product space.
+It makes use of the proximity operators of the perspective of the LS objective (see [4,5])
+The Huber problem with concomitant scale [R4] is reformulated as scaled Lasso problem
+with the mean shift (see [6]) and thus solved in (n + d) dimensions.
+
+
+
+## References
+
+* [1] B. R. Gaines, J. Kim, and H. Zhou, [Algorithms for Fitting the Constrained Lasso](https://www.tandfonline.com/doi/abs/10.1080/10618600.2018.1473777?journalCode=ucgs20), J. Comput. Graph. Stat., vol. 27, no. 4, pp. 861–871, 2018.
+
+* [2] L. Briceno-Arias and S.L. Rivera, [A Projected Primal–Dual Method for Solving Constrained Monotone Inclusions](https://link.springer.com/article/10.1007/s10957-018-1430-2?shared-article-renderer), J. Optim. Theory Appl., vol. 180, Issue 3, March 2019.
+
+* [3] P. L. Combettes and J.C. Pesquet, [Primal-Dual Splitting Algorithm for Solving Inclusions with Mixtures of Composite, Lipschitzian, and Parallel-Sum Type Monotone Operators](https://arxiv.org/pdf/1107.0081.pdf), Set-Valued and Variational Analysis, vol. 20, pp. 307-330, 2012.
+
+* [4] P. L. Combettes and C. L. Müller, [Perspective M-estimation via proximal decomposition](https://arxiv.org/abs/1805.06098), Electronic Journal of Statistics, 2020, [Journal version](https://projecteuclid.org/euclid.ejs/1578452535)
+
+* [5] P. L. Combettes and C. L. Müller, [Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications](https://arxiv.org/abs/1903.01050), Statistics in Bioscience, 2020.
+
+* [6] A. Mishra and C. L. Müller, [Robust regression with compositional covariates](https://arxiv.org/abs/1909.04990), arXiv, 2019.
+
+* [7] S. Rosset and J. Zhu, [Piecewise linear regularized solution paths](https://projecteuclid.org/euclid.aos/1185303996), Ann. Stat., vol. 35, no. 3, pp. 1012–1030, 2007.
+
+* [8] J. Bien, X. Yan, L. Simpson, and C. L. Müller, [Tree-Aggregated Predictive Modeling of Microbiome Data](https://www.biorxiv.org/content/10.1101/2020.09.01.277632v1), biorxiv, 2020.
+
+
+
+
+
+
+%prep
+%autosetup -n c-lasso-1.0.11
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-c-lasso -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.11-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..91bf2a4
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+e6c7ca5c6456bf96865da173e72fb7b9 c-lasso-1.0.11.tar.gz