DPC (DPP) Screening Methods for Group Lasso

Group Lasso is a widely used sparse modeling technique to identify important groups of variables.
For Group Lasso，the DPC screening rule is developed as an extension of the DPP method for standard Lasso.
DPC can be integrated with any existing solvers for Group Lasso. The code will be available soon. The implementation of the DPC rule is very easy.

Publications

Lasso Screening Rules via Dual Polytope Projection. (Improved version of the one accepted by NIPS 2013)
Jie Wang, Peter Wonka, and Jieping Ye.
Journal of Machine Learning Research, to appear.
Lasso Screening Rules via Dual Polytope Projection.
Jie Wang, Jiayu Zhou, and Peter Wonka, Jieping Ye.
NIPS 2013.

Notice that, the Enhanced DPP (EDPP) rule to appear in JMLR is an improved version of the EDPP rule in the NIPS paper. They are NOT the same.

The EDPP rule in the NIPS paper refers to “Improvement 1” in the journal version. The EDPP rule introduced below refers to the one in the journal version.

Formulation of Group Lasso

Let {bf y} denote the dimensional response vector and ${bf X} = [{bf x}_1, {bf x}_2, ldots, {bf x}_p]$ be the N times p feature matrix. Let lambdageq0 be the regularization parameter. With the group information available, the Group Lasso problem is formulated as the following optimization problem:

$inf_{beta in mathbb{R}^p} frac{1}{2}left|{bf y} - sum_{g=1}^G {bf X}_g beta_g right|_2^2 + lambda sum_{g=1}^G sqrt{n_g} |{beta}_g|_2.$

The dual problem of Lasso is

$sup_{theta}quad left{frac{1}{2}|{bf y}|_2^2 - frac{lambda^2}{2}left|theta - frac{{bf y}}{lambda}right|_2^2:,, |{bf X}_g^{T}theta|_2leq sqrt{n_g},,g=1,2,ldots,Gright},$

We denote the primal and dual optimal solutions of Lasso by beta^*(lambda) and theta^*(lambda) , respectively, which depend on the value of lambda . and are related by

${bf y} = sum_{g=1}^G {bf X}_g beta_g^*(lambda) + lambda theta^*(lambda) = {bf X}beta^*(lambda) + lambda theta^*(lambda).$

Moreover, it is easy to see that the dual optimal solution is the projection of $frac{bf y}{lambda}$ onto the dual feasible set, which is the intersection of a set of ellipsoids.

DPC (EDPP) rule for Group Lasso

Let us define

$lambda_{rm max}=max_g frac{|{bf X}_g^T {bf y}|_2}{sqrt{n_g}}.$

It is well-known that $forall lambda in [lambda_{rm max}, infty)$ , we have

$beta^*(lambda) = 0, theta^*(lambda) = frac{bf y}{lambda}.$

In other words, the Group Lasso problem admits closed form solutions when $lambda geq lambda_{rm max}$ .

Let $lambda_0 in (0,lambda_{rm max}]$ and lambda in (0,lambda_0] . We define

${bf X}_* = {rm argmax}_{{bf X}_g} frac{|{bf X}_g^T {bf y}|_2}{sqrt{n_g}},$

${bf v}_1(lambda_0) = left{ begin{array}{ll} frac{bf y}{lambda_0} - theta^*(lambda_0), & {rm if} hspace{2mm} lambda_0 in (0, lambda_{rm max}), {bf X}_* {bf X}_*^T {bf y}, & {rm if} hspace{2mm} lambda_0 = lambda_{rm max}. end{array}right.$

${bf v}_2(lambda, lambda_0) = frac{bf y}{lambda} - theta^*(lambda_0),$

${bf v}_2^{perp}(lambda, lambda_0) = {bf v}_2(lambda, lambda_0) - frac{langle{bf v}_1(lambda_0),{bf v}_2(lambda, lambda_0)rangle}{|{bf v}_1(lambda_0)|_2^2}{bf v}_1(lambda_0).$

DPC (EDPP) Screening Rule for Group Lasso

For the Group Lasso problem, suppose we are given a sequence of parameter values $lambda_{rm max} = lambda_0>lambda_1>ldots>lambda_{mathcal{K}}$ . Then for any integer , we have $[beta_g^*(lambda_{k+1})]_i=0$ if is known and the following holds:

$left|{bf X}_g^Tleft(frac{{bf y}-sum_{g=1}^G{bf X}_gbeta_g^*(lambda_k)}{lambda_k}+frac{1}{2}{bf v}_2^{perp}(lambda_{k+1},lambda_k)right)right|_2<1-frac{1}{2}|{bf v}_2^{perp}(lambda_{k+1},lambda_k)|_2|{bf X}_g|_2.$

A Few More Comments of DPC (EDPP)

To start from $lambda_{rm max}$ , it is worthwhile to note that $theta^*(lambda_{rm max}) = frac{bf y}{lambda_{rm max}}$ and $beta^*(lambda_{rm max}) = 0$ .
At the $k^{th}$ step, suppose that is known. To determine the zero coefficients of $beta^*(lambda_{k+1})$ , DPC (EDPP) needs to compute ${bf v}_2^{perp}(lambda, lambda_0)$ , which depends on ${bf v}_2(lambda_{k+1}, lambda_k)$ and ${bf v}_1(lambda_k)$ . Thus, we need to compute . Indeed, can be computed by $frac{{bf y}-{bf X} beta^*(lambda_k)}{lambda_k}$ .
After DPC (EDPP) tells you the zero coefficients of $beta^*(lambda_{k+1})$ , the corresponding features can be removed from the optimization and you can apply your favourite solver to solve for the remaining coefficients of $beta^*(lambda_{k+1})$ . Then, go to the next step until the Lasso problems at all given parameter values are solved.