DPC (DPP) Screening Methods for Group Lasso

  • Group Lasso is a widely used sparse modeling technique to identify important groups of variables.

  • For Group Lassoļ¼Œthe DPC screening rule is developed as an extension of the DPP method for standard Lasso.

  • DPC can be integrated with any existing solvers for Group Lasso. The code will be available soon. The implementation of the DPC rule is very easy.

Publications

Notice that, the Enhanced DPP (EDPP) rule to appear in JMLR is an improved version of the EDPP rule in the NIPS paper. They are NOT the same.

The EDPP rule in the NIPS paper refers to “Improvement 1” in the journal version. The EDPP rule introduced below refers to the one in the journal version.

Formulation of Group Lasso

Let {bf y} denote the N dimensional response vector and {bf X} = [{bf x}_1, {bf x}_2, ldots, {bf x}_p] be the N times p feature matrix. Let lambdageq0 be the regularization parameter. With the group information available, the Group Lasso problem is formulated as the following optimization problem:

 inf_{beta in mathbb{R}^p} frac{1}{2}left|{bf y} - sum_{g=1}^G {bf X}_g beta_g right|_2^2 + lambda sum_{g=1}^G sqrt{n_g} |{beta}_g|_2.

The dual problem of Lasso is

 sup_{theta}quad left{frac{1}{2}|{bf y}|_2^2 - frac{lambda^2}{2}left|theta - frac{{bf y}}{lambda}right|_2^2:,, |{bf X}_g^{T}theta|_2leq sqrt{n_g},,g=1,2,ldots,Gright},

We denote the primal and dual optimal solutions of Lasso by beta^*(lambda) and theta^*(lambda), respectively, which depend on the value of lambda. beta^*(lambda) and theta^*(lambda) are related by

 {bf y} = sum_{g=1}^G {bf X}_g beta_g^*(lambda) + lambda theta^*(lambda) = {bf X}beta^*(lambda) + lambda theta^*(lambda).

Moreover, it is easy to see that the dual optimal solution is the projection of frac{bf y}{lambda} onto the dual feasible set, which is the intersection of a set of ellipsoids.

DPC (EDPP) rule for Group Lasso

Let us define

 lambda_{rm max}=max_g frac{|{bf X}_g^T {bf y}|_2}{sqrt{n_g}}.

It is well-known that forall lambda in [lambda_{rm max}, infty), we have

 beta^*(lambda) = 0, theta^*(lambda) = frac{bf y}{lambda}.

In other words, the Group Lasso problem admits closed form solutions when lambda geq lambda_{rm max}.

Let lambda_0 in (0,lambda_{rm max}] and lambda in (0,lambda_0]. We define

 {bf X}_* = {rm argmax}_{{bf X}_g} frac{|{bf X}_g^T {bf y}|_2}{sqrt{n_g}},
   {bf v}_1(lambda_0) = left{   begin{array}{ll}   frac{bf y}{lambda_0} - theta^*(lambda_0), & {rm if} hspace{2mm} lambda_0 in (0, lambda_{rm max}),    {bf X}_* {bf X}_*^T {bf y}, & {rm if} hspace{2mm} lambda_0 = lambda_{rm max}.    end{array}right.
 {bf v}_2(lambda, lambda_0) = frac{bf y}{lambda} - theta^*(lambda_0),
 {bf v}_2^{perp}(lambda, lambda_0) = {bf v}_2(lambda, lambda_0) - frac{langle{bf v}_1(lambda_0),{bf v}_2(lambda, lambda_0)rangle}{|{bf v}_1(lambda_0)|_2^2}{bf v}_1(lambda_0).
DPC (EDPP) Screening Rule for Group Lasso
  • For the Group Lasso problem, suppose we are given a sequence of parameter values lambda_{rm max} = lambda_0>lambda_1>ldots>lambda_{mathcal{K}}. Then for any integer 0leq k < mathcal{K}, we have [beta_g^*(lambda_{k+1})]_i=0 if beta^*(lambda_k) is known and the following holds:

 left|{bf X}_g^Tleft(frac{{bf y}-sum_{g=1}^G{bf X}_gbeta_g^*(lambda_k)}{lambda_k}+frac{1}{2}{bf v}_2^{perp}(lambda_{k+1},lambda_k)right)right|_2<1-frac{1}{2}|{bf v}_2^{perp}(lambda_{k+1},lambda_k)|_2|{bf X}_g|_2.
A Few More Comments of DPC (EDPP)
  • To start from lambda_{rm max}, it is worthwhile to note that theta^*(lambda_{rm max}) = frac{bf y}{lambda_{rm max}} and beta^*(lambda_{rm max}) = 0.

  • At the k^{th} step, suppose that beta^*(lambda_k) is known. To determine the zero coefficients of beta^*(lambda_{k+1}), DPC (EDPP) needs to compute {bf v}_2^{perp}(lambda, lambda_0), which depends on {bf v}_2(lambda_{k+1}, lambda_k) and {bf v}_1(lambda_k). Thus, we need to compute theta^*(lambda_k). Indeed, theta^*(lambda_k) can be computed by frac{{bf y}-{bf X} beta^*(lambda_k)}{lambda_k}.

  • After DPC (EDPP) tells you the zero coefficients of beta^*(lambda_{k+1}), the corresponding features can be removed from the optimization and you can apply your favourite solver to solve for the remaining coefficients of beta^*(lambda_{k+1}). Then, go to the next step until the Lasso problems at all given parameter values are solved.