DPC: IntroductionHere is a brief introduction of the package. Basic IdeaData with various structures and scales comes from almost every aspect of daily life. To effectively extract patterns in the data and build interpretable models with high prediction accuracy is always desirable. One popular technique to identify important explanatory features is by sparse regularization. The intuition of sparse modeling is based on the observation that many real-world data with complex structures and billions of variables can usually be well interpreted by a few most relevant explanatory features. However, for big data when the dimensionality of feature space and the number of samples are extremely large, solving the sparse models remains challenging because we may not even be able to load the data matrix into main memory. Therefore, there is an increasingly urgent need for nontraditional sparse modeling techniques to address the challenges posed by big data. The idea of “screening” has been shown very promising in solving Lasso for large-scale problems. Essentially, screening for Lasso aims to quickly identify the “inactive features” that have 0 components in the solution and then remove them from the optimization. Therefore, we can work on a reduced feature matrix to solve the Lasso problem, which may lead to substantial savings in computational cost and memory usage. We develop novel screening methods based on the Dual Projection onto Convex Set (DPC). Advantages of DPC includes:
For the DPC (Dual Projection onto Convex Set) rules, C (convex set) refers to the dual feasible sets of the sparse models. For example, the dual feasible set of Lasso is a polytobe. Therefore, DPC rule is named DPP in our paper: “Lasso Screening Rules via Dual Polytope Projection”. List of our screening methods
|