loading

Logout succeed

Logout succeed. See you again!

ebook img

Post-selection Inference for Forward Stepwise and Least Angle Regression PDF

pages96 Pages
release year2014
file size2.45 MB
languageEnglish

Preview Post-selection Inference for Forward Stepwise and Least Angle Regression

Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September 2014 1/45 Matching  Results   from  picadilo.com   Ryan  Tibshirani  ,                      Rob  Tibshirani   CMU.  PhD  student  of  Taylor                                Stanford   2011   2/45 ⎜   81%   71%   Ryan  Tibshirani                    Rob  Tibshirani   ë          CMU                                Stanford   Top  matches  from   picadilo.com   3/45 81%   Ryan  Tibshirani                    Rob  Tibshirani   71%   69%          CMU                                Stanford   4/45 Conclusion Confidence— the strength of evidence— matters! 5/45 Outline Setup and basic question • Quick review of least angle regression and the covariance test • A new framework for inference after selection • Application to forward stepwise and least angle regression • Application of these and related ideas to other problems • 6/45 Setup and basic question (cid:73) Given an outcome vector y Rn and a predictor matrix ∈ X Rn×p, we consider the usual linear regression setup: ∈ y = Xβ∗+σ(cid:15), where β∗ Rp are unknown coefficients to be estimated, and ∈ the components of the noise vector (cid:15) Rn are i.i.d. N(0,1) ∈ (cid:73) Main question: If we apply least angle or forward stepwise regression, how can we compute valid p-values and confidence intervals? 7/45 Forward stepwise regression (cid:73) This procedure enters predictors one a time, choosing the predictor that most decreases the residual sum of squares at each stage. (cid:73) Defining RSS to be the residual sum of squares for the model containing k predictors, and RSS the residual sum of null squares before the kth predictor was added, we can form the usual statistic R = (RSS RSS)/σ2 k null − (with σ assumed known), and compare it to a χ2 distribution. 1 8/45 Simulated example: naive forward stepwise Setup: n = 100,p = 10, true model null 5 l 1 l Test statistic 510 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 0 0 2 4 6 8 10 Chi−squared on 1 df Test is too liberal: for nominal size 5%, actual type I error is 39%. (Yes, Larry, can get proper p-values by sample splitting: but messy, loss of power) 9/45 Quick review of LAR and the covariance test Least angle regression or LAR is a method for constructing the path of solutions for the lasso: (cid:88) (cid:88) (cid:88) min (y β x β )2+λ β i 0 ij j j β0,βj i − − j · j | | LAR is a more democratic version of forward stepwise regression. (cid:73) Find the predictor most correlated with the outcome (cid:73) Move the parameter vector in the least squares direction until some other predictor has as much correlation with the current residual (cid:73) This new predictor is added to the active set, and the procedure is repeated (cid:73) Optional (“lasso mode”): if a non-zero coefficient hits zero, that predictor is dropped from the active set, and the process is restarted 10/45

See more

The list of books you might like