Table of Content
1. Software subsamp
2. How to install the R package subsamp, subsamp()
3. How to run subsamp()
4. References
Thus, the default m=100, and "q" is "qnum" here. To view the multi-panel plot for confirming or selecting "s," set "wplot=T" inside "subsamp."
install.packages("subsamp") # install subsamp package
library(subsamp) # load subsamp package
detach(package:subsamp) # unload subsamp package
library(subsamp) ?subsamp #To see the help file
n <- 80; p <- 100 set.seed(2017) x <- matrix(rnorm(n*p),n) coefs <- c(.1, .5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 5) y <- x[, 1:length(coefs)]%*%coefs + rnorm(n) d <- data.frame(y,x)
One can try using the standard linear regression by lm():
lm(y~x, d)
Next, we run our SWA with the default values for m and wplot, as well as a smaller value of s than desired (as we do not know what is desired in practice):
subsamp(x, y, s=8, qnum=10) #1st runwhich leads to the following output:
Call:
lm(formula = y ~ X10 + X9 + X8 + X7 + X82 + X47 + X5, data = Data.final)
Residuals:
Min 1Q Median 3Q Max
-4.7524 -2.0673 -0.1295 1.3865 6.2082
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1205 0.3183 0.379 0.7061
X10 5.1715 0.3214 16.090 < 2e-16 ***
X9 4.0475 0.3197 12.658 < 2e-16 ***
X8 3.4370 0.3354 10.249 1.03e-15 ***
X7 2.6745 0.3318 8.062 1.18e-11 ***
X82 0.5581 0.3245 1.720 0.0898 .
X47 0.5766 0.3478 1.658 0.1016
X5 1.5756 0.3529 4.465 2.91e-05 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 2.67 on 72 degrees of freedom
Multiple R-squared: 0.9102, Adjusted R-squared: 0.9015
F-statistic: 104.2 on 7 and 72 DF, p-value: < 2.2e-16
Next, we try a bigger s and include a W-plot for diagnostics:
#png("swa-ex1.png") #Uncomment this to save the resulting picture into a png file
subsamp(x, y, s=10, qnum=10, wplot=TRUE) #2nd run
that outputs:
Call:
lm(formula = y ~ X10 + X8 + X9 + X7 + X6 + X5 + X17 + X73, data = Data.final)
Residuals:
Min 1Q Median 3Q Max
-4.2751 -1.3235 0.0921 1.1333 3.5947
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1025 0.2175 0.471 0.6389
X10 5.0646 0.2153 23.524 < 2e-16 ***
X8 3.4123 0.2280 14.963 < 2e-16 ***
X9 3.7985 0.2189 17.354 < 2e-16 ***
X7 2.7257 0.2237 12.182 < 2e-16 ***
X6 1.8814 0.2042 9.215 9.42e-14 ***
X5 1.4865 0.2334 6.368 1.65e-08 ***
X17 -0.3072 0.2068 -1.485 0.1418
X73 0.3280 0.1803 1.819 0.0731 .
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 1.805 on 71 degrees of freedom
Multiple R-squared: 0.9595, Adjusted R-squared: 0.955
F-statistic: 210.4 on 8 and 71 DF, p-value: < 2.2e-16
We look for a value of s, at which the plot is `stabilized' from this point on. By being `stabilized,' we mean that the upper-arm sets in the sub-graphs in the W-plot from this value of `s,' say 10, onward, are similar. This diagnostic weights plot indicates that s=10 is a good choice.