by Alex Gold
Last Updated July 02, 2018 01:19 AM

I'm trying to bootstrap non-parametric prediction errors for a model I'm building. I've seen a few resources that suggest that the proper procedure is the following, with the input matrix $X$, response vector $y$, an input matrix to be predicted $X_p$, and $N$ equal to the number of resamples (e.g. 10,000):

for $i$ in $N$:

- Resample the rows of $X$ with replacement to get $X_i$
- Retrain the model $m_i(X_i)$ and predict the responses: $\hat{y_i} = m_i(X_p)$
- Generate the residual vector $\epsilon_i = y - \hat{y_i}$
- Resample (shuffle) the residuals to yield $\epsilon^*_i$ the bootstrap estimates $\hat{y^b_i} = \hat{y_i} + \epsilon^*_i$

Then the bootstrap estimate and prediction intervals can be computed directly from that vector (e.g. mean and quantiles of the vector).

So my question is why is step 4 necessary? My intuition is that it has to do with the fact that I'm computing prediction intervals rather than a confidence interval, but I haven't found a good resource on this.

Thanks!

Sources: Slides 12-15 https://www.emse.fr/~roustant/Documents/Bootstrap_Conf_and_Pred_Intervals.pdf

Bootstrap prediction interval (This would be perfect if I happened to be using OLS, but I'm not...)

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger