What are bias and variance for a resampling scheme (as opposed to bias and variance for a model)?

by joern   Last Updated June 12, 2019 19:19 PM

I am trying to understand how bias and variance of a resampling scheme like cross validation are different from bias and variance of a model (like a linear model or a tree).

Which of the following is true (if any)?

  1. Bias and variance of cv are the bias and variance of the accuracy estimate obtained from using cv as a resampling scheme. Because of this, bias and variance of cv tells us something about the accuracy of the estimate (variance: like the standard error of an estimator in statistics, its precision) and whether the estimate will converge towards the real and unknown out-of-sample error, for example, if the sample is large enough (then bias=0).

  2. Because of 1.: CV itself can have small bias and small variance, but the predicted error can have large bias and variance (it is just that the CV estimates it well. But the estimated error itself is large. That is something else).

If 1. and 2. are true, then: why is large variance of a resampling scheme a problem? I can only imagine that resulting confidence intervals are large, which might make it hard to find differences between models.

  1. No, 1. and 2. are wrong. Cross validation will influence the estimated model, so that statements about the bias of CV become statements about the model itself. If I say „CV has low bias“, this means models obtained through CV have low bias. Same for variance: if „CV has high variance“, then the model will have high variance.

I understand bias variance decomposition and dilemma in the context of model selection, and that we will never see variance and bias in real data. We have to estimate the error, for example using cv.

Thank you,

Kind regards Joern

Edit: Typos

Related Questions

Bias/variance tradeoff tutorial

Updated October 04, 2018 19:19 PM

What is "Entropic Capacity"?

Updated July 26, 2018 16:19 PM