Can Mathematical Solutions Help Scientific Stagnation?
Happy to discuss at length but the term over fitting is too squishy. It is simply applied when someone thinks a process yields results they think are suboptimal.
Your example is showing an early stopping algorithm but I guarantee you people will still accuse of overfitting, rightfully so, even if you use early stopping. Often this occurs when yoi jave a poorly specified loss function or a badly chosen data set that learning the validation set still doesn't generalize.
I find this passage to be clarifiying for describing why overfitting is an underspecified, and unhelpful, term.
This came from Bob Carpenter on the Stan mailing list:
It’s not overfitting so much as model misspecification.
If your model is correct, “overfitting” is impossible. In its usual form, “overfitting” comes from using too weak of a prior distribution.
One might say that “weakness” of a prior distribution is not precisely defined. Then again, neither is “overfitting.” They’re the same thing.
P.S. In response to some discussion in comments: One way to define overfitting is when you have a complicated statistical procedure that gives worse predictions, on average, than a simpler procedure.
Or, since we’re all Bayesians here, we can rephrase: Overfitting is when you have a complicated model that gives worse predictions, on average, than a simpler model.
I’m assuming full Bayes here, not posterior modes or whatever.
Anyway, yes, overfitting can happen. And it happens when the larger model has too weak a prior. After all, the smaller model can be viewed as a version of the larger model, just with a very strong prior that restricts some parameters to be exactly zero