Estimating predictive performance is a critical part of real-world statistical modeling. This includes both model selection to find the best model architecture, hyperparameters, and features and model assessment to gauge the business impact of a model once it's deployed.
Research digest: what does cross-validation really estimate?
A new paper by Bates, Hastie, and Tibshirani reminds us that estimating a model's predictive performance is tricky. For linear models at least, cross-validation does not estimate the generalization error of a specific model, as you would assume. How much does this matter for data science in practice?