Abstract:
Cross-validation (CV), while being extensively used for model comparison in data science, may have three major weaknesses. The regular 10-fold CV, for instance, is often unstable in its choice of the best model among the candidates. Secondly, the CV outcome of singling out one candidate based on the total prediction errors over the different folds does not convey any sensible information on how much one can trust the apparent winner. Related to this, the popular one-standard-error-rule turns out to be questionable. Lastly, when only one data splitting ratio is considered, regardless of its choice, it may work very poorly for some situations. In this work, to address these shortcomings, we propose a new averaging-voting based version of cross-validation for better comparison results. Simulations and real data are used to illustrate the superiority of the new approach over traditional CV methods. This talk is based a joint work with Zishu Zhan.
About the Speaker:
Dr. Yuhong Yang is Professor at Yau Mathematical Sciences Center. He received his Ph.D. in statistics from Yale University in 1996. His research interests include model selection, model averaging, multi-armed bandit problems, causal inference, high-dimensional data analysis, and machine learning. He has published in journals in several fields including Annals of Statistics, JASA, IEEE Transactions on Information Theory, IEEE Signal Processing Magazine, Journal of Econometrics, Journal of Machine Learning Research, and International Journal of Forecasting. He is a recipient of the US NSF CAREER Award and a fellow of the Institute of Mathematical Statistics. He is included in the list of top 2% of the world's most cited scientists by Stanford University.