From In Vitro to In Vivo AI Evaluation
Kawin Ethayarajh (Stanford University)
Colloquium
Tuesday, April 18, 2023, 3:30 pm
Abstract
AI can fail spectacularly in the wild despite passing evaluation in the lab. Why? The in vitro evaluation done in research is divorced from the complexity of the real world in which AI models are eventually deployed. This has had catastrophic effects, including systemic discrimination and the loss of billions of dollars. By making interdisciplinary connections, Kawin's work develops in vivo evaluation paradigms for AI that bridge the gap between research and reality. In this talk, he will discuss: (1) how we can create datasets that are as difficult as the underlying tasks we want to solve; (2) how we can measure and incorporate more context into the representations of AI models; (3) the importance of tracking the hidden costs of making predictions. With in vivo evaluation, we can be better assured that the progress we make on paper translates to progress in the real world.