Computer Vision Seminar
Language and Vision: Learning Knowledge About the World
Add to Google Calendar
Online images and text provide rich records of human lives, events, and activities. In this talk, I will survey some of our recent attempts to learn various aspects of commonsense knowledge from naturally existing multi-modal web data. More concretely, I will present our work on learning (1) relative physical knowledge (e.g., are elephants bigger than butterflies?), (2) learning visual entailment (e.g., a horse "eating" is likely to be "standing" , while a cat "eating" is likely to be "sitting" ), and (3) learning prototypical event structure of common life scenarios (e.g., in a "wedding" , "exchanging vows" typically occur before "cutting a cake" , and "dancing" happens last). In all these projects, a recurring theme is the use of naturally existing multi-modal web data to recover the implicit knowledge, and how such knowledge can help improve related downstream tasks.
Yejin Choi is an assistant professor at the Computer Science & Engineering Department of University of Washington. Her recent research focuses on language grounding, integrating language and vision, and modeling nonliteral meaning in text. She was among the IEEE's AI Top 10 to Watch in 2015 and a co-recipient of the Marr Prize at ICCV 2013. Her work on detecting deceptive reviews, predicting the literary success, and learning to interpret connotation has been featured by numerous media outlets including NBC News for New York, NPR Radio, New York Times, and Bloomberg Business Week. She received her Ph.D. in Computer Science at Cornell University.