Multimodal Character Representation for Visual Story Understanding
Add to Google Calendar
Recently, several efforts have been made towards building story understanding benchmarks. As characters are the key component around which the story events unfold, character representations are crucial for deep story understanding such as their names, appearances, and relations to other characters. As a step towards endowing systems with a richer understanding of characters in a given narrative, this thesis develops new techniques that rely on the vision, audio and language channels to address three important challenges: i) speaker recognition and identification, ii) character representation and embedding, and iii) temporal modeling of character relations. We show that our approach improves systems ability to understand narratives, which is measured using several tasks such as their ability to answer questions about stories on several benchmarks.