Toward the Who and Where of Action Recognition
Add to Google Calendar
Action recognition has been hotly studied in computer vision for more than two decades. Recent action recognition systems are adept at classifying web videos in a closed-world of action categories. But, next generation cognitive systems will require far more than action classification. Full action recognition requires not only classifying the action, but also localizing it and potentially even finely segmenting its boundaries. It requires not only focusing on human action but also the action of other agents in the environment, such as animals or vehicles. In this talk, I will describe our recent work in moving toward these more rigorous aspects of action recognition. Our work is the first effort in the computer vision community to jointly consider various types of actors undergoing various actions. We consider seven actor-types and eight action-types in three action understanding problems including single-label action classification, multi-label action classification and actor-action joint semantic segmentation. We propose graduated strata of models for this task and analyze the performance of each in all three tasks. The talk will thoroughly discuss these models, the results, and a new dataset that we released to support these more rigorous action understanding problems. This talk involves work appearing in CVPR 2015 and new material.