Faculty Candidate Seminar
Visual Recognition in the Three-Dimensional World
Add to Google Calendar
The ability to interpret the semantic of objects and actions, their individual geometric attributes as well as their spatial and temporal relationships within the environment is essential for an intelligent visual system and extremely valuable in numerous applications. In visual recognition, the problem of categorizing generic objects is a highly challenging one. Single objects vary in appearances and shapes under various photometric (e.g. illumination) and geometric (e.g. scale, view point, occlusion, etc.) transformations. Largely due to the difficulty of this problem, most of the current research in object categorization has focused on modeling object classes in single (or nearly single) views. But our world is fundamentally 3D and it is crucial that we design models and algorithms that can handle such appearance and pose variability. In the first part of the talk I introduce a novel framework for learning and recognizing 3D object categories and their poses. Our approach is to capture a compact model of an object category by linking together diagnostic parts of the objects from different viewing points. The resulting model is a summarization of both the appearance and geometry information of the object class. Unlike earlier attempts for 3D object categorization, our framework requires minimal supervision and has the ability to synthesize unseen views of an object category. Our results on categorization show superior performances to state-of-the-art algorithms on the largest dataset up to date. In the second part, I present a new framework for modeling the overall geometrical and temporal organization of scenes. This is done by learning the typical distribution of spatial and temporal relationships among elements in scenes. Our model is extremely compact and can be learned in an unsupervised fashion. Experiments demonstrate that the added ability of modeling such spatial and temporal relationships is useful in several recognition tasks, such as scene/object categorization and human action classification. I conclude the talk with final remarks on the relevance of object recognition, pose estimation, and temporal and spatial reasoning as key ingredients for a coherent geometrical interpretation of images and videos.