Communications and Signal Processing Seminar

A New Embedding Framework to Design Consistent Surrogate Losses

Rafael M. FrongilloAssistant ProfessorUniversity of Colorado Boulder
WHERE:
1008 EECS BuildingMap
SHARE:

Abstract:  Statistical learning problems are often given by a target loss function: one seeks to learn a hypothesis which minimizes the target loss over the true data distribution. For discrete prediction tasks like classification, ranking, and structured prediction, however, empirically minimizing the target loss directly is computationally intractable. Instead, one typically minimizes a surrogate loss which is convex, and therefore efficiently minimized. A link function then translates back to the target problem. To ensure the surrogate and link properly correspond to the target problem, the surrogate must be consistent: minimizing the surrogate loss over enough data should also minimize the target loss.

Yet how should one design such a consistent convex surrogate? Many such surrogates are known for specific problems, like binary classification, but from there our knowledge drops off quickly. In particular, the literature lacks general techniques to design surrogates for any discrete target loss.

In this talk, I will discuss a natural approach to design convex surrogates via an embedding. In this approach, one embeds each of the finitely many predictions (e.g. rankings) as a point in R^d, assigns the original loss values to these points, and “convexifies” the loss in some way to obtain a surrogate. We will see a strong connection between this approach and polyhedral (piecewise-linear convex) surrogate losses: every discrete loss is embedded by some polyhedral loss, and every polyhedral loss embeds some discrete loss. Moreover, an embedding gives rise to a consistent link function as well as linear surrogate regret bounds. These results are constructive, as we will illustrate with several examples.

Bio: Rafael (Raf) Frongillo is an Assistant Professor of Computer Science at the University of Colorado Boulder. His research lies at the interface between theoretical machine learning and economics, primarily focusing on information elicitation mechanisms, which incentivize humans or algorithms to predict accurately. Before Boulder, Raf was a postdoc at the Center for Research on Computation and Society at Harvard University and at Microsoft Research New York. He received his PhD in Computer Science at UC Berkeley, advised by Christos Papadimitriou and supported by the NDSEG Fellowship.

***Event will take place in hybrid format. The location for in-person attendance will be room 1008 EECS.   Attendance will also be available via Zoom.

Join Zoom Meeting https: https://umich.zoom.us/j/91414297851

Meeting ID: 914 1429 7851

Passcode: XXXXXX (Will be sent via e-mail to attendees)

Zoom Passcode information is also available upon request to Michele Feldkamp (careymrz@umich.edu).

This seminar will be recorded and posted to the CSP Seminar events page.

 

Faculty Host

Ambuj TewariProfessor, StatisticsUniversity of Michigan