Communications and Signal Processing Seminar

Transformers as Support Vector Machines

Samet OymakAssistant Professor, Electrical Engineering & Computer ScienceUniversity of Michigan, College of Engineering
3427 EECS BuildingMap

Abstract: Recent advances in language modeling, such as ChatGPT, have had a revolutionary impact within a short timeframe. These large language models are based on the transformer architecture which uses the self-attention mechanism as their central component. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this talk, we establish a formal equivalence between the optimization geometry of the attention layer and a linear hard-margin SVM problem that separates the optimal tokens within the input sequence from non-optimal tokens (e.g. selecting the most relevant words within a sentence). Through this, we characterize the inductive bias of 1-layer transformers optimized with gradient descent and prove that optimization of attention weights converges in direction to a max-margin token-separator minimizing either nuclear norm or Frobenius norm objective depending on the parameterization. If time permits, I will provide further discussion on the role of the MLP layer in attention and importance of flat minima. Finally, I will demonstrate the practical validity of our hard-margin SVM equivalence via numerical experiments. Our findings on attention mechanism inspire a new perspective, interpreting multilayer transformers as a hierarchy of SVMs that separates and selects optimal tokens.


Bio: Samet Oymak is an assistant professor of Electrical Engineering and Computer Science at the University of Michigan. Prior to UMich, he was with the ECE department at the University of California, Riverside. He has also spent time in industry as a researcher and did a postdoc at UC Berkeley as a Simons Fellow. He obtained his PhD degree from Caltech in 2015 for which he received a Charles Wilts Prize for the best departmental thesis. He is also a recipient of an NSF CAREER award and multiple industry faculty research awards. Website:

*** This event will take place in a hybrid format. The location for in-person attendance will be room 3427 EECS.   Attendance will also be available via Zoom.

Join Zoom Meeting https:

Meeting ID: 991 0245 1525

Passcode: XXXXXX (Will be sent via e-mail to attendees)

Zoom Passcode information is also available upon request to Sher Nickrand ([email protected]).

This seminar will be recorded and posted to the CSP Seminar website.

See the full seminar by Professor Oymak

Faculty Host

Qing QuAssistant ProfessorUniversity of Michigan, Electrical and Computer Engineering