Communications and Signal Processing Seminar

(CANCELED) Discrete Optimization for Adversarial Attacks on Large Language Models

Zico KolterAssociate Professor of Computer ScienceCarnegie Mellon UniversityChief Scientist of AI ResearchBosch Center for AI(BCAI), Pittsburgh Office
3427 EECS BuildingMap

Abstract:  In this talk, I’ll discuss our recent work on adversarial attacks against public large language models (LLMs), such as ChatGPT and Bard.  At a high level, the attacks look for “adversarial suffix” strings that cause these models to ignore their guardrails and answer potentially harmful user queries.  This talk will specifically focus on the optimization aspects of this problem, where the task at hand involves a relatively unstructured optimization over discrete objects (the tokens in the adversarial suffix).  I will highlight the challenges of this problem from an optimization standpoint, and highlight the main features of our method, which combines gradient-based information and with greedy search.  I will highlight potential future directions for research in such optimization settings, as well as discuss the broader implications on LLM robustness.

Bio:  Zico Kolter is an Associate Professor in the Computer Science Department at Carnegie Mellon University, and also serves as chief scientist of AI research for the Bosch Center for Artificial Intelligence. His work spans the intersection of machine learning and optimization, with a large focus on developing more robust and rigorous methods in deep learning. In addition, he has worked in a number of application areas, highlighted by work on sustainability and smart energy systems. He is a recipient of the DARPA Young Faculty Award, a Sloan Fellowship, and best paper awards at NeurIPS, ICML (honorable mention), AISTATS (test of time), IJCAI, KDD, and PESGM.

*** The event will take place in a hybrid format. The location for in-person attendance will be

Faculty Host

Liyue ShenAssistant ProfessorUniversity of Michigan, Electrical and Computer Engineering