Faculty Candidate Seminar
Fair and Reliable Machine Learning for High-Stakes Applications: Approaches Using Information Theory
This event is free and open to the publicAdd to Google Calendar
Abstract: How do we make machine learning (ML) algorithms fair, reliable, and lawful? This is particularly important today as ML enters high-stakes applications such as hiring and education, often adversely affecting people’s lives with respect to gender, race, etc., and also violating anti-discrimination laws. When it comes to resolving legal disputes, or even informing policies and interventions, only identifying bias/disparity in a model’s decision is insufficient. We really need to dig deeper and identify/explain the sources of disparity. E.g., disparities in hiring that can be explained by an occupational necessity (code-writing skills for software engineering) may be exempt by law, but the disparity arising due to an aptitude test may not be (Ref: Griggs v. Duke Power ‘71). This leads us to a question that bridges the fields of fairness, explainability, and law: How can we identify and explain the sources of disparity in ML models, e.g., did the disparity entirely arise due to the critical occupational necessities? In this talk, I propose the first systematic measure of “non-exempt disparity,” i.e., the bias which cannot be explained by the occupational necessities. To arrive at a measure for the non-exempt disparity, I adopt a rigorous axiomatic approach that brings together concepts in information theory, in particular, an emerging body of work called Partial Information Decomposition, with causal inference tools.
Another direction of my research includes reliability in computing, which has led to an emerging interdisciplinary area called “coded computing.” Towards the end of the talk, I will also provide an overview of some of my results on coded reliable computing that addresses long-standing computational challenges in large-scale distributed machine learning (namely, errors, stragglers, faults, failures) using tools from coding theory, optimization, and queueing theory.
Bio: Sanghamitra Dutta (B. Tech. IIT Kharagpur) is a Ph.D. candidate at Carnegie Mellon University, USA. Her research interests revolve around machine learning, information theory, and statistics. She is currently focused on addressing the emerging reliability issues in machine learning concerning fairness, explainability, and law with recent publications at AAAI’20, ICML’20 (also featured in New Scientist). In her prior work, she has also examined problems in reliable computing, proposing novel algorithmic solutions for large-scale distributed machine learning in the presence of faults and failures, using tools from coding theory (an emerging area called “coded computing”). Her results on coded computing address problems that have been open for several decades and have received substantial attention from across communities (published at IEEE Transactions on Information Theory’19,’20, NeurIPS’16, AISTATS’18, IEEE BigData’18, ICML Workshop Spotlight’19, ISIT’17,’18, Proceedings of IEEE’20 along with two pending patents). She is a recipient of the 2020 Cylab Presidential Fellowship, 2019 K&L Gates Presidential Fellowship, 2019 Axel Berny Presidential Graduate Fellowship, 2017 Tan Endowed Graduate Fellowship, 2016 Prabhu and Poonam Goel Graduate Fellowship, the 2015 Best Undergraduate Project Award at IIT Kharagpur, and the 2014 HONDA Young Engineer and Scientist Award. She has also pursued research internships at IBM Research and Dataminr.