Communications and Signal Processing Seminar

Variations on a theme of Watkins

Vivek BorkarProfessorIndian Institute of Technology, Mumbai, India

Abstract:The talk will begin with an interpretation and derivation of Watkins’ Q-learning algorithm from a stochastic approximation point of view, highlighting in particular the `ODE’ (for `Ordinary Differential Equations’) approach. Then I shall introduce a `prospect theoretic’ version which is a variant of the one studied by Yun Shen in his Ph.D. thesis. This factors in a behavioral aspect. Specifically, one passes the future rewards through an S- shaped nonlinearity that models risk-aversion for rewards and risk-seeking for losses, relative to a reference point. The traditional convergence arguments no longer work, nevertheless one can say quite a bit about its qualitative behavior. If time permits, I shall speak briefly about a variant of the popular DQN algorithm that is theoretically better justified and in preliminary studies, given improved performance.

Speaker Bio:  Vivek Borkar obtained his B.Tech. (EE) from Indian Institute of Technology Bombay, M.S. (Systems and Control) from Case Western Reserve Uni., and Ph.D. (EECS) from Uni. of California, Berkeley, in 1976, 1977, 1980 resp. He has held positions in TIFR Centre for Applicable Mathematics and Indian Institute of Science in Bangalore, and Tata Institute of Fundamental Research and IIT Bombay in Mumbai. Since July 2020, he is an Emeritus Fellow at IIT Bombay with S. S. Bhatnagar Fellowship from the Council of Scientific and Industrial Research, Government of India. He is a Fellow of IEEE, AMS, TWAS and various science and engineering academies in India. He has won several awards in India and was an invited speaker at ICM 2006. His research interests are in the broad domains of stochastic optimization and control, spanning theory, algorithms, and applications.

Join Zoom Meeting

Meeting ID: 975 9857 1292

Passcode: XXXXXX (Will be sent via email to attendees)

See full seminar by Professor Borkar