Statistical learning methods in modern AI Conference RL Session Speakers
Michael R. Kosorok
Professor Michael R. Kosorok is the W.R. Kenan, Jr. Distinguished Professor, Department of Biostatistics, and Professor, Department of Statistics and Operations Research, at the University of North Carolina at Chapel Hill. Michael’s expertise is in biostatistics, data science, machine learning, artificial intelligence, and precision health. He is an expert on the theoretical properties underlying data analysis methods, especially in the areas of empirical processes and semiparametric inference, and is author of a book on the topic. He also has expertise in the application of biostatistics and data science to human health research, including cancer, cystic fibrosis, diabetes, and other health areas. He has pioneered machine learning and data mining tools for precision health and has co-edited a book (with Erica E. M. Moodie) on the topic. He is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the American Association for the Advancement of Science.
Title: Recent Machine Learning Developments for Multiple Outcomes in Precision Health
Abstract: Precision health is the science of data-driven decision support for improving health at the individual and population levels. This includes precision medicine and precision public health and circumscribes all health-related challenges which can benefit from a precision operations approach. This framework strives to develop study design, data collection and analysis tools to discover empirically valid solutions to optimize outcomes in both the short and long term. This includes utilizing multiple competing outcomes. In this presentation, we will discuss two approaches to doing this. In the first approach, we utilize patient preferences about the relative importance of the two outcomes. In the second approach, we utilize expert opinion when the experts may be subject to error. The second example involves an interesting new type of inverse reinforcement learning. The ideas will be illustrated with applications in mental health.
Susan Murphy is Professor of Statistics at Harvard University, Radcliffe Alumnae Professor at the Radcliffe Institute, Harvard University, and Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences. Her lab works on clinical trial designs and online learning algorithms in sequential decision making, in particular in the area of digital health. She developed the micro-randomized trial for use in constructing mobile health interventions which is in use across a broad range of health related areas. She is a 2013 MacArthur Fellow, a member of the National Academy of Sciences and the National Academy of Medicine, both of the US National Academies. She is a Past-President of IMS and of the Bernoulli Society and a former editor of the Annals of Statistics. She is a prior recipient of the RA Fisher Award from COPSS and was awarded the Guy Medal in Silver from the RSS.
Title: We used RL; but did it work?
Abstract: Reinforcement Learning provides an attractive suite of online learning methods for personalizing interventionsin a Digital Health.However after an reinforcement learning algorithm has been run in a clinical study, how do we assess whether personalization occurred? We might find users for whom it appears that the algorithm has indeed learned in which contexts the user is more responsive to a particular intervention. But could this have happened completely by chance? We discuss some first approaches to addressing these questions.
Rui Song is professor in the Department of Statistics at North Carolina State University. Her current research interests include Machine Learning, Causal Inference, Precision Health, Financial Econometrics. Her research has been supported as sole principle investigator by National Science Foundation (NSF).
Title: Statistical Inference for Online Decision Making via Stochastic Gradient Descent
Abstract: Online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. Since the decision rule should be updated once per step, an offline update which uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Focusing on the statistical inference of online decision making, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The proposed algorithm and theoretical results are tested by simulations and a real data application to news article recommendation.
Tony Qin is Principal Research Scientist and Director of the Decision Intelligence group at DiDi AI Labs, working on core problems in ridesharing marketplace optimization. Prior to DiDi, he was a research scientist in supply chain and inventory optimization at Walmart Global E-commerce. Tony received his Ph.D. in Operations Research from Columbia University. His research interests span optimization and machine learning, with a particular focus in reinforcement learning and its applications in operational optimization, digital marketing, and smart transportation. He has published in top-tier conferences and journals in machine learning and optimization and served as Program Committee of NeurIPS, ICML, AAAI, IJCAI, KDD, and a referee of top journals including PAMI and JMLR. He and his team received the INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice in 2019 and were selected for the NeurIPS 2018 Best Demo Awards. Tony holds more than 10 US patents in intelligent transportation, supply chain, and recommendation systems.
Title: Ride-hailing Marketplace Optimization: Reinforcement Learning Approaches
Abstract: With the rising prevalence of smart mobile phones in our daily life, online ride-hailing platforms have emerged as a viable solution to provide more timely and personalized transportation service, led by companies such as DiDi, Uber, and Lyft. These platforms also allow idle vehicle vacancy to be more effectively utilized to meet the growing need of on-demand transportation, by connecting potential mobility requests to available drivers. In this talk, we will describe our research on order dispatching and vehicle repositioning optimization for ride-hailing. We will first show offline reinforcement learning methods and results from a series of real-world field experiments. We will talk about simulation evaluation and our experience in hosting the KDD Cup last year. Finally, we will discuss our latest development of an on-policy framework that unifies order dispatching and vehicle repositioning.
Dr. Linglong Kong is an associate professor at the department of Mathematical and Statistical Sciences of the University of Alberta. He is a Canadian Research Chair in Statistical Learning. He has published more than 50 peer-reviewed manuscripts including top journals AOS, JASA and JRSSB, and top conferences ICML, ICDM, AAAI and IJCAI. Currently, Linglong is serving as associate editors of Journal of the American Statistical Association, International Journal of Imaging Systems and Technology, Canadian Journal of Statistics, member of the Board of Directors of the Statistics Society of Canada and Western North American Region of The International Biometric Society, the ASA Statistical Imaging Session program chair-past and the ASA Statistical Computing Session program chair-elect. His research interests include statistical machine learning, high-dimensional data analysis, neuroimaging data analysis, robust statistics and quantile regression.
Title: Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations
Abstract: In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to make suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. Firstly, we propose a general State-Noisy Markov Decision Process (SNMDP) to incorporate both random and adversarial state observation noises, in which the convergence and contraction of both expectation-based and distributional Bellman operator can be derived. Beyond State-Noisy MDP, we further theoretically characterize the impact of more flexible state noises on the Temporal-Difference (TD) learning by establishing more rigorous sufficient conditions for the convergence. Moreover, we analyze the sensitivity on estimated parameters of flexible state noises by the leverage of influence function. Finally, extensive experiments on the suite of games show that distributional RL enjoys better training robustness compared with its expectation-based counterpart across various state observation noises.