Meet Our Speakers
S. Kevin Zhou
Dr. S. Kevin Zhou obtained his PhD degree from University of Maryland, College Park. Currently he is a professor and executive dean of School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China (USTC) and an adjunct professor at Institute of Computing Technology, Chinese Academy of Sciences and Chinese University of Hong Kong (CUHK), Shenzhen. Prior to this, he was a principal expert and a senior R&D director at Siemens Healthcare Research. Dr. Zhou has published 240+ book chapters and peer-reviewed journal and conference papers, registered 140+ granted patents, written two research monographs, and edited three books. The two recent books he led the edition are entitled “Deep Learning for Medical Image Analysis, SK Zhou, H Greenspan, DG Shen (Eds.)” and “Handbook of Medical Image Computing and Computer Assisted Intervention, SK Zhou, D Rueckert, G Fichtinger (Eds.)”. He has won multiple awards including R&D 100 Award (Oscar of Invention), Siemens Inventor of the Year, and UMD ECE Distinguished Alumni Award. He has been a program co-chair for MICCAI2020, an editorial board member for IEEE Trans. Medical Imaging and Medical Image Analysis, and an area chair for AAAI, CVPR,ICCV, MICCAI, and NeurIPS. He has been elected as a treasurer and board member of the MICCAI Society, an advisory board member of MONAI (Medical Open Network for AI), and a fellow of AIMBE, IEEE, and NAI (National Academy of Inventors).
Title: Traits and Trends of AI in Medical Imaging
Abstract: Artificial intelligence or deep learning technologies have gained prevalence in solving medical imaging tasks. In this talk, we first review the traits that characterize medical images, such as multi-modalities, heterogeneous and isolated data, sparse and noisy labels, imbalanced samples. We then point out the necessity of a paradigm shift from “small task, big data” to “big task, small data”. Finally, we illustrate the trends of AI technologies in medical imaging and present a multitude of algorithms that attempt to address various aspects of “big task, small data”:
- Annotation-efficient methods that tackle medical image analysis without many labelled instances, including one-shot or label-free inference approaches.
- Universal models that learn “common + specific” feature representations for multi-domain tasks to unleash the potential of ‘bigger data’, which are formed by integrating multiple datasets associated with tasks of interest into one use.
- “Deep learning + knowledge modeling” approaches, which combine machine learning with domain knowledge to enable start-of-the-art performances for many tasks of medical image reconstruction, recognition, segmentation, and parsing.
Yaodong Yang
Dr. Yaodong is a machine learning researcher with ten-year working experience in both academia and industry. Currently, he is an assistant professor at Peking University. His research is about reinforcement learning and multi-agent systems. He has maintained a track record of more than forty publications at top conferences and journals, along with the best system paper award at CoRL 2020 and the best blue-sky paper award at AAMAS 2021. Before joining Peking University, he was an assistant professor at King’s College London. Before KCL, he was a principal research scientist at Huawei U.K. where he headed the multi-agent system team in London. Before Huawei, he was a senior research manager at AIG, working on AI applications in finance. He holds a Ph.D. degree from University College London, an M.Sc. degree from Imperial College London and a Bachelor degree from University of Science and Technology of China.
Title: Training a Population of Agents
Abstract: Recent advances in multiagent reinforcement learning have seen the introduction of a new learning paradigm that revolves around population-based training. The idea is to consider the structure of games not at the micro-level of individual actions, but at the meta-level of the which agent to train against for any given game or situation. A typical framework of population-based training is Policy Space Response Oracle (PSRO) method where, at each iteration, a new RL agent is discovered as the best response to a Nash mixture of agents from the opponent populations. PSRO methods can provably converge to Nash, correlated and coarse correlated equilibria in N-player games; particularly, they have showed remarkable performance on solving large-scale zero-sum games. In this tutorial, I will introduce the basic idea of PSRO methods, the necessity of using PSRO methods in solving real-world games such as Chess, the recent results on solving N-player games and mean-field games, and how to promote behavioral diversity during PSRO training. At last, I will introduce a meta-PSRO framework named Neural Auto-Curricula where we make AI learning to learn a PSRO-like solution algorithm purely from data.
Andrea Zanette
Dr. Andrea Zanette is a postdoctoral scholar at the Department for Computer Sciences and Electrical Engineering at the university of California at Berkeley, working primarily with Martin Wainwright on the foundations of Reinforcement Learning, a subarea of Artificial Intelligence that deals with decision making under uncertainty. Andrea completed his PhD (2017-2021) in the Institute for Computational and Mathematical Engineering (ICME) at Stanford University advised by prof Emma Brunskill and Mykel J. Kochenderfer. During his candidacy he also worked with Alessandro Lazaric from Facebook Artificial Intelligence Research and Alekh Agarwal from Microsoft Research. His PhD dissertation proposed algorithms to tackle modern Reinforcement Learning challenges such as exploration, function approximation, adaptivity, and learning from offline data; it was awarded the Gene Golub Doctoral Dissertation Award. Before starting his PhD, Andrea was a master’s student in the same department (2015-2017). He has a background in mechanical engineering, and worked in the civil construction sector and for M3E, developing high performance linear algebra software. He also spent some time at the The von Karman Institute for Fluid Dynamics, a NATO-affiliated international research establishment. More information about him can be found on https://azanette.com.
Title: Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Abstract: Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action value function of the actor’s policies; this is a more general setting than the low-rank MDP model. Despite the added generality, the procedure is computationally tractable as it involves the solution of a sequence of second-order programs. We prove an upper bound on the suboptimality gap of the policy returned by the procedure that depends on the data coverage of any arbitrary, possibly data dependent comparator policy. The achievable guarantee is complemented with a minimax lower bound that is matching up to logarithmic factors.
Xiaocheng Tang
Dr. Xiaocheng Tang is a senior staff research scientist at DiDi Labs and engineering manager in the autonomous vehicle team, working on core decision making problems in autonomous driving and the ride-hailing marketplace.
Since his graduate study Dr. Tang has been actively engaged in analyzing and designing practical intelligent algorithms at the intersection of machine learning, optimization, and most recently, RL and control. His work on joint optimization of order dispatching and reposition via reinforcement learning won Best Demo Awards at NeurIPS 2018.
He and his team received Daniel H. Wagner prize for Excellence in Operations Research practice at INFORMS 2019. His work on AutoML in collaboration with UCLA won Outstanding Paper Awards at ICLR 2021.
He has also served as PC of conferences and journals including NeurIPS, ICML, ICLR, AAAI, JMLR.
Title: Discussion on Andrea Zanette’s Talk
Lerrel Pinto
Lerrel Pinto is an Assistant Professor of Computer Science at NYU. His research interests focus on machine learning and computer vision for robots. He received a PhD degree from CMU in 2019; prior to that he received an MS degree from CMU in 2016, and a B.Tech in Mechanical Engineering from IIT-Guwahati. His work on large-scale robot learning received the Best Student Paper award at ICRA 2016 and a Best Paper finalist award at IROS 2019. Several of his works have been featured in popular media such as The Wall Street Journal, TechCrunch, MIT Tech Review, Wired, and BuzzFeed among others. His recent work can be found on www.lerrelpinto.com.
Title: Rethinking Representations for Robotics
Abstract: Even with the substantial progress we have seen in Robot Learning, we are nowhere near general purpose robots that can operate in the real world that we live in. There are two fundamental reasons for this. First, robots need to build concise representations in high-dimensional sensory observations often without access to explicit sources of supervision. Second, unlike standard supervised learning, they will need to solve long-horizon decision making problems. In this talk, I’ll propose a recipe for general purpose robot learning that combines ideas of self-supervision for representation learning with ideas in RL, adaptation, and imitation for decision making.
Xinrun Wang
Xinrun Wang is currently a Research Assistant Professor in the School of Computer Science and Engineering (SCSE) at the Nanyang Technological University (NTU), Singapore. He obtained his PhD degree in computer science from NTU in 2020, under the supervision of Associate Professor Bo An. His research interests include algorithmic game theory, reinforcement learning and multi-agent reinforcement learning.
Title: Discussion on Lerrel Pinto’s talk
Bin Dong
Dr. Bin Dong is an associate professor of the Beijing International Center for Mathematical Research at Peking University. He received his B.S. from Peking University in 2003, M.Sc from the National University of Singapore in 2005, and Ph.D. from the University of California Los Angeles in 2009. Dr. Bin Dong has made important contribution to the mathematical modeling and algorithmic design in image processing and data analysis. In particular, Dr. Dong and his collaborators have established a profound connection between two mathematical branches (PDE-based and wavelet-based approaches) that have been independently developed in the image field for nearly 30 years, changing some established assumptions and understandings about these two approaches in the field and also broadening their scope of application. These theoretical studies also led to a mathematical understanding of convolutional neural networks (CNNs), where Dr. Dong’s team explored structrual similarities between numerical schemes of differential equations and CNNs. Such understanding has led to further developments in modeling and algorithmic design in machine learning and imaging and opened up new and exciting applications of machine learning in scientific computing.In 2014, he received the Qiu Shi Outstanding Young Scholar Award from the Hong Kong Qiu Shi Science and Technologies Foundation. He is also an invited speaker of ICM2022.
Title: Some Applications of Deep Reinforcement Learning to Imaging and Numerical PDEs
Abstract: Deep reinforcement learning (DRL) has become one of the most popular and fastest developing fields of artificial intelligence. DRL methods are particularly effective in sequential decision-making and solving problems that are combinatorial in nature. Successful examples include game playing, combinatorial optimization, etc. In recent years, DRL has started to make an impact in computational imaging and scientific computing. In this talk, I will go over our work on designing DRL methods for solving PDEs, denoise images, and making adaptive scanning strategies for CT imaging.
Siwei Lyu
Siwei Lyu is a SUNY Empire Innovation Professor at the Department of Computer Science and Engineering of University at Buffalo, State University of New York. Before joining UB, Dr. Lyu was an Assistant Professor from 2008 to 2014, a tenured Associate Professor from 2014 to 2019, and a Full Professor from 2019 to 2020, at the Department of Computer Science, University at Albany, State University of New York. Dr. Lyu received his Ph.D. degree in Computer Science from Dartmouth College in 2005, and his M.S. degree in Computer Science in 2000, and B.S. degree in Information Science in 1997, both from Peking University, China. Dr. Lyu’s research interests include computer vision and machine learning. Dr. Lyu has published over 170 refereed journal and conference papers. He is the recipient of the IEEE Signal Processing Society Best Paper Award (2011), the National Science Foundation CAREER Award (2010), SUNY Albany’s Presidential Award for Excellence in Research and Creative Activities (2017), SUNY Chancellor’s Award for Excellence in Research and Creative Activities (2018), Google Faculty Research Award (2019), and IEEE Region 1 Technological Innovation (Academic) Award (2021). Dr. Lyu is a Fellow of IEEE.
Title: Image to image transform (I2IT) with deep reinforcement learning
Abstract: Many computer vision problems, such as registration, in-painting, generation, and style transfer, can be defined as learning an image-to-image transform (I2IT). Currently, the most effective I2IT solution is based on a one-step framework that generates images in a single run of a deep learning (DL) model. Directly learning I2IT with these DL models is challenging, due to the abundance of local minimums and poor generalization caused by overfitting. In addition, the learned models often have intrinsically high complexities, for which the optimal parameters (e.g. stage number and scale factor) have to be determined in a subjective and ad hoc manner. We explore solving I2IT problems by leveraging the recent advances in deep reinforcement learning (DRL). The key idea is to decompose the monolithic learning process into small steps by a lighter-weight CNN, with the aim of progressively improving the quality of the model. we propose a new DRL framework for I2IT problems, known as the Soft actor-executor-critic (SAEC), to handle high dimensional continuous state and action spaces. We demonstrate experimentally the effectiveness and robustness of this framework on different I2IT tasks, Neural Style Transfer, Face In-painting, Image Synthesis, and Deformable Image Registration.
Xin Wang
Dr.Xin Wang is currently a Senior Machine Learning Scientist at Keya Medical, Seattle, USA. He received his Ph.D. degree in Computer Science from the University at Albany, the State University of New York, NY in 2015. His research interests are in artificial intelligence, machine learning, medical image computing, computer vision, and media forensics. He is a senior member of IEEE.
Title: Image to image transform (I2IT) with deep reinforcement learning
Title: Discussion on Bin Dong’s talk.
Chengchun Shi
Dr. Chengchun Shi is an Assistant Professor in data science at London School of Economics and Political Science. In this early stage of his career, he has over 10 first-author peer-reviewed articles accepted in highly-ranked statistical journals: AOS, JRSSB and JASA. He also has papers published in highly-ranked machine learning conferences: ICML and NeurIPS. Starting from this year, he is serving as the associate editors of JRSSB and Journal of Nonparametric Statistics. His research focuses on developing statistical learning methods in reinforcement learning and analysis of complex data, with applications to healthcare, ridesharing and neuroimaging. He was the recipient of the Royal Statistical Society Research Prize in 2021. He also received the IMS travel awards in two consecutive years.
Title: Statistical inference in reinforcement learning
Abstract: Reinforcement learning (RL) is concerned with how intelligence agents take actions in a given environment to maximize the cumulative reward they receive. In healthcare, applying RL algorithms could assist patients in improving their health status. In ride-sharing platforms, applying RL algorithms could increase drivers’ income and customer satisfaction. RL has been arguably one of the most vibrant research frontiers in machine learning over the last few years. Nevertheless, statistics as a field, as opposed to computer science, has only recently begun to engage with reinforcement learning both in depth and in breadth. In today’s talk, I will discuss some of my recent work on developing statistical inferential tools for reinforcement learning, with applications to mobile health and ridesharing companies. The talk will cover several different papers published in highly-ranked statistical journals (JASA & JRSSB) and top machine learning conferences (ICML).
Zhengling Qi
Zhengling Qi is an assistant professor at the Department of Decision Sciences, George Washington University. He finished his PhD from the Department of Statistics and Operations Research at University of North Carolina, Chapel Hill. Before UNC, he studied at University of Michigan, Ann Arbor and Fudan University, Shanghai. Zhengling’s research interests include statistical reinforcement learning, non-convex optimization and causal inference. So far he has published several papers on highly ranked journals such as JASA, Mathematics of Operations Research, SIAM Journal on Optimization.
Title: Discussion on Chengchun Shi’s talk.
Xun Huan
Dr. Xun Huan is an Assistant Professor of Mechanical Engineering at the University of Michigan, where he leads the Uncertainty Quantification and Scientific Machine Learning Group and is affiliated with the Michigan Institutes for Computational Discovery and Engineering (MICDE), Michigan Institute for Data Science (MIDAS), Precision Health, and Applied Physics. Dr. Huan received a Ph.D. in Computational Science and Engineering from MIT, and was a postdoctoral researcher at MIT and Sandia National Laboratories. His research interests revolve around methods for optimal experimental design, Bayesian analysis, inverse reinforcement learning, and physics-aware data-driven modeling.
Title: Bayesian Sequential Optimal Experimental Design for Nonlinear Systems via Policy Gradient
Abstract: Experiments are indispensable for learning and developing models in engineering and science. A careful design of these often-expensive data-acquisition opportunities can be immensely beneficial. Simulation-based optimal experimental design, while leveraging a predictive model, provides a framework to systematically quantify and maximize the value of experiments. Here we focus on optimally designing a finite number of sequential experiments. We formulate this sequential optimal experimental design (sOED) problem as a finite-horizon partially observable Markov decision process (POMDP) in a Bayesian setting and with information-theoretic utilities, where the policy is both adaptive to newly collected data and forward-looking to future consequences. We parameterize the policy and value functions by neural networks thus adopting an actor-critic approach, and derive and prove the policy gradient expression. The policy is then improved using gradient estimates produced from simulated design and observation sequences. Our approach is built to handle continuous random variables, non-Gaussian posteriors, and expensive nonlinear forward models. The method is validated on a linear-Gaussian benchmark, and its advantages over batch and greedy designs are demonstrated through a contaminant source inversion problem in a convection-diffusion field. This is joint work with Wanggang Shen.
Shikai Luo
Dr. Shikai Luo graduated from the School of Mathematics of Nankai University in 2011. In 2016, he earned the Ph.D. degree from North Carolina State University. His research focuses on causal inference and reinforcement learning. He has published many papers in Journal of American Statistical Association, Journal of Machine Learning Research, Electronic Journal of Statistics, Statistical Methodology, NeuroImage, CIKM, etc. He has successively worked in Quantlab Financial, Didi, Tencent, and Bytedance. From 2016 to 2018, he used deep reinforcement learning at Quantlab Financial to develop high-, middle- and low-frequency trading strategies. From 2018 to 2020, he worked on using causal inference and reinforcement learning to optimize and evaluate core operating strategies. Dr. Luo joined Byte in 2021, mainly responsible for Toutiao’s user growth algorithm.
Title: Discussion on Xun Huan’s talk.
Jiayu Zhou
Dr. Jiayu Zhou is currently an Associate Professor in the Department of Computer Science and Engineering at Michigan State University. He received his Ph.D. degree in computer science from Arizona State University in 2014. Dr. Zhou has a broad research interest in large-scale machine learning, data mining, and biomedical informatics, with a focus on transfer and multi-task learning. His research has been funded by the National Science Foundation, National Institutes of Health, and Office of Naval Research, and published more than 100 peer-reviewed journal and conference papers in data mining and machine learning. Dr. Zhou is a recipient of the National Science Foundation CAREER Award (2018). His papers received the Best Student Paper Award in the 2014 IEEE International Conference on Data Mining (ICDM), the Best Student Paper Award at the 2016 International Symposium on Biomedical Imaging (ISBI), and Best Paper Award in the 2016 IEEE International Conference on Big Data (BigData).
Title: Advancements in Artificial Intelligence for Neurodegenerative Diseases
Abstract: There are over 9.9 million new cases of dementia each year worldwide, implying one new case every 3.2 seconds. The increased aging population will further exaggerate this number. The learning tasks in modeling neurodegenerative diseases are facing unique challenges from both data sparseness and complicated non-linear interactions between feature variables and target responses. In this talk, I will present my recent research on artificial intelligence approaches for decoding the mechanism of aging and neurodegenerative diseases using insights jointly derived from data and knowledge. For learning tasks from imaging markers, we developed the subspace network, which is an efficient deep modeling approach for non-linear multi-task learning from small datasets. Each layer of the subspace network performs multi-task learning to improve upon the predictions from the last layer via sketching a low-dimensional subspace to perform knowledge transfer among learning tasks. Empirical results demonstrate that the subspace network quickly picks up the correct parameter subspaces, and outperforms state-of-the-arts in predicting neurodegenerative clinical scores using the information in brain imaging. For learning with language markers, we introduce a novel reinforcement learning framework to train a dialogue agent that conducts efficient conversations with elder subjects and identifies early dementia. The agent is trained to sketch disease-specific lexical probability distribution, and thus to converse in a way that maximizes the diagnosis accuracy and minimizes the number of conversation turns. The results show that while using only a few turns of conversation, our framework can significantly outperform state-of-the-art supervised learning approaches.
Fei Wang
Dr. Fei Wang is an Associate Professor in Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, Cornell University. He got his PhD degree from Department of Automation, Tsinghua University of 2008. He is the recipient of the national excellent doctoral thesis award. Dr. Wang’s current major research interest is AI for health data science. He has published close to 300 papers on the top venues of related areas such as ICML, KDD, NIPS, CVPR, AAAI, IJCAI, JAMA Internal Medicine, Annals of Internal Medicine, Lancet Digital Health, Science Translational Medicine, etc. His papers have received over 18,000 citations so far with an H-index 66. His (or his students’) papers have won 8 best paper (or nomination) awards at top international conferences on data mining and medical informatics. His team won the championship of the NIPS/Kaggle Challenge on Classification of Clinically Actionable Genetic Mutations in 2017 and Parkinson’s Progression Markers’ Initiative data challenge organized by Michael J. Fox Foundation in 2016. Dr. Wang is the recipient of the NSF CAREER Award in 2018, as well as the inaugural research leadership award in IEEE International Conference on Health Informatics (ICHI) 2019. Dr. Wang is the past chair of the Knowledge Discovery and Data Mining working group in American Medical Informatics Association (AMIA). Dr. Wang is a fellow of AMIA and a distinguished member of ACM.
Title: Discussion on Jiayu Zhou’s Talk
Jim Dai
Jim Dai is the Leon C. Welch Professor of Engineering in the School of Operations Research and Information Engineering at Cornell University. He is also the Dean of School of Data Science at the Chinese University of Hong Kong, Shenzhen. From 1990 to 2012, he was on the faculty of Georgia Institute of Technology. Jim Dai received his BA and MA in Mathematics from Nanjing University and his Ph.D. in Mathematics from Stanford University. His research area is in applied probability, focusing on stochastic processing networks and recently on reinforcement learning. His research awards include the Erlang Prize and the ACM SIGMETRICS Achievement Award. He served as the Editor-In-Chief for Mathematics of Operations Research from 2012 to 2019.
Title: Scalable Deep Reinforcement Learning for Ride-Hailing
Abstract: Ride-hailing services, such as Didi Chuxing, Lyft, and Uber, arrange thousands of cars to meet ride requests throughout the day. We consider a Markov decision process (MDP) model of a ride-hailing service system, framing it as a reinforcement learning (RL) problem. The simultaneous control of many agents (cars) presents a challenge for the MDP optimization because the action space grows exponentially with the number of cars. We propose a special decomposition for the MDP actions by sequentially assigning tasks to the drivers. The new actions structure resolves the scalability problem and enables the use of deep RL algorithms for control policy optimization. We demonstrate the benefit of our proposed decomposition with numerical experiments in a ride-hailing model that is motivated from Didi Chuxing data. This is the joint work with Jiekun Feng and Mark Gluzman.
Linglong Kong
Dr. Linglong Kong is an associate professor at the department of Mathematical and Statistical Sciences of the University of Alberta. He is a Canadian Research Chair in Statistical Learning. He has published more than 50 peer-reviewed manuscripts including top journals AOS, JASA and JRSSB, and top conferences NeurIPS, ICML, ICDM, AAAI and IJCAI. Currently, He is serving as associate editors of Journal of the American Statistical Association, International Journal of Imaging Systems and Technology, Canadian Journal of Statistics, guest editor of the Frontiers of Neurosciences, member of the Board of Directors of the Statistics Society of Canada and Western North American Region of the International Biometric Society, and the ASA Statistical Computing Session program chair. Linglong served as a guest editor of Canadian Journal of Statistics and the ASA Statistical Imaging Session program chair. His research interests include functional and neuroimaging data analysis, statistical machine learning, robust statistics and quantile regression, reinforcement learning and artificial intelligence for smart health.
Title: Damped Anderson Mixing for Deep Reinforcement Learning and Applications
Abstract: Deep reinforcement learning (RL) has been widely used in a variety of challenging tasks, from the game playing to robot navigation. However, sample inefficiency and slow convergence rate, i.e. the required number of interactions with the environment and training time is impractically high, remain challenging problems in RL. To address these issues, we propose a general acceleration method for deep RL algorithms built on Anderson mixing, which is an effective approach to accelerating the iterates of the fixed point problems. Specifically, we provide deeper insights into the acceleration schemes in policy iteration by establishing a connection between Anderson mixing and quasi-Newton methods and proving that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. The effectiveness of our proposed method is evaluated on a variety of Atari games. Experiment results show that our proposed method enhances the convergence, stability, and performance of state-of-the-art deep RL algorithms.
Statistical learning methods in modern AI Conference RL Session Speakers
Michael R. Kosorok
Professor Michael R. Kosorok is the W.R. Kenan, Jr. Distinguished Professor, Department of Biostatistics, and Professor, Department of Statistics and Operations Research, at the University of North Carolina at Chapel Hill. Michael’s expertise is in biostatistics, data science, machine learning, artificial intelligence, and precision health. He is an expert on the theoretical properties underlying data analysis methods, especially in the areas of empirical processes and semiparametric inference, and is author of a book on the topic. He also has expertise in the application of biostatistics and data science to human health research, including cancer, cystic fibrosis, diabetes, and other health areas. He has pioneered machine learning and data mining tools for precision health and has co-edited a book (with Erica E. M. Moodie) on the topic. He is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the American Association for the Advancement of Science.
Title: Recent Machine Learning Developments for Multiple Outcomes in Precision Health
Abstract: Precision health is the science of data-driven decision support for improving health at the individual and population levels. This includes precision medicine and precision public health and circumscribes all health-related challenges which can benefit from a precision operations approach. This framework strives to develop study design, data collection and analysis tools to discover empirically valid solutions to optimize outcomes in both the short and long term. This includes utilizing multiple competing outcomes. In this presentation, we will discuss two approaches to doing this. In the first approach, we utilize patient preferences about the relative importance of the two outcomes. In the second approach, we utilize expert opinion when the experts may be subject to error. The second example involves an interesting new type of inverse reinforcement learning. The ideas will be illustrated with applications in mental health.
Susan Murphy
Susan Murphy is Professor of Statistics at Harvard University, Radcliffe Alumnae Professor at the Radcliffe Institute, Harvard University, and Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences. Her lab works on clinical trial designs and online learning algorithms in sequential decision making, in particular in the area of digital health. She developed the micro-randomized trial for use in constructing mobile health interventions which is in use across a broad range of health related areas. She is a 2013 MacArthur Fellow, a member of the National Academy of Sciences and the National Academy of Medicine, both of the US National Academies. She is a Past-President of IMS and of the Bernoulli Society and a former editor of the Annals of Statistics. She is a prior recipient of the RA Fisher Award from COPSS and was awarded the Guy Medal in Silver from the RSS.
Title: We used RL; but did it work?
Abstract: Reinforcement Learning provides an attractive suite of online learning methods for personalizing interventionsin a Digital Health.However after an reinforcement learning algorithm has been run in a clinical study, how do we assess whether personalization occurred? We might find users for whom it appears that the algorithm has indeed learned in which contexts the user is more responsive to a particular intervention. But could this have happened completely by chance? We discuss some first approaches to addressing these questions.
Rui Song
Rui Song is professor in the Department of Statistics at North Carolina State University. Her current research interests include Machine Learning, Causal Inference, Precision Health, Financial Econometrics. Her research has been supported as sole principle investigator by National Science Foundation (NSF).
Title: Statistical Inference for Online Decision Making via Stochastic Gradient Descent
Abstract: Online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. Since the decision rule should be updated once per step, an offline update which uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Focusing on the statistical inference of online decision making, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The proposed algorithm and theoretical results are tested by simulations and a real data application to news article recommendation.
Tony Qin
Tony Qin is Principal Research Scientist and Director of the Decision Intelligence group at DiDi AI Labs, working on core problems in ridesharing marketplace optimization. Prior to DiDi, he was a research scientist in supply chain and inventory optimization at Walmart Global E-commerce. Tony received his Ph.D. in Operations Research from Columbia University. His research interests span optimization and machine learning, with a particular focus in reinforcement learning and its applications in operational optimization, digital marketing, and smart transportation. He has published in top-tier conferences and journals in machine learning and optimization and served as Program Committee of NeurIPS, ICML, AAAI, IJCAI, KDD, and a referee of top journals including PAMI and JMLR. He and his team received the INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice in 2019 and were selected for the NeurIPS 2018 Best Demo Awards. Tony holds more than 10 US patents in intelligent transportation, supply chain, and recommendation systems.
Title: Ride-hailing Marketplace Optimization: Reinforcement Learning Approaches
Abstract: With the rising prevalence of smart mobile phones in our daily life, online ride-hailing platforms have emerged as a viable solution to provide more timely and personalized transportation service, led by companies such as DiDi, Uber, and Lyft. These platforms also allow idle vehicle vacancy to be more effectively utilized to meet the growing need of on-demand transportation, by connecting potential mobility requests to available drivers. In this talk, we will describe our research on order dispatching and vehicle repositioning optimization for ride-hailing. We will first show offline reinforcement learning methods and results from a series of real-world field experiments. We will talk about simulation evaluation and our experience in hosting the KDD Cup last year. Finally, we will discuss our latest development of an on-policy framework that unifies order dispatching and vehicle repositioning.
Linglong Kong
Dr. Linglong Kong is an associate professor at the department of Mathematical and Statistical Sciences of the University of Alberta. He is a Canadian Research Chair in Statistical Learning. He has published more than 50 peer-reviewed manuscripts including top journals AOS, JASA and JRSSB, and top conferences ICML, ICDM, AAAI and IJCAI. Currently, Linglong is serving as associate editors of Journal of the American Statistical Association, International Journal of Imaging Systems and Technology, Canadian Journal of Statistics, member of the Board of Directors of the Statistics Society of Canada and Western North American Region of The International Biometric Society, the ASA Statistical Imaging Session program chair-past and the ASA Statistical Computing Session program chair-elect. His research interests include statistical machine learning, high-dimensional data analysis, neuroimaging data analysis, robust statistics and quantile regression.
Title: Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations
Abstract: In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to make suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. Firstly, we propose a general State-Noisy Markov Decision Process (SNMDP) to incorporate both random and adversarial state observation noises, in which the convergence and contraction of both expectation-based and distributional Bellman operator can be derived. Beyond State-Noisy MDP, we further theoretically characterize the impact of more flexible state noises on the Temporal-Difference (TD) learning by establishing more rigorous sufficient conditions for the convergence. Moreover, we analyze the sensitivity on estimated parameters of flexible state noises by the leverage of influence function. Finally, extensive experiments on the suite of games show that distributional RL enjoys better training robustness compared with its expectation-based counterpart across various state observation noises.
Houssam Nassif
Houssam Nassif is a Principal Applied Scientist at Amazon, where he established and leads Amazon’s adaptive testing framework, researching, deploying and evangelizing bandits, with forays into reinforcement learning, causality, and diversity. Houssam started his career as a wet-lab biologist, before switching to computer sciences, earning his PhD in Artificial Intelligence from the University of Wisconsin – Madison. His early research spans biomedical informatics, statistical relational learning, and uplift modeling. Since joining Amazon in 2013, Houssam has been passionate about adaptive experimentation. He helped launch 27 business products across Amazon, Google, and Cisco, which generated $1.5 billion incremental yearly revenue. Houssam has published over 25 peer-reviewed papers in leading ML and biomedical informatics journals and conferences, and organized AISTATS’15. His work has been recognized with 4 paper awards, including from RecSys and KDD.
Title: Solving Inverse Reinforcement Learning, Bootstrapping Bandits
Abstract: This talk discusses three different ways we leveraged reward signals to inform recommendation. In Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions (ICML’20), we use deep energy-based policies to recover the true reward function in an Inverse Reinforcement Learning setting. We uniquely identify the reward function by assuming the existence of an anchor action with known reward, for example a do-nothing action with zero reward. In Decoupling Learning Rates Using Empirical Bayes (under review, arXiv), we devise an Empirical Bayes formulation that extracts an unbiased prior in hindsight from an experiment’s early reward signals. We apply this empirical prior to warm-start bandit recommendations and speed up convergence. In Seeker: Real-Time Interactive Search (KDD’19), we introduce a recommender system that adaptively refines search rankings in real time, through user interactions in the form of likes and dislikes. We extend Boltzmann bandit exploration to adapt to the interactively changing embedding space, and to factor-in the uncertainty of the reward estimates.
Zhaoran Wang
Zhaoran Wang is an assistant professor at Northwestern University, working at the interface of machine learning, statistics, and optimization. He is the recipient of the AISTATS (Artificial Intelligence and Statistics Conference) notable paper award, ASA (American Statistical Association) best student paper in statistical learning and data mining, INFORMS (Institute for Operations Research and the Management Sciences) best student paper finalist in data mining, Microsoft Ph.D. Fellowship, Simons-Berkeley/J.P. Morgan AI Research Fellowship, Amazon Machine Learning Research Award, and NSF CAREER Award.
Title: Is Pessimism Provably Efficient for Offline RL?
Abstract: Coupled with powerful function approximators such as deep neural networks, reinforcement learning (RL) achieves tremendous empirical successes. However, its theoretical understandings lag behind. In particular, it remains unclear how to provably attain the optimal policy with a finite regret or sample complexity. In the offline setting, we aim to learn the optimal policy based on a dataset collected a priori. Due to a lack of active interactions with the environment, we suffer from the insufficient coverage of the dataset. To maximally exploit the dataset, we propose a pessimistic least-squares value iteration algorithm, which achieves a minimax-optimal sample complexity.
Zhuoran Yang
Zhuoran Yang is a final-year Ph.D. student in the Department of Operations Research and Financial Engineering at Princeton University, advised by Professor Jianqing Fan and Professor Han Liu. Before attending Princeton, He obtained a Bachelor of Mathematics degree from Tsinghua University. His research interests lie in the interface between machine learning, statistics, and optimization. The primary goal of his research is to design a new generation of machine learning algorithms for large-scale and multi-agent decision-making problems, with both statistical and computational guarantees. Besides, he is also interested in the application of learning-based decision-making algorithms to real-world problems that arise in robotics, personalized medicine, and computational social science.
Title: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Abstract: The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural networks, and indeed there have been many empirical successes that have exploited such combinations in large-scale applications. There are profound challenges, however, in developing a theory to support this enterprise, most notably the need to take into consideration the exploration-exploitation tradeoff at the core of RL in conjunction with the computational and statistical tradeoffs that arise in modern function-approximation-based learning systems. We approach these challenges by studying an optimistic modification of the least-squares value iteration algorithm, in the context of the action-value function represented by a kernel function or an overparameterized neural network. We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs a sublinear regret which is independent of the number of states, a result which exhibits clearly the benefit of function approximation in RL.
Guanjie Zheng
Guanjie Zheng is an assistant professor at the John Hopcroft Center, Shanghai Jiao Tong University. His research interests lie in reinforcement learning (RL) and spatio-temporal data mining. His recent work focuses on how to learn optimal strategies for city-level traffic coordination from multi-modal data. He has published more than 20 papers on top-tier conferences, such as KDD, WWW, AAAI, ICDE, and CIKM.
Title: Improving Urban Traffic Signal Control via Reinforcement Learning
Abstract: Increasingly available city data and advanced learning techniques have empowered people to improve the efficiency of our city functions. Among them, improving the urban transportation efficiency is one of the most prominent topics. Recent studies have proposed to use reinforcement learning (RL) for traffic signal control. Different from traditional transportation approaches which rely heavily on prior knowledge, RL can learn directly from the feedback. On the other side, without a careful model design, existing RL methods typically take a long time to converge and the learned models may not be able to adapt to dynamic traffic scenarios. In this talk, we will cover three essential aspects in using reinforcement learning to attack traffic signal control problems: (1) typical solution framework; (2) state and reward design with connection to transportation theory; (3) communication and cooperation among multiple intersections. These considerations will help us build an effective and scalable reinforcement learning algorithm for city-level urban traffic signal control.
Keith Ross
Dr. Keith Ross has been the Dean of Engineering and Computer Science at NYU Shanghai since 2013. Previously he was a professor at NYU Tandon/Poly (10 years), University of Pennsylvania (13 years), and Eurecom Institute in France (5 years). He received a Ph.D. in Computer and Control Engineering from The University of Michigan. He is an ACM Fellow and an IEEE Fellow. His current research interests are in deep and tabular reinforcement learning. He has also worked in Internet privacy, peer-to-peer networking, Internet measurement, stochastic modeling of computer networks, queuing theory, and Markov decision processes. He is the co-author of the most popular textbook on computer networking. At NYU Shanghai he has been teaching Machine Learning, Reinforcement Learning, and Introduction to Computer Programming.
Title: Recent Advances in Sample Efficient DRL
Abstract: The performance of a DRL algorithm can be measured along many dimensions including: asymptotic performance; sample efficiency; computational efficiency; and simplicity and elegance. In this talk we will discuss two recent research projects in DRL algorithmic design. The first project is a new algorithm for on-policy DRL with safety constraints (spotlight paper at NeurIPS 2020); the second project is a highly sample-efficient off-policy DRL algorithm for environments with continuous action spaces (conference paper at ICLR 2021).
Mengdi Wang
Wang is an associate professor at the Department of Electrical Engineering and Center for Statistics and Machine Learning at Princeton University. She is also affiliated with the Department of Computer Science and a visiting research scientist at DeepMind. Her research focuses on data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, the Google Faculty Award in 2017, and the MIT Tech Review 35-Under-35 Innovation Award (China region) in 2018. She serves as an associate editor for Operations Research and Mathematics of Operations Research, as area chair for ICML, NeurIPS, AISTATS, and is on the editorial board of Journal of Machine Learning Research. Research supported by NSF, AFOSR, NIH, ONR, Google, Microsoft C3.ai DTI, FinUP.
Title: Compressive state representation learning towards small-data RL applications
Abstract: In this talk we survey recent advances on statistical efficiency and regret of reinforcement learning (RL) when good state representations are available. Motivated by the RL theory, we discuss what should be good state representations for RL and how to find compact state embeddings from high-dimensional Markov state trajectories. In the spirit of diffusion map for dynamical systems, we propose an efficient method for learning a low-dimensional state embedding and capturing the process’s dynamics. State embedding can be used to cluster states into metastable sets predict future dynamics, and enable generalizable downstream machine learning and reinforcement learning tasks. We demonstrated applications of the approach in games, clinical pathway optimization, single-cell biology and identification of gene markers for drug discovery.
Wenbin Lu
Dr. Wenbin Lu is Professor of Statistics at North Carolina State University. He obtained his Ph.D. from the Department of Statistics at Columbia University in 2003. His research interests include biostatistics, high-dimensional data analysis, statistical and machine learning methods for precision medicine, and network data analysis. He has published more than 100 papers in a variety of statistical journals, including Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society (Series B), Annals of Statistics, and Journal of Machine Learning Research. His research is partly funded by several grants from the National Institute of Health. He is an Associate Editor for Biostatistics, Biometrics and Statistica Sinica, and a fellow of American Statistical Association.
Title: Jump Q-Learning for Optimal Interval-Values Treatment Decision Rule
Abstract: An individualized decision rule (IDR) is a decision function that assigns each individual a given treatment based on his/her observed characteristics. Most of the existing works in the literature consider settings with binary or finitely many treatment options. In this work, we focus on the continuous treatment setting and propose a jump Q-learning to develop an individualized interval-valued decision rule (I2DR) that maximizes the expected outcome. Unlike IDRs that recommend a single treatment, the proposed I2DR yields an interval of treatment options for each individual, making it more flexible to implement in practice. To derive an optimal I2DR, our jump Q-learning method estimates the conditional mean of the response given the treatment and the covariates (the Q-function) via jump penalized regression, and derives the corresponding optimal I2DR based on the estimated Q-function. The regressor is allowed to be either linear for clear interpretation or deep neural network to model complex treatment-covariates interactions. To implement jump Q-learning, we develop a searching algorithm based on dynamic programming that efficiently computes the Q-function. Statistical properties of the resulting I2DR are established when the Q-function is either a piecewise or continuous function over the treatment space. We further develop a procedure to infer the mean outcome under the estimated optimal policy. Extensive simulations and a real data application to a warfarin study are conducted to demonstrate the empirical validity of the proposed I2DR.
Guanhua Chen
Dr. Guanhua Chen is an Assistant Professor of Biostatistics and Medical Informatics at the University of Wisconsin-Madison. He got his Ph.D. from the University of North Carolina at Chapel Hill in 2014 under the direction of Professor Michael R. Kosorok. His research focuses on developing statistical learning methods for clinical and biomedical research, with a particular emphasis on the discovery of complex patterns in omics data and electronic health record data to advance precision medicine.
Title: Discussion on Wenbin Lu’s Talk
Dong Zhang
Mr. Dong Zhang is a research assistant at Western University, Canada. He obtained his master’s degree from department of Biomedical Engineering at Western in 2020 and bachelor’s degree in Automation from Northwestern Polytechnical University in 2018. His research focuses on medical image processing, machine learning, and artificial intelligence. He published several papers in the highly referred conference and journal in medical image analysis.
Title: Deep reinforcement learning in medical object detection and segmentation
Abstract: Medical object detection and segmentation are crucial pre-processing steps in the clinical workflow for diagnosis and therapy planning. Deep reinforcement learning (DRL) as the newest artificial intelligence algorithm, how can we leverage DRL to improve the medical object detection and segmentation performance. In this talk, I will introduce the studies that we applied DRL into two challenging and representative medical object detection and segmentation tasks: 1) Sequential-conditional reinforcement learning for vertebral body detection and segmentation by modeling the spine anatomy with DRL; 2)Weakly-supervised teacher-student network for liver tumor segmentation from non-enhanced images by transferring knowledge from the enhanced images with DRL. The experiment indicates our methods are effective and outperform state-of-art deep learning methods. Overall, our studies improve object detection and segmentation accuracy and offer researchers a novel approach based on DRL in medical image analysis.
Shuo Li
Dr. Shuo Li is a pioneer in conducting multi-disciplinary research for imaging centered medical data analytics to enable artificial intelligence (AI) in healthcare. His current research focuses on the development of AI systems to solve the most challenging clinical and fundamental data analytics problems in radiology, urology, surgery, rehabilitation, and cancer, with an emphasis on the innovations of learning schemes (e.g. regression learning, deep learning, reinforcement learning). Dr. Li has significant amount of influence and research reputation internationally. He is a committee member in multiple highly influential conferences and societies. He is most notable for serving on the prestigious board of directors in the MICCAI society (2015-2023), where he is also the general chair for the MICCAI 2022 conference. He has over 200 publications, acted as the editor for six Springer books, and serves as an associate editor for several prestigious journals in the field. Throughout his career, he has received several awards from GE, various institutes and international organizations.
Title: Discussion on Dong Zhang’s Talk
Peng Wei
Peng Wei is an assistant professor in the Department of Mechanical and Aerospace Engineering at George Washington University, with courtesy appointments at Electrical and Computer Engineering Department and Computer Science Department. By contributing to the intersection of control, optimization, machine learning, and artificial intelligence, he develops autonomy and decision support tools for aeronautics, aviation and aerial robotics. His current focus is on safety, efficiency, and scalability of decision making systems in complex, uncertain and dynamic environments. Recent applications include: Air Traffic Control/Management (ATC/M), Airline Operations, UAS Traffic Management (UTM), eVTOL Urban Air Mobility (UAM) and Autonomous Drone Racing (ADR). Prof. Wei is leading the Intelligent Aerospace Systems Lab (IASL). He is an associate editor for AIAA Journal of Aerospace Information Systems. He received his Ph.D. degree in Aerospace Engineering from Purdue University in 2013 and his bachelor degree in Automation from Tsinghua University in 2007.
Title: Deep Multi-Agent Reinforcement Learning for Autonomous Urban Air Mobility
Abstract: Urban Air Mobility (UAM) is an envisioned air transportation concept, where intelligent flying machines could safely and efficiently transport passengers and cargo within urban areas by rising above traffic congestion on the ground. How can we design and build a real-time, trustworthy, safety-critical autonomous UAM separation assurance tool to enable large-scale flight operations in high-density, dynamic and complex urban airspace environments? In this talk the speaker will present studies to address this critical research challenge using multi-agent reinforcement learning and attention networks.
Zhiyuan Liu
Dr Zhiyuan (Terry) Liu is currently a Professor and Vice Dean in the School of Transportation at Southeast University, Nanjing China. He received his PhD degree from National University of Singapore (NUS). From 2012 to 2015, he was a lecturer in Monash University Australia. In 2018, He was a visiting scholar in the School of Mathematics and Statistics, University of Melbourne. His research interests include Transportation Data Analysis, Transportation Network Modelling, Public Transport, Intelligent Transport Systems. In these areas, he has published more than 100 SCI/SSCI papers. He is an associate editor of IET Intelligent Transport System and ASCE Journal of Transportation Engineering, and also serves the editorial board of three international journals, Transportation Research Part E; Transportation Research Record; Journal of Transport and Land Use.
Title: Urban Transport Simulation Using Reinforcement Learning
Abstract: Simulation technology has been widely used in the field of transportation. However, existing simulation packages mainly focus on one aspect of the transport system, to provide only a macro- or microscopic view of the analysis. Integration of macro and micro simulation is of considerable significance for urban transport studies. Thus, this study addresses the next generation of urban transport simulation, where artificial intelligence (AI) techniques, especially Reinforcement Learning (RL) is a key backbone. The new simulation platform will also support advanced traffic applications such as vehicle-road collaborative systems (SVIS) and automatic driving systems, and multi-agent simulation. Compared with traditional mathematical modeling and optimization methods, the RL-based simulation has great advantages in traffic modeling and simulation. The new generation of traffic simulation software based on reinforcement learning will have more ability to create or reconstruct traffic scenarios with high precision. With an integration of all the transport sub-systems with high precision and compatibility, this new platform is also deemed as a transport digital twin platform, and such a concept is introduced in the talk.
Yuxi Li
Yuxi Li, author of the 150 pages Deep Reinforcement Learning: An Overview, at https://bit.ly/2AidXm1, is writing a book about reinforcement learning applications. He is the lead guest editor for a Machine Learning Special Issue, the lead co-chairs for an ICML 2019 Workshop and a 2020 virtual workshop, all on reinforcement learning for real life. He was a co-organizer for AI Frontiers Conference in Silicon Valley in 2017 and 2018. He has published refereed papers at venues such as NIPS, AISTATS, and INFOCOM. He serves as TPC Members/reviewers for conferences and journals, like AAAI 2019-2021, ACM Computing Surveys, TKDD, and PLOS ONE. He obtained the PhD in computer science from the University of Alberta and was a postdoc there. He was an associate professor in China and a senior data scientist in US. He founded attain.ai in Canada.
Title: Reinforcement Learning Applications
Abstract: What is the most exciting AI news in recent years? AlphaGo! What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)? What are application areas for RL? A lot! In fact, besides games, RL has been making tremendous achievements in diverse areas like recommenders and robotics. In this talk, we will introduce RL briefly, present several RL applications, and discuss issues for successfully applying RL in real life scenarios.
Yanhua Li
Prof. Yanhua Li received two Ph.D. degrees in computer science from University of Minnesota at Twin Cities in 2013, and in electrical engineering from Beijing University of Posts and Telecommunications, Beijing in China in 2009, respectively. He joined Department of Computer Science at Worcester Polytechnic Institute (WPI) as an assistant professor since fall 2015. His research interests are artificial intelligence and data science, with applications in smart cities in many contexts, including spatial-temporal data analytics, urban planning and optimization. Recently, Dr. Li focuses on developing data-driven approaches to inversely learn and influence the decision-making strategies of urban travelers, who take public transits, taxis, sharing bikes, etc. Dr. Li is a recipient of NSF CAREER and CRII Awards. (http://www.wpi.edu/~yli15/)
Title: Decision Analysis from Human-Generated Spatial-Temporal Data
Abstract: With the fast development of mobile sensing and information technology, large volumes of human-generated spatio-temporal data (HSTD) are increasingly collected, including taxi GPS trajectories, passenger trip data from automated fare collection (AFC) devices on buses and trains, and working traces from the emerging gig-economy services, such as food delivery (DoorDash, Postmates), and everyday tasks (TaskRabbit). Such HSTD capture unique decision-making strategies of the “data generators” (e.g., gig-workers, taxi drivers). Harnessing HSTD to characterize unique decision-making strategies of human agents has transformative potential in many applications, including promoting individual well-being of gig-workers, and improving service quality and revenue of transportation service providers. In this talk, I will introduce a spatial-temporal imitation learning framework for inversely learning and “imitating” the decision-making strategies of human agents from their HSTD, and present our recent works on analyzing taxi drivers’ passenger-seeking strategies and public transit travelers’ route choice strategies. Moreover, I will discuss key design challenges in spatial-temporal imitation learning, and outline various future applications in targeted training, incentive, and planning mechanisms that enhance the well-being of urban dwellers and society in terms of income level, travel and living convenience.
Haipeng Chen
Haipeng Chen is a postdoc in the Computer Science Department, Harvard University. Before that, he did his first postdoc in the Computer Science Department at Dartmouth College and obtained the PhD from Interdsciplinary Graduate School, Nanyang Technological University, Singapore in 2018. His research lies in the general areas of Artificial Intelligence, including machine learning, data mining, and algorithmic game theory, as well as their applications towards social good. He was winner for the 2017 Microsoft Malmo Collaborative AI Challenge, and runner-up for the Innovation Demonstration Award of IJCAI’19. He has published multiple papers in top conferences such as AAAI, IJCAI, AAMAS, UAI, KDD, ICDM. He serves as program committee member for top AI conferences such as Neurips, ICLR, AAAI, IJCAI and AAMAS, and is co-organizer for ICLR’2021 workshop on Synthetic Data Generation.
Title: Discussion on Yanhua Li’s Talk
Liam Paull
Liam Paull is an assistant professor at l’Université de Montréal and the head of the Montreal Robotics and Embodied AI Lab (REAL), and holds a Canada AI Chair. His lab focuses on robotics problems including building representations of the world (such as for simultaneous localization and mapping), modeling of uncertainty, and building better workflows to teach robotic agents new tasks (such as through simulation or demonstration). Previous to this, Liam was a research scientist at CSAIL MIT where he led the TRI funded autonomous car project. He was also a postdoc in the marine robotics lab at MIT where he worked on SLAM for underwater robots. He obtained his PhD from the University of New Brunswick in 2013 where he worked on robust and adaptive planning for underwater vehicles. He is a co-founder and director of the Duckietown Foundation, which is dedicated to making engaging robotics learning experiences accessible to everyone. The Duckietown class was originally taught at MIT but now the platform is used at numerous institutions worldwide.
Title: Training Robotics in Simulators
Abstract: Reinforcement learning is an appealing approach to developing robot capabilities. It is flexible and general. However, there are some particular challenges with respect to training RL agents on real physically embodied systems. For example: RL training tends to be quite inneficient and performing rollouts on a real robot system is expensive, real world environments don’t automatically reset, and real world environments don’t necessarily provide a reward signal to the agent explicitly. To overcome these challenges, training agents in simulators is appealing. However, the new problem becomes ensuring that an agent trained in a simulator generalizes to the real environment, the so-called sim2real problem. In this talk we will present two paradigms for tackling the sim2real, which we refer to as “Learn to Transfer” and “Learn to Generalize”. We will also outline some future directions that we are pursuing in the Montreal Robotics and Embodied AI Lab (REAL) in this direction. Finally, I will also briefly describe our AI Driving Olympics project in connection to the problem of robotics benchmarking and “sim2real” transfer.
Nick Rhinehart
Nick Rhinehart is a Postdoctoral Scholar in the Electrical Engineering and Computer Science Department at the University of California, Berkeley with Sergey Levine. His work focuses on fundamental and applied research in machine learning and computer vision for behavioral forecasting and control in complex environments, with an emphasis on imitation learning, reinforcement learning, and deep learning methods. Applications of his work include autonomous navigation, robotic manipulation, and first-person video. He received a Ph.D. in Robotics from Carnegie Mellon University with Kris Kitani, and B.S. and B.A. degrees in Engineering and Computer Science from Swarthmore College. Nick’s work has been honored with a Best Paper Award at the ICML 2019 Workshop on AI for Autonomous Driving and a Best Paper Honorable Mention Award at ICCV 2017. His work has been published at a variety of top-tier venues in machine learning, computer vision, and robotics, including AAMAS, CoRL, CVPR, ECCV, ICCV, ICLR, ICML, ICRA, NeurIPS, and PAMI. You can learn more about his work at https://people.eecs.berkeley.edu/~nrhinehart/.
Title: Jointly Forecasting and Controlling Behavior by Learning From High-Dimensional Data
Abstract: A primary goal of many scientific and engineering disciplines is to develop accurate predictive models. Predictive models are also critical to human intelligence, as they enable us to plan behaviors by reasoning about how actions affect the world around us. These models are especially useful when they can accurately predict the future behaviors of other agents, which enables planning in their presence. In this talk, I will describe some of my research on developing learning-based models to jointly perform forecasting, planning, and control in a unified framework that draws inspiration from concepts in Imitation Learning and Reinforcement Learning. I will show how these models can be learned to make accurate predictions and decisions in the presence of rich perceptual input, and demonstrate their application to single- and multi-agent settings in first-person video, robotic manipulation, and autonomous navigation.
Nathan Kallus
Nathan Kallus is an Assistant Professor in the School of Operations Research and Information Engineering and Cornell Tech at Cornell University. Nathan’s research interests include personalization; optimization, especially under uncertainty; causal inference; sequential decision making; credible and robust inference; and algorithmic fairness. He holds a PhD in Operations Research from MIT as well as a BA in Mathematics and a BS in Computer Science both from UC Berkeley. Before coming to Cornell, Nathan was a Visiting Scholar at USC’s Department of Data Sciences and Operations and a Postdoctoral Associate at MIT’s Operations Research and Statistics group.
Title: Statistically Efficient Offline Reinforcement Learning
Abstract: Offline reinforcement learning (RL), wherein one uses existing off-policy data to evaluate and learn new policies, is crucial in applications where experimentation is limited and simulation unreliable, such as medicine. But offline RL is also notoriously difficult because the similarity between the trajectories observed and those generated by any proposed policy diminishes exponentially as horizon grows, known as the curse of horizon, which has severely limited the application of offline RL whenever horizons are moderate to long or even infinite. To understand this limitation, we study the statistical efficiency limits of two central tasks in offline reinforcement learning: estimating policy value and policy gradient from off-policy data. This reveals that the curse is insurmountable without leveraging Markov structure — and as such plagues the standard doubly-robust estimators — but may be overcome in Markov and stationary settings. We develop the first estimators achieving the efficiency limits in finite- and infinite-horizon MDPs using a meta-algorithm we term Double Reinforcement Learning (DRL). We provide favorable guarantees for DRL and for off-policy policy optimization via ascending our efficiently-estimated policy gradient.
Chengchun Shi
Chengchun Shi is Assistant Professor of Data Science at London School of Economics and Political Science. His research interests include: (i) statistical methods in reinforcement learning; (ii) statistical analysis of complex data. Despite his age, he has over 10 papers published/accepted at Annals of Statistics, Journal of Americal Statistical Association, Journal of the Royal Statistical Society (Series B), Journal of the Machine Learning Research and the International Conference on Machine Learning. Before he joined LSE, he obtained a PhD at North Carolina State University.
Title: Discussion on Nathan Kallus’s Talk
Eric Laber
Eric Laber is the Goodnight Distinguished Professor and Faculty Scholar in the department of Statistics at NC State University. He joined NC State after completing his PhD at the University of Michigan in 2011. His research focuses on methods development for data-driven decision making with applications in precision public health, defense, sports/e-sports, and inventory management. He is also passionate about K-12 STEM Outreach. He served as director of research translation and engagement from 2016-2019 for the College of Sciences at NC State. You can learn more about his research and outreach at: Laber-Labs.com.
Title: Partially observable Markov Decision Processes as a Model for Chronic Illness
Abstract: Observational longitudinal studies are a common means to study treatment efficacy and safety in chronic mental illness. In many such studies, treatment changes may be initiated by either the patient or by their clinician and can thus vary widely across patients in their timing, number, and type. Indeed, in the observational longitudinal pathway of the STEP-BD study of bipolar depression, one of the motivations for this work, no two patients have the same treatment history even after coarsening clinic visits to a weekly time-scale. Estimation of an optimal treatment regime using such data is challenging as one cannot naively pool together patients with the same treatment history, as is required by methods based on inverse probability weighting, nor is it possible to apply backwards induction over the decision points, as is done in Q-learning and its variants. Thus, additional structure is needed to effectively pool information across patients and within a patient over time. Current scientific theory for many chronic mental illnesses maintains that a patient’s disease status can be conceptualized as transitioning among a small number of discrete states. We use this theory to inform the construction of a partially observable Markov decision process model of patient health trajectories wherein observed health outcomes are dictated by a patient’s latent health state. Using this model, we derive an estimator of an optimal treatment regime under two common paradigms for quantifying long-term patient health. The finite sample performance of the proposed estimator is demonstrated through a series of simulation experiments and application to the observational pathway of the STEP-BD study. We find that the proposed method provides high-quality estimates of an optimal treatment strategy in settings where existing approaches cannot be applied without {em ad hoc} modifications.
Fanyou Wu
Fanyou Wu is now a Ph.D. candidate in Forestry and Natural Resources Department, Purdue University. His research focuses on the application of machine learning in forestry and transportation, and has published several paper in those fields. He has also won many championships and runners-up in machine learning related competitions, including the title of JDD (2019), the tournament of IJCAI Adversarial AI Challenge (2019), and champion of KDD Cup (2020).
Title: KDD Cup 2020 RL Track Winners Presentation – Part II
Abstract: Machine Learning competitions are often considered as bridges between industries and researches. Leading top solutions have often become state-of-the-art methods in the real world. In this presentation, I want to share some experiences about those competitions and use the vehicle dispatching task in the KDD RL track as the main example. The vehicle dispatching system has always been one of the most critical problems in online taxi-hailing platforms to adapt the operation and management strategy to demand and supply dynamics. In the KDD competition, my team used a single agent deep reinforcement learning approach for vehicle repositioning by deploying idle vehicles to specific locations to anticipate future demand at the destination . A global pruned action space, which encompasses a set of discrete actions, is used in this approach. It can benefit drivers by avoiding traveling to distant outskirts where there are few order requests. In addition, my team designed a simulator using the Julia programming language, which brings about over ten times optimization in speed compared with the Python simulator implementation.
Yansheng Wang
Yansheng Wang is currently a first year Ph.D. candidate in School of Computer Science and Engineering at Beihang University. He is working on crowd intelligence, spatial crowdsourcing and reinforcement learning. He has published several papers in highly refereed conferences and journals such as ICDE, AAAI and Neurocomputing. He is the team leader of the champion team in KDD CUP 2020 RL track.
Title: KDD Cup 2020 RL Track Winners Presentation – Part I
Abstract: The development of the sharing economy and mobile Internet has stimulated an explosion of real-world dynamic ridesharing applications. Among them the order dispatching is vital to ridesharing platforms. Given dynamic input of orders and available drivers, order dispatching aims to assign drivers to suitable orders with the objective of maximizing the overall platform revenue. In this talk, I will introduce two types of reinforcement learning (RL) based approaches to solve the problem. First, I will present an adaptive batch-based approach, where RL is applied to decide the batch sizes. Then I will elaborate on another fixed batch-based approach, where we use RL to guide the in-batch matching decisions, which is also the champion solution of order dispatching tasks in KDD CUP 2020 RL track. Finally, I will highlight some other research challenges in RL-based order dispatching in the future.
Tony Qin
Tony Qin is Principal Research Scientist and Director of the reinforcement learning group at DiDi AI Labs, working on core problems in ridesharing marketplace optimization. Prior to DiDi, he was a research scientist in supply chain and inventory optimization at Walmart Global E-commerce. Tony received his Ph.D. in Operations Research from Columbia University. His research interests span optimization and machine learning, with a particular focus in reinforcement learning and its applications in operational optimization, digital marketing, and smart transportation. He has published and served a program committee member in numerous top-tier conferences and journals in machine learning and optimization. He and his team received the INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice in 2019 and were selected for the NeurIPS 2018 Best Demo Awards. Tony holds more than 10 US patents in intelligent transportation and E-commerce systems.
Title: Deep Reinforcement Learning in a Ride-sharing Marketplace
Abstract: With the rising prevalence of smart mobile phones in our daily life, online ride-hailing platforms have emerged as a viable solution to provide more timely and personalized transportation service, led by such companies as DiDi, Uber, and Lyft. These platforms also allow idle vehicle vacancy to be more effectively utilized to meet the growing need of on-demand transportation, by connecting potential mobility requests to eligible drivers. In this talk, we will describe our train of research on ride-hailing marketplace optimization at DiDi, in particular, order dispatching and vehicle repositioning. We will show the development of the spatiotemporal contextual value network and how it is used in order dispatching policy generation and decision-time planning in vehicle repositioning.
Michael R. Kosorok
Michael R. Kosorok, Ph.D., the W.R. Kenan, Jr. Distinguished Professor of Biostatistics and Professor of Statistics and Operations Research at the University of North Carolina at Chapel Hill, received his PhD in Biostatistics from the University of Washington in 1991. He is an internationally known biostatistician and a prominent expert in data science, machine learning and precision medicine. He is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the American Association for the Advancement of Sciences. He has published over 170 peer-reviewed articles, written a major text on the theoretical foundations of empirical processes and semiparametric inferences (Kosorok, 2008, Springer), and co-edited (with Erica E.M. Moodie, 2016, ASA-SIAM) a research monograph on dynamic treatment regimens and precision medicine.
Title: Off-Policy Reinforcement Learning for Estimation of Optimal Treatment Regime
Abstract: In this presentation, we introduce off-policy reinforcement learning in the context of estimating an optimal treatment regime for a finite sequence of decision times. We introduce and discuss dynamic treatment regimes in this context, backward induction, Q-learning and A-learning.
Presentation Slides: 09/24/2020 Michael Presentation Slides
Susan Murphy
Susan Murphy is Professor of Statistics at Harvard University, Radcliffe Alumnae Professor at the Radcliffe Institute, Harvard University, and Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences. Her lab works on clinical trial designs and online learning algorithms for developing personalized mobile health interventions. She is a 2013 MacArthur Fellow, a member of the National Academy of Sciences and the National Academy of Medicine, both of the US National Academies. She is currently President of the Institute of Mathematical Statistics.
Title: Intelligent Pooling: Practical Thompson Sampling for mHealth
Abstract: In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In particular, individuals who are in the same context can exhibit differential response to treatments yet only a limited amount of data is available for learning on any one individual. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop Intelligent Pooling. Intelligent Pooling uses empirical Bayes methods to update each user’s degree of personalization while making use of available data on other users to speed up learning. In this talk we discuss associated computational challenges.
Bo An
Bo An is a President’s Council Chair Associate Professor in Computer Science and Engineering, Nanyang Technological University, Singapore. He received the Ph.D degree in Computer Science from the University of Massachusetts, Amherst. His current research interests include artificial intelligence, multiagent systems, computational game theory, reinforcement learning, and optimization. Dr. An was the recipient of the 2010 IFAAMAS Victor Lesser Distinguished Dissertation Award, an Operational Excellence Award from the Commander, First Coast Guard District of the United States, the 2012 INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice, and 2018 Nanyang Research Award (Young Investigator). His publications won the Best Innovative Application Paper Award at AAMAS’12 and the Innovative Application Award at IAAI’16. He was invited to give Early Career Spotlight talk at IJCAI’17. He led the team HogRider which won the 2017 Microsoft Collaborative AI Challenge. He was named to IEEE Intelligent Systems’ “AI’s 10 to Watch” list for 2018. He is PC Co-Chair of AAMAS’20. He is a member of the editorial board of JAIR and the Associate Editor of JAAMAS, IEEE Intelligent Systems, and ACM TIST. He was elected to the board of directors of IFAAMAS and senior member of AAAI.
Title: Reinforcement Learning in Competitive Environment
Abstract: For some complex domains with strategic interaction, reinforcement learning have been successfully used to learn efficient policies. This talk will discuss key techniques behind these success and their applications in domains including games, e-commerce, and urban planning.