Meet Our Speakers

Houssam
YouTube Video Link
Bilibili Video Link

Houssam Nassif

Houssam Nassif is a Principal Applied Scientist at Amazon, where he established and leads Amazon’s adaptive testing framework, researching, deploying and evangelizing bandits, with forays into reinforcement learning, causality, and diversity. Houssam started his career as a wet-lab biologist, before switching to computer sciences, earning his PhD in Artificial Intelligence from the University of Wisconsin – Madison. His early research spans biomedical informatics, statistical relational learning, and uplift modeling. Since joining Amazon in 2013, Houssam has been passionate about adaptive experimentation. He helped launch 27 business products across Amazon, Google, and Cisco, which generated $1.5 billion incremental yearly revenue. Houssam has published over 25 peer-reviewed papers in leading ML and biomedical informatics journals and conferences, and organized AISTATS’15. His work has been recognized with 4 paper awards, including from RecSys and KDD.

Title: Solving Inverse Reinforcement Learning, Bootstrapping Bandits

Abstract: This talk discusses three different ways we leveraged reward signals to inform recommendation. In Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions (ICML’20), we use deep energy-based policies to recover the true reward function in an Inverse Reinforcement Learning setting. We uniquely identify the reward function by assuming the existence of an anchor action with known reward, for example a do-nothing action with zero reward. In Decoupling Learning Rates Using Empirical Bayes (under review, arXiv), we devise an Empirical Bayes formulation that extracts an unbiased prior in hindsight from an experiment’s early reward signals. We apply this empirical prior to warm-start bandit recommendations and speed up convergence. In Seeker: Real-Time Interactive Search (KDD’19), we introduce a recommender system that adaptively refines search rankings in real time, through user interactions in the form of likes and dislikes. We extend Boltzmann bandit exploration to adapt to the interactively changing embedding space, and to factor-in the uncertainty of the reward estimates.

YouTube Video Link
Bilibili Video Link

Zhaoran Wang

Zhaoran Wang is an assistant professor at Northwestern University, working at the interface of machine learning, statistics, and optimization. He is the recipient of the AISTATS (Artificial Intelligence and Statistics Conference) notable paper award, ASA (American Statistical Association) best student paper in statistical learning and data mining, INFORMS (Institute for Operations Research and the Management Sciences) best student paper finalist in data mining, Microsoft Ph.D. Fellowship, Simons-Berkeley/J.P. Morgan AI Research Fellowship, Amazon Machine Learning Research Award, and NSF CAREER Award.

Title: Is Pessimism Provably Efficient for Offline RL?

Abstract: Coupled with powerful function approximators such as deep neural networks, reinforcement learning (RL) achieves tremendous empirical successes. However, its theoretical understandings lag behind. In particular, it remains unclear how to provably attain the optimal policy with a finite regret or sample complexity. In the offline setting, we aim to learn the optimal policy based on a dataset collected a priori. Due to a lack of active interactions with the environment, we suffer from the insufficient coverage of the dataset. To maximally exploit the dataset, we propose a pessimistic least-squares value iteration algorithm, which achieves a minimax-optimal sample complexity.

Zhuoran
YouTube Video Link
Bilibili Video Link

Zhuoran Yang

Zhuoran Yang is a final-year Ph.D. student in the Department of Operations Research and Financial Engineering at Princeton University, advised by Professor Jianqing Fan and Professor Han Liu. Before attending Princeton, He obtained a Bachelor of Mathematics degree from Tsinghua University. His research interests lie in the interface between machine learning, statistics, and optimization. The primary goal of his research is to design a new generation of machine learning algorithms for large-scale and multi-agent decision-making problems, with both statistical and computational guarantees. Besides, he is also interested in the application of learning-based decision-making algorithms to real-world problems that arise in robotics, personalized medicine, and computational social science.

Title: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

Abstract: The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural networks, and indeed there have been many empirical successes that have exploited such combinations in large-scale applications. There are profound challenges, however, in developing a theory to support this enterprise, most notably the need to take into consideration the exploration-exploitation tradeoff at the core of RL in conjunction with the computational and statistical tradeoffs that arise in modern function-approximation-based learning systems. We approach these challenges by studying an optimistic modification of the least-squares value iteration algorithm, in the context of the action-value function represented by a kernel function or an overparameterized neural network. We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs a sublinear regret which is independent of the number of states, a result which exhibits clearly the benefit of function approximation in RL.

YouTube Video Link
Bilibili Video Link

Guanjie Zheng

Guanjie Zheng is an assistant professor at the John Hopcroft Center, Shanghai Jiao Tong University. His research interests lie in reinforcement learning (RL) and spatio-temporal data mining. His recent work focuses on how to learn optimal strategies for city-level traffic coordination from multi-modal data. He has published more than 20 papers on top-tier conferences, such as KDD, WWW, AAAI, ICDE, and CIKM.

Title: Improving Urban Traffic Signal Control via Reinforcement Learning

Abstract: Increasingly available city data and advanced learning techniques have empowered people to improve the efficiency of our city functions. Among them, improving the urban transportation efficiency is one of the most prominent topics. Recent studies have proposed to use reinforcement learning (RL) for traffic signal control. Different from traditional transportation approaches which rely heavily on prior knowledge, RL can learn directly from the feedback. On the other side, without a careful model design, existing RL methods typically take a long time to converge and the learned models may not be able to adapt to dynamic traffic scenarios. In this talk, we will cover three essential aspects in using reinforcement learning to attack traffic signal control problems: (1) typical solution framework; (2) state and reward design with connection to transportation theory; (3) communication and cooperation among multiple intersections. These considerations will help us build an effective and scalable reinforcement learning algorithm for city-level urban traffic signal control.

YouTube Video Link
Bilibili Video Link

Keith Ross

Dr. Keith Ross has been the Dean of Engineering and Computer Science at NYU Shanghai since 2013. Previously he was a professor at NYU Tandon/Poly (10 years), University of Pennsylvania (13 years), and Eurecom Institute in France (5 years). He received a Ph.D. in Computer and Control Engineering from The University of Michigan. He is an ACM Fellow and an IEEE Fellow. His current research interests are in deep and tabular reinforcement learning. He has also worked in Internet privacy, peer-to-peer networking, Internet measurement, stochastic modeling of computer networks, queuing theory, and Markov decision processes. He is the co-author of the most popular textbook on computer networking. At NYU Shanghai he has been teaching Machine Learning, Reinforcement Learning, and Introduction to Computer Programming.

Title: Recent Advances in Sample Efficient DRL

Abstract: The performance of a DRL algorithm can be measured along many dimensions including: asymptotic performance; sample efficiency; computational efficiency; and simplicity and elegance. In this talk we will discuss two recent research projects in DRL algorithmic design. The first project is a new algorithm for on-policy DRL with safety constraints (spotlight paper at NeurIPS 2020); the second project is a highly sample-efficient off-policy DRL algorithm for environments with continuous action spaces (conference paper at ICLR 2021).

YouTube Video Link
Bilibili Video Link

Mengdi Wang

Wang is an associate professor at the Department of Electrical Engineering and Center for Statistics and Machine Learning at Princeton University. She is also affiliated with the Department of Computer Science and a visiting research scientist at DeepMind. Her research focuses on data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, the Google Faculty Award in 2017, and the MIT Tech Review 35-Under-35 Innovation Award (China region) in 2018. She serves as an associate editor for Operations Research and Mathematics of Operations Research, as area chair for ICML, NeurIPS, AISTATS, and is on the editorial board of Journal of Machine Learning Research. Research supported by NSF, AFOSR, NIH, ONR, Google, Microsoft C3.ai DTI, FinUP.

Title: Compressive state representation learning towards small-data RL applications

Abstract: In this talk we survey recent advances on statistical efficiency and regret of reinforcement learning (RL) when good state representations are available. Motivated by the RL theory, we discuss what should be good state representations for RL and how to find compact state embeddings from high-dimensional Markov state trajectories. In the spirit of diffusion map for dynamical systems, we propose an efficient method for learning a low-dimensional state embedding and capturing the process’s dynamics. State embedding can be used to cluster states into metastable sets predict future dynamics, and enable generalizable downstream machine learning and reinforcement learning tasks. We demonstrated applications of the approach in games, clinical pathway optimization, single-cell biology and identification of gene markers for drug discovery.

YOUTUBE VIDEO LINK
Bilibili Video Link

Wenbin Lu

Dr. Wenbin Lu is Professor of Statistics at North Carolina State University. He obtained his Ph.D. from the Department of Statistics at Columbia University in 2003. His research interests include biostatistics, high-dimensional data analysis, statistical and machine learning methods for precision medicine, and network data analysis. He has published more than 100 papers in a variety of statistical journals, including Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society (Series B), Annals of Statistics, and Journal of Machine Learning Research. His research is partly funded by several grants from the National Institute of Health. He is an Associate Editor for Biostatistics, Biometrics and Statistica Sinica, and a fellow of American Statistical Association.

Title: Jump Q-Learning for Optimal Interval-Values Treatment Decision Rule

Abstract: An individualized decision rule (IDR) is a decision function that assigns each individual a given treatment based on his/her observed characteristics. Most of the existing works in the literature consider settings with binary or finitely many treatment options. In this work, we focus on the continuous treatment setting and propose a jump Q-learning to develop an individualized interval-valued decision rule (I2DR) that maximizes the expected outcome. Unlike IDRs that recommend a single treatment, the proposed I2DR yields an interval of treatment options for each individual, making it more flexible to implement in practice. To derive an optimal I2DR, our jump Q-learning method estimates the conditional mean of the response given the treatment and the covariates (the Q-function) via jump penalized regression, and derives the corresponding optimal I2DR based on the estimated Q-function. The regressor is allowed to be either linear for clear interpretation or deep neural network to model complex treatment-covariates interactions. To implement jump Q-learning, we develop a searching algorithm based on dynamic programming that efficiently computes the Q-function. Statistical properties of the resulting I2DR are established when the Q-function is either a piecewise or continuous function over the treatment space. We further develop a procedure to infer the mean outcome under the estimated optimal policy. Extensive simulations and a real data application to a warfarin study are conducted to demonstrate the empirical validity of the proposed I2DR.

YouTube Video Link
Bilibili Video Link

Guanhua Chen

Dr. Guanhua Chen is an Assistant Professor of Biostatistics and Medical Informatics at the University of Wisconsin-Madison. He got his Ph.D. from the University of North Carolina at Chapel Hill in 2014 under the direction of Professor Michael R. Kosorok. His research focuses on developing statistical learning methods for clinical and biomedical research, with a particular emphasis on the discovery of complex patterns in omics data and electronic health record data to advance precision medicine.

Title: Discussion on Wenbin Lu’s Talk

YouTube Video Link
Bilibili Video Link

Dong Zhang

Mr. Dong Zhang is a research assistant at Western University, Canada. He obtained his master’s degree from department of Biomedical Engineering at Western in 2020 and bachelor’s degree in Automation from Northwestern Polytechnical University in 2018. His research focuses on medical image processing, machine learning, and artificial intelligence. He published several papers in the highly referred conference and journal in medical image analysis.

Title: Deep reinforcement learning in medical object detection and segmentation

Abstract: Medical object detection and segmentation are crucial pre-processing steps in the clinical workflow for diagnosis and therapy planning. Deep reinforcement learning (DRL) as the newest artificial intelligence algorithm, how can we leverage DRL to improve the medical object detection and segmentation performance. In this talk, I will introduce the studies that we applied DRL into two challenging and representative medical object detection and segmentation tasks: 1) Sequential-conditional reinforcement learning for vertebral body detection and segmentation by modeling the spine anatomy with DRL; 2)Weakly-supervised teacher-student network for liver tumor segmentation from non-enhanced images by transferring knowledge from the enhanced images with DRL. The experiment indicates our methods are effective and outperform state-of-art deep learning methods. Overall, our studies improve object detection and segmentation accuracy and offer researchers a novel approach based on DRL in medical image analysis.

YouTube Video Link
Bilibili Video Link

Shuo Li

Dr. Shuo Li is a pioneer in conducting multi-disciplinary research for imaging centered medical data analytics to enable artificial intelligence (AI) in healthcare. His current research focuses on the development of AI systems to solve the most challenging clinical and fundamental data analytics problems in radiology, urology, surgery, rehabilitation, and cancer, with an emphasis on the innovations of learning schemes (e.g. regression learning, deep learning, reinforcement learning). Dr. Li has significant amount of influence and research reputation internationally. He is a committee member in multiple highly influential conferences and societies. He is most notable for serving on the prestigious board of directors in the MICCAI society (2015-2023), where he is also the general chair for the MICCAI 2022 conference. He has over 200 publications, acted as the editor for six Springer books, and serves as an associate editor for several prestigious journals in the field. Throughout his career, he has received several awards from GE, various institutes and international organizations.

Title: Discussion on Dong Zhang’s Talk

YouTube Video Link
Bilibili Video Link

Peng Wei

Peng Wei is an assistant professor in the Department of Mechanical and Aerospace Engineering at George Washington University, with courtesy appointments at Electrical and Computer Engineering Department and Computer Science Department. By contributing to the intersection of control, optimization, machine learning, and artificial intelligence, he develops autonomy and decision support tools for aeronautics, aviation and aerial robotics. His current focus is on safety, efficiency, and scalability of decision making systems in complex, uncertain and dynamic environments. Recent applications include: Air Traffic Control/Management (ATC/M), Airline Operations, UAS Traffic Management (UTM), eVTOL Urban Air Mobility (UAM) and Autonomous Drone Racing (ADR). Prof. Wei is leading the Intelligent Aerospace Systems Lab (IASL). He is an associate editor for AIAA Journal of Aerospace Information Systems. He received his Ph.D. degree in Aerospace Engineering from Purdue University in 2013 and his bachelor degree in Automation from Tsinghua University in 2007.

Title: Deep Multi-Agent Reinforcement Learning for Autonomous Urban Air Mobility

Abstract: Urban Air Mobility (UAM) is an envisioned air transportation concept, where intelligent flying machines could safely and efficiently transport passengers and cargo within urban areas by rising above traffic congestion on the ground. How can we design and build a real-time, trustworthy, safety-critical autonomous UAM separation assurance tool to enable large-scale flight operations in high-density, dynamic and complex urban airspace environments? In this talk the speaker will present studies to address this critical research challenge using multi-agent reinforcement learning and attention networks.

YouTube Video Link
Bilibili Video Link

Zhiyuan Liu

Dr Zhiyuan (Terry) Liu is currently a Professor and Vice Dean in the School of Transportation at Southeast University, Nanjing China. He received his PhD degree from National University of Singapore (NUS). From 2012 to 2015, he was a lecturer in Monash University Australia. In 2018, He was a visiting scholar in the School of Mathematics and Statistics, University of Melbourne. His research interests include Transportation Data Analysis, Transportation Network Modelling, Public Transport, Intelligent Transport Systems. In these areas, he has published more than 100 SCI/SSCI papers. He is an associate editor of IET Intelligent Transport System and ASCE Journal of Transportation Engineering, and also serves the editorial board of three international journals, Transportation Research Part E; Transportation Research Record; Journal of Transport and Land Use.

Title: Urban Transport Simulation Using Reinforcement Learning

Abstract: Simulation technology has been widely used in the field of transportation. However, existing simulation packages mainly focus on one aspect of the transport system, to provide only a macro- or microscopic view of the analysis. Integration of macro and micro simulation is of considerable significance for urban transport studies. Thus, this study addresses the next generation of urban transport simulation, where artificial intelligence (AI) techniques, especially Reinforcement Learning (RL) is a key backbone. The new simulation platform will also support advanced traffic applications such as vehicle-road collaborative systems (SVIS) and automatic driving systems, and multi-agent simulation. Compared with traditional mathematical modeling and optimization methods, the RL-based simulation has great advantages in traffic modeling and simulation. The new generation of traffic simulation software based on reinforcement learning will have more ability to create or reconstruct traffic scenarios with high precision. With an integration of all the transport sub-systems with high precision and compatibility, this new platform is also deemed as a transport digital twin platform, and such a concept is introduced in the talk.

YouTube Video Link
Bilibili Video Link

Yuxi Li

Yuxi Li, author of the 150 pages Deep Reinforcement Learning: An Overview, at https://bit.ly/2AidXm1, is writing a book about reinforcement learning applications. He is the lead guest editor for a Machine Learning Special Issue, the lead co-chairs for an ICML 2019 Workshop and a 2020 virtual workshop, all on reinforcement learning for real life. He was a co-organizer for AI Frontiers Conference in Silicon Valley in 2017 and 2018. He has published refereed papers at venues such as NIPS, AISTATS, and INFOCOM. He serves as TPC Members/reviewers for conferences and journals, like AAAI 2019-2021, ACM Computing Surveys, TKDD, and PLOS ONE. He obtained the PhD in computer science from the University of Alberta and was a postdoc there. He was an associate professor in China and a senior data scientist in US. He founded attain.ai in Canada.

Title: Reinforcement Learning Applications

Abstract: What is the most exciting AI news in recent years? AlphaGo! What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)? What are application areas for RL? A lot! In fact, besides games, RL has been making tremendous achievements in diverse areas like recommenders and robotics. In this talk, we will introduce RL briefly, present several RL applications, and discuss issues for successfully applying RL in real life scenarios.

YouTube Video Link
Bilibili Video Link

Yanhua Li

Prof. Yanhua Li received two Ph.D. degrees in computer science from University of Minnesota at Twin Cities in 2013, and in electrical engineering from Beijing University of Posts and Telecommunications, Beijing in China in 2009, respectively. He joined Department of Computer Science at Worcester Polytechnic Institute (WPI) as an assistant professor since fall 2015. His research interests are artificial intelligence and data science, with applications in smart cities in many contexts, including spatial-temporal data analytics, urban planning and optimization. Recently, Dr. Li focuses on developing data-driven approaches to inversely learn and influence the decision-making strategies of urban travelers, who take public transits, taxis, sharing bikes, etc. Dr. Li is a recipient of NSF CAREER and CRII Awards. (http://www.wpi.edu/~yli15/)

Title: Decision Analysis from Human-Generated Spatial-Temporal Data

Abstract: With the fast development of mobile sensing and information technology, large volumes of human-generated spatio-temporal data (HSTD) are increasingly collected, including taxi GPS trajectories, passenger trip data from automated fare collection (AFC) devices on buses and trains, and working traces from the emerging gig-economy services, such as food delivery (DoorDash, Postmates), and everyday tasks (TaskRabbit). Such HSTD capture unique decision-making strategies of the “data generators” (e.g., gig-workers, taxi drivers). Harnessing HSTD to characterize unique decision-making strategies of human agents has transformative potential in many applications, including promoting individual well-being of gig-workers, and improving service quality and revenue of transportation service providers. In this talk, I will introduce a spatial-temporal imitation learning framework for inversely learning and “imitating” the decision-making strategies of human agents from their HSTD, and present our recent works on analyzing taxi drivers’ passenger-seeking strategies and public transit travelers’ route choice strategies. Moreover, I will discuss key design challenges in spatial-temporal imitation learning, and outline various future applications in targeted training, incentive, and planning mechanisms that enhance the well-being of urban dwellers and society in terms of income level, travel and living convenience.

YouTube Video Link
Bilibili Video Link

Haipeng Chen

Haipeng Chen is a postdoc in the Computer Science Department, Harvard University. Before that, he did his first postdoc in the Computer Science Department at Dartmouth College and obtained the PhD from Interdsciplinary Graduate School, Nanyang Technological University, Singapore in 2018. His research lies in the general areas of Artificial Intelligence, including machine learning, data mining, and algorithmic game theory, as well as their applications towards social good. He was winner for the 2017 Microsoft Malmo Collaborative AI Challenge, and runner-up for the Innovation Demonstration Award of IJCAI’19. He has published multiple papers in top conferences such as AAAI, IJCAI, AAMAS, UAI, KDD, ICDM. He serves as program committee member for top AI conferences such as Neurips, ICLR, AAAI, IJCAI and AAMAS, and is co-organizer for ICLR’2021 workshop on Synthetic Data Generation.

Title: Discussion on Yanhua Li’s Talk

YouTube Video Link
Bilibili Video Link

Liam Paull

Liam Paull is an assistant professor at l’Université de Montréal and the head of the Montreal Robotics and Embodied AI Lab (REAL), and holds a Canada AI Chair. His lab focuses on robotics problems including building representations of the world (such as for simultaneous localization and mapping), modeling of uncertainty, and building better workflows to teach robotic agents new tasks (such as through simulation or demonstration). Previous to this, Liam was a research scientist at CSAIL MIT where he led the TRI funded autonomous car project. He was also a postdoc in the marine robotics lab at MIT where he worked on SLAM for underwater robots. He obtained his PhD from the University of New Brunswick in 2013 where he worked on robust and adaptive planning for underwater vehicles. He is a co-founder and director of the Duckietown Foundation, which is dedicated to making engaging robotics learning experiences accessible to everyone. The Duckietown class was originally taught at MIT but now the platform is used at numerous institutions worldwide.

Title: Training Robotics in Simulators

Abstract: Reinforcement learning is an appealing approach to developing robot capabilities. It is flexible and general. However, there are some particular challenges with respect to training RL agents on real physically embodied systems. For example: RL training tends to be quite inneficient and performing rollouts on a real robot system is expensive, real world environments don’t automatically reset, and real world environments don’t necessarily provide a reward signal to the agent explicitly. To overcome these challenges, training agents in simulators is appealing. However, the new problem becomes ensuring that an agent trained in a simulator generalizes to the real environment, the so-called sim2real problem. In this talk we will present two paradigms for tackling the sim2real, which we refer to as “Learn to Transfer” and “Learn to Generalize”. We will also outline some future directions that we are pursuing in the Montreal Robotics and Embodied AI Lab (REAL) in this direction. Finally, I will also briefly describe our AI Driving Olympics project in connection to the problem of robotics benchmarking and “sim2real” transfer.

YouTube Video Link
Bilibili Video Link

Nick Rhinehart

Nick Rhinehart is a Postdoctoral Scholar in the Electrical Engineering and Computer Science Department at the University of California, Berkeley with Sergey Levine. His work focuses on fundamental and applied research in machine learning and computer vision for behavioral forecasting and control in complex environments, with an emphasis on imitation learning, reinforcement learning, and deep learning methods. Applications of his work include autonomous navigation, robotic manipulation, and first-person video. He received a Ph.D. in Robotics from Carnegie Mellon University with Kris Kitani, and B.S. and B.A. degrees in Engineering and Computer Science from Swarthmore College. Nick’s work has been honored with a Best Paper Award at the ICML 2019 Workshop on AI for Autonomous Driving and a Best Paper Honorable Mention Award at ICCV 2017. His work has been published at a variety of top-tier venues in machine learning, computer vision, and robotics, including AAMAS, CoRL, CVPR, ECCV, ICCV, ICLR, ICML, ICRA, NeurIPS, and PAMI. You can learn more about his work at https://people.eecs.berkeley.edu/~nrhinehart/.

Title: Jointly Forecasting and Controlling Behavior by Learning From High-Dimensional Data

Abstract: A primary goal of many scientific and engineering disciplines is to develop accurate predictive models. Predictive models are also critical to human intelligence, as they enable us to plan behaviors by reasoning about how actions affect the world around us. These models are especially useful when they can accurately predict the future behaviors of other agents, which enables planning in their presence. In this talk, I will describe some of my research on developing learning-based models to jointly perform forecasting, planning, and control in a unified framework that draws inspiration from concepts in Imitation Learning and Reinforcement Learning. I will show how these models can be learned to make accurate predictions and decisions in the presence of rich perceptual input, and demonstrate their application to single- and multi-agent settings in first-person video, robotic manipulation, and autonomous navigation.

YouTube Video Link
Bilibili Video Link

Nathan Kallus

Nathan Kallus is an Assistant Professor in the School of Operations Research and Information Engineering and Cornell Tech at Cornell University. Nathan’s research interests include personalization; optimization, especially under uncertainty; causal inference; sequential decision making; credible and robust inference; and algorithmic fairness. He holds a PhD in Operations Research from MIT as well as a BA in Mathematics and a BS in Computer Science both from UC Berkeley. Before coming to Cornell, Nathan was a Visiting Scholar at USC’s Department of Data Sciences and Operations and a Postdoctoral Associate at MIT’s Operations Research and Statistics group.

Title: Statistically Efficient Offline Reinforcement Learning

Abstract: Offline reinforcement learning (RL), wherein one uses existing off-policy data to evaluate and learn new policies, is crucial in applications where experimentation is limited and simulation unreliable, such as medicine. But offline RL is also notoriously difficult because the similarity between the trajectories observed and those generated by any proposed policy diminishes exponentially as horizon grows, known as the curse of horizon, which has severely limited the application of offline RL whenever horizons are moderate to long or even infinite. To understand this limitation, we study the statistical efficiency limits of two central tasks in offline reinforcement learning: estimating policy value and policy gradient from off-policy data. This reveals that the curse is insurmountable without leveraging Markov structure — and as such plagues the standard doubly-robust estimators — but may be overcome in Markov and stationary settings. We develop the first estimators achieving the efficiency limits in finite- and infinite-horizon MDPs using a meta-algorithm we term Double Reinforcement Learning (DRL). We provide favorable guarantees for DRL and for off-policy policy optimization via ascending our efficiently-estimated policy gradient.

YouTube Video Link
Bilibili Video Link

Chengchun Shi

Chengchun Shi is Assistant Professor of Data Science at London School of Economics and Political Science. His research interests include: (i) statistical methods in reinforcement learning; (ii) statistical analysis of complex data. Despite his age, he has over 10 papers published/accepted at Annals of Statistics, Journal of Americal Statistical Association, Journal of the Royal Statistical Society (Series B), Journal of the Machine Learning Research and the International Conference on Machine Learning. Before he joined LSE, he obtained a PhD at North Carolina State University.

Title: Discussion on Nathan Kallus’s Talk

YouTube Video Link
Bilibili Video Link

Eric Laber

Eric Laber is the Goodnight Distinguished Professor and Faculty Scholar in the department of Statistics at NC State University. He joined NC State after completing his PhD at the University of Michigan in 2011. His research focuses on methods development for data-driven decision making with applications in precision public health, defense, sports/e-sports, and inventory management. He is also passionate about K-12 STEM Outreach. He served as director of research translation and engagement from 2016-2019 for the College of Sciences at NC State. You can learn more about his research and outreach at: Laber-Labs.com.

Title: Partially observable Markov Decision Processes as a Model for Chronic Illness

Abstract: Observational longitudinal studies are a common means to study treatment efficacy and safety in chronic mental illness. In many such studies, treatment changes may be initiated by either the patient or by their clinician and can thus vary widely across patients in their timing, number, and type. Indeed, in the observational longitudinal pathway of the STEP-BD study of bipolar depression, one of the motivations for this work, no two patients have the same treatment history even after coarsening clinic visits to a weekly time-scale. Estimation of an optimal treatment regime using such data is challenging as one cannot naively pool together patients with the same treatment history, as is required by methods based on inverse probability weighting, nor is it possible to apply backwards induction over the decision points, as is done in Q-learning and its variants. Thus, additional structure is needed to effectively pool information across patients and within a patient over time. Current scientific theory for many chronic mental illnesses maintains that a patient’s disease status can be conceptualized as transitioning among a small number of discrete states. We use this theory to inform the construction of a partially observable Markov decision process model of patient health trajectories wherein observed health outcomes are dictated by a patient’s latent health state. Using this model, we derive an estimator of an optimal treatment regime under two common paradigms for quantifying long-term patient health. The finite sample performance of the proposed estimator is demonstrated through a series of simulation experiments and application to the observational pathway of the STEP-BD study. We find that the proposed method provides high-quality estimates of an optimal treatment strategy in settings where existing approaches cannot be applied without {em ad hoc} modifications.

YouTube Video Link
Bilibili Video Link

Fanyou Wu

 Fanyou Wu is now a Ph.D. candidate in Forestry and Natural Resources Department, Purdue University. His research focuses on the application of machine learning in forestry and transportation, and has published several paper in those fields. He has also won many championships and runners-up in machine learning related competitions, including the title of JDD (2019), the tournament of IJCAI Adversarial AI Challenge (2019), and champion of KDD Cup (2020).

Title: KDD Cup 2020 RL Track Winners Presentation – Part II

Abstract: Machine Learning competitions are often considered as bridges between industries and researches. Leading top solutions have often become state-of-the-art methods in the real world. In this presentation, I want to share some experiences about those competitions and use the vehicle dispatching task in the KDD RL track as the main example. The vehicle dispatching system has always been one of the most critical problems in online taxi-hailing platforms to adapt the operation and management strategy to demand and supply dynamics. In the KDD competition, my team used a single agent deep reinforcement learning approach for vehicle repositioning by deploying idle vehicles to specific locations to anticipate future demand at the destination . A global pruned action space, which encompasses a set of discrete actions, is used in this approach. It can benefit drivers by avoiding traveling to distant outskirts where there are few order requests. In addition, my team designed a simulator using the Julia programming language, which brings about over ten times optimization in speed compared with the Python simulator implementation.

YouTube Video Link
Bilibili Video Link

Yansheng Wang

Yansheng Wang is currently a first year Ph.D. candidate in School of Computer Science and Engineering at Beihang University. He is working on crowd intelligence, spatial crowdsourcing and reinforcement learning. He has published several papers in highly refereed conferences and journals such as ICDE, AAAI and Neurocomputing. He is the team leader of the champion team in KDD CUP 2020 RL track.

Title: KDD Cup 2020 RL Track Winners Presentation – Part I

Abstract: The development of the sharing economy and mobile Internet has stimulated an explosion of real-world dynamic ridesharing applications. Among them the order dispatching is vital to ridesharing platforms. Given dynamic input of orders and available drivers, order dispatching aims to assign drivers to suitable orders with the objective of maximizing the overall platform revenue. In this talk, I will introduce two types of reinforcement learning (RL) based approaches to solve the problem. First, I will present an adaptive batch-based approach, where RL is applied to decide the batch sizes. Then I will elaborate on another fixed batch-based approach, where we use RL to guide the in-batch matching decisions, which is also the champion solution of order dispatching tasks in KDD CUP 2020 RL track. Finally, I will highlight some other research challenges in RL-based order dispatching in the future.

YouTube Video Link
Bilibili Video Link

Tony Qin

Tony Qin is Principal Research Scientist and Director of the reinforcement learning group at DiDi AI Labs, working on core problems in ridesharing marketplace optimization. Prior to DiDi, he was a research scientist in supply chain and inventory optimization at Walmart Global E-commerce. Tony received his Ph.D. in Operations Research from Columbia University. His research interests span optimization and machine learning, with a particular focus in reinforcement learning and its applications in operational optimization, digital marketing, and smart transportation. He has published and served a program committee member in numerous top-tier conferences and journals in machine learning and optimization. He and his team received the INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice in 2019 and were selected for the NeurIPS 2018 Best Demo Awards. Tony holds more than 10 US patents in intelligent transportation and E-commerce systems.

Title: Deep Reinforcement Learning in a Ride-sharing Marketplace

Abstract: With the rising prevalence of smart mobile phones in our daily life, online ride-hailing platforms have emerged as a viable solution to provide more timely and personalized transportation service, led by such companies as DiDi, Uber, and Lyft. These platforms also allow idle vehicle vacancy to be more effectively utilized to meet the growing need of on-demand transportation, by connecting potential mobility requests to eligible drivers. In this talk, we will describe our train of research on ride-hailing marketplace optimization at DiDi, in particular, order dispatching and vehicle repositioning. We will show the development of the spatiotemporal contextual value network and how it is used in order dispatching policy generation and decision-time planning in vehicle repositioning.

YouTube Video Link
Bilibili Video Link

Michael R. Kosorok

Michael R. Kosorok, Ph.D., the W.R. Kenan, Jr. Distinguished Professor of Biostatistics and Professor of Statistics and Operations Research at the University of North Carolina at Chapel Hill, received his PhD in Biostatistics from the University of Washington in 1991. He is an internationally known biostatistician and a prominent expert in data science, machine learning and precision medicine. He is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the American Association for the Advancement of Sciences. He has published over 170 peer-reviewed articles, written a major text on the theoretical foundations of empirical processes and semiparametric inferences (Kosorok, 2008, Springer), and co-edited (with Erica E.M. Moodie, 2016, ASA-SIAM) a research monograph on dynamic treatment regimens and precision medicine.

Title: Off-Policy Reinforcement Learning for Estimation of Optimal Treatment Regime

Abstract: In this presentation, we introduce off-policy reinforcement learning in the context of estimating an optimal treatment regime for a finite sequence of decision times. We introduce and discuss dynamic treatment regimes in this context, backward induction, Q-learning and A-learning.

Presentation Slides: 09/24/2020 Michael Presentation Slides

YouTube Video Link

Susan Murphy

Susan Murphy is Professor of Statistics at Harvard University, Radcliffe Alumnae Professor at the Radcliffe Institute, Harvard University, and Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences. Her lab works on clinical trial designs and online learning algorithms for developing personalized mobile health interventions. She is a 2013 MacArthur Fellow, a member of the National Academy of Sciences and the National Academy of Medicine, both of the US National Academies. She is currently President of the Institute of Mathematical Statistics.

Title: Intelligent Pooling: Practical Thompson Sampling for mHealth

Abstract: In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In particular, individuals who are in the same context can exhibit differential response to treatments yet only a limited amount of data is available for learning on any one individual. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop Intelligent Pooling. Intelligent Pooling uses empirical Bayes methods to update each user’s degree of personalization while making use of available data on other users to speed up learning. In this talk we discuss associated computational challenges.

YouTube Video Link

Bo An

Bo An is a President’s Council Chair Associate Professor in Computer Science and Engineering, Nanyang Technological University, Singapore. He received the Ph.D degree in Computer Science from the University of Massachusetts, Amherst. His current research interests include artificial intelligence, multiagent systems, computational game theory, reinforcement learning, and optimization. Dr. An was the recipient of the 2010 IFAAMAS Victor Lesser Distinguished Dissertation Award, an Operational Excellence Award from the Commander, First Coast Guard District of the United States, the 2012 INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice, and 2018 Nanyang Research Award (Young Investigator). His publications won the Best Innovative Application Paper Award at AAMAS’12 and the Innovative Application Award at IAAI’16. He was invited to give Early Career Spotlight talk at IJCAI’17. He led the team HogRider which won the 2017 Microsoft Collaborative AI Challenge. He was named to IEEE Intelligent Systems’ “AI’s 10 to Watch” list for 2018. He is PC Co-Chair of AAMAS’20. He is a member of the editorial board of JAIR and the Associate Editor of JAAMAS, IEEE Intelligent Systems, and ACM TIST. He was elected to the board of directors of IFAAMAS and senior member of AAAI.

Title: Reinforcement Learning in Competitive Environment

Abstract: For some complex domains with strategic interaction, reinforcement learning have been successfully used to learn efficient policies. This talk will discuss key techniques behind these success and their applications in domains including games, e-commerce, and urban planning.