Time series experiments, in which experimental units receive a sequence of treatments over time, are prevalent in technological companies, including ride-sharing platforms and trading companies. These companies frequently employ such experiments for A/B testing, to evaluate the performance of a newly developed policy, product, or treatment relative to a baseline control. Many existing solutions require that the experimental environment be fully observed to ensure the data collected satisfies the Markov assumption. This condition, however, is often violated in real-world scenarios. Such gap between theoretical assumptions and practical realities challenges the reliability of existing approaches and calls for more rigorous investigations of A/B testing procedures.
In this paper, we study the optimal experimental design for A/B testing in partially observable environments. We introduce a controlled (vector) autoregressive moving average model to effectively capture a rich class of partially observable environments. Within this framework, we derive closedform expressions, i.e., efficiency indicators, to assess the statistical efficiency of various sequential experimental designs in estimating the average treatment effect (ATE). A key innovation of our approach lies in the introduction of a weak signal assumption, which significantly simplifies the computation of the asymptotic mean squared errors of ATE estimators in time series experiments. We next proceed to develop two data-driven algorithms to estimate the optimal design: one utilizing constrained optimization, and the other employing reinforcement learning. We demonstrate the superior performance of our designs using a dispatch simulator and two real datasets from a ride-sharing company.
About the Speaker:
Chengchun Shi is an Associate Professor at London School of Economics and Political Science. He is serving as the associate editors of JRSSB, JASA (T & M) and Journal of Nonparametric Statistics. His research focuses on developing statistical learning methods in reinforcement learning, with applications to healthcare, ridesharing, video-sharing and neuroimaging. He was the recipient of the Royal Statistical Society Research Prize in 2021 and IMS Tweedie Award in 2024.