Statistical Inference with Online Debiasing

A framework for controlled variable selection with adaptively collected data

Preamble: Modern data collection, experimentation and modeling are often interactive in nature. The analyst reacts to the previously collected data, received feedbacks and rewards, and then adapts the model and data collection policy accordingly. For example, clinical trials are often run in phases where the data from each phase impact the design of future phases. In E-commerce, algorithms collect data by eliciting feedback from users (e.g., their ratings on a product or their purchase behavior), which is ultimately used to improve the recommendation algorithms and so affecting future data. In reinforcement learning, agents interact with an environment, take actions and receive a reward at each round. Based on the rewards, they adjust their action policy which in turn affects the future data collected from the environment.

In many such applications, adaptive data collection is often carried out for objectives correlated to, but different from statistical inference. In E-commerce and reinforcement learning, algorithms aim at minimizing lost revenue to pure experimentation. In other applications such as clinical trail, that data is an expensive commodity practitioners may choose to include samples that are a priori deemed most informative.

Adaptive data collection induces correlation in samples and bias in the estimates, posing major obstacles to statistical inference. Another source of bias is the curse of dimensionality. In high-dimensional regime where the number of model parameters exceeds sample size, the point estimators are necessarily biased, since they are produced from data in lower dimension. Debiasing framework was invented to mitigate the effect of bias due to dimensionality and provide valid statistical measures, such as p-values and confidence intervals for low dimensional components of a high dimensional model.

We propose online debiasing, a novel approach based on the debiasing framework that cope with the both sources of bias, due to dimensionality and adaptivity in data collection. In our paper, we focus on two concrete contexts of adaptively collected data (i) time series model and (ii) batched data collection.

This website provides an overview of online debiasing for high-dimensional adaptively collected data accompanied by the source code for its implementation.



 

Paper

Yash Deshpande, Adel Javanmard, Mohammad Mehrabi, Online Debiasing for Adaptively Collected High-dimensional Data with Applications to Time Series Analysis, 2019.

People

Related Papers