Algorithm

Consider the linear regression with samples (y_1,x_1), (y_2,x_2), cdots, (y_n,x_n) satisfying:

 y_t = x_t^T theta_0+ epsilon_t, quad epsilon_i stackrel{rm{iid}}{sim}{sf N}(0,sigma^2)


Here theta_0in mathbb{R}^p is an unknown parameter vector relating the covariates x_t to the response y_t, and epsilon_t are noise terms. ‘‘The covariates x_t are potentially collected adaptively, and so x_t can be correlated to prior covariates x_1, x_2, dotsc, x_{t-1}.’’

Goal: We aim at ex post statistical inference on individual model parameters theta_{0,t}, in terms of frequentist p-value and confidence interval.

Method

Below we provide a brief explanation of the online debiasing method. For more details and scissions, we refer to our paper.

Denote by widehat{theta}^{sf L} = widehat{theta}^{sf L}(y,X;lambda) the Lasso estimator

 widehat{theta}^{sf L} = argmin_{theta} Big{frac{1}{2n} |y-Xtheta|_2^2 + lambda |theta|_1 Big},.


The online debased estimator widehat{theta}^{sf on} = widehat{theta}^{sf on}(y,X;(M_t)_{tle n}, lambda) takes the form

 widehat{theta}^{sf on} := widehat{theta}^{sf L} + frac{1}{n}sum_{t=1}^n M_t x_t (y_t -x_t^Twidehat{theta}^{sf L})


The term ‘online’ comes from the crucial constraint of predictability imposed on the sequence: there exists a filtration ({frak{F}}_t)_{tge 0} so that, (1) epsilon_t are adapted to ({frak{F}})_t and epsilon_t is independent of {frak{F}}_s for s< t. We assume that the sequences (x_t)_{tge 1} and (M_t)_{tge 1} are predictable with respect to {frak{F}}_t.

How to choose decorrelating matrices?

We focus on times series as an important application of adaptively collected data and consider a vector autoregression (VAR) model for time series. By a proper change of variables, a Var(d) model can be represented as the linear regression with dp^2 parameters and n = T-d sample size where T is the time horizon where we observe the times series, and d is the lag. Our primary interest is in high-dimensional Var(d) model, where the number of model parameters dp^2 exceeds the sample size n = T-d.

The proposed online debiasing provides valid statistical significance measures for the model parameters (the time invariant matrices in the Var(d) model).

To construct decorrelating matrices M_i that satisfy the predictability condition, we proceed as follows:

  • For given positive integer numbers r_0, r_1, dotsc, r_{K-1} that add up to n, we partition the time horizon into episodes E_0, E_1, dotsc, E_{K-1}, with E_ell of length r_ell and let widehat{Sigma}^{(ell)} be the sample covariance of features x_t in the first ell episodes.

 



  • At the beginning of each episode ell, we calculate a decorrelating matrix M^{(ell)} using the previous data points and use that matrix to debias the sample coming in the current episode, that is to say M_t = M^{(ell)} for tin E_{ell}. The details of this step are summarized in the box below.

1. For a = 1,2,dotsc, dp do Construct m^ell_ain mathbb{R}^{dp} by solving the following optimization:

 underset{m}{text{minimize}} quadquad m^T widehat{Sigma}^{(ell)} m
 text{subject to}quad, |widehat{Sigma}^{ell} m - e_a|_infty le mu, quad |m|_1le L,,

With e_a the standard basis element with one at the a-th position and zero everywhere else.

2. Set M^{(ell)} = (m^ell_1, m^ell_2, dotsc, m^ell_{dp})^T. In words, stack the constructed vectors m^ell_a as rows of M^{(ell)}.

Our analysis in the paper suggests the choice of episode lengths r_ellsim alpha^ell. The code can take the parameter alpha as an input.