Alleviating Linear Ecological Bias and Optimal Design with Subsample Data

Abstract

We illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides two main benefits. First, by including the individ- ual level subsample data, the biases that are associated with linear ecological inference can be eliminated. Second, available ecological data can be used to design optimal subsampling schemes that maximize information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree on wages, showing that small, optimally chosen subsamples can be combined with ecological data to generate precise estimates relative to a simple random subsample.

Publication
Journal of the Royal Statistical Society, A