Generalised Linear Models Incorporating Population Level Information: An Empirical Likelihood Based Approach

Abstract

Regression coefficients specify the partial effect of a regressor on the dependent variable. Sometimes the bivariate, or limited multivariate relationship of that regressor variable with the dependent variable is known from population-level data. We show here such population-level data can be used to reduce variance and bias about estimates of those regression coefficients from sample survey data. The method of constrained MLE is used to achieve these improvements. Its statistical properties are first described. The method constrains the weighted sum of all the covariate-specific associations (partial effects) of the regressors on the dependent variable to equal the overall population association of one or more regressors. We refer to those regressors whose bivariate or limited multivariate relationships with the dependent variable are constrained by population data as being “directly constrained.” Our study investigates the improvements in the estimation of directly-constrained variables, and also the improvements in the estimation of other regressor variables that may be correlated with the directly-constrained variables, and thus “indirectly-constrained” by the population data. The example application is to the marital fertility of black versus white women. The difference between white and black women’s rates of marital fertility, available from population-level data, gives the overall association of race with fertility. The constrained MLE that uses this information both provides a far more powerful statistical test of the partial effect of being black, and purges the test of a bias that would otherwise distort the estimated magnitude of this effect. We find only trivial reductions, however, in the standard errors of the parameters for indirectly-constrained regressors.

Type
Publication
Center for Statistics and Social Sciences