Evaluating Variance Estimators for Respondent‐Driven Sampling

Abstract

Respondent-Driven Sampling (RDS) is a network-based method for sampling hard-to-reach populations that is widely used by public health agencies and researchers worldwide. Estimation of population characteristics from RDS data is challenging due to the unobserved population network, and multiple point and variance estimators have been proposed. Research evaluating these estimators has been limited and largely focused on point estimation; this analysis is the first evaluation of multiple variance estimators currently in use. We evaluated the performance of RDS variance estimators via simulations of RDS on synthetic networked populations constructed from 40 RDS surveys of injection drug users in the United States. In these simulations, average design effects (DEs) were lower and average 95% confidence interval (CI) coverage percentages were higher than suggested in previous work: typical DE range = 1–3; average 95% CI coverage = 93%. However, DE and CI coverage vary across the 40 sets of simulations, suggesting that the characteristics of a given study should be evaluated to assess estimator performance. We also found that simulation results are sensitive to whether sampling is conducted with replacement and the approach used to create CIs. We conclude that CI coverage rates and DEs are often acceptable but not perfect and that RDS estimates are usually reliable in scenarios where RDS assumptions are met. While RDS estimation performed reasonably well, we found strong evidence that the simple random sample variance estimator and corresponding CIs significantly underestimate variance and should not be used to analyze RDS data.

Publication
Journal of Survey Statistics and Methodology