Respondent-Driven Sampling: An Assessment of Current Methodology

Abstract

Respondent-driven sampling (RDS) employs a variant of a link-tracing network sampling strategy to collect data from hard-to-reach populations. By tracing the links in the underlying social network, the process exploits the social structure to expand the sample and reduce its dependence on the initial (convenience) sample.

The current estimators of population averages make strong assumptions in order to treat the data as a probability sample. We evaluate three critical sensitivities of the estimators: (1) to bias induced by the initial sample, (2) to uncontrollable features of respondent behavior, and (3) to the without-replacement structure of sampling.

Our analysis indicates: (1) that the convenience sample of seeds can induce bias, and the number of sample waves typically used in RDS is likely insufficient for the type of nodal mixing required to obtain the reputed asymptotic unbiasedness; (2) that preferential referral behavior by respondents leads to bias; (3) that when a substantial fraction of the target population is sampled the current estimators can have substantial bias.

This paper sounds a cautionary note for the users of RDS. While current RDS methodology is powerful and clever, the favorable statistical properties claimed for the current estimates are shown to be heavily dependent on often unrealistic assumptions. We recommend ways to improve the methodology.

Publication
Sociological Methodology