**Title**: Confounding by Network Structure and the Replication Crisis

**Abstract**:

Researchers across the health and social sciences generally assume that observations are independent, even while relying on convenience samples that draw subjects from one or a small number of communities, schools, hospitals, etc. A paradigmatic example of this is the Framingham Heart Study (FHS). Many of the limitations of such convenience samples are well-known, but the issue of statistical dependence due to social network ties has not previously been addressed. Furthermore, while the fact that the covariance structure among subjects can result in spurious and biased association estimates is a well-known problem in human genetics (specifically cryptic relatedness and population stratification), this kind of structural confounding has not received attention outside of that community. We introduce the concept of confounding by network structure and show that, along with anticonservative variance estimation, network dependence can bias associations away from the null. Using a statistical test that we adapted from one developed for spatial autocorrelation, we test for network dependence and for possible confounding by network structure in some of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data could be biased and anticonservative due to unacknowledged network dependence. We conclude with a brief discussion about how to recover valid statistical and causal inference with social network data.

I am an Assistant Professor in the Department of Biostatistics at Johns Hopkins University.

My research is in causal inference and epidemiologic methods. Broadly, I am interested in developing methods for and describing the behavior of traditional statistical machinery when standard assumptions are not met. I have worked on characterizing the bias that results from misclassification, i.e. violations of the assumption that variables were measured accurately. I have also worked on semiparametric estimation of instrumental variables models, as these models are useful for certain violations of “no unmeasured confounding” assumptions.

Currently a major focus of my work is on analysis of social and other network data. I am working on methods for statistical inference when observations are dependent, with a dependence structure informed by network topology rather than Euclidean topology, and on how to identify causal effects when treatments exhibit interference (that is, when one subject’s treatment may affect other subjects’ outcomes) and outcomes exhibit contagion.

I am a member of the causal inference working group.