Big Data

For Better Hearts



An interview with Angela Wood on statistical methodology

An interview with Angela Wood on statistical methodology

An interview with Angela Wood, Senior Lecturer in Biostatistics at the University of Cambridge, and WP 5 co-lead and Case Study 5 lead BigData@Heart. 


1) Could you please shortly introduce yourself and your work?

I am a Senior Lecturer in Biostatistics at the University of Cambridge, where I am a senior member of both the MRC/BHF Cardiovascular Epidemiology Unit and the NIHR Blood and Transplant Research Unit in Donor Health and Genomics.

My research interests are centered on the development and application of statistical methods for advancing epidemiological research. I have focused on developing statistical methodology for handling measurement error, using repeated measures of risk factors, missing data problems, multiple imputation, risk prediction and meta-analysis for observational studies. I have developed statistical methods and led analyses for major population resources to advance the study of cardiovascular disease, including analyses of the 2.5 million-participant Emerging Risk Factors Collaboration and EPIC-CVD (the world’s largest genomic case-cohort study of incident CVD).

2) Has your work on genomics changed clinical practice or is that still on the horizon?

Genomics is a relatively new research area for me. My interest arises from the emerging methodology to identify causal risk factors and potential drug targets from sets of high-dimensional and correlated phenotypes.

3) Could you elaborate on mendelian randomization and its role in BigData@Heart? How has this method changed research approaches? What are the shortcomings/potential pitfalls?

To address important medical questions, such as to determine the causes of acute coronary syndrome (ACS), atrial fibrillation (AF) and heart failure (HF) or to assess the impact of a drug target, then we have to answer questions of cause and effect. The optimal way to address these questions is by appropriate study design, such as the use of prospective randomized trials. However, randomized trials are expensive and time-consuming, especially when the outcome is rare or requires a long follow-up period to be observed. In addition, we are often interested in many risk factors which cannot all be randomly allocated for practical or ethical reasons.

Classical epidemiology has focused on addressing scientific hypotheses using observational data. Instead of intervening on the risk factor, disease outcomes of individuals with different levels of the risk factor are compared. However, interpreting these associations as a causal relationship relies on untestable and usually implausible assumptions, such as the absence of unmeasured confounding and of reverse causation. This has led to several high-profile cases where a risk factor has been widely promoted as an important factor in disease prevention based on observational data, only to be later discredited when evidence from randomized trials did not support a causal interpretation.

Mendelian randomization (MR) is an alternative approach to assess causality. Genetic variants are used as instrumental variables for risk factors to make inferences about causal effects based on observational data. It can be a reliable way, of assessing the causal nature of many risk factors for a range of disease outcomes and avoids problems with potential confounding and reverse causality. As all statistical methods, MR critically relies on a number of assumptions, such as the absence of an association between the instrument (genetic variants) and the outcome (ACS, AF or HF), other than that which operates through the risk factor of interest. In many MR studies, the prime suspect for violation of this assumption is when the genetic variants act on other risk factors (called pleiotropy), although there are now extended MR approaches which can deal with such problems.

Perhaps the greatest pitfall of MR currently is its’ over-use and subsequent over-interpretation of results. Genetic tool variants for thousands of risk factors and diseases are becoming increasingly available and it is “easy” to perform high-throughput MR analyses with unreliable tools.

4) One of the broad objectives of the project is to improve disease definitions and to further sub-phenotype CVDs into molecularly well-defined and well-characterised goups, as well as to discover and validate drug targets for ACS, AF and HF. Given the differences between ACS, AF and HF, how tailored do the approaches need to be? Or is there a “one size fits all” approach?

The backbone of the analysis strategies for ACS, AF and HF will actually be quite similar, however, the detail of the analyses will become quite tailored! For example, current treatments for HF are fairly heterogeneous depending on classification and symptoms, whereas treatments for AF focus mainly on restoring a normal heart rhythm. Approaches will also depend on the availability of samples/data/consortia with fatal/non-fatal incident cases.

5) The creation of genetic scores can often look like a black box to people unfamiliar with the methods. Could you describe how you go about weighing the impact of genes and the design of a genetic score comprised of several genes?

Quite simply, a genetic score for a risk factor, say  X, can be considered as the linear predictor (beta_hat*G) of the risk factor based on coefficients (beta) estimated from a regression model of X on the genes, G (eg, in a linear model for a continuous variable X = beta*G). The weighting of the impact of genes relate to the estimated regression coefficients from the regression model. Designing a genetic score is analogous to selecting variables in a regression model.

6) Genetic data is being enriched with proteomics, lipidomics and metabolomics. Similarly to constructing a genetic tool with several genes, do you think it will be feasible and potentially even more impactful to bring together the multi-layers of omics in genetic analyses?

Scientists have been moving forward from using one gene to multiple genes, one protein to multiple proteins and now from one omics platform to multiple platforms. This is an exciting scientific era for etiological understanding and for more general methodology development. Findings are beginning to emerge showing specific networks and biological pathways from gene to protein to metabolomics to disease, helping us understand disease mechanisms and identify potential new drug targets (as well as help deprioritize drug targets or identify possible side effects).  

7) The case study 5 focuses on two very different approaches. Investigating existing or novel drug targets related to cardiovascular disease and the impact of SNPs on iron deficiency in CHF. Where do you see the most potential for these two approaches to complement each other?

Addressing these questions together may help us identify genetic markers that might be predictive of predisposing individuals to the development of iron deficiency and/or CHF and to the improvement in cardiac function and clinical outcomes following iron treatment.

Published on: 06/25/2018