Big Data

For Better Hearts


Interview with Amitava Banerjee

Dr Amitava Banerjee 

Associate Professor in Clinical Data Science and Honorary Consultant Cardiologist

University College London Hospitals and Barts Health NHS Trusts






Interview with Amitava Banerjee about the award-winning publication: Banerjee A et al. Estimating excess 1-year mortality from COVID-19 according to underlying conditions and age in England: a rapid analysis using NHS health records in 3.8 million adults. Lancet (12 May 2020).  The paper won the Impact of the Year award from Health Data Research UK, which is the UK National Institute for Health Data Science.

Background: Amitava Banerjee is Associate Professor in Clinical Data Science at University College London, and Honorary Consultant Cardiologist at University College London Hospitals and Barts Health NHS Trusts. Ami is a researcher, educator and clinician with interests spanning data science, cardiovascular disease, global health, training and evidence-based healthcare. In the Big Data @Heart project, Ami is the co-lead for Work Package 2 “Disease understanding and outcomes definition”.

Ami qualified at Oxford Medical School and trained in Oxford, Newcastle, Hull, and London. He has a master’s in public health from Harvard, did an internship at the World Health Organisation and DPhil in epidemiology from Oxford. He works across two busy tertiary care settings with both inpatient and outpatient commitments. He is subspecialised in heart failure. His clinical work very much informs his research and vice versa, whether in the evaluation of medical technology or the ethics of large-scale use of patient data. 

How are you holding up during the pandemic?

I am OK but am very busy. I have increased my clinical work since the outbreak while still trying to maintain our research output. In addition, both my wife and I were infected but fortunately both with a mild clinical course. So, life remains ever changing and exciting.

Can you tell us about your paper, and why it was awarded? 

At a very early phase in the pandemic (Mid-March 2020), we were already seeing that most COVID-19 related deaths were occurring in older people with underlying medical conditions. So, our team (including Professor Harry Hemingway, Professor Spiros Denaxas and several UCL colleagues involved in the Big Data @ Heart consortium) knew it would be important to quantify the impact of age and comorbidities on risk of infection to help inform the UK government’s efforts to stop the spread of the disease. 

At the time we did the analysis, the UK had only reported 1950 cases and 81 deaths from coronavirus (COVID-19) infection and only people with single disease risks were identified as high-risk individuals. We were convinced that background risk of mortality was a major driver of pandemic deaths and that electronic health records could help us to demonstrate the importance of targeted measures to dampen the spread of the pandemic.

So, we used NHS health records in 3.8 million adults to estimate the excess number of deaths over 1 year under different COVID-19 incidence rates and differing mortality impacts. We estimated both the short- and long-term impact of the COVID-19 emergency on overall population mortality. 

We predicted that a mitigation strategy aimed at slowing the spread of coronavirus would still result in at least 35,000-70,000 excess deaths over one year in the UK if population infection rate reached 10%. Therefore, our study demonstrated the need to implement more stringent measures at a population level as well as efforts to target those at highest risk for a range of preventive interventions to avoid not just immediate deaths but also long-term excess deaths.

What is excess mortality and why is it important?

Excess mortality is defined as actual deaths from all causes minus the background level of deaths. This approach attempts to avoid miscounting deaths from under-reporting of COVID-19-related deaths as well as other health conditions left untreated.  We used it to analyse the potential social and economic consequences of the pandemic and the putative impact of different intervention strategies. 

Can you explain how a big data approach was important for your work? 

The thread running through my research career to-date is application of routine data to improve knowledge and care of cardiovascular disease (CVD) through epidemiology, informatics and global health approaches. This special interest prepared me to access available big data resources for this project.  

Given the urgency of the situation, we did not wait for perfect data but used what was available. We also were able to complete the analysis within three days. We used population based linked primary and secondary care electronic health records in England (HDR UK - CALIBER) and the prevalence of underlying conditions defined by UK Public Health England COVID-19 guidelines in 3,862,012 individuals aged ≥30 years from 1997-2017. We then used previously validated phenotypes to estimate the 1-year mortality in each condition, and developed simple models of excess COVID-19-related deaths assuming relative risk (RR) of the impact of the pandemic (compared to background mortality) of 1.2, 1.5 and 2.0. 


We found that 20.0% of the population were at risk, of which; 13.7% were age>70 years and 6.3% aged ≤70 years with ≥1 underlying condition (cardiovascular disease, diabetes, steroid therapy, severe obesity, chronic kidney disease and chronic obstructive pulmonary disease). Multimorbidity was common (10.1%). The 1-year mortality in the at-risk population was 4.46%, and age and underlying conditions combine to influence background risk, varying markedly across conditions (5.9% in age>70 years, 8.6% for COPD and 13.1% in those with ≥3 or more conditions). 

We then calculated that with infection rates of 0.001% of the UK population (suppression scenario) there would be minimal excess deaths (3 and 7 excess deaths at relative risk, RR, 1.5 and 2.0 respectively) but with an infection rates of 10% (mitigation scenario) the model estimated excess deaths increase to 13791, 34479 and 68957 (at RR 1.2, 1.5 and 2.0 respectively). We also developed an online, public, prototype risk calculator for excess death estimation. Thus, we provided the public, researchers, and policy makers a simple model to estimate the excess mortality over 1 year from COVID-19, based on underlying conditions at different ages. 

In hindsight, how accurate were your estimates? 

As of 23 June, there were at least 42,927 COVID-19 excess deaths in the UK, 95% of which occurred in people over the age of 70 years or with comorbidities. Therefore, our model was in the right ball park. It also remains very clear that suppressing infection rates is the most important means to reduce excess deaths. I think this analysis is a good example of why it is so important to work towards harmonized phenotype definitions and to prepare the data for similar analyses – as we are doing in BigData@Heart - before they are needed so we can respond quickly and correctly. In addition, researchers, clinicians and health services need nationwide access to NHS data, so we can continue to learn how to improve patient care during our daily work but especially during times of health emergencies like we are currently experiencing with the COVID-19 pandemic.

Published on: 07/30/2020