Big Data

For Better Hearts


BigData@Heart latest publications

Desiderata for the development of next-generation electronic health record phenotype libraries

Published: 11 September 2021, GigaScience

Authors: Martin Chapman, Shahzad Mumtaz, Luke V Rasmussen, Andreas Karwath, Georgios V Gkoutos, Chuang Gao, Dan Thayer, Jennifer A Pacheco, Helen Parkinson, Rachel L Richesson, Emily Jefferson, Spiros Denaxas, Vasa Curcin.



High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling.


A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices.


We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing.


There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.


Read the full paper here


The social licence for data-intensive health research: towards co-creation, public value and trust 

Published: 10 August 2021, BMC Medical Ethics

Authors: Sam H. A. Muller, Shona Kalkman, Ghislaine J. M. W. van Thiel, Menno Mostert & Johannes J. M. van Delden.  



The rise of Big Data-driven health research challenges the assumed contribution of medical research to the public good, raising questions about whether the status of such research as a common good should be taken for granted, and how public trust can be preserved. Scandals arising out of sharing data during medical research have pointed out that going beyond the requirements of law may be necessary for sustaining trust in data-intensive health research. We propose building upon the use of a social licence for achieving such ethical governance.

Main text

We performed a narrative review of the social licence as presented in the biomedical literature. We used a systematic search and selection process, followed by a critical conceptual analysis. The systematic search resulted in nine publications. Our conceptual analysis aims to clarify how societal permission can be granted to health research projects which rely upon the reuse and/or linkage of health data. These activities may be morally demanding. For these types of activities, a moral legitimation, beyond the limits of law, may need to be sought in order to preserve trust. Our analysis indicates that a social licence encourages us to recognise a broad range of stakeholder interests and perspectives in data-intensive health research. This is especially true for patients contributing data. Incorporating such a practice paves the way towards an ethical governance, based upon trust. Public engagement that involves patients from the start is called for to strengthen this social licence.


There are several merits to using the concept of social licence as a guideline for ethical governance. Firstly, it fits the novel scale of data-related risks; secondly, it focuses attention on trustworthiness; and finally, it offers co-creation as a way forward. Greater trust can be achieved in the governance of data-intensive health research by highlighting strategic dialogue with both patients contributing the data, and the public in general. This should ultimately contribute to a more ethical practice of governance.

Read the full paper here

Published on: 11/05/2021