1. ESRC Genomics Network (archive)
  2. Gengage
  3. The Human Genre Project

Egenis · Research

New Directions in Genomics: Philosophical Issues in Bioinformatics (2008-2011)

Sabina Leonelli

Start date



Researcher: Email: s.leonelli@exeter.ac.uk


Within genomics, new technologies have appeared with the potential to revolutionize traditional means of data collection and dissemination. Sequencing techniques and microarray experiments are only two examples of methods allowing biologists to acquire data on a scale and at a speed never witnessed before. Bioinformatic tools such as online databases and bio-ontologies provide the means to store these masses of data and make them retrievable for future research. These tools not only help with data circulation, but also with data analysis: contemporary bioinformatics has the ambitious goal to enable the integration of available knowledge about organisms, thus fostering the elaboration of new research hypotheses to be tested and even new discoveries.

Bioinformaticians are so successful in pursuing these goals that many biologists decide which topic and organism to work upon on the basis of the quality of the relevant databases. As testified by the quantity of journals, institutes and training programmes devoted to bioinformatic resources, research sponsors and biologists have seized upon these novel opportunities very rapidly, without however elaborating long-term strategies for managing their impact on research practices. Now that bioinformatics has matured and become a backbone for research on the bench, the time has come for a systematic assessment of what is going on. Systematic philosophical analysis allows the evaluation of how bioinformatics challenges traditional notions of what counts as theories, evidence and data in biology, thus ultimately shedding light on what bioinformaticians mean by ‘designing ontologies’ and ‘integrating knowledge’ through databases.


Subproject 1: Bio-ontologies

My starting point is a philosophical evaluation of the epistemic status of bio-ontologies - that is, the networks of terms used to classify data for dissemination through digital databases. Bio-ontologies aim to enable collaboration across research cultures. Because of this pragmatic motivation, bioinformaticians refer to terms used in bio-ontologies as de facto standards, to be adopted or rejected depending on how helpful they prove towards conducting research on the bench. The characterisation of bio-ontologies as standards does not, however, clarify the implications of choosing specific definitions (to the exclusion of others) to describe the phenomena at hand. Further, it masks their normative role in determining what counts as a research object and in retrieving what is known about it.

I examine the interpretive steps involved in the construction and use of bio-ontologies, and particularly the ways in which concepts are chosen and applied to classify, display and further experimental results (both at the level of gene products and at higher levels of biological organisation). I also compare the development and applications of bio-ontologies in the medical and the biological realm, focusing particularly on (1) how bio-ontologies are used to classify human and non-human data (as for instance in the case of the , used to classify data acquired through clinical trials, versus the Gene Ontology, used to classify data from well-established model organisms such as yeast, fruit-flies, thale cress and mice) and (2) how useful bio-ontologies are as taxonomies for Genome Wide Application Studies (GWAS).

This study aims to provide additional criteria to assess the quality, reliability and future impact of bioinformatic resources, thus helping practitioners to use those resources with full awareness of the ontological and epistemic commitments involved in developing and using tools such as bio-ontologies. By comparing the bioinformatic handling of human and non-human data, the study also aims to identify factors that make the classification and sharing of human data different and more complex than the classification and sharing of data about model organisms.

Subproject 2: Data-driven Research

What does it mean for research to be based on empirical evidence? This question, one of the oldest within the philosophy of science, is being reformulated and reconsidered within contemporary biological and biomedical science. In these areas, technological innovation and shifting ideas about what counts as evidence have transformed current practices of data collection. In particular, the activity of data gathering appears to have acquired relative independence from other scientific activities such as hypothesis-testing, theorisation and explanation. Up to the second half of the 20th century, biological data were largely produced as evidence to support a specific experimental hypothesis. Thanks to high-throughput technologies such as sequencing and micro-array analysis, the activity of data gathering has become increasing automated and technology-driven, resulting in the production of billions of data-points in need of a biological interpretation. Evidence-based medicine has fostered a similar attention shift to data collection within biomedical research, by placing data obtained through clinical trials at the top of the hierarchy of evidence. Massive research efforts are being devoted to the dissemination of data, in the hope that they can be used to generate new insights. Several commentators have argued that the extraction of knowledge from automatically generated data may constitute a new approach to scientific method, described as ‘data-driven’.

This project examines the characteristics of data-driven research and its significance for future research from the perspectives of philosophical, historical and social studies of science. The aim of the project is to reach an understanding of how data collection and use affect the production of scientific knowledge, and of the role played by theory and hypotheses in this process. If data-driven research constitutes a distinctive mode of knowledge production, how can it be characterised, and how innovative is it with respect to existing or past scientific practices? What is the role of theoretical assumptions and hypotheses within research practices that are currently referred to as data-driven, and what are the relationships more generally between data-driven and hypothesis-driven research?

Project update

Results from this project are being further pursued in the broader framework of the project `'.


Leonelli, S. and Ankeny, R.A. (2012) Re-Thinking Organisms: The Epistemic Impact of Databases on Model Organism Biology. Studies in the History and the Philosophy of the Biological and Biomedical Sciences: Part C.

Leonelli, S. (Ed.) (2012) Special issue ‘Data-driven research in the biological and biomedical sciences’ Studies in the History and the Philosophy of the Biological and Biomedical Sciences: Part C.

Leonelli, S. (2010) The Commodification of Knowledge Exchange: Governing the Circulation of Biological Data. In: Radder, H. (ed) The Commodification of Academic Research: Science and the Modern University. Pittsburgh University Press.

Leonelli, S. (2010) Documenting the Emergence of Bio-Ontologies: Or, Why Researching Bioinformatics Requires HPSSB. History and Philosophy of the Life Sciences, 32, 1: 105-126.

Leonelli, S. (2009) On the Locality of Data and Claims About Phenomena. Philosophy of Science 76(2).

Leonelli, S. (2009) The Role of Bio-Ontologies In Data-Driven Research: A Philosophical Perspective. Proceedings of the International Conference for Biomedical Ontologies.

De Regt, H., Leonelli, S. and Eigner, K. (eds) (2009) Philosophical Perspectives on Scientific Understanding, Pittsburgh University Press.

Leonelli, S. (2009) The Impure Nature of Biological Knowledge. In: de Regt, H. et al (eds.), Scientific Understanding: A Philosophical Perspective. Pittsburgh University Press.

Leonelli, S. (2008) Bio-Ontologies as Tools for Integration in Biology. Biological Theory, 3(1).

Leonelli, S. (2008) Circulating Evidence Across Research Contexts: The Locality of Data and Claims in Model Organism Biology, LSE Working Papers on the Nature of Evidence: How Well Do ‘Facts’ Travel?, 25.