Data flows in genomic and environmental science: Replication, durability and metrology
Ruth McNally, Adrian Mackenzie
Affiliated staffJennifer Tomomitsu, Allison Hui
Funded bye-Science Institute
In the life sciences, next generation sequencing (NGS) and environmental networked sensors (ENS) epitomise very different data ‘topographies’. NGS and ENS not only designate different sources of data (sequencers, sensors), but send very different flows of data across platforms, instruments, repositories, centres, applications and publications. NGS and ENS also delineate the interface between application domain experts (genomic scientists, environmental scientists and other scientific expertise) and technical domains experts (computer scientists, software and electronic engineers) differently. This project explores problems of replication, durability, and metrology in NGS and ENS data flows and identifies emerging devices and collaborative practices to enable e-science work in these areas.
Firstly, this project will contribute to the challenge of developing a richly described sociology of data (as identified by A. Szalay, John Hopkins University) that takes into account problems of replicability, durability and metrology, and offers an empirically grounded typology of the so-called ‘data deluge’. In doing this, it will develop an awareness of the problems, obstacles, friction points and gaps that hinder transformations or the reshaping of data flows in data intensive sciences. Secondly, it will develop an awareness of alternative ways of thinking about data flows in genomics and environmental sciences. Thirdly, it will identify practices and devices in the conduct of e-science that sustain collaborative development. Finally, it will develop an alternative socio-technical model that opens up new avenues for interdisciplinary collaboration with high throughput data flows.
MethodsA series of three workshops has provided opportunities to collaboratively explore data flows within NGS and ENS. The first workshop focused upon next generation sequencing for genomics and the second on environmental networked sensors, involving participants from government agencies, international academic research groups, and specialist software and infomatics companies. Both workshops provided opportunities for domain and technical scientists to present on how their work addresses data flows. We also designed and constructed a visual-based method to engage participants around common topics of interest. The final workshop explores re-presenting descriptions of the data flows in these two fields to a new group of domain scientists, in order to explore and validate images of the future challenges and promises of data flows.
Mackenzie, A., McNally, R., Tomomitsu, J. and Hui, A. 2011. Understanding the 'intensive' in 'data intensive': Data flows in Next Generation Sequencing and Environmental Networked Sensors. International Journal of Digital Curation. Under review.
Mackenzie, A., McNally, R., Tomomitsu, J. and Hui, A. 2011. Data flows in genomic and environmental science. Final Report on mini-theme at e-Science Institute .
Further information, including all of the workshop presentations, is available at the mini-theme Wiki on the website of the e-Science Institute. From this Wiki it is also possible to access a recording of the public lecture on this project, delieverd by McNally and Mackenzie at the e-Science Institute, Edinburgh, on Tuesday 28th June 2011. The wiki is available here