Keynote - John Mattick

Genetic Information and the Data Avalanche in Biology

The human genome comprises 6 billion pairs of its 4 constituent units (“bases”) of DNA. It programs the development of an exquisitely sculpted organism with 100 trillion cells arranged into a myriad of organs, muscles and bones, as well as a brain with 100 billion neurons, each with an average of 10,000 connections.

It is 57 years since Watson and Crick solved the structure of DNA. The field gathered speed in the mid-1970s with the introduction of DNA cloning and sequencing, which enabled decoding of the information, much of which, surprisingly, lay outside conventional genes and is assumed (incorrectly) to be junk. These technologies became progressively more sophisticated over the ensuing years, to the point that by 1990 it was feasible to consider the decoding the entire genome. The first draft sequence was completed ten years later. It involved thousands of DNA sequencing machines and cost several billion dollars. It was a tour-de-force, the attempt of which, let alone achievement, was inconceivable just 20 years earlier.

However, this was just the beginning, and the pace of change since has been dizzying. Over the past few years a beautiful intersection of nanotechnologies, optical technologies and DNA technologies has revolutionized DNA sequencing. The volume of data has exploded, and the cost is dropping like a stone. The latest generation of automated machines can generate 200 billion bases (“gigabases”) of DNA sequence per run (over 60 human genome equivalents), in just a few days.

The cost of sequencing a human genome is now less than $10,000. The amount of DNA sequence being deposited in databases is growing at 5-10-fold per annum, which makes Moore’s Law of computing look lethargic. New technologies are coming, such as reading sequence by electrical charge disturbance as molecules pass through nanopores, and is expected that sequencing costs will fall much further in the near future.

This information avalanche is transforming our understanding of biology. Sooner rather than later, every genome of scientific or practical interest will be sequenced. Individual genome sequencing will soon become standard in medicine. This will be expanded by data concerning the epigenome (contextual chemical changes to DNA and the proteins around which it is wrapped) and the transcriptome (the repertoire of RNAs produced from the genome), both of which are far more complex that the genome itself.

This is creating huge challenges that for present and future global collaboration, and therefore dependency on network connectivity between researchers, clinicians and HPC, well beyond that which is available today. These include a massive increase in data storage, which will soon rise to exabytes, and then zettabytes, as well as requirements for data repositories that are scalable in both disk capacity and I/O performance. There will be equal challenges (and opportunities) associated with the development and integration of new software and visualization tools to store and interrogate this data deluge - the “Fourth Paradigm” enunciated by Jim Gray. More exciting is the emerging realization that evolution discovered the power of advanced information systems well before we did: the human genome is the most sophisticated zip file of hardware and software specification yet known, with many lessons to be learnt for both biology and IT.

John Mattick's Biography
John Mattick is the Professor of Molecular Biology and NHMRC Australia Fellow at the Institute for Molecular Bioscience. He obtained his BSc from the University of Sydney and his PhD in Biochemistry from Monash University. Subsequently he worked at Baylor College of Medicine in Houston, the CSIRO Division of Molecular Biology in Sydney, and the University of Queensland, where he has been based since 1988. He has also spent significant periods at the Universities of Cambridge, Oxford, Cologne and Strasbourg. He was Foundation Director of the Australian Genome Research Facility, two ARC Special Research Centres and the Institute for Molecular Bioscience. His honours and awards include the Pharmacia-LKB Biotechnology Medal of the Australian Biochemical Society, Honorary Fellowship of the Royal College of Pathologists of Australasia, the Centenary Medal of the Australian Government, the Eppendorf Achievement Award and Julian Wells Medal of the Lorne Genome Conference, the inaugural Gutenberg Professorship of the University of Strasbourg and appointment as an Officer in the Order of Australia. He was elected an Associate Member of the European Molecular Biology Organisation in 2007 and a Fellow of the Australian Academy of Science in 2008. His research focus is the genomic programming of human development and cognition, using an integrated suite of theoretical, computational and experimental approaches. He has published almost 250 papers and his work has received editorial coverage in Science, Nature, Scientific American and the New York Times.

Back to Questnet 2011 Keynote Speakers