Metagenomics and High Performance Computing
It has become a cliché to state that the biological sciences have become information sciences. Vastly increased volumes of experimentally acquired genomic and proteomic data hint at rich new insights in many areas of the biological sciences, but the demands they place on computing for their analysis are just as great. This is one of the many reasons why scientists from the more traditional areas of high performance computing have been attracted into biology. However, the character of this computing has changed -- away from simulation upon which much of our high performance computing expertise is based.
This talk discusses my journey in moving from
traditional computational simulation into high performance bioinformatics. The
motivation occurs through global climate modeling and the very large
contribution that the microbial biology of the ocean has upon the carbon
dioxide budget in ocean models. Current microbial models incorporated into
ocean models presume knowledge of the organisms present and their metabolism.
In reality, recent "metagenomic" ocean surveys have shown that most
organisms are not known or understood, nor do we know about their spatial and
temporal distribution. So, how would we use this new information to evaluate
the performance of current models or build new ones?
Metagenomics is the study of microbial
communities in situ. Over 99% of microbes in the ocean cannot be studied in the
lab, because they cannot be separated from the symbiosis of their community and
survive. Their genomes must be acquired together and teased apart with new
computational algorithms. I will discuss work in sequence based and similarity
based algorithms to categorize the mixed fragments of DNA for assembly into
complete genomes. Comparison of these genome fragments and complete genomes can
be performed through multiple alignment algorithms. Both of these algorithmic
tasks are now overwhelming our high performance computing capability and point
the way to fertile new fields for algorithm developers.