Metagenomics and High Performance Computing

It has become a cliché to state that the biological sciences have become information sciences. Vastly increased volumes of experimentally acquired genomic and proteomic data hint at rich new insights in many areas of the biological sciences, but the demands they place on computing for their analysis are just as great. This is one of the many reasons why scientists from the more traditional areas of high performance computing have been attracted into biology. However, the character of this computing has changed -- away from simulation upon which much of our high performance computing expertise is based.

This talk discusses my journey in moving from traditional computational simulation into high performance bioinformatics. The motivation occurs through global climate modeling and the very large contribution that the microbial biology of the ocean has upon the carbon dioxide budget in ocean models. Current microbial models incorporated into ocean models presume knowledge of the organisms present and their metabolism. In reality, recent "metagenomic" ocean surveys have shown that most organisms are not known or understood, nor do we know about their spatial and temporal distribution. So, how would we use this new information to evaluate the performance of current models or build new ones?

Metagenomics is the study of microbial communities in situ. Over 99% of microbes in the ocean cannot be studied in the lab, because they cannot be separated from the symbiosis of their community and survive. Their genomes must be acquired together and teased apart with new computational algorithms. I will discuss work in sequence based and similarity based algorithms to categorize the mixed fragments of DNA for assembly into complete genomes. Comparison of these genome fragments and complete genomes can be performed through multiple alignment algorithms. Both of these algorithmic tasks are now overwhelming our high performance computing capability and point the way to fertile new fields for algorithm developers.