Petasort: How to Sort 1PB of Data in Under a Day


MapReduce forms the basis of a great deal of the computation done at Google
.  We are constantly striving to scale up MapReduce-based programs to larger problem sizes, and use sorting as a stress test for the MapReduce framework.  This talk will discuss the issues and challenges we encountered when scaling our sorting benchmark (and systems it relies upon) up to a 1PB input.  We will describe the "Google way" of solving large problems quickly by layering paranoid software on inexpensive (possibly unreliable) hardware.