Petasort:
How to Sort 1PB of Data in Under a Day
MapReduce forms the basis of a great deal of the computation done at Google. We
are constantly striving to scale up MapReduce-based programs to larger problem sizes, and use sorting as a stress test for the
MapReduce framework. This talk will
discuss the issues and challenges we encountered when scaling our sorting
benchmark (and systems it relies upon) up to a 1PB input. We will describe
the "Google way" of solving large problems quickly by layering
paranoid software on inexpensive (possibly unreliable) hardware.