Session 1 (09:00 - 10.30)
Keynote: Things that go bump in the night (and how to
sleep through them)
Speaker: Michael Christian, Infrastructure Resilience, Yahoo!
There is a widespread belief in our industry that our data-centers are supremely reliable, capable of providing us true five-9s service, and that by hosting our platforms in these expensive massively redundant locations, we will be safe from all ills. As a result, business continuity planning is often approached from a classic DR "Airplane Into Building" perspective, where insufficient thought, energy, planning, and practice is put into a solution never expected to be used.
The truth is that data-centers DO fail, sometimes for the oddest of reasons. This is not a knock against the infrastructure designers of our industry, who have created some of the most efficient, redundant, and innovative buildings in history; it is merely a statement of fact. As redundancy is added, complexity increases, adding more links to a system at the mercy of the weakest. Something as simple as a border router failure can effectively knock an entire building of compute offline, regardless of how many generators it has.
By starting under the assumption that a particular data-center WILL fail at some point, it becomes significantly easier to build platforms that can be quickly and transparently shifted from location to location. Where designing for the unthinkable leads to poorly thought out solutions, designing for the every-day leads to useable and indeed useful tooling. Highly available internet platforms are not nearly as technically difficult as they are culturally difficult.
I'll interleave a history of massive outages during the last decade with proven
recovery solutions and strategies, with the hope of swaying our collective
industry from a disaster insurance approach to a truly always-on
philosophy. This talk is loosely based on Chapter 17 of O'Reilly's Web
Operations, by the same author/speaker.
Ant Rowstron (Microsoft Research, Cambridge), Dushyanth Narayanan
(Microsoft Research, Cambridge), Austin Donnelly (Microsoft Research,
Cambridge), Greg O'Shea (Microsoft Research, Cambridge) and Andrew Douglas
(Microsoft Research, Cambridge). Nobody ever got fired for using
Coffee break (10:30 - 11.00)
Session 2 (11:00 - 12.30)
Keynote: Programming and Debugging Large-Scale Data
Speaker: Christopher Olston, Google
This talk gives an overview of my former team's work on large-scale
data processing at Yahoo! Research. The talk begins by introducing two
data processing systems we helped develop: PIG, a dataflow programming
environment and Hadoop-based runtime, and NOVA, a workflow manager for
Pig/Hadoop. The bulk of the talk focuses on debugging, and looks at
what can be done before, during and after execution of a data
* Pig's automatic EXAMPLE DATA GENERATOR is used before running a
Pig job to get a feel for what it will do, enabling certain kinds of
mistakes to be caught early and cheaply. The algorithm behind the
example generator performs a combination of sampling and synthesis to
balance several key factors---realism, conciseness and
completeness---of the example data it produces.
* INSPECTOR GADGET is a framework for creating custom tools that
monitor Pig job execution. We implemented a dozen user-requested
tools, ranging from data integrity checks to crash cause investigation
to performance profiling, each in just a few hundred lines of code.
* IBIS is a system that collects metadata about what happened during
data processing, for post-hoc analysis. The metadata is collected from
multiple sub-systems (e.g. Nova, Pig, Hadoop) that deal with data and
processing elements at different granularities (e.g. tables vs.
records; relational operators vs. reduce task attempts) and offer
disparate ways of querying it. IBIS integrates this metadata and
presents a uniform and powerful query interface to users.
Bio: Christopher Olston is a staff research scientist at Google, working on
structured data. He previously worked at Yahoo! (principal research
scientist) and Carnegie Mellon (assistant professor). He holds
computer science degrees from Stanford (2003 Ph.D., M.S.; funded by
NSF and Stanford fellowships) and UC Berkeley (B.S. with highest
Olston just started at Google in November 2011, so he hasn't done
anything there yet. At Yahoo, Olston co-created Apache Pig, which is
used for large-scale data processing by LinkedIn, Netflix, Salesforce,
Twitter, Yahoo and others, and is offered by Amazon as a cloud
service. Olston gave the 2011 Symposium on Cloud Computing keynote,
and won the 2009 SIGMOD best paper award. During his flirtation with
academia, Olston taught undergrad and grad courses at Berkeley,
Carnegie Mellon and Stanford, and signed several Ph.D. dissertations.
- Malte Schwarzkopf (University of Cambridge) and Steven Hand (University of
Cambridge). Bringing the cloud down to earth
Lunch break (12:30 - 14.00)
Session 3 (14:00 - 15.30)
- Nathan Backman (Brown Univesity), Ugur Cetintemel (Brown Univesity) and
Rodrigo Fonseca (Brown University). Managing Parallelism for Stream
Processing in the Cloud
- Masoud Saeida Ardekani (UPMC - LIP6), Marek Zawirski (INRIA & UMPC - LIP6),
Pierre Sutra (INRIA & UPMC - LIP6) and Marc Shapiro (INRIA & UPMC -
LIP6). The Space Complexity of Transactional Interactive Reads
- Panel: Topic TBA