Starting Points in Distributed Systems, Pt. I: Background

This is the first part of a three-part miniseries about building a conceptual foundation in distributed systems through independent study. In this series, I sketch out the map that I wish I’d had when I started studying last year, drawing from my own experience and currently available resources. My hope is that this informal guide will assist fellow students on their own journeys.

Background: my experience

During the second half of 2013 I undertook an independent study of distributed computer systems, with a particular focus on distributed databases. Like many others–or so I imagine–I started with a few conversations with a knowledgeable friend and a few citations to follow: Leslie Lamport’s “Time, Clocks, and the Ordering of Events in a Distributed System”, the Dynamo paper, and a recommended FAQ or two about something called the CAP theorem. Over the following months, I expanded my knowledge by branching outward from those starting points. I built a conceptual foundation in the field primarily through a process of reading papers, articles, blog posts, and product documentation, and by following the citation chains between sources. In addition to internal citations, I also found more personal direction through helpful fellow attendees of conferences and meet-ups, and more programmatic direction through university course websites and reading lists.

Although my academic training in another field prepared me to approach a new field by tracing webs of citations and joining networks of students, researchers, and teachers, I still found this to be an especially challenging process. In particular, I often felt a need for more integrative and intermediate resources for students. By volume, most of the resources on distributed systems that I found fell into one of two categories: i) documentation and marketing presentations of a particular software product or service, or ii) formal papers by academic researchers and/or industry practitioners. Working with resources of the first type, I had difficulty identifying broader contexts for particular systems, their relationships and interconnections, and their conceptual foundations. In papers of the second type, I struggled with unfamiliar formalisms and the high level of assumed background in computer science and mathematics. I found myself wanting an overarching structure to help integrate concepts and approaches across systems.

Since I started learning in the spring of 2013, new resources have been created to help students, both by simplifying the task of assembling and navigating the relevant literature, and by offering an accessible introduction to distributed systems as a field. To really engage the distributed systems literature, you’ll still need venture into the deep, dark, forest of formal papers for and by specialists, but a decent map or two can dramatically facilitate your progress.

In that spirit, the rest of this series describes where I would tell a friend–or my past self–to start today. I begin here, with a few prerequisites, continue next time with an orientation in, and survey of part of the distributed systems landscape, and conclude with a third post, in which I sketch out an exploratory method for continuing the journey.

Prerequisites, or a prelude

This may go without saying, but to make substantial progress towards understanding distributed systems, you’ll need at least a working knowledge of computer programming, systems, and networking.

For help building that knowledge in computer networking, I can personally recommend the Computer Networks course on Coursera offered by the University of Washington’s David Wetherall, Arvind Krishnamurthy, and John Zahorjan. This free course is taught “at the upper-undergraduate level.” I found the lectures, problem sets, and exams to be challenging but accessible. Since this course focuses in large part on the Internet and web applications, you’ll actually be learning quite a bit about distributed systems and some key algorithms already. In that sense, you might think of this, or a similar course, as a prelude to your more extended study.

If you push through to formal computer science papers in later parts of your studies, you’ll also want to be able to read and think with formal proofs and descriptions of algorithms. When I started, I wasn’t anywhere near the level I needed to be on to work through papers by researchers like Nancy Lynch and Leslie Lamport. I found that I could learn quite a bit from the introductory and concluding sections of papers, but I needed to do some substantial catch-up work in mathematics to be able to (start to) follow the meatier sections.

If you’re in a similar position, I can personally recommend Daniel J. Velleman’s How to Prove It: A Structured Approach (2nd ed., ISBN: 978-0521675994) on proofs and Tom Jenkyns and Ben Stephenson’s Fundamentals of Discrete Math for Computer Science: A Problem-Solving Primer (ISBN: 978-1447140689) on algorithms. Unlike the overwhelming majority of resources I’ll be mentioning in these posts, these aren’t free, but they are relatively inexpensive paperbacks that I consider to have been excellent investments.

In the next post, we’ll start to better orient ourselves in the distributed systems landscape and take on a survey.