Lessons from Jeff Dean

As one of the most influential figures in computer science and artificial intelligence, Jeff Dean's insights have guided the development of large-scale distributed systems and the advancement of machine learning.

On Machine Learning and Artificial Intelligence

"In recent years, I think machine learning has really changed our expectations of what we think of computers being able to do." [1]
"Every time we scale things up, things get better. All of a sudden, new capabilities emerge or the accuracy of some problem reaches a threshold, where before it was kind of unusable and now all of a sudden it becomes usable, and that enables new kinds of things." [1][2]
"If you think back 10 or 15 years ago, speech recognition kind of worked, but it wasn't really seamless — it made lots of errors. Computers didn't really understand images from the pixel level of what was in that image." [1]
"We now have computers that can see and sense and that's a completely different ball game." [2]
"Increasing scale, larger scale use of compute resources, specialized compute... larger and more interesting and richer data sets, larger scale machine learning models, all scaling all of those things tends to deliver better results and that's been true for the last 10 to 15 years." [2]
"If you can't understand what's in information then it's going to be very difficult to organize it." [3]
"They've learned they can take a whole bunch of subsystems, some of which may be machine learned, and replace it with a much more general end-to-end machine learning piece. Often when you have lots of complicated subsystems there's usually a lot of complicated code to stitch them all together. It's nice if you can replace all that with data and very simple algorithms." [3]
"By increasing the size of the model it can remember not just the obvious things but it can remember the subtle kinds of patterns that occur maybe in only a tiny fraction of the examples in the data set." [3]
"The reason neural networks didn't take off in the 90s was a lack of computational power and a lack of large interesting data sets." [3]
"One of the things that's really happened in the last 5 or 6 years, that has caused machine learning to really take off, is that we now have enough computational power, and large enough and interesting real-world datasets, to solve problems that previously we weren't able to solve in any other way: problems in computer vision, speech recognition and language understanding." [4]
"There have been a number of surprising results in the last 5 years or so in machine learning— things that I didn't think computers could really do that all of a sudden they can now do." [4]
"Recently, advances in machine learning have mostly come in areas where we have large labeled datasets: problems where we have the inputs and then the desired outputs for those problems." [4]
"We think there's going to be hundreds of thousands of organizations impacted by machine learning in the next 5 or 10 years." [4]
"Very simple techniques, when you have a lot of data, work incredibly well." [5]
"If you only have 10 examples of something, it's going to be hard to make deep learning work. If you have 100,000 things you care about, records or whatever, that's the kind of scale where you should really start thinking about these kinds of techniques." [5]
"There's a lot of work in machine learning systems that is not actually machine learning." [5]
"Supervised learning works so well when you have the right data set, but ultimately unsupervised learning is going to be a really important component in building really intelligent systems - if you look at how humans learn, it's almost entirely unsupervised." [5]
"We want to build systems that can generalize to a new task. Being able to do things with much less data and with much less computation is going to be interesting and important." [5]
"I think true artificial general intelligence would be a system that is able to perform human-level reasoning, understanding, and accomplishing of complicated tasks. We are clearly not there yet." [6]
"Bigger model, more data, better results,’ which has been sort of relatively true for the last 12 or 15 years." [7]
"The models that we have are capable of doing really interesting things. They can't solve every problem; they can solve a growing set of problems year over year because the models get better, we have better algorithmic improvements that show us how to train larger models with the same compute cost, more capable models." [7]
"Current: solution = machine learning expertise + data + computation. Future: solution = data + 100X computation." [8]
"A way to get to more data efficient algorithms is to train one large model which can do 1000 different things. Then adding 1001st thing doesn't require much data, it can use representations it already learned for previous 1000 tasks." [8]

On Systems and Infrastructure

"I've always liked code that runs fast." [5]
"In a lot of these areas, from machine translation to search quality, you're always trying to balance what you can do computationally with each query." [5]
"Some things are easier to parrellelize than others. It's pretty easy to train up 100 models and pick the best one. If you want to train one big model but do it on hundreds of machines, that's a lot harder to parallelize." [5]
"Traditional systems I think like operating systems compilers memory allocators and so on they don't make extensive use of machine learning today uh but that should change and is starting to change." [9]
"We know parallelism is a really good solution for a lot of kinds of large scale problems." [9]
"In general I think we need to use much more compute and much more parallelism." [9]
A key design lesson is to plan for inevitable failures in large-scale systems, such as disk failures and server outages. [10]
He provided a widely cited list of "Numbers Everyone Should Know" which gives latency numbers for common computer operations, crucial for performance engineering. [10]
"A typical first year for a new cluster" involves numerous failures, including overheating, PDU failures, and rack failures, emphasizing the need for fault-tolerant design. [10]
"Often when you have lots of complicated subsystems there's usually a lot of complicated code to stitch them all together. It's nice if you can replace all that with data and very simple algorithms." [3]
When building large-scale systems, it's important to decompose them into services. [11]
At large scale, failures are common, so you need to minimize them and have fast recovery mechanisms. [12]
"We needed about a million times as much processing power to get them to really start to work well on real problems that you might sort of care about." [13]
The evolution of computing is shifting to both very small devices and large, consolidated computing farms. [14]
When designing world-wide systems, a key challenge is the automatic and dynamic placement of data and computation to minimize latency and cost under various constraints. [14]

On Innovation and Research

"It's nice to have short-term to medium-term things that we can apply and see real change in our products, but also have longer-term, five to 10 year goals that we're working toward." [5]
"As a society I think we are going to be much better off by having machines that can work in conjunction with humans to do things more efficiently and even better in some cases. That will enable humans to do things that they do better than machines." [5]
"I find that often when you bring people together to work on problems who have different kinds of expertise, you often end up with something that was better than if you brought in a more homogenous group of people with more narrow expertise in one area." [6]
"The machine learning community moves really really fast." [3]
"I worry policymakers are not putting enough attention on what we should be planning for 10 years down the road." [5]
"The combination of people that have those two different kinds of skills [large-scale computing and machine learning] coming together to work on problems often leads to significant advances that people with just machine learning skills or just large-scale computing skills often could not achieve on their own." [6]
He co-founded the Google Brain project in 2011 to focus on making progress towards intelligent machines. [15]
He was a co-creator of distillation, a technique to transfer knowledge from a large neural network to a smaller, more efficient one. [15]
He was instrumental in transitioning Google Translate to a neural machine translation system, significantly improving quality. [15]
A significant part of his work focuses on applying AI to problems that can benefit billions of people in socially beneficial ways. [15]
The development of specialized hardware like Tensor Processing Units (TPUs) was crucial for advancing machine learning by providing the necessary computational power with a different kind of computation. [8]
He named the system built for training early large-scale neural networks "Disbelief" because many people were skeptical it would work. [13]

Learn more:

Lessons from Jeff Dean

On Machine Learning and Artificial Intelligence

On Systems and Infrastructure

On Innovation and Research

Written by Antoine Buteau

Lessons from Fidji Simo

Lessons from Olivier Pomel