Chapter 4. GEOFFREY HINTON
In the past when AI has been overhyped—including backpropagation in the 1980s—people were expecting it to do great things, and it didn’t actually do things as great as they hoped. Today, it’s already done great things, so it can’t possibly all be just hype.
EMERITUS DISTINGUISHED PROFESSOR OF COMPUTER SCIENCE, UNIVERSITY OF TORONTO VICE PRESIDENT & ENGINEERING FELLOW, GOOGLE
Geoffrey Hinton is sometimes known as the Godfather of Deep Learning, and he has been the driving force behind some of its key technologies, such as backpropagation, Boltzmann machines, and the Capsules neural network. In addition to his roles at Google and the University of Toronto, he is also Chief Scientific Advisor of the Vector Institute for Artificial Intelligence.
MARTIN FORD: You’re most famous for working on the backpropagation algorithm. Could you explain what backpropagation is?
GEOFFREY HINTON: The best way to explain it is by explaining what it isn’t. When most people think about neural networks, there’s an obvious algorithm for training them: Imagine you have a network that has layers of neurons, and you have an input at the bottom layer, and an output at the top layer. Each neuron has a weight associated with each connection. What each neuron does is look at the neurons in the layer below and it multiplies the activity of a neuron in the layer below by the weight, then adds all that up and gives an output that’s a function of that sum. By adjusting the weights on the connections, you can get networks that do anything you like, such as looking at a picture of a cat and labeling it as a cat.
The question is, how should you adjust the weights so that the network does what you want? There’s a very simple algorithm that will actually work but is incredibly slow—it’s a dumb mutation algorithm—where you start with random weights on all the connections, and you give your network a set of examples and see how well it works. You then take one of those weights, and you change it a little bit, and now you give it another set of examples to see if it works better or worse than it did before. If it works better than it did before, you keep the change you made. If it works worse than it did before, you don’t keep that change, or perhaps you change the weight in the opposite direction. Then you take another weight, and you do the same thing.
You have to go around all of the weights, and for each weight, you have to measure how well the network does on a set of examples, with each weight having to be updated multiple times. It is an incredibly slow algorithm, but it works, and it’ll do whatever you want.
Backpropagation is basically a way of achieving the same thing. It’s a way of tinkering with the weights so that the network does what you want, but unlike the dumb algorithm, it’s much, much faster. It’s faster by a factor of how many weights there are in the network. If you’ve got a network with a billion weights, backpropagation is going to be a billion times faster than the dumb algorithm.
The dumb algorithm works by having you adjust one of the weights slightly, followed by you measuring to see how well the network does. For evolution, that’s what you’ve got to do because the process that takes you from your genes to the finished product depends on the environment you’re in. There’s no way you can predict exactly what the phenotype will look like from the genotype, or how successful the phenotype will be because that depends on what’s going on in the world.
In a neural net, however, the processor takes you from the input and the weights to how successful you are in producing the right output. You have control over that whole process because it’s all going on inside the neural net; you know all the weights that are involved. Backpropagation makes use of all that by sending information backward through the net. Using the fact that it knows all the weights, it can compute in parallel for every single weight in the network, whether you should make it a little bit bigger or smaller to improve the output.
The difference is that in evolution, you measure the effect of a change, and in backpropagation, you compute what the effect would be of making a change, and you can do that for all the weights at once with no interference. With backpropagation you can adjust the weights rapidly because you can give it a few examples, then backpropagate the discrepancies between what it said and what it should have said, and now you can figure out how to change all of the weights simultaneously to make all of them a little bit better. You still need to do the process a number of times, but it’s much faster than the evolutionary approach.
MARTIN FORD: The backpropagation algorithm was originally created by David Rumelhart, correct, and you took that work forward?
GEOFFREY HINTON: Lots of different people invented different versions of backpropagation before David Rumelhart. They were mainly independent inventions, and it’s something I feel I’ve got too much credit for. I’ve seen things in the press that say I invented backpropagation, and that’s completely wrong. It’s one of these rare cases when an academic feels he’s got too much credit for something! My main contribution was to show how you can use it for learning distributed representations, so I’d like to set the record straight on that.
In 1981, I was a postdoc in San Diego, California and David Rumelhart came up with the basic idea of backpropagation, so it’s his invention. Myself and Ronald Williams worked with him on formulating it properly. We got it working, but we didn’t do anything particularly impressive with it, and we didn’t publish anything. After that, I went off to Carnegie Mellon and worked on the Boltzmann machine, which I think of as a much more interesting idea, even though it doesn’t work as well. Then in 1984, I went back and tried backpropagation again so I could compare it with the Boltzmann machine, and discovered it actually worked much better, so I started communicating with David Rumelhart again.
What got me really excited about backpropagation was what I called the family trees task, where you could show that backpropagation can learn distributed representations. I had been interested in the brain having distributed representations since high school, and finally, we had an efficient way to learn them! If you gave it a problem, such as if I was to input two words and it has to output the third word that goes with that, it would learn distributed representations for the words, and those distributed representations would capture the meanings of the words.
Back in the mid-1980s, when computers were very slow, I used a simple example where you would have a family tree, and I would tell you about relationships within that family tree. I would tell you things like Charlotte’s mother is Victoria, so I would say Charlotte and mother, and the correct answer is Victoria. I would also say Charlotte and father, and the correct answer is James. Once I’ve said those two things, because it’s a very regular family tree with no divorces, you could use conventional AI to infer using your knowledge of family relations that Victoria must be the spouse of James because Victoria is Charlotte’s mother and James is Charlotte’s father. The neural net could infer that too, but it didn’t do it by using rules of inference, it did it by learning a bunch of features for each person. Victoria and Charlotte would both be a bunch of separate features, and then by using interactions between those vectors of features, that would cause the output to be the features for the correct person. From the features for Charlotte and from the features for mother, it could derive the features for Victoria, and when you trained it, it would learn to do that. The most exciting thing was that for these different words, it would learn these feature vectors, and it was learning distributed representations of words.
We submitted a paper to Nature in 1986 that had this example of backpropagation learning distributed features of words, and I talked to one of the referees of the paper, and that was what got him really excited about it, that this system was learning these distributed representations. He was a psychologist, and he understood that having a learning algorithm that could learn representations of things was a big breakthrough. My contribution was not discovering the backpropagation algorithm, that was something Rumelhart had pretty much figured out, it was showing that backpropagation would learn these distributed representations, and that was what was interesting to psychologists, and eventually, to AI people.
Quite a few years later, in the early 1990s, Yoshua Bengio rediscovered the same kind of network but at a time where computers were faster. Yoshua was applying it to language, so he would take real text, taking a few words as context, and then try and predict the next word. He showed that the neural network was pretty good at that and that it would discover these distributed representations of words. It made a big impact because the backpropagation algorithm could learn representations and you didn’t have to put them in by hand. People like Yann LeCun had been doing that in computer vision for a while. He was showing that backpropagation would learn good filters for processing visual input in order to make good decisions, and that was a bit more obvious because we knew the brain did things like that. The fact that backpropagation would learn distributed representations that captured the meanings and the syntax of words was a big breakthrough.
MARTIN FORD: Is it correct to say that at that time using neural networks was still not a primary thrust of AI research? It’s only quite recently this has come to the forefront.
GEOFFREY HINTON: There’s some truth to that, but you also need to make a distinction between AI and machine learning on the one hand, and psychology on the other hand. Once backpropagation became popular in 1986, a lot of psychologists got interested in it, and they didn’t really lose their interest in it, they kept believing that it was an interesting algorithm, maybe not what the brain did, but an interesting way of developing representations. Occasionally, you see the idea that there were only a few people working on it, but that’s not true. In psychology, lots of people stayed interested in it. What happened in AI was that in the late 1980s, Yann LeCun got something impressive working for recognizing handwritten digits, and there were various other moderately impressive applications of backpropagation from things like speech recognition to predicting credit card fraud. However, the proponents of backpropagation thought it was going to do amazing things, and they probably did oversell it. It didn’t really live up to the expectations we had for it. We thought it was going to be amazing, but actually, it was just pretty good.
In the early 1990s, other machine learning methods on small datasets turned out to work better than backpropagation and required fewer things to be fiddled with to get them to work well. In particular, something called the support vector machine did better at recognizing handwritten digits than backpropagation, and handwritten digits had been a classic example of backpropagation doing something really well. Because of that, the machine learning community really lost interest in backpropagation. They decided that there was too much fiddling involved, it didn’t work well enough to be worth all that fiddling, and it was hopeless to think that just from the inputs and outputs you could learn multiple layers of hidden representations. Each layer would be a whole bunch of feature detectors that represent in a particular way.
The idea of backpropagation was that you’d learn lots of layers, and then you’d be able to do amazing things, but we had great difficulty learning more than a few layers, and we couldn’t do amazing things. The general consensus among statisticians and people in AI was that we were wishful thinkers. We thought that just from the inputs and outputs, you should be able to learn all these weights; and that was just unrealistic. You were going to have to wire in lots of knowledge to make anything work.
That was the view of people in computer vision until 2012. Most people in computer vision thought this stuff was crazy, even though Yann LeCun sometimes got systems working better than the best computer vision systems, they still thought this stuff was crazy, it wasn’t the right way to do vision. They even rejected papers by Yann, even though they worked better than the best computer vision systems on particular problems, because the referees thought it was the wrong way to do things. That’s a lovely example of scientists saying, “We’ve already decided what the answer has to look like, and anything that doesn’t look like the answer we believe in is of no interest.”
In the end, science won out, and two of my students won a big public competition, and they won it dramatically. They got almost half the error rate of the best computer vision systems, and they were using mainly techniques developed in Yann LeCun’s lab but mixed in with a few of our own techniques as well.
MARTIN FORD: This was the ImageNet competition?
GEOFFREY HINTON: Yes, and what happened then was what should happen in science. One method that people used to think of as complete nonsense had now worked much better than the method they believed in, and within two years, they all switched. So, for things like object classification, nobody would dream of trying to do it without using a neural network now.
MARTIN FORD: This was back in 2012, I believe. Was that the inflection point for deep learning?
GEOFFREY HINTON: For computer vision, that was the inflection point. For speech, the inflection point was a few years earlier. Two different graduate students at Toronto showed in 2009 that you could make a better speech recognizer using deep learning. They went as interns to IBM and Microsoft, and a third student took their system to Google. The basic system that they had built was developed further, and over the next few years, all these companies’ labs converted to doing speech recognition using neural nets. Initially, it was just using neural networks for the frontend of their system, but eventually, it was using neural nets for the whole system. Many of the best people in speech recognition had switched to believing in neural networks before 2012, but the big public impact was in 2012, when the vision community, almost overnight, got turned on its head and this crazy approach turned out to win.
MARTIN FORD: If you read the press now, you get the impression that neural networks and deep learning are equivalent to artificial intelligence—that it’s the whole field.
GEOFFREY HINTON: For most of my career, there was artificial intelligence, which meant the logic-based idea of making intelligent systems by putting in rules that allowed them to process symbol strings. People believed that’s what intelligence was, and that’s how they were going to make artificial intelligence. They thought intelligence consists of processing symbol strings according to rules, they just had to figure out what the symbol strings were and what the rules were, and that was AI. Then there was this other thing that wasn’t AI at all, and that was neural networks. It was an attempt to make intelligence by mimicking how the brain learns.
Notice that standard AI wasn’t particularly interested in learning. In the 1970s, they would always say that learning’s not the point. You have to figure out what the rules are and what the symbolic expressions they’re manipulating are, and we can worry about learning later. Why? Because the main point is reasoning. Until you’ve figured out how it does reasoning, there’s no point thinking about learning. The logic-based people were interested in symbolic reasoning, whereas the neural network-based people were interested in learning, perception, and motor control. They’re trying to solve different problems, and we believe that reasoning is something that evolutionarily comes very late in people, and it’s not the way to understand the basics of how the brain works. It’s built on top of something that’s designed for something else.
What’s happened now is that industry and government use “AI” to mean deep learning, and so you get some really paradoxical things. In Toronto, we’ve received a lot of money from the industry and government for setting up the Vector Institute, which does basic research into deep learning, but also helps the industry do deep learning better and educates people in deep learning. Of course, other people would like some of this money, and another university claimed they had more people doing AI than in Toronto and produced citation figures as evidence. That’s because they used classical AI. They used citations of conventional AI to say they should get some of this money for deep learning, and so this confusion in the meaning of AI is quite serious. It would be much better if we just didn’t use the term “AI.”
MARTIN FORD: Do you really think that AI should just be focused on neural networks and that everything else is irrelevant?
GEOFFREY HINTON: I think we should say that the general idea of AI is making intelligent systems that aren’t biological, they are artificial, and they can do clever things. Then there’s what AI came to mean over a long period, which is what’s sometimes called good old-fashioned AI: representing things using symbolic expressions. For most academics—at least, the older academics—that’s what AI means: that commitment to manipulating symbolic expressions as a way to achieve intelligence.
I think that old-fashioned notion of AI is just wrong. I think they’re making a very naive mistake. They believe that if you have symbols coming in and you have symbols coming out, then it must be symbols in-between all the way. What’s in-between is nothing like strings of symbols, it’s big vectors of neural activity. I think the basic premise of conventional AI is just wrong.
MARTIN FORD: You gave an interview toward the end of 2017 where you said that you were suspicious of the backpropagation algorithm and that it needed to be thrown out and we needed to start from scratch. (https://www.axios.com/artificial-intelligence-pioneer-says-we-need-to-start-over-1513305524-f619efbd-9db0-4947-a9b2-7a4c310a28fe.html) That created a lot of disturbance, so I wanted to ask what you meant by that.
Geoffrey Hinton: The problem was that the context of the conversation wasn’t properly reported. I was talking about trying to understand the brain, and I was raising the issue that backpropagation may not be the right way to understand the brain. We don’t know for sure, but there are some reasons now for believing that the brain might not use backpropagation. I said that if the brain doesn’t use backpropagation, then whatever the brain is using would be an interesting candidate for artificial systems. I didn’t at all mean that we should throw out backpropagation. Backpropagation is the mainstay of all the deep learning that works, and I don’t think we should get rid of it.
MARTIN FORD: Presumably, it could be refined going forward?
GEOFFREY HINTON: There’s going to be all sorts of ways of improving it, and there may well be other algorithms that are not backpropagation that also work, but I don’t think we should stop doing backpropagation. That would be crazy.
MARTIN FORD: How did you become interested in artificial intelligence? What was the path that took you to your focus on neural networks?
GEOFFREY HINTON: My story begins at high school, where I had a friend called Inman Harvey who was a very good mathematician who got interested in the idea that the brain might work like a hologram.
MARTIN FORD: A hologram being a three-dimensional representation?
GEOFFREY HINTON: Well, the important thing about a proper hologram is that if you take a hologram and you cut it in half, you do not get half the picture, but instead you get a fuzzy picture of the whole scene. In a hologram, information about the scene is distributed across the whole hologram, which is very different from what we’re used to. It’s very different from a photograph, where if you cut out a piece of a photograph you lose the information about what was in that piece of the photograph, it doesn’t just make the whole photograph go fuzzier.
Inman was interested in the idea that human memory might work like that, where an individual neuron is not responsible for storing an individual memory. He suggested that what’s happening in the brain is that you adjust the connection strengths between neurons across the whole brain to store each memory, and that it’s basically a distributed representation. At that time, holograms were an obvious example of distributed representation.
People misunderstand what’s meant by distributed representation, but what I think it means is you’re trying to represent some things—maybe concepts—and each concept is represented by activity in a whole bunch of neurons, and each neuron is involved in the representations of many different concepts. It’s very different from a one-to-one mapping between neurons and concepts. That was the first thing that got me interested in the brain. We were also interested in how brains might learn things by adjusting connection strengths, and so I’ve been interested in that basically the whole time.
MARTIN FORD: When you were at high school? Wow. So how did your thinking develop when you went to university?
GEOFFREY HINTON: One of the things I studied at university was physiology. I was excited by physiology because I wanted to learn how the brain worked. Toward the end of the course they told us how neurons send action potentials. There were experiments done on the giant squid axon, figuring out how an action potential propagated along an axon, and it turned out that was how the brain worked. It was rather disappointing to discover, however, that they didn’t have any kind of computational model of how things were represented or learned.
After that, I switched to psychology, thinking they would tell me how the brain worked, but this was at Cambridge, and at that time it was still recovering from behaviorism, so psychology was largely about rats in boxes. There was some cognitive psychology then but they were fairly non-computational, and I didn’t really get much sense that they were ever going to figure out how the brain worked.
During the psychology course, I did a project on child development. I was looking at children between the ages of two and five, and how the way that they attend to different perceptual properties changes as they develop. The idea is that when they’re very young, they’re mainly interested in color and texture, but as they get older, they become more interested in shape. I conducted an experiment where I would show the children three objects, of which one was the odd one out, for example, two yellow circles and a red circle. I trained the children to point at the odd one out, something that even very young children can learn to do.
I’d also train them on two yellow triangles and one yellow circle, and then they’d have to point at the circle because that was the odd one out on shape. Once they’d been trained on simple examples where there was a clear odd one out, I would then give them a test example like a yellow triangle, a yellow circle, and a red circle. The idea was that if they were more interested in color than shape, then the odd one out would be the red circle, but if they were more interested in shape than color, then the odd one out would be the yellow triangle. That was all well and good, and for a couple of children, they pointed out either the yellow triangle that was a different shape or the red circle that was a different color. I remember, though, that when I first did the test with one bright five-year-old, he pointed at the red circle, and he said, “You’ve painted that one the wrong color.”
The model that I was trying to corroborate was a very dumb, vague model that said, “when they’re little, children attend more to color and as they get bigger, they attend more to shape.” It’s an incredibly primitive model that doesn’t say how anything works, it’s just a slight change in emphasis from color to shape. Then, I was confronted by this kid who looks at them and says, “You’ve painted that one the wrong color.” Here’s an information processing system that has learned what the task is from the training examples, and because he thinks there should be an odd one out, he realizes there isn’t a single odd one out, and that I must have made a mistake, and the mistake was probably that I painted that one the wrong color.
Nothing in the model of children that I was testing allowed for that level of complexity at all. This was hugely more complex than any of the models in psychology. It was an information processing system that was smart and could figure out what was going on, and for me, that was the end of psychology. The models they had were hopelessly inadequate compared with the complexity of what they were dealing with.
MARTIN FORD: After leaving the field of psychology, how did you end up going into artificial intelligence?
GEOFFREY HINTON: Well, before I moved into the world of AI, I became a carpenter, and whilst I enjoyed it, I wasn’t an expert at it. During that time, I met a really good carpenter, and it was highly depressing, so because of that I went back to academia.
MARTIN FORD: Well, given the other path that opened up for you, it’s probably a good thing that you weren’t a great carpenter!
GEOFFREY HINTON: Following my attempt at carpentry, I worked as a research assistant on a psychology project trying to understand how language develops in very young children, and how it is influenced by social class. I was responsible for creating a questionnaire that would assess the attitude of the mother toward their child’s language development. I cycled out to a very poor suburb of Bristol, and I knocked on the door of the first mother I was due to talk to. She invited me in and gave me a cup of tea, and then I asked her my first question, which was: “What’s your attitude towards your child’s use of language?” She replied, “If he uses language, we hit him.” So that was pretty much it for my career as a social psychologist.
After that I went into AI and became a graduate student in artificial intelligence at The University of Edinburgh. My adviser was a very distinguished scientist called Christopher Longuet-Higgins who’d initially been a professor of chemistry at Cambridge and had then switched fields to artificial intelligence. He was very interested in how the brain might work—and in particular, studying things like holograms. He had realized that computer modeling was the way to understand the brain, and he was working on that, and that’s why I originally signed up with him. Unfortunately for me, about the same time that I signed up with him, he changed his mind. He decided that these neural models were not the way to understand intelligence, and the actual way to understand intelligence was to try and understand language.
It’s worth remembering that at the time, there were some impressive models—using symbol processing—of systems that could talk about arrangements of blocks. An American professor of computer science called Terry Winograd wrote a very nice thesis that showed how you could get a computer to understand some language and to answer questions, and it would actually follow commands. You could say to it, “put the block that’s in the blue box on top of the red cube,” and it would understand and do that. It was only in a simulation, but it would understand the sentence. That impressed Christopher Longuet-Higgins a lot, and he wanted me to work on that, but I wanted to keep working on neural networks.
Now, Christopher was a very honorable guy, but we completely disagreed on what I should do. I kept refusing to do what he said, but he kept me on anyway. I continued my work on neural networks, and eventually, I did a thesis on neural networks, though at the time, neural networks didn’t work very well and there was a consensus that they were just nonsense.
MARTIN FORD: When was this in relation to Marvin Minsky and Seymour Papert’s Perceptrons book?
GEOFFREY HINTON: This was in the early ‘70s, and Minsky and Papert’s book came out in the late ‘60s. Almost everybody in artificial intelligence thought that was the end of neural networks. They thought that trying to understand intelligence by studying neural networks was like trying to understand intelligence by studying transistors; it just wasn’t the way to do it. They thought intelligence was all about programs, and you had to understand what programs the brain was using.
These two paradigms were completely different, they aimed to try and solve different problems, and they used completely different methods and different kinds of mathematics. Back then, it wasn’t at all clear which was going to be the winning paradigm. It’s still not clear to some people today.
What was interesting, was that some of the people most associated with logic actually believed in the neural net paradigm. The biggest examples are John von Neumann and Alan Turing, who both thought that big networks of simulated neurons were a good way to study intelligence and figure out how those things work. However, the dominant approach in AI was symbol processing inspired by logic. In logic, you take symbol strings and alter them to arrive at new symbol strings, and people thought that must be how reasoning works.
They thought neural nets were far too low-level, and that they were all about implementation, just like how transistors are the implementation layer in a computer. They didn’t think you could understand intelligence by looking at how the brain is implemented, they thought you could only understand it by looking at intelligence in itself, and that’s what the conventional AI approach was.
I think it was disastrously wrong, something that we’re now seeing. The success of deep learning is showing that the neural net paradigm is actually far more successful than the logic-based paradigm, but back then in the 1970s, that was not what people thought.
MARTIN FORD: I’ve seen a lot of articles in the press suggesting deep learning is being overhyped, and this hype could lead to disappointment and then less investment, and so forth. I’ve even seen the phrase “AI Winter” being used. Is that a real fear? Is this potentially a dead end, or do you think that neural networks are the future of AI?
GEOFFREY HINTON: In the past when AI has been overhyped—including backpropagation in the 1980s—people were expecting it to do great things, and it didn’t actually do things as great as they hoped. Today, it’s already done great things, so it can’t possibly all be just hype. It’s how your cell phone recognizes speech, it’s how a computer can recognize things in photos, and it’s how Google does machine translation. Hype means you’re making big promises, and you’re not going to live up to them, but if you’ve already achieved them, that’s clearly not hype.
I occasionally see an advertisement on the web that says it’s going to be a 19.9 trillion-dollar industry. That seems like rather a big number, and that might be hype, but the idea that it’s a multi-billion-dollar industry clearly isn’t hype, because multiple people have put billions of dollars into it and it’s worked for them.
MARTIN FORD: Do you believe the best strategy going forward is to continue to invest exclusively in neural networks? Some people still believe in symbolic AI, and they think there’s potentially a need for a hybrid approach that incorporates both deep learning and more traditional approaches. Would you be open to that, or do you think the field should focus only on neural networks?
GEOFFREY HINTON: I think big vectors of neural activities interacting with each other is how the brain works, and it’s how AI is going to work. We should definitely try and figure out how the brain does reasoning, but I think that’s going to come fairly late compared with other things.
I don’t believe hybrid systems are the answer. Let’s use the car industry as an analogy. There are some good things about a petrol engine, like you can carry a lot of energy in a small tank, but there are also some really bad things about petrol engines. Then there are electric motors, which have a lot to be said in their favor compared with petrol engines. Some people in the car industry agreed that electrical engines were achieving progress and then said they’d make a hybrid system and use the electric motor to inject the petrol into the engine. That’s how people in conventional AI are thinking. They have to admit that deep learning is doing amazing things, and they want to use deep learning as a kind of low-level servant to provide them with what they need to make their symbolic reasoning work. It’s just an attempt to hang on to the view they already have, without really comprehending that they’re being swept away.
MARTIN FORD: Thinking more in terms of the future of the field, I know your latest project is something you’re calling Capsules, which I believe is inspired by the columns in the brain. Do you feel that it’s important to study the brain and be informed by that, and to incorporate those insights into what you’re doing with neural networks?
GEOFFREY HINTON: Capsules is a combination of half a dozen different ideas, and it’s complicated and speculative. So far, it’s had some small successes, but it’s not guaranteed to work. It’s probably too early to talk about that in detail, but yes, it is inspired by the brain.
When people talk about using neuroscience in neural networks, most people have a very naive idea of science. If you’re trying to understand the brain, there’s going to be some basic principles, and there’s going to be a whole lot of details. What we’re after is the basic principles, and we expect the details all to be very different if we use different kinds of hardware. The hardware we have in graphics processor units (GPUs) is very different from the hardware in the brain, and one might expect lots of differences, but we can still look for principles. An example of a principle is that most of the knowledge in your brain comes from learning, it doesn’t come from people telling you facts that you then store as facts.
With conventional AI, people thought that you have this big database of facts. You also have some rules of inference. If I want to give you some knowledge, what I do is simply express one of these facts in some language and then transplant it into your head, and now you have the knowledge. That’s completely different from what happens in neural networks: You have a whole lot of parameters in your head, that is weights of connections between neurons, and I have a whole lot of weights of connections between the neurons in my head, and there’s no way that you can give me your connection strengths. Anyway, they wouldn’t be any use to me because my neural network’s not exactly the same as yours. What you have to do is somehow convey information about how you are working so that I can work the same way, and you do that by giving me examples of inputs and outputs.
For example, if you look at a tweet from Donald Trump, it’s a big mistake to think that what Trump is doing is conveying facts. That’s not what he’s doing. What Trump is doing is saying that given a particular situation, here’s a way you might choose to respond. A Trump follower can then see the situation, they can see how Trump thinks they ought to respond, and they can learn to respond the same way as Trump. It’s not that some proposition is being conveyed from Trump to the follower, it’s that a way of reacting to things has been conveyed by example. That’s very different from a system that has a big store of facts, and you can copy facts from one system to another.
MARTIN FORD: Is it true that the vast majority of applications of deep learning rely heavily on labeled data, or what’s called supervised learning, and that we still need to solve unsupervised learning?
GEOFFREY HINTON: That’s not entirely true. There’s a lot of reliance on labeled data, but there are some subtleties in what counts as labeled data. For example, if I give you a big string of text and I ask you to try and predict the next word, then I’m using the next word as a label of what the right answer is, given the previous words. In that sense, it’s labeled, but I didn’t need an extra label over and above the data. If I give you an image and you want to recognize cats, then I need to give you a label “cat,” and the label “cat” is not part of the image. I’m having to create these extra labels, and that’s hard work.
If I’m just trying to predict what happens next, that’s supervised learning because what happens next acts as the label, but I don’t need to add extra labels. There’s this thing in between unlabeled data and labeled data, which is predicting what comes next.
MARTIN FORD: If you look at the way a child learns, though, it’s mostly wandering around the environment and learning in a very unsupervised way.
GEOFFREY HINTON: Going back to what I just said, the child is wandering around the environment trying to predict what happens next. Then when what happens next comes along, that event is labeled to tell it whether it got it right or not. The point is, with both those terms, “supervised” and “unsupervised,” it’s not clear how you apply them to predicting what happens next.
There’s a nice clear case of supervised learning, which is that I give you an image and I give you the label “cat,” then you have to say it’s a cat, then there’s a nice clear case of unsupervised learning, which is if I give you a bunch of images, and you have to build representations of what’s going on in the images. Finally, there’s something that doesn’t fall simply into either camp, which is if I give you a sequence of images and you have to predict the next image. It’s not clear in that case whether you should call that supervised learning or unsupervised learning, and that causes a lot of confusion.
MARTIN FORD: Would you view solving a general form of unsupervised learning as being one of the primary obstacles that needs to be overcome?
GEOFFREY HINTON: Yes. But in that sense, one form of unsupervised learning is predicting what happens next, and my point is that you can apply supervised learning algorithms to do that.
MARTIN FORD: What do you think about AGI, and how would you define that? I would take it to mean human-level artificial intelligence, namely an AI that can reason in a general way, like a human. Is that your definition, or would you say it’s something else?
GEOFFREY HINTON: I’m happy with that definition, but I think people have various assumptions of what the future’s going to look like. People think that we’re going to get individual AIs that get smarter and smarter, but I think there are two things wrong with that picture. One is that deep learning, or neural networks are going to get much better than us at some things, while they’re still quite a lot worse than us at other things. It’s not like they’re going to get uniformly better at everything. They’re going to be much better, for example, at interpreting medical images, while they’re still a whole lot worse at reasoning about them. In that sense, it’s not going to be uniform.
The second thing that’s wrong is that people always think about it as individual AIs, and they ignore the social aspect of it. Just for pure computational reasons, making very advanced intelligence is going to involve making communities of intelligent systems because a community can see much more data than an individual system. If it’s all a question of seeing a lot of data, then we’re going to have to distribute that data across lots of different intelligent systems and have them communicate with one another so that between them, as a community, they can learn from a huge amount of data meaning that in the future, the community aspect of it is going to be essential.
MARTIN FORD: Do you envision it as being an emergent property of connected intelligences on the internet?
GEOFFREY HINTON: No, it’s the same with people. The reason that you know most of what you know is not because you yourself extracted that information from data, it’s because other people, over many years, have extracted information from data. They then gave you training experiences that allowed you to get to the same understanding without having to do the raw extraction from data. I think it’ll be like that with artificial intelligence too.
MARTIN FORD: Do you think AGI, whether it’s an individual system or a group of systems that interact, is feasible?
GEOFFREY HINTON: Oh, yes. I mean OpenAI already has something that plays quite sophisticated computer games as a team.
MARTIN FORD: When do you think it might be feasible for an artificial intelligence, or a group of AIs that come together, to have the same reasoning, intelligence, and capability as a human being?
GEOFFREY HINTON: If you go for reasoning, I think that’s going to be one of the things we get really good at later on, but it’s going to be quite a long time before big neural networks are really as good as people at reasoning. That being said, they’ll be better at all sorts of other things before we get to that point.
MARTIN FORD: What about for a holistic AGI, though, where a computer system’s intelligence is as good as a person?
GEOFFREY HINTON: I think there’s a presupposition that the way AIs can develop is by making individuals that are general-purpose robots like you see on Star Trek. If your question is, “When are we going to get a Commander Data?”, then I don’t think that’s how things are going to develop. I don’t think we’re going to get single, general-purpose things like that. I also think, in terms of general reasoning capacity, it’s not going to happen for quite a long time.
MARTIN FORD: Think of it in terms of passing the Turing test, and not for five minutes but for two hours, so that you can have a wide-ranging conversation that’s as good as a human being. Is that feasible, whether it’s one system or some community of systems?
GEOFFREY HINTON: I think there’s a reasonable amount of probability that it will happen in somewhere between 10 and 100 years. I think there’s a very small probability, it’ll happen before the end of the next decade, and I think there’s also a big probability that humanity gets wiped out by other things before the next 100 years occurs.
MARTIN FORD: Do you mean through other existential threats like a nuclear war or a plague?
GEOFFREY HINTON: Yes, I think so. In other words, I think there are two existential threats that are much bigger than AI. One is global nuclear war, and the other is a disgruntled graduate student in a molecular biology lab making a virus that’s extremely contagious, extremely lethal, and has a very long incubation time. I think that’s what people should be worried about, not ultra-intelligent systems.
MARTIN FORD: Some people, such as Demis Hassabis at DeepMind, do believe that they can build the kind of system that you’re saying you don’t think is going to come into existence. How do you view that? Do you think that it is a futile task?
GEOFFREY HINTON: No, I view that as Demis and me having different predictions about the future.
MARTIN FORD: Let’s talk about the potential risks of AI. One particular challenge that I’ve written about is the potential impact on the job market and the economy. Do you think that all of this could cause a new Industrial Revolution and completely transform the job market? If so, is that something we need to worry about, or is that another thing that’s perhaps overhyped?
GEOFFREY HINTON: If you can dramatically increase productivity and make more goodies to go around, that should be a good thing. Whether or not it turns out to be a good thing depends entirely on the social system, and doesn’t depend at all on the technology. People are looking at the technology as if the technological advances are a problem. The problem is in the social systems, and whether we’re going to have a social system that shares fairly, or one that focuses all the improvement on the 1% and treats the rest of the people like dirt. That’s nothing to do with technology.
MARTIN FORD: That problem comes about, though, because a lot of jobs could be eliminated—in particular, jobs that are predictable and easily automated. One social response to that is a basic income, is that something that you agree with?
GEOFFREY HINTON: Yes, I think a basic income is a very sensible idea.
MARTIN FORD: Do you think, then, that policy responses are required to address this? Some people take a view that we should just let it play out, but that’s perhaps irresponsible.
GEOFFREY HINTON: I moved to Canada because it has a higher taxation rate and because I think taxes done right are good things. What governments ought to be is mechanisms put in place so that when people act in their own self-interest, it helps everybody. High taxation is one such mechanism: when people get rich, everybody else gets helped by the taxes. I certainly agree that there’s a lot of work to be done in making sure that AI benefits everybody.
MARTIN FORD: What about some of the other risks that you would associate with AI, such as weaponization?
GEOFFREY HINTON: Yes, I am concerned by some of the things that President Putin has said recently. I think people should be very active now in trying to get the international community to treat weapons that can kill people without a person in the loop the same way as they treat chemical warfare and weapons of mass destruction.
MARTIN FORD: Would you favor some kind of a moratorium on that type of research and development?
GEOFFREY HINTON: You’re not going to get a moratorium on that type of research, just as you haven’t had a moratorium on the development of nerve agents, but you do have international mechanisms in place that have stopped them being widely used.
MARTIN FORD: What about other risks, beyond the military weapon use? Are there other issues, like privacy and transparency?
GEOFFREY HINTON: I think using it to manipulate elections and to manipulate voters is worrying. Cambridge Analytica was set up by Bob Mercer who was a machine learning person, and you’ve seen that Cambridge Analytica did a lot of damage. We have to take that seriously.
MARTIN FORD: Do you think that there’s a place for regulation?
GEOFFREY HINTON: Yes, lots of regulation. It’s a very interesting issue, but I’m not an expert on it, so don’t have much to offer.
MARTIN FORD: What about the global arms race in general AI, do you think it’s important that one country doesn’t get too far ahead of the others?
GEOFFREY HINTON: What you’re talking about is global politics. For a long time, Britain was a dominant nation, and they didn’t behave very well, and then it was America, and they didn’t behave very well, and if it becomes the Chinese, I don’t expect them to behave very well.
MARTIN FORD: Should we have some form of industrial policy? Should the United States and other Western governments focus on AI and make it a national priority?
GEOFFREY HINTON: There are going to be huge technological developments, and countries would be crazy not to try and keep up with that, so obviously, I think there should be a lot of investment in it. That seems common sense to me.
MARTIN FORD: Overall, are you optimistic about all of this? Do you think that the rewards from AI are going to outweigh the downsides?
GEOFFREY HINTON: I hope the rewards will outweigh the downsides, but I don’t know whether they will, and that’s an issue of social systems, not with the technology.
MARTIN FORD: There’s an enormous talent shortage in AI and everyone’s hiring. Is there any advice you would give to a young person that wants to get into this field, anything that might help attract more people and enable them to become expert in AI and in deep learning, that you can offer?
GEOFFREY HINTON: I’m worried that there may not be enough people who are critical of the basics. The idea of Capsules is to say, maybe some of the basic ways we’re doing things aren’t the best way of doing things, and we should throw a wider net. We should think about alternatives to some of the very basic assumptions we’re making. The one piece of advice I give people is that if you have intuitions that what people are doing is wrong and that there could be something better, you should follow your intuitions.
You’re quite likely to be wrong, but unless people follow the intuitions when they have them about how to change things radically, we’re going to get stuck. One worry is that I think the most fertile source of genuinely new ideas is graduate students being well advised in a university. They have the freedom to come up with genuinely new ideas, and they learn enough so that they’re not just repeating history, and we need to preserve that. People doing a master’s degree and then going straight into the industry aren’t going to come up with radically new ideas. I think you need to sit and think for a few years.
MARTIN FORD: There seems to be a hub of deep learning coalescing in Canada. Is that just random, or is there something special about Canada that helped with that?
GEOFFREY HINTON: The Canadian Institute for Advanced Research (CIFAR) provided funding for basic research in high-risk areas, and that was very important. There’s also a lot of good luck in that both Yann LeCun, who was briefly my postdoc, and Yoshua Bengio were also in Canada. The three of us could form a collaboration that was very fruitful, and the Canadian Institute for Advanced Research funded that collaboration. This was at a time when all of us would have been a bit isolated in a fairly hostile environment—the environment for deep learning was fairly hostile until quite recently—it was very helpful to have this funding that allowed us to spend quite a lot of time with each other in small meetings, where we could really share unpublished ideas.
MARTIN FORD: So, it was a strategic investment on the part of the Canadian government to keep deep learning alive?
GEOFFREY HINTON: Yes. Basically, the Canadian government is significantly investing in advanced deep learning by spending half a million dollars a year, which is pretty efficient for something that’s going to turn into a multi-billion-dollar industry.
MARTIN FORD: Speaking of Canadians, do you have any interaction with your fellow faculty member, Jordan Peterson? It seems like there’s all kinds of disruption coming out of the University of Toronto...
GEOFFREY HINTON: Ha! Well, all I’ll say about that is that he’s someone who doesn’t know when to keep his mouth shut.
GEOFFREY HINTON received his undergraduate degree from Kings College, Cambridge and his PhD in Artificial Intelligence from the University of Edinburgh in 1978. After five years as a faculty member at Carnegie-Mellon University, he became a fellow of the Canadian Institute for Advanced Research and moved to the Department of Computer Science at the University of Toronto where he is now an Emeritus Distinguished Professor. He is also a Vice President & Engineering Fellow at Google and Chief Scientific Adviser of the Vector Institute for Artificial Intelligence.
Geoff was one of the researchers who introduced the backpropagation algorithm and the first to use backpropagation for learning word embeddings. His other contributions to neural network research include Boltzmann machines, distributed representations, time-delay neural nets, mixtures of experts, variational learning and deep learning. His research group in Toronto made seminal breakthroughs in deep learning that revolutionized speech recognition and object classification.
Geoff is a fellow of the UK Royal Society, a foreign member of the US National Academy of Engineering and a foreign member of the American Academy of Arts and Sciences. His awards include the David E. Rumelhart prize, the IJCAI award for research excellence, the Killam prize for Engineering, the IEEE Frank Rosenblatt medal, the IEEE James Clerk Maxwell Gold medal, the NEC C&C award, the BBVA award, and the NSERC Herzberg Gold Medal, which is Canada’s top award in science and engineering.”