You may be wondering why machine learning took so long to catch on. After all, Arthur Samuel created his revolutionary checkers program in 1959. Machine learning was poised to become the dominant form of artificial intelligence. It had the wind at its back.
What happened is that machine learning took a backseat to other innovations such as the symbolic approach. Not until the late 1980s and early 1990s did researchers start thinking again about machine learning.
The rise and fall and rise again of machine learning is both sad and interesting. It shows how few researchers were instrumental in building out the field.
In 1958 Cornell professor Frank Rosenblatt created an early version of an artificial neural network. Instead of using nodes and neurons, he used perceptrons and tied them together to create a complex form of machine intelligence.
Rosenblatt thought that these perceptrons were the most promising path to AI. He created a machine called the Mark 1 Perceptron. It tied together thousands of these perceptrons into a neural network. It had small cameras and was designed to learn how to tell the difference between two images. Unfortunately, it took thousands of tries, and even then the Mark I had a hard time distinguishing even basic images.
While Rosenblatt was working on his Mark I Perceptron, MIT professor Marvin Minsky was pushing hard for a symbolic approach. Minsky and Rosenblatt debated passionately about which was the best approach to AI. The debates were almost like family arguments. They had attended the same high school and knew each other for decades.
In 1969 Minsky co-authored a book called Perceptrons: An Introduction to Computational Geometry. In it he argued decisively against Rosenblatt’s perceptrons approach to AI, proving that it failed to include some simple logical functions. Sadly, a few years after the book was published, Rosenblatt died in a boating accident. Without Rosenblatt to defend perceptrons, much of the funding for this approach dried up.
Minsky later dedicated the work to his one-time rival, but it was too late. Perceptrons and artificial neural networks languished for nearly a decade.
One reason that Rosenblatt’s Mark 1 Perceptron fell short is that it did not include a hidden layer — a key component that enables artificial neural networks to solve more challenging problems.
Without a hidden layer Rosenblatt’s perceptron was limited to solving linear problems. There had to be a straight line from problem to solution. Using a straight line, a machine can classify two groups by drawing a straight line between the two groups; for example with dogs on one side of the line and cats on the other.
A hidden layer enables the network to work on nonlinear problems. So if you wanted to determine different breeds of dogs you could have each layer breakdown the problem into different outputs. The first layer could look at the nose. The second layer could look at the eyes. Each of these layers could break down your dog into different probabilities that it belongs to a certain breed.
One of the main challenges with this multilayer neural network was that it was difficult to get each layer to teach what it learned to the next layer. In the mid-1980s Carnegie Mellon professor Geoff Hinton showed how multi-layered neurons could be trained efficiently. He added in a new way to train each hidden layer so it could accumulate more knowledge as it passed through the network.
This addition enabled his artificial neural network to tackle much more complicated challenges. However, these early artificial neural networks continued to struggle; they were slow, having to review a problem several times before becoming “smart” enough to solve it.
Later, in the 1990s, Hinton started working in a new field called deep learning — an approach that includes many more hidden layers between the input and output layers. The added layers provide the artificial neural network a greater capacity to learn. The pioneers of deep learning also developed new ways to facilitate learning, such as backpropagation, which enables the nodes to spread their knowledge more rapidly.
Deep learning networks also use clustering to help identify patterns. Clustering enables the network to create categories and then sort the new information into these categories. For example, suppose you wanted to use a deep learning network to distinguish cats from other animals. You could load a million photos of various animals into the network, and the network would cluster them into groups of photos that showed animals with similar characteristics. Then, each time you loaded a photo into the network, it would add the photo to the relevant group or discard it as not being a photo of an animal.
It can do this without actually knowing anything about cats. In fact it won’t even understand the label “cats” until a human gives them this information. Instead it will just group images with similar characteristics. There might be a grouping of pixels that looks like whiskers. Or even a grouping of pixels that looks like a tale. When the network sees these groupings it will cluster them together. Then a human might come in and label this cluster as “cats.”