Dark logo

I have worked with several organizations over the years helping them implement machine learning, often after failed attempts to do so on their own. It is no surprise that the organizations that succeed generally do everything right and those that fail often do so as a result of making common mistakes. In this post, I present machine learning dos and don’ts to increase your chances of achieving a successful launch of your machine learning initiative.

Do Start by Asking Relevant, Compelling Questions

Before you even start to introduce machine learning to your organization, you need to find a way to connect your organization’s business needs to machine-learning technology. Otherwise, you are likely to put a program in place and assemble a team with the requisite technical expertise only to find them spending all their time just playing around with the technology.

To avoid this common mistake, take the following steps:

  1. Educate C-level executives and all managers on the topic of machine learning with an emphasis on use cases that are relevant to the industry in which your organization operates. Case studies are a great way to get leadership thinking about practical applications of machine learning in the organization.
  2. Schedule regular meetings, encouraging participants to bring to the meeting relevant and compelling questions they are struggling to answer, problems they are trying to solve, or business insights that would make the organization more competitive. 
  3. Create a list of problems, questions, and desired insights; prioritize the items on the list; and then consider which technology would be the most effective for addressing each item. Flag items on the list which are likely to benefit from machine learning.
  4. Create another list of processes or procedures that may benefit from machine-learning-driven automation — tasks such as preventing unauthorized access to the organization’s information system or weeding out unqualified job applicants.

Keep in mind that the best technology isn't necessarily machine learning. Your organization may be able to answer most questions and solve most problems and gain valuable insights with the use of a data warehouse and good business intelligence (BI) software. It may not need a dedicated machine-learning team.

Don't Mix Training Data with Test Data

Machine learning often involves supervised learning — feeding the machine labeled data, so the machine can learn the connection between the labels and the data inputs. A common mistake is to mix some of the training data into the test data, which is often tempting when the availability of relevant data is limited. To avoid this mistake, before you engage in supervised learning, create two separate data sets:

If, after training the machine, you mix some of your training data in with your test data, you won't have a clear picture of how well the machine performed on the test. It would be like giving students a sheet of paper with some of the test questions and their correct answers just before they take the test. The test results wouldn't accurately represent what they had learned or where they were struggling.

Do Know Your Algorithms and Functions

Algorithms and functions are the engines that drive machine learning, and data is the fuel. Although the machine does the learning and ultimately creates the model that the computer follows to perform the desired task, it is up to you to construct a “brain” that enables the learning process. The building blocks you have to work with are algorithms and functions:

When you are building a machine that can learn, you need to be familiar with a wide variety of algorithms and functions, so you will know which ones to choose and how to arrange them.

Do Manage Expectations

After training a machine, you may be tempted to show it off — to demonstrate that your model can actually do something useful and perhaps even amazing. You collect your test data and schedule a presentation to demonstrate the power and precision of your new machine-learning model.

Whoa! This irrational exuberance can end in disaster, maybe not during the presentation but afterward, when someone in your audience uses the model and it misses the mark.

You can avoid the potential embarrassment by running your new model on test data first, so the machine can adjust the model, if necessary, to improve its accuracy. Several rounds of testing (with different test data) and adjustments may be required before your model is ready for prime time.

Of course, there are other pitfalls that you would be wise to avoid when starting out with machine learning, but by steering clear of the common pitfalls covered in this post, you will be far ahead of the game!

In my previous article, Neural Network Backpropagation, I explained the basics of how an artificial neural network learns. The cost function calculates how wrong the network's answer is, and then backpropagation goes back through the various layers of the neural network changing weights and biases to adjust the strengths of the connections between neurons and the calculations within certain neurons. I compare the process to that of turning the dials on a complex sound system to optimize the quality of sound being emitted by the speakers.

In that article, I focus mostly on how fine tuning a neural network with back propagation improves its accuracy — to arrive at the correct answer. Although coming up with the correct answer is certainly the top priority, backpropagation must also adjust the weights and biases to reduce the output that’s driving the wrong answers:

Learning as an Iterative Process

Keep in mind that the artificial neural network makes adjustments for every item in the training data set. In my previous post, I used an example of an artificial neural network for identifying dog breeds. To properly train the network to identify ten different dog breeds, the initial training data set would need at least one picture of a dog of each breed: German shepherd, Labrador retriever, Rottweiler, beagle, bulldog, golden retriever, Great Dane, poodle, Doberman, and dachshund. 

This is a very small dataset, but what is important is that it contains at least one picture of each breed. If you fed the network only the picture of the beagle, you would end up with a neural network that classifies every image of a dog as a beagle. You need to feed the network more pictures — pictures of dogs of different breeds and pictures of different dogs of the same breed.

You would follow up the initial training with a test data set — a separate collection of at least one picture of a dog of each breed. During this test, you would feed a picture of a dog to the neural network without telling it the dog's breed and then check its answer. If the answer was wrong, you would correct the neural network, and it would use backpropagation to make additional adjustments. You would then feed it the next picture, and continue the process, working through the entire set of test data.

Even after testing, the neural network continues to learn as you feed it larger volumes of data. With every picture you feed into the network, it uses backpropagation to make tiny adjustments to weights and biases, fine-tuning its ability to distinguish between different breeds.

Overall Improvement

While the example from my previous post focused on improving the network's ability to identify a picture of a beagle, you want the network to achieve the same degree of accuracy for each and every breed. To achieve this feat, the neural network needs to make some trade-offs. 

As you feed the network a diversity of pictures, it becomes a little less accurate in identifying beagles, so that it can do a better job identifying pictures of Labrador retrievers, Rottweilers, German shepherds, poodles, and so on. Your network tries to find the optimal weights and biases to minimize the cost regardless of the breed shown in the picture. The settings may not be the best for any one dog, but having well-balanced settings enables the network to make fewer mistakes, resulting in more accurate classification among different breeds of dogs overall.

The cost function, gradient descent, and backpropagation all work together to make this magic happen.

Creating a Model

Although machine learning is a continuous process of optimizing accuracy, the goal of training and testing a neural network is to create a model — the product of the entire machine learning process that can be used to make predictions. Creating a model is actually a joint effort of humans and machines. A human being builds the artificial neural network and establishes hyperparameters to set the stage for the learning process; the machine then takes over, adjusting parameters, including weights and biases, to develop the desired skill.

As you begin to build your own neural networks, keep in mind that the process often involves some experimentation. Your first attempts may involve small experiments to test outcomes followed by adjustments to hyperparameters. In a sense, both you and the machine are involved in a learning process.

In my previous article, The Neural Network Cost Function, I describe the cost function and highlight the essential role it plays in machine learning. With the cost function, the machine pays a price for every mistake it makes. This provides the machine with a sort of incentive or motivation to learn; the machine's goal is to minimize the cost by becoming increasingly accurate.

Unfortunately, the cost function tells the network only how wrong it is; it doesn't provide a way for the network to become less wrong. This is where machine learning gradient descent comes into play. Gradient descent is an optimization algorithm that minimizes the cost by repeatedly and gradually moving the output in the direction opposite of that in which the slope of the gradient line increases, as shown here. 

During the learning process, the neural network adjusts the weights of the connections between neurons, giving input from some neurons more or less emphasis than inputs from other neurons, as shown below. This is how the machine learns. With gradient descent, the neural network adjusts the initial weights a tiny bit at a time in the direction opposite of the steepest incline. The neural network performs this adjustment iteratively, continually pushing the weight down the slope toward the point at which it can no longer be moved downhill. This point is called the local minimum and is the point at which the machine pays the lowest cost for errors because it has achieved optimum accuracy.

For example, suppose you are building a machine that can look at a picture of a dog and tell what breed it is. You would place a cost function at the output layer that would signal all the nodes in the hidden layer telling them how wrong the output was. The nodes in the hidden layer would then use gradient descent to move their outputs in the direction opposite of the steepest incline in an attempt to minimize the cost.

As the nodes make adjustments, they monitor the cost function to see whether the cost has increased or decreased and by how much, so they can determine whether the adjustments were beneficial or not. During this process, the neural network is learning from its mistakes. As the machine becomes more accurate and more confident in its output, the overall cost is diminished.

For example, suppose the neural network's output layer has 5 neurons representing five dog breeds — German shepherd, Doberman, poodle, beagle, and dachshund. The output neuron for the Doberman indicates a 40% probability the picture is of a Doberman; the German shepherd neuron is 35% sure it's a German shepherd; the poodle neuron is 25% sure it's a poodle; and the beagle and dachshund neurons each indicate a certainty of 15% the picture is of one of their breeds.

You already decided that you want the machine to be 90% certain in its analysis, so these numbers are not very good.

To improve the machine's accuracy, you can combine the cost function with gradient descent. With the cost function, the machine calculates the difference between each wrong answer and each correct answer and then averages them. So let’s say it was a picture of a Doberman. That means you want to nudge the network in a few places:

Then you want to average of all your nudges to get an overall sense of how accurate your network is at finding different dog breeds:

(+0.60 – 0.35 – 0.25 – 0.15 – 0.15)/5 = –0.55/5 = –0.11 

But remember this is just one training example. The machine repeats this process on numerous pictures of dogs of different breeds:

(0.01 – 0.6 – 0.32 + 0.16 – 0.25)/5 = –0.04/5 = –0.02

(0.7 – 0.3 + 0.12 – 0.05 – 0.12)/5 = 0.35/5 = 0.07

With each iteration, the neural network calculates the cost and adjusts the weights moving the network closer and closer to zero cost — the point at which point the network has achieved optimum accuracy and you are confident in its output.

As you can see, the cost function and gradient descent make a powerful combination in machine learning, not only telling the machine when it has made a mistake and how far off it was, but also providing guidance on which direction to tune the network to increase the accuracy of its output.

Machines often learn the same way humans do — by making mistakes and paying the price for doing so. For example, when you’re first learning to drive, you merge onto the highway and are driving 55 mph in a 65 mph zone. Other drivers are beeping at you, passing you on the left and right, giving you dirty looks, and making rude gestures. You get the message and start driving the speed limit. Cars are still passing you on the left and right, and their drivers appear to be annoyed. You start driving 75 mph to blend in with the traffic. You are rewarded by feeling the excitement of driving faster and by reaching your destination more quickly. Soon, you are so comfortable driving 75 mph that you start driving 80 mph. One day, you hear a siren, and you see a state trooper’s car close behind you with a flashing red light. You get pulled over and issued a ticket for $200, so you slow it down and now routinely drive about 5 to 9 mph over the speed limit.

During this entire scenario, you learn through a process of trial and error by paying for your mistakes. You pay by being embarrassed for driving too slowly or you pay by getting pulled over and issued a warning or ticket or by getting into or causing an accident. You also learn by being rewarded, but since this article is about the cost function, I won’t get into that.

Machine Learning with the Cost Function

With machine learning, your goal is to make your machine as accurate as possible — whether the machine’s purpose is to make predictions, identify patterns in medical images, or drive a car. One way to improve accuracy in machine learning is to use a neural network cost function — a mathematical operation that compares the network’s output (the predicted answer) to the targeted output (the correct answer) to determine the accuracy of the machine. 

In other words, the cost function tells the network how wrong it was, so the network can make adjustments to be less wrong (and more right) in the future. As a result, the network pays for its mistakes and learns by trial and error. The cost is higher when the network is making bad or sloppy classifications or predictions — typically early in its training phase. 

What Does the Machine Learn?

Machines learn different lessons depending on the model. In a simple linear regression model, the machine learns the relationship between an independent variable and a dependent variable; for example, the relationship between the size of a home and its cost. With linear regression, the relationship can be graphed as a straight line, as shown in the figure.

During the learning process, the machine can adjust the model in several ways. It can move the line up or down, left or right, or change the line’s slope, so that it more accurately represents the relationship between home size and square footage. The resulting model is what the machine learns. It can then use this model to predict the cost of a home when provided with the home’s size.

Cost Function Limitation

The cost function has one major limitation — it does not tell the machine what to adjust, by how much, or in which direction. It only indicates the accuracy of the output. For the machine to be able to make the necessary adjustments, the cost function must be combined with another function that provides the necessary guidance, such as gradient descent, which just happens to be the subject of my next post.

Artificial neural networks learn through a combination of functions, weights, and biases. Each neuron receives weighted inputs from the outside world or from other neurons, adds bias to the sum of the weighted inputs, and then executes a function on the total to produce an output. During the learning process, the neural network weights are assigned randomly across the entire network to increase its overall accuracy in performing its task, such as deciding how likely a certain credit card transaction is fraudulent.

Imagine weights and biases as dials on a sound system. Just as you can turn the dials to control the volume, balance, and tone to produce the desired sound quality, the machine can adjust its dials (weights and biases) to fine-tune its accuracy. (For more about functions, weights, and bias, see my previous article, Functions, Weights, and Bias in Artificial Neural Networks.)

Setting Random Weights and Biases

When you’re setting up an artificial neural network, you have to start somewhere. You could start by cranking the dials all the way up or all the way down, but then you would have too much symmetry in the network, making it more difficult for the network to learn. Specifically, if neighboring nodes in the hidden layers of the neural network are connected to the same inputs and those connections have identical weights, the learning algorithm is unable to adjust the weights, and the model will be stuck — no learning will occur.

Instead, you want to assign different values to the weights — typically small values, close to zero but not zero. (By default, the bias in each neuron is set to zero. The network can dial up the bias during the learning process and then dial it up or down to make additional adjustments.)

In the absence of any prior knowledge, a plausible solution is to assign totally random values to the weights. Techniques for generating random values include the following:

For now just think of random values as unrelated weights between zero and one but closer to zero. What’s important is that these random values provide a starting point that enables the network to adjust weights up and down to improve the artificial neural network’s accuracy. The network can also make adjustments by dialing the bias within each neuron up or down.

The Difference between Deterministic and Non-Deterministic Algorithms

For an artificial neural network to learn, it requires a machine learning algorithm — a process or set of procedures that enables the machine to create a model that can process the data input in a way that achieves the network’s desired objective. Algorithms come in two types:

As a rule of thumb, use deterministic algorithms to solve problems with concrete answers, such as determining which route is shortest in a GPS program. Use non-deterministic algorithms when an approximate answer is good enough and too much processing power and time would be required for the computer to arrive at a more accurate answer or solution.

An artificial neural network uses a non-deterministic algorithm, so the network can experiment with different approaches and then adjust accordingly to optimize its accuracy.

What Happens During the Learning Process?

Suppose you are training an artificial neural network to distinguish among different dog breeds. As you feed your training data (pictures of dogs and label of breeds) into the network, it adjusts the weights and biases to identify a relationship between each picture and label (dog breed), and it begins to distinguish between different breeds. Early in training, it may be a little unsure whether the dog in a certain picture is one breed or another. It may indicate that it’s 40% sure it’s a beagle, 30% sure it’s a dachshund, 20% sure it’s a Doberman, and 10% sure it’s a German shepherd.

Suppose it is a dachshund. You correct the machine, it adjusts the weights and biases, and tries again. This time, the machine indicates that it’s 80% sure it’s a dachshund, and 20% sure it’s a beagle. You tell the machine it is correct, and no further adjustment is needed. (Of course, the machine may need to make further adjustments later if it makes another mistake.)

The good news is that during the machine learning process, the artificial neural network does most of the heavy lifting. It turns the dials up and down to make the necessary adjustments. You just need to make sure that you give it a good starting point by assigning random weights and that you continue to feed it relevant input to enable it to make further adjustments.

Data science, artificial intelligence (AI), and machine learning (ML) are very complex fields. Amidst this complexity, it is easy to lose sight of the fundamental challenges to executing a data science initiative. In this article, I take a step back to focus less on the inner workings of AI and ML and more on the artificial intelligence challenges that often lead to mistakes and failed attempts at weaving data science into an organization's fabric. In the process, I explain how to overcome these key challenges.

Embrace Data Science

The term "data science" is often misinterpreted. People tend to place too much emphasis on "data" and too little on "science." It is important to realize that data science is rooted in science. It is, or at least should be, exploratory. As you begin a data science program, place data science methodology at the forefront:

  1. Observe. Examine your existing data to identify any problems with the data (such as missing data, irrelevant or outdated data, and erroneous data) and to develop a deeper understanding of the data you have. 
  2. Ask interesting questions related to business goals, objectives, or outcomes. Nurture a culture of curiosity in your organization. Encourage personnel at all levels to ask questions and challenge long-held beliefs.
  3. Gather relevant data. Your organization may not have all the data it needs to answer certain questions or solve specific problems. Develop ways to capture the needed data or acquire it from external source(s).
  4. Prepare your data. Data may need to be loaded into your data warehouse or data lake, cleaned, and aggregated prior to analysis.
  5. Develop your model. This is where AI and ML come into play. Your model will extract valuable insights from the data.
  6. Evaluate and adjust the model as necessary. You may need to experiment with multiple models or versions of a model to find out what works best.
  7. Deploy the model and repeat the process. Deliver the model to the people in your organization who will use it to inform their decisions, then head back to Step 1 to continue the data science process.

Get Large Volumes of Relevant Data

Even the most basic artificial neural networks require large volumes of relevant data to enable learning. While human beings often learn from one or two exposures to new data or experiences, modern neural networks are far less efficient. They may require hundreds or thousands of relevant inputs to fine-tune the parameters (weights and biases) to the degree at which the network's performance is acceptable.

To overcome this limitation, AI experts have developed a new type of artificial neural network called a capsule network — a compact group of neurons that can extract more learning from smaller data sets. As of this writing, these networks are still very much in the experimental phase for most organizations.

Until capsule networks prove themselves or some other innovation enables neural networks to learn from smaller data sets, plan on needing a lot of high-quality, relevant data.

If you are lacking the data you need, consider obtaining data from external sources. Free data sources include government databases, such as the US Census Bureau database and the CIA World Factbook; medical databases, such as Healthdata.gov, NHS health, and the Social Care Information Centre; Amazon Web Services public datasets; Google Public Data Explorer; Google Finance; the National Climatic Data Center; The New York Times; and university data centers. Many organizations that collect data, including Acxiom, IRI, and Nielsen, make their data available for purchase. As long as you can figure out which data will be helpful, you can usually find a source.

Separate Training and Test Data

There are two approaches to machine learning — supervised and unsupervised learning. With supervised learning, you need two data sets — a training data set and a testing data set. The training data set contains inputs and labels. For example, you feed the network a picture of an elephant and tell it, "This is an elephant." Then, you feed it a picture of a giraffe and tell it, "This is a giraffe." After training, you switch to the testing data set, which contains unlabeled inputs. For example, you feed the network a picture of an elephant, and the network tells you, "It's an elephant." If the network makes a mistake, you feed it the correct answer, and it makes adjustments to improve its accuracy.

Sometimes when a data science team is unable to acquire the volume of data it needs to train its artificial neural network, the team mixes some of its training data with its test data. This workaround is a big no-no; it is the equivalent of giving students a test and providing them with the answers. In such a case, the test results would be a poor reflection of the students' knowledge. In the same way, an artificial neural network relies on quality testing to sharpen its skills.

The moral of this story is this: Don’t mix test data with training data. Keep them separate.

Carefully Choose Training and Test Data

When choosing training and test data for machine learning, select data that is representative of the task that the machine will ultimately be required to perform. If the training or test data is too easy, for example, the machine will struggle later with more challenging tasks. Imagine teaching students to multiply. Suppose you teach them multiplication tables up to 12 x 12 and then put problems on the test such as 35 x 84. They’re not going to perform very well. In the same way, training and test data should be as challenging as what the machine will ultimately be required to handle.

Also, avoid the common mistake of introducing bias when selecting data. For example, if you’re developing a model to predict how people will vote in a national election and you feed the machine training data that contains voting data only from conservative, older men living in Wyoming your model will do a poor job of predicting the outcome.

Don't Assume Machine Learning Is the Best Tool for the Job

Machine learning is a powerful tool, but it’s not always the best tool for answering a question or solving a problem. Here are a couple other options that may lead to better, faster outcomes depending on the nature of the question or problem:

As you introduce data science, artificial intelligence, and machine learning to your organization, remain aware of the key challenges you face, and avoid getting too wrapped up in the technologies and toolkits. Focus on areas that contribute far more to success, such as asking interesting questions and using your human brain to approach problems logically. Artificial intelligence and machine learning are powerful tools. Master the tools; do not let them master you.

Artificial intelligence and organizations are not always a great fit. While many organizations use artificial intelligence to answer specific questions and solve specific problems, they often overlook its potential as a tool for exploration and innovation — to look for patterns in data that they probably would not have noticed on their own. In these organizations, the focus is on supervised learning — training machines to recognize associations between inputs and labels or between independent variables and the dependent variable they influence. These organizations spend less time, if they spend any time at all, on unsupervised learning — feeding an artificial neural network large volumes of data to find out what the machine discovers in that data.

Observe and Question

With supervised learning, data scientists are primarily engaged in a form of programming, but instead of writing specific instructions in computer code, they develop algorithms that enable machines to learn how to perform specific tasks on their own — after a period of training and testing. Many data science teams today focus almost exclusively on toolkits and languages at the expense of data science methodology and governance.

Data science encompasses much more than merely training machines to perform specific tasks. To achieve the full potential of data science, organizations should place the emphasis on science and apply the scientific method to their data:

  1. Observe
  2. Question
  3. Research
  4. Hypothesize
  5. Experiment
  6. Test
  7. Draw conclusions
  8. Report

Note that the first step in the scientific method is to observe. This step is often overlooked by data science teams. They start using the data to drive their supervised machine learning projects before they fully understand that data.

A better approach is exploratory data analysis (EDA) — an approach to analyzing data sets that involves summarizing their main characteristics, typically through data visualizations. The purpose of EDA is to find out what the data reveals before conducting any formal modeling or testing or hypothesis about the data.

Unsupervised learning is an excellent tool for conducting EDA, because it can analyze volumes of data far beyond the capabilities of what humans can analyze, it looks at the data objectively, and it provides a unique perspective on that data often revealing insights that data science team members would never have thought to look for.

Note that the second step in the scientific method is to question. Unfortunately, many organizations disregard this step, usually because they have a deeply ingrained control culture — an environment in which leadership makes decisions and employees implement those decisions. Such organizations would be wise to change from a control culture to a culture of curiosity — one in which personnel on all levels of the organization ask interesting questions and challenge long-held beliefs.

Nurturing a Culture of Curiosity

People are naturally curious, but in some organizations, employees are discouraged from asking questions or challenging long-held beliefs. In organizations such as these, changing the culture is the first and most challenging step toward taking an exploratory approach to artificial intelligence, but it is a crucial first step. After all, without compelling questions, your organization will not be able to reap the benefits of the business insights and innovations necessary to remain competitive.

In one of my previous posts Asking Data Science Questions, I present a couple ways to encourage personnel to start asking interesting questions:

Another way to encourage curiosity is to reward personnel for asking interesting questions and, more importantly, avoid discouraging them from doing so. Simply providing public recognition to an employee who asked a question that led to a valuable insight is often enough to encourage that employee and others to keep asking great questions.

The takeaway here is that you should avoid the temptation to look at artificial intelligence as just another project. You don’t want your data science teams merely producing reports on customer engagement, for example. You want them to also look for patterns in data that might point the way to innovative new ideas or to problems you weren’t aware of and would never think to look for.

In my previous article Artificial Neural Networks Regression and Classification, I introduced the three types of problems that machine learning is generally used to solve:

In that article, I focus on solving classification and regression problems. In this article, I turn my attention to neural network clustering problems — problems that can be solved by identifying common patterns among inputs.

Clustering has numerous applications in a wide variety of fields. Here are a few examples of how clustering may be used:

Recognizing the Limitations of Supervised Learning

Unlike classification and regression problems, which employ supervised learning, clustering problems rely on unsupervised learning. With supervised learning, you have clearly labeled data or categories that you are trying to match inputs to. For example, you may want to classify homes by price or classify transactions as fraudulent or honest.

Unfortunately, supervised learning is not always an option. For example, if you do not have clearly labeled data or know the categories into which you want to sort the data inputs, you cannot engage your artificial neural network in supervised learning. In other applications, you may not be interested in classifying your data into categories created by humans; instead, you want to see how your neural network clusters the data to call your attention to patterns you may never have thought to look for.

In such cases, unsupervised learning is the better choice. With unsupervised learning, you let the neural network cluster your data into different groups.

Considering Business Clustering Problems

One of the more interesting applications of clustering is its use by large retailers to decide whom to invite to their loyalty programs or when to offer promotions. With unsupervised learning, the machine may identify three clusters of customers — loyal, somewhat loyal, and not loyal. (The not loyal customers always buy from whichever retailer offers the lowest price.) Knowing these clusters, the large retailers create strategies to try and elevate somewhat loyal customers to loyal customers. Or they could invite their loyal customers to participate in special promotions.

Other companies use clustering to decide where to place new stores. For example, a seller of athletic footwear may feed demographic and sales data into an artificial neural network to find locations that have the highest concentration of active runners or locations where customers allocate a higher percentage of their income to outdoor recreation.

Choosing the Right Approach for the Problem You Are Trying to Solve

When you decide to use machine learning to solve a problem, what is most important is that you choose the right approach for the problem you are trying to solve. Classification is great when you know what you are looking for and can teach the machine the relationship between inputs and labels or between independent variables and a dependent variable. Clustering is a more powerful tool for gaining insight — for seeing things in a different way, a way you may never have considered or when you have a high volume of unlabeled data you want to analyze. After all, there is much more unlabeled (unstructured) data available than there is labeled (structured) data.

When you’re trying to decide which approach to take — classification, regression, or clustering — first ask yourself what problem you’re trying to solve or what question you need to answer. Then ask yourself whether the problem or question is something that can best be addressed with classification, regression, or clustering. Finally, ask yourself whether the data you have is labeled or unlabeled. By answering these questions, you should have a clearer idea of which approach to take: classification or regression (with supervised learning) or clustering (with unsupervised learning).

Unlike human beings who often learn for the intrinsic value of knowing something, machine-learning is almost always purpose-driven. Your job as the machine's developer is to determine what that purpose is before you start development. With neural network regression and classification, you then need to decide on the capability that will serve that purpose best. Ask yourself, “Am I looking at a classification problem, a regression problem, or a clustering problem?” Those are the three things artificial neural networks do best: classification, regression, and clustering. Here’s how you choose:

1. Classification is best when you need to assign inputs to known (labeled) categories. There are two types of classification:

2. Regression is best when you need to predict a continuous response value — a variable that can take on any value between its minimum and maximum value; for example, if you need a system that can predict the value of a home based on certain criteria, such as square footage, location, number of bedrooms and bathrooms, and so on.

3. Clustering is the right choice when you want to identify patterns in the data and have no idea what those patterns may be; for example, if you want to identify patterns among loyal, somewhat loyal, and un-loyal customers.

 In this article, you gain a deeper understanding of classification and regression. In my next article, I focus on clustering problems. But first, let's take a look at how the approach to machine learning differs based on the type of problem you are trying to solve.

Supervised Versus Unsupervised Learning

Classification and regression problems involve supervised learning — using training data, to teach the machine (the artificial neural network) how to associate inputs with outputs. For example, you may feed the machine a picture of a cat and tell it, "This is a cat." You feed it a picture of a dog and tell it, "This is a dog." Then, you feed the machine test data; for example, a picture of a cat without telling the machine what the animal in the picture is, and the machine should be able to tell you it's a cat. If the machine gives the incorrect answer, you correct it, and the machine makes adjustments to improve its accuracy.

Clustering problems are in the realm of unsupervised learning. You feed the machine data inputs without labels, and the machine identifies common patterns among the inputs without labeling those patterns. 

For more about supervised and unsupervised learning, see my previous post Supervised Versus Unsupervised Learning.

Solving Classification Problems

Classification is one of the most common ways to use an artificial neural network. For example, credit card companies use classification to detect and prevent fraudulent transactions. The human trainer will feed the machine an example of a fraudulent transaction and tell the machine, "This is fraud." The trainer then feeds the machine an example of an honest transaction and tells the machine, "This is not fraud." As the trainer feeds more and more labeled data into the machine, it learns the patterns in the data that distinguish fraudulent transactions from honest transactions.

The machine may be set up with three output nodes (one for each class). If a transaction is highly characteristic of fraud, the Fraud neuron fires to cancel the transaction and suspend the card. If a transaction is less characteristic of fraud, the Maybe Fraud neuron fires to notify the cardholder of suspicious activity. If the transaction is even less characteristic of fraud, the Not Fraud neuron fires and the transaction is processed. 

Solving Regression Problems

In regression problems, the machine tries to come up with an approximation based on the data input. During the training session, instead of showing the machine how inputs are connected to labels, you show the machine the connection between a known outcome and the variables that impact that outcome. For example, the amount of time it takes to drive home from work varies depending on weather conditions, traffic conditions, and the time of day, as shown below.

A stock price predictor would be another example of machine learning used to solve a regression problem. The stock price would be the dependent variable and would be driven by a host of independent variables, including earnings, profits, future estimated earnings, a change of management, accounting errors or scandals, and so forth.

One way to look at the difference between classification and regression is that with classification the output requires a class label, whereas with regression the output is an approximation or likelihood.

In my next article, I examine an entirely different type of problem — those that can be solved not by classification or regression but by clustering.

In a previous article What Is Machine Learning? I define machine learning as "the science of getting computers to perform tasks they weren't specifically programmed to do." So what is Deep Learning? Deep learning is a subset of machine learning (ML), which is a subset of artificial intelligence (AI):

A Brief History Lesson

In 1958 Cornell professor Frank Rosenblatt created an early version of an artificial neural network composed of interconnected perceptrons. Like the nodes in modern artificial neural networks, a perceptron takes in binary inputs and performs a calculation on those inputs to produce an output, as presented below. Note that with a perceptron both the inputs and outputs are binary — for example, zero/one, on/off, in/out.

Rosenblatt's machine, the Mark I Perceptron, had small cameras and was designed to learn how to tell the difference between two images. Unfortunately, it took thousands of tries, and even then the Mark I had difficulty distinguishing even basic images. In other words, the Mark I Perceptron wasn't a very good student. It could not develop a skill that is relatively easy for humans to learn.

The Mark I Perceptron had a couple flaws — it had only one layer of perceptrons, and the perceptrons were equipped with binary functions. As a result, this artificial neural network could solve only linear problems and had no easy and effective way to adjust the strength of the connections between neurons, which is required for learning to take place.

These problems were solved primarily by the introduction of hidden layers in the mid-1980s by Carnegie Mellon professor Geoff Hinton and by replacing binary functions with the sigmoid function, which increased the variation in outputs while limiting those variations between zero and one.

These additions enabled the artificial neural network to tackle much more complicated challenges. However, these early artificial neural networks continued to struggle; they were slow, having to review a problem several times before becoming "smart" enough to solve it. 

Later, in the 1990s, Hinton started working in a new field called deep learning — an approach that added many more hidden layers between the input and output layers of the neural network.

A Black Box

The hidden layers of a hidden network function like a black box, swirling together computation and data to find answers and solutions. No human knows how the network arrives at its decision. For example, in 2012, Google’s DeepMind project wanted to see how a deep learning neural network might perceive video data. Developers fed 10 million random images from YouTube videos into a network that had over 1 billion neural connections running on 16,000 processors. They didn’t label any of the data. So the network didn’t know what it meant to be a cat, a human being, or a car. Instead, the network just looked through the images and came up with its own clusters. 

It found that many of the videos contained a very similar cluster. To the network this cluster looked like this.

Now as a human being you might recognize this as the face of a cat, but to the neural network this was just a very common pattern that it recognized in many of the videos. In a sense it invented its own interpretation of a cat. After performing this exercise, the network was able to identify a cat in an image 74.8% of the time.

While it is certainly intriguing to see an artificial neural network recognize objects without ever being trained to do so, the real mystery is how the network accomplishes such a feat. We know that the machine adjusts strengths of the connections between neurons, but we cannot describe the "thought processes" in a way that supports any of the conclusions the machine draws.

The black box nature of hidden layers is important to keep in mind when designing artificial neural networks, because you may be "flying blind" when you're developing your initial design. Success depends a great deal on taking an empirical approach — trying different arrangements of neurons, starting with different weights and biases, trying different activation functions, and then looking at the results and making adjustments.

9450 SW Gemini Drive #32865
Beaverton, Oregon, 97008-7105
Dark logo
© 2022 Doug Enterprises, LLC All Rights Reserved
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram