In a previous post, “Building a Top-Notch Data Science Team,” I recommend creating a small team of three to five individuals consisting of at least one research lead, a data analyst, and a project manager. The research lead is primarily responsible for asking compelling and relevant questions; the data analyst gathers and analyzes the data to answer those questions; and the project manager is mostly in charge of gaining access to the data the analyst needs and communicating the team’s findings to stakeholders in the organization.
However, even if you follow my advice and build the A-Team of data science teams, you have no guarantee that your team will deliver the goods. People make mistakes, and certain dysfunctional group dynamics can compromise the team’s output. In this post, I describe several common data science team pitfalls and offer guidance on how to avoid them.
Reaching Consensus Too Quickly
In most organizations, people naturally try to reach consensus, but in data science, reaching consensus too soon is usually a sign that everyone on the data science team shares a common misunderstanding. You want your team to explore new ideas, uncover problems, and discover or create new opportunities. Everyone on the data science team should be comfortable arguing about how to interpret the data. Meetings should be more like uncomfortable family dinners than church services. You want the team to talk, explore, and even annoy one another. This type of exchange is much more likely to uncover new ideas.
To keep your team from quickly reaching consensus too quickly, take the following precautions:
The flip side of reaching consensus too quickly is wandering— spending too much time seeking answers to the wrong questions. In many ways, this is a much more difficult challenge than consensus or groupthink. You don’t want to stifle innovation by focusing on delivery, but the team must deliver the knowledge and insights the organization needs.
To avoid this trap, take the following precautions:
I once worked on a project for a large home improvement retailer that was trying to determine whether its customers were homeowners or professional remodelers. The research lead asked some very interesting questions. What items are professionals more likely to purchase? Are there times when a professional is more likely to shop? Are professionals more likely to make large purchases?
They produced dozens of narrowly defined reports every few weeks with small, uninteresting conclusions, which totally ignored the big question — were the company’s customers homeowners or professional remodelers? It was as though they were looking through the keyhole of a glass door.
Starting with Untested Assumptions
New business ventures often fail because the business plan was based on untested assumptions — unrealistically high demand, unforeseen competition, failure to account for certain costs, and so on. Data science projects can fail for the same reasons — for example, assuming the data available is complete and accurate, accepting an opinion as fact or an estimate as accurate, or assuming that the analysis results obtained today will reflect future conditions.
To avoid this pitfall, take the following precautions:
Remember, accepting a statement as fact is tempting and easy. Challenging and testing a statement requires some effort. Everyone on the data science team should have a skeptical mind. Don’t accept any statement as fact unless you have solid data to back it up.
Measuring the Wrong Thing
One common untested assumption people commonly make is that certain metrics are related. For example, a company may track the average amount of time per day a user spends on a social site and assume that more time spent equals greater user satisfaction. However, more time spent may be an indication of user dissatisfaction, perhaps due to the site being difficult to navigate. Likewise, a developer of an anti-spam utility may look at the total number of incoming email messages blocked as a positive sign that the utility is working well when, in fact, the utility is wrongly identifying acceptable email messages as spam.
To avoid this pitfall, question your assumptions, as explained in the previous section. Don’t assume that certain metrics reflect one another. Ask yourself, “What else could possibly be the cause for a change in the measurement?”
Be Curious and Skeptical
These are just a few of the many pitfalls that data science teams may be susceptible to. You can avoid most pitfalls by engaging in a persistent cycle of discovery. Be curious and skeptical. Look at the data honestly. Teams generally get into trouble when they start making assumptions and using the data to support those assumptions instead of using it to test those assumptions. As long as everyone on the team is dedicated to pursuing the truth, it should have no trouble staying on track.