Building a data science team is not as simple as hiring a database administrator and a few data analysts. You want to democratize your data — you want the organization’s data and the tools for analyzing it in the hands of everyone in the organization. You want your entire organization to think about your data in creative and interesting ways and put the newly acquired information and insights into action.
Yet, your organization should have a small data science team that’s focused exclusively on extracting knowledge and insights from the organization’s data. Approach data science as a team endeavor — small groups of people with different backgrounds experimenting with the organization’s data to extract knowledge and insights.
Keep the team small (three to five members, max). You need to fill the following three positions:
In the following sections, I describe these roles in greater detail.
Note: When building a data science team, you’re essentially breaking down the role of data scientist into three separate positions. Finding a single individual who knows the business, understands the data, is familiar with analytical tools and techniques, and is an effective project manager is often an insurmountable challenge. Creating a team enables you to distribute the workload while ensuring that the data is examined from different perspectives.
The research lead has three areas of responsibility:
The research lead should be someone from the business side — someone who knows the industry in which the business operates, the business itself, and the unique intelligence needs of the business. He or she must recognize the role that the data science team plays in supporting the organization’s strategic initiatives and enabling data-driven decision-making at all levels.
A good research lead is curious, skeptical, and innovative. Specialized training is not required. In fact, a child could fill this role. For example, Edward Land invented the Polaroid instant camera to answer an interesting question asked by his three-year-old daughter. When they were on vacation in New Mexico, after he took a picture with a conventional camera, his daughter asked, “Why do we have to wait for the picture?”
Asking compelling, sometimes obvious, questions sounds easy, but it’s not. Such questions only seem easy and obvious after someone else asks them.
Of course, asking compelling questions is something everyone in your organization should be doing. Certainly everyone on the data science team should be involved in the process. However, having one person in charge of questions provides the team with some direction.
Maintaining separation between the people asking the questions and the people looking for possible answers is also beneficial. Otherwise, you’re likely to encounter a conflict of interest; for example, if the people in charge of answering questions are working with a small data set, they may be inclined to limit the scope of their questions to the available data. A research lead, on the other hand, is more likely to think outside that box and ask questions that can’t be answered with the current data. Such questions would challenge the team to capture other data or procure data from a third-party provider.
Your data science team should have one to three data analysts to work with the research lead to answer questions, discover solutions to problems, and use data in creative ways to support the organization’s operations and strategy. Responsibilities of a data analyst include the following:
Note: The data analyst on the team should be familiar with software development. Many of the best data visualization tools require some software coding.
The primary purpose of a project manager is to protect the data science team from increasing demands placed on it from the rest of the organization. For example, I once worked for an organization that had a very creative data science team. They were coming up with new and interesting ways to use the company’s vast credit card data. During the first few months, the data science team was mostly left alone to explore the data. As their insights became more interesting, the rest of the organization became more curious. Departments started calling on team members to give presentations. These meetings increased interest across the organization, which led to even more meetings. After a few months, some people on the data science team were in meetings for up to twenty hours a week! They shifted roles from analysts to presenters.
As a result, the team spent much less time analyzing data. The same departments who were requesting the meetings started asking why output from the data science team was dwindling.
An effective product manager serves as a shield to protect the team from too many meetings and as a bulldozer to break down barriers to the data. In this role, the project manager has the following responsibilities:
Working together, the research lead, analysts, and project manager function as a well-oiled machine — asking and answering questions, uncovering solutions to problems, developing creative ways to use the organization’s data to further its competitive strategy, and working with other groups and individuals throughout the organization to implement data-driven changes.