Building a world-class data-science team

Let’s begin with the origins and end with practical tools.

Data science isn’t about special people in special places. It’s about teams.

We’ve all witnessed the wave of innovations that has washed over business models of late. These innovations didn’t surface as the ideas of individuals. The architecture of businesses, business interactions, data collections, and the use of information is so complex that a single individual in a mid- or large-size company wouldn’t have the knowledge to understand all elements required to make the idea a practical reality.

Also, it’s long been proven that heterogeneity enhances group brainstorming. More diverse groups produce better ideas. This concept is especially important when we’re designing data-science teams.

A part of the whole

You’ve probably been told you need to hire one of two individuals. The first is an astute data developer with a grounded understanding of Python, SQL and data storage, PostgreSQL, Unix and Linux command-line knowledge (mainly to run and schedule cron jobs); Python data libraries (Pandas, Scrapy, Keras, Matplotlib, TensorFlow, Bokeh, Scikit-learn, etc.); Flask, Bottle, and Django to host the analysis of the database as a RESTful API, AWS, or Azure-hosting framework; and, of course, AngularJS for presentation results and DS.js to create data visualizations.

If, for some reason, you botch the hiring of the astute data developer, you only have one other alternative—to hire a data academic. This is a theorist who pontificates about changing the world with data but whose experience rarely ventures outside the educational setting and has few practical applications. The data academic understands core statistics, categorical data analysis, applying statistics with R (multiple linear regressions, qualitative predictors, linear discriminant analysis, resampling methods like k-fold cross-validation, hyperplanes, hierarchical clustering), sequential data models (Markov models, hidden Markov models, linear dynamical systems), Bayesian model averaging, and machine-learning probabilistic theory. You hope some of this learning is connected to causality.

Are these two roles important for a data-science team? Of course. If you, by chance, hire both these roles, do you have a data-science team? No, you do not.

Let’s begin with the origins of data science and, from there, we’ll lead into the critical capabilities required to build a world-class data-science team.

From there to here

The foundation of data science originated with five key areas:

  1. Computer science: the study of computation and information
  2. Data technology: data generated by humans and machines
  3. Visualizations: graphical representation of information and data
  4. Statistics: methodologies to gather, review, analyze, and draw conclusions from data
  5. Mathematics: the science of the logic of shape, quantity, and arrangement

Computer science evolved from Turing machines to cybernetics and information theory by the 1900s. Tree-based methods and graph algorithms surfaced in the 1960s. By the 1970s, computer programming and text or string searches popped up. Data mining, data classification, and similar methods pushed us into the early 2000s.

Data technology began before the 1800s with binary logic and Boolean algebra with punch cards. IBM introduced the first computers in the 1940s as DBMS matured. Removable disks with relational DBMS followed into the 1960s. By the mid-1970s desktops, SQL, and objective-oriented programming was the norm. In early 2001, statistical modeling started to emerge, balancing the stochastic data model by using algorithmic models and treating data mechanisms as unknowns.

Visualizations arose prior to the 1800s with cartography and astronomical mapping of charts. Line and bar charts came out in the 1800s, and statistical graphics were depicted by the mid-1800s. The box plot was created in the 1970s, and word or tag clouds started to form in 1992.

Statistics entered the 1800s with theories of correlation, probability, and Bayes Theorem. In the 1900s the concept of regression, times series and least-squares made the rounds. The 1900s introduced the foundation of modern statistics with the hypothesis and design of experiments. By the mid-1960s, we had Bayesian methods, stochastic methods, and more complex time-series methods such as survival analysis and grouping time-series data. Through the 1980s, more developments occurred in Markov simulation and computational statistics, allowing us to better understand the interface between statistics and computer science. By the late 1990s, decision science, pattern recognition, and machine learning were starting to take shape.

Mathematics entered the 1800s with calculus and logarithms. Next, Newton-Raphson introduced optimization methods. By the 1930s, the military had started to adopt theories for manufacturing and communications. The 1960s were booming with networks, automation, scheduling, and assignment problems, which have only matured in recent years.

Understanding the origins of data science helps demystify it and allows you to develop a concrete capability in your company.

Data-science capabilities

Finding success with data science comes down to four factors: people, data, tools, and security.

The most important elements of your data-science team are the people and the capabilities they enable. Next, to get insights—even with the best people—we ultimately need data and access to data. Usually, data is siloed across teams, departments, and systems, making gaining access difficult. Assuming we have the people and access to the data, next, we need tools. Performing analytics necessitates computational and data-storage resources. Fortunately, today we have many open-source options that are more than adequate. Lastly, data security and privacy protection are crucial as data becomes more centralized. With this convenience comes access—which, in the wrong hands, creates risk.

With this understanding of the origins of data science, it’s fascinating to see the mix of conventional capabilities aligned with the less traditional data-science skills that are required for success. Let’s cover examples of data-science capabilities and complementary data-science team skills that are found within world-class data-science teams.

Data-science capabilities

Data-science team skills

  • Stakeholder management: business-relationships management, project management
  • Storytelling ability: executive presence, presentation skills
  • Business communications: clear and timely communication, governance
  • Consulting: need analysis, solutions aligned to goals
  • Problem-solving: lean six-sigma, agile
  • Topical analytics techniques: statistics, root-cause analysis, statistical-process control, value-stream mapping, flows
  • Domain expertise: knowledge of the data, who’s using it and for what purpose
  • Business analysis: experience evaluating and modeling business cases

The ultimate success of a data-science team depends on how well expectations are managed. When expectations are met, the data-science team will be viewed as impactful. Inversely, a weak perception of delivery is a significant reason why data-science teams eventually get disbanded—they focus on what’s cool, not what’s most impactful for the business.

The hidden art of storytelling

It’s idealistic to believe data-science teams can find value in data from day 1, but, eventually, they’ll connect data to new insights. However, often that data is layered across hundreds or thousands of sources, and the team might be months or years away from collecting it all. Most data-science teams begin with a simple set of questions. These questions are challenging but tangible to answer. This approach also limits the data set required to be integrated into an initial proof-of-concept. Sample questions might include some of the following:

  • Which applications in our portfolio have the most significant security risk?
  • Why is the Durham, NC location the most profitable?
  • What type of patient visit will be the costliest next quarter?
  • Is antibody A or antibody B more likely to achieve FDA approval?
  • Which drone should we bring in first for preventive maintenance?

Building a world-class data-science capability isn’t about individuals; it’s about assembling your team. It’s crucial to ensure that essential data-science capabilities and data-science skills are part of your team design. To tap into the power of data science, we require teams to not only extract insights from data but also tell a compelling story. Quite often, we’re left with a lot of data, confusing insights, and no story. Make sure that the team you build can tell a story.

Previous articlePeter Nichol Recognized by BRM Institute for 2020 Global Community Impact
Next articleHow to hire a data science leader
Peter is a technology executive with over 20 years of experience, dedicated to driving innovation, digital transformation, leadership, and data in business. He helps organizations connect strategy to execution to maximize company performance. He has been recognized for Digital Innovation by CIO 100, MIT Sloan, Computerworld, and the Project Management Institute. As Managing Director at OROCA Innovations, Peter leads the CXO advisory services practice, driving digital strategies. Peter was honored as an MIT Sloan CIO Leadership Award Finalist in 2015 and is a regular contributor to CIO.com on innovation. Peter has led businesses through complex changes, including the adoption of data-first approaches for portfolio management, lean six sigma for operational excellence, departmental transformations, process improvements, maximizing team performance, designing new IT operating models, digitizing platforms, leading large-scale mission-critical technology deployments, product management, agile methodologies, and building high-performance teams. As Chief Information Officer, Peter was responsible for Connecticut’s Health Insurance Exchange’s (HIX) industry-leading digital platform transforming consumerism and retail-oriented services for the health insurance industry. Peter championed the Connecticut marketplace digital implementation with a transformational cloud-based SaaS platform and mobile application recognized as a 2014 PMI Project of the Year Award finalist, CIO 100, and awards for best digital services, API, and platform. He also received a lifetime achievement award for leadership and digital transformation, honored as a 2016 Computerworld Premier 100 IT Leader. Peter is the author of Learning Intelligence: Expand Thinking. Absorb Alternative. Unlock Possibilities (2017), which Marshall Goldsmith, author of the New York Times No. 1 bestseller Triggers, calls "a must-read for any leader wanting to compete in the innovation-powered landscape of today." Peter also authored The Power of Blockchain for Healthcare: How Blockchain Will Ignite The Future of Healthcare (2017), the first book to explore the vast opportunities for blockchain to transform the patient experience. Peter has a B.S. in C.I.S from Bentley University and an MBA from Quinnipiac University, where he graduated Summa Cum Laude. He earned his PMP® in 2001 and is a certified Six Sigma Master Black Belt, Masters in Business Relationship Management (MBRM) and Certified Scrum Master. As a Commercial Rated Aviation Pilot and Master Scuba Diver, Peter understands first hand, how to anticipate change and lead boldly.