The Missing Aspect of Data Science

Hello there, my name’s Gary Short. If you’re a follower of Black Marble news, you’ll already know that I’ve recently joined the company as Head of Data Science, with the task of creating a flourishing Data Science Practice within the company.

And that is all I plan to say about that. 

With the traditional “first post introduction” out of the way, I’d like to spend the rest of this post talking about something far more interesting… data science, and in particular an aspect of data science that I don’t hear a lot of people talking about, and that’s critical thinking. Head over to Wikipedia (no not now, after you’ve finished reading this!) and look up the definition of data science and you see it’s defined as the intersection of a lot of Cool StuffTM

Whilst all of these are valid, and I agree that they are all part of what makes a good data scientist, I do believe that this definition is missing the key aspect of critical thinking. So what is critical thinking? Well, if we pop over to Wikipedia again (not yet! What’s wrong with you people?), we can see that critical thinking is defined thusly:

Different sources define critical thinking variously as:

  • "reasonable reflective thinking focused on deciding what to believe or do"[2]
  • "the intellectually disciplined process of actively and skillfully conceptualizing, applying, analyzing, synthesizing, or evaluating information gathered from, or generated by, observation, experience, reflection, reasoning, or communication, as a guide to belief and action"[4]
  • "purposeful, self-regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based"[5]
  • "includes a commitment to using reason in the formulation of our beliefs"[6]

Umm, yeah, well that’s clear then, isn’t it? Well no, not really. When I see something defined like this, and by “like this” I mean having several definitions, I immediately decide that the reason that there isn’t one clear definition is that the definition is context sensitive, that is, it very much depends on the point of view of the observer. Since we’re going to be focusing on critical thinking from the point of view of the data scientist, I don’t think it helps us to layer on yet another definition. Instead, it will be much more productive if we just jump in and look at critical thinking by example instead of definition. So, for the remainder of this post, and in a number of future posts, we’ll do just that.

For us data scientists, critical thinking can can best be thought of as the process of examining the significance and meaning of the claims made by the results of our statistical analysis. That’s all well and good, but what exactly is a claim? Well, much like we programmers are used to dealing with claims based authentication, where a “claim” is made about a person or an organisation and a system must check the veracity of such a claim; in data science, a claim may arise from the output of an analysis, and we must ascertain the veracity of such a claim.

Let’s take a recent example from the news, here we’re told that a “Joint study shows extent of Scottish under-employment”, and that..

The extent of under-employment in Scotland has been revealed in a study published by the Scottish government. An analysis jointly prepared with the STUC shows more than 250,000 workers want to work longer hours. That is a rise of 80,000 on 2008, before the downturn got under way. That makes 256,000 people, or more than 10% of the entire workforce.

Critical thinking will teach us to examine each of the claims in this article and to ask questions about them. So let’s do that now, by way of example:

Firstly, the article claims “The extent of under-employment in Scotland has been revealed”. Has it? The article doesn’t say how the figures were obtained, so we can’t draw our own conclusions with regard to sample bias.

The article claims that “250, 000 workers want to work longer hours”. Do these people actually want to work longer hours, or do they want to earn more money for the full time work they do? Do these people want to work longer hours, or do they want to move from part time work into full time work?

After we have examined such explicit claims, we can also examine the implicit claims of the article. An item such as this, on the BBC news site, comes with an implied claim that there is no bias, just a straight reporting of facts. We can apply critical thinking here too, and ask ourselves such questions as, do the Scottish Government and the STUC gain from maximising or minimising the reported number? If they do, is there evidence, in the article or elsewhere, that they have done such a thing?

The article quotes from the research, without linking to it, and from a spokesman from one of the report’s authors, but there is no balancing quote from the opposition, so we should ask ourselves, are these figures undisputed and therefore no balancing quote is required? If not, is the journalist showing bias here by excluding an opposing view or did the supporters of an opposing view decline to comment? If the former, does this journalist have a history of doing so, or is this a one off? If the latter, who holds the opposing view, and were they given enough time to formulate a response? The answers to these questions will go a long way in helping us to contextualise the output of this analysis and to give it the appropriate weight in our decision making process.

As you can see, critical thinking is a key aspect of data science, the correct application of which will allow us to take full advantage of the output of our analysis and will allow us to interpret it properly for our end audience, or for our own benefit when consuming other data scientist’s output. In future blog posts we’ll delve further into this fascinating aspect of data science and I hope you’ll join me for those posts, until then, keep crunching those numbers!