Have you ever noticed purported victims often tend to refer to the number of human casualties, while the perceived perpetrators often only talk about the number of people deceased? For the importance of context in numbers this is just one example of the validation of data analysis. Data has always been a valuable part of  journalism. What is, for instance, a sports report without data? Or imagine criminal news reports without data. Indeed: incomprehensible. Although data is synergized in journalism for many years now, data journalism is becoming more popular, and certainly more important. News these days consists of multiple sources, tweets from victims, eyewitnesses, vox pop, and so on. But - good news for data journalists - this is exactly why data journalism is so important. Gathering information, filtering data and visualizing what is happening beyond what the eye can see, increases in relevance. In today’s global economy, everything is invisibly connected from the sandwich you eat and the Coca-Cola you buy to other people and you. The language of this network is data: small pieces of information, often irrelevant on their own, but when you take a step back you see that all the interconnections form one big structure of data readily accessible to your advantage. But, how do you know your data analysis is valid? 

"Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: 'There are three kinds of lies: lies, damned lies, and statistics.” – Mark Twain


Extraordinary evidence: Correlating data

Numbers itself do not have a meaning without context: 95% is an important number when it comes to an agreement for customer choice of a new sandwich, but it could be a big issue when 95% relates to the functioning of an airplane engine. Interpreting results is, therefore, a crucial aspect. However, as a result of poor interpretation of data or other misconceptions of statistics, there are many examples with extraordinary results. Take for instance Andrew Wakefield. In “The Lancet” he published an article on vaccinations as possible cause of autism. During his press conference he claimed vaccinations indeed cause autism, leading to the media getting in on the story publishing screaming headlines in newspapers throughout the United Kingdom. However, as a result of a too small dataset, the beginning of a myth was born. The heedless duplication of the results of Wakefields’ study by journalists all over Britain - probably just following up what the competition already wrote - had grave ramifications. This could easily have been prevented, should the journalists have been more critical in their assessment of the study and the following press conference with Wakefield himself. Another study, in contrast, concluded that praying patients with a bloodstream infection can go home significantly earlier from the hospital. Small detail: the praying has been done four years later. Because the researchers used a huge dataset, it is, however, not really strange they have found this correlation; split data into two groups and you will find something that correlates significantly if you do not use compelling selection criteria to tie your interpretation to a strong preset framework. And these are just two of the numerous examples where the validation of data analysis goes wrong…



Check, check, double check

As the examples show, the validation of data analysis is indispensable. Data collected from multiple sources need to be analysed with great care. The nature or environment of another project might differ from your needs. All assumptions, including its strengths and weaknesses, and term definitions ought to be fully understood in order to interpret and use the results for your story. One unmeasured factor may cause a complete change in effect. The importance of double checking the completeness of data should, therefore, not be forgotten. The real value of data lies in a valid analysis in order to make it applicable. Having an empirical and logical train of thought remains important when you evaluate the results of data analysis conducted by a computer program. This often goes wrong, resulting in dire consequences for a valid story. You might wonder what is key in transforming data into something meaningful - something everyone can understand and relate to. How do you measure what you want to measure? Well, just keep reading.


Turning numbers into stories

For you - as a data journalist - it is important you develop two key skills: analysing and writing. You need to know how to interpret (existing) data, and be critical in the examination of the context as a result of your interpretation. It is, therefore, the process of turning numbers into stories the right way, in which the 5 W's never have been more important. Keep in mind the rendition of the 5 W’s below is directly related to the topic of this blog and require a broader perspective and application when implemented in a story.  To facilitate your needs, directly beneath you find the 5 W’s in a straightforward summary so you can print them, place them above your bed and dream about it so you learn them by hard:

       Who: This is actually the most important W. Where did the data come from? Is this source reliable? Keep in mind you cannot blindly trust the accuracy of supplied data. Therefore, try to find more information about the source in question and check the transparency of the source.

       What: This refers to what you are trying to say in your story. Think about which points you will get across. Keep in mind that your job is to bridge the gap between raw data and the target audience. For yourself it is important you analyse all data in order to get a clear overview of what it stands for. Additionally, do not come with results or numbers that do not fit your topic properly. In this case less is more - only publish valid results and give them context.

       When: How old is your data? Contemplate their relevance. A dataset regarding Internet behavior in the 90’s is no longer relevant or applicable nowadays. Everything has its expiring date. 

       Where: A key part of data journalism is the ability to ‘mash up’ different datasets to create a new story.

       Why: This is the hardest question. Data journalism is not always best suited at correlating diverse datasets to produce a cause and effect analysis.



Measurement as a drug in the business world

These days everything is about data, analytics and measurements. Statements such as “If you cannot measure it, you cannot manage it” show the pervasiveness of the scientific method in everyday live and ordinary business processes. This way of putting things with evermore increasing emphasis on the value of measured research shows eerie similarities with religion. Although journalism is not just about measurements, data analysis has become a big part of it. And, of course, we cannot put a metric next to everything nor can we measure quality in only one way. Through the scientific method everything is defined or coded to make it measurable. It is no coincidence that a high grade is related to a good understanding of the subject-matter. In order to execute a valid data analysis, however, it is not just about data based on measurements. Bearing can be found in the interpretation of data - the analysis. Do you still measure what you want to measure? Data journalism needs to go one step further. Indeed, walk the extra mile. All these data with significant results are great, but what does it really say? And that is exactly what your job is all about; translate numbers into a story that makes it comprehensible for everyone. Find out whether it is really about the vaccinations that cause autism or prayers that cure people faster. Or is it just another example of how well the total number of political action committees (US) correlates with people who died by falling out their wheelchair…?