How to Turn Data into a Content Marketing Campaign
Lies, damned lies, and statistics. This pithy quote sums up most people’s attitude to data. It’s either untrustworthy, unreliable, or just plain boring.
But in the right hands data can be turned into stories. Stories that captivate. Stories that excite. Stories that get links from top tier publications.
This post will explore how to find great data online, how to use that data to spark an idea, and perhaps most importantly, how not to be wrong when analysing the data.
We’ll also take a brief look at data visualisation techniques, but that really deserves a whole post of its own. Probably from someone with more graphic design experience.
There’s a lot of data online. It’s estimated that Google, Amazon, Microsoft and Facebook store at least 1.2 million terabytes between them. But that’s just the tip of the iceberg. The amount of data on the entire internet is thought to be in the order of zettabytes. That’s a one with 21 zeroes.
To even begin to comprehend how much information this is, imagine streaming a Netflix movie this size. With a broadband speed of 100 Gbps, this would take you 2,535 years. That’s a lot of popcorn.
So, finding the right data is tricky. Thankfully an excellent marketer has put a guide together for you.
For most of us, the first step is likely Google. Many datasets can be found this way, especially those from governments or public bodies. There are a few things to keep in mind when Googling for data:
- Use the word “historical” when searching for older data
- Use the word “data” in your search
- Keep your initial search broad (crime data London) to see the options, then explore the different sources (data.london.gov, Met Police, ONS etc.). I prefer to do this in separate tabs, keeping the main search results page open too
Once you know what you’re looking for, use an advanced Google search for .xls or .xlsx files or government websites
- filetype:xls / xlsx / pdf etc.
Google also has a separate search engine specifically for datasets , which is definitely worth a visit.
The best part is that the archive of all the datasets is available to view at any time, saving you the trouble of hunting through your inbox for the email you know mentioned where walruses like to hangout .
Using Data to Spark an Idea
We’ve written about ideation before. This process, which I like to call data-driven ideation, is slightly different.
Whereas Tom’s method involves coming up with an idea, then looking for sources to support it, I like to turn it on its head. I look for datasets or data-driven articles relevant to my client’s niche, then think of datasets these questions could answer.
When thinking of questions to answer, there are two things to keep in mind. One, has the question been answered before? If it hasn’t, great. If it has, would your answer add anything to the conversation?
Once you have your data source, more often than not you’ll have to do some manipulation to get it how you want it. Governments in particular seem to delight in the awkward Excel spreadsheet formatted the way you wouldn’t expect. Like using columns when rows would make more sense…
As content marketers/data analysts, the humble average is the metric we’re most likely to work with. But did you know there is more than one type of average?
The mean is what most people mean when they refer to the average. It’s calculated by adding all the numbers up and dividing by the total number of values. Or by using the AVERAGE formula in Sheets or Excel. It doesn’t cope well with outliers, so if your data is skewed, please move along.
The median is the middle value of a dataset and is calculated by ordering the numbers and finding the middle value. You can also use the MEDIAN formula in your spreadsheet tool of choice.
This measure is more suited to skewed datasets. Say I’m looking for a house in London and I want to know the average price of one. All the oligarchs buying penthouses will skew the mean so high it’ll just depress me. But the median takes into account the much larger number of slightly more reasonably priced houses, leading to a hopefully less depressing number.
Finally, we come to the mode. This is the most common value in a dataset. If your dataset is numerical, then you can use the MODE formula. If it’s the most common text you’re trying to find, then the formula is slightly more involved , but still accessible.
Outliers are results that are either much larger or much smaller than you would expect based on the rest of the data. They can really mess up your analysis if you’re not careful.
If outliers are:
- A measurement error or data entry error, you should correct the error if possible. If you can’t fix it, remove that observation because you know it’s incorrect.
- Not a part of the population you’re studying (because of unusual properties or conditions), you can remove the outlier.
- A natural part of the population you’re studying, you shouldn’t remove it.
Accounting for Population Size
A lot of data-driven content involves ranking areas by one or more metrics, like pint prices or spider sightings . It’s an excellent way to get coverage in a lot of different local publications.
But there are pitfalls when ranking areas by metrics that could be affected by population size, as this oft-referenced XKCD comic highlights.
To remove this issue, we use per capita measurements. This is essentially a fancy Latin way of saying divide your metric by the population of the area it refers to, giving the metric per person.
In most cases, unless you’re dealing with silly numbers like GDP or national debt, this will give you a tiny number. So, to make it more manageable, the convention is to multiply by 100,000. This gives you the metric per 100,000 people in that area. And voila, the largest place no longer wins every time.
How Not to be Wrong
There are many ways of being wrong, but only one way of being right. Here we’ll look at some of the most common pitfalls in data analysis and how you can avoid them.
- Picking out a data range that supports a point of view, while ignoring the larger trend
- Saying something about a larger group based on a non-representative sample
- Using percentage change for small numbers – this is misleading
- Correlation doesn’t equal causation. Even if we don’t say something is causing something else, putting two trends next to each other encourages readers to draw that conclusion
- Avoid unnecessary accuracy: taking numbers past the decimal point can be deceptive if one number in the calculation is an estimate
- Don’t confuse the percentage point difference (40% – 30% = 10 percentage points) with percentage change (40% to 30% is a 25% decrease)
- In general, take care when using percentage change with values that are already percentages. This can introduce other errors
- Record your steps, including clearly where you got the data from. Hidden sections of websites can be difficult to find
- Make sure you’re dividing by the right number in percentages or division
- Standardise dates, including breaking them up into day/month/year if necessary
- Don’t type things and use formulas wherever possible. Entering data by hand introduces mistakes
- Spot check your data after doing large changes
Data-driven content should be rigorous and accurate. Journalists aren’t going to cover, let along link to, something where the numbers have been fudged. Being as careful as possible is the best way to prevent this.
Choosing how to visualise the data you’ve so carefully collected and analysed is one of the most important parts of the process. After all, most people are turned off by spreadsheets. Making your data beautiful is another step towards the coverage you want.
What story you want to tell is the main driver behind what chart you choose. For example, a bar chart is best for displaying the number of items in each category, but a line chart is the choice for showing how the data has changed over time.
There are also other, fancier charts if you want to be bolder. Choropleth charts use colour to visualise values over a geographical area, while Sankey diagrams show the transfer of something (energy, money etc.) from one place to another.
For an excellent guide to different chart types and what they’re used for, visit the Data Visualisation Catalogue .
The guide above will help you turn boring old numbers into an exciting content campaign.
To sum up the process, first, find a great dataset library to go back to again and again. Then, when looking for ideas, find relevant datasets and see if they spark ideas or questions to answer.
Getting the data ready to analyse is probably the most boring, but most important, part of the whole process. Use all the Excel/Sheets hacks you know to make analysis as quick and easy as possible.
Finally, choose a suitable visualisation technique and let the links roll in. Well, you’ll have to outreach it first, but that’s a story for another day.