The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents.
H.P. Lovecraft, The Call of Cthulu
Big Data has been on everybody’s minds and lips – including ours, over the past couple of months. It’s a topic we have decided to explore extensively, and the article here comes as a continuation to everything both I and my dear colleague, Sorin Staicu, have written on Big Data, data hygiene and, ultimately, data inaccuracy.
Few things can ever be as scary as the propensity of the information we are idly sitting on, hiding under a mountain of gazillionbytes of data we have been childishly generating and gathering, creating boatloads of valuable, yet still elusive insights.
The thing with this monstrous pile of information we are storing is that it can fuel our freedom or our demise. It can be our saving grace or Lovecraft’s immortal Cthulhu waiting to awaken from its slumber and release madness upon the world.
Truth be told, the way we use our big data might as well be the scariest, most terrifying story we have the power to create. It goes beyond Terminators and Cylons, and all the AI-infused SF stories we have ever told.
Because it’s so real that it will hurt every pixel, every inch of screen, and every single bit of technology on which we aim to lay the foundations of our future.
Sounds ultra-dramatic, perhaps, but hear me out.
From Big Data to Bad Data
Over the course of the past couple of months, both I and Sorin have approached, from different angles, the subject of Big Data.
We have discussed Big Data from the data hygiene perspective, from the perspective of Dataism, and, ultimately, Sorin has plunged into the Big Data Pyramid, a fascinating military-based approach to using data to your business’ advantage. If this article is where you landed in our Big Data series, we highly recommend that you read those pieces as well – they will give you a clearer idea of where all of this is going.
The importance of Big Data has been tackled quite extensively (and there is still a lot more to discuss!). What needs to be discussed now is the importance of bad data or dirty data.
How do you go from Big Data to bad data?
This might be surprising, but it is incredibly easy to spoil your data. You don’t even have to do much, really:
- You can omit or miss data from your database
- You can feed the wrong fields of your database with data that does not belong there
- You can feed your database with data that is not the norm (meaning that it hasn’t been normalized according to the system of records enforced by your organization)
- You can feed your database with duplicate data
- You can feed your database with false or erroneous data
- You can feed your database with data that has been poorly entered (typos, spelling issues and variations, formatting problems, and so on)
In short, the way your data is structured and fed into your database is crucial.
Look at it this way: if you were to bake a cake today, you would need all the ingredients in handy. Even more, you would want to make sure the butter is not spoiled, and that the chocolate hasn’t been sitting in the sun for three months. The road from a good cake to a bad cake can be amazingly short: it’s enough for one of your ingredients to be contaminated, and the entire cake will be ruined.
The same goes for your data. You might have all the ingredients, and you might aim for a great result (gaining important insights about your business). But if one of the ingredients is spoiled, the result will be poor at best.
The road from Big Data to Bad Data is way too short, way too easy to walk, and way too dangerous to disconsider.
From the Ground Up – Data Accuracy
Data accuracy is the foundation upon which you have to build your structures. You can have the most intricate deep-learning machines at your fingertips – but if you feed it inaccurate data, it’s all for nothing.
Or worse, even.
Take Tay, for example. In 2016, Microsoft released a chatterbot into the World Wide Web. They called it, quite innocently, Tay. Well, Tay was a lonely robot in a man-made world, and it had been taught to react to everything he sees on Twitter. As a result, his Tweets were a mere reflection of whatever he assimilated via social media.
The issue is that, well, not everything on social media is right or politically correct.
As such, 16 hours after its birth, Tay (an acronym standing for Thinking about You) was shut down for tweeting racist messages.
Tay’s story might seem funny, but it’s a pretty clear example of just how badly Big Data can mess up. Microsoft’s chatterbot might not have harmed anyone in the process – or not directly, at least – but it showed pretty well what an AI fed with poor data can do.
The reality of inaccurate data goes far deeper than tweets – it can take root deep in the trenches of your business, at its very core.
Some of the most common effects of allowing yourself to work with bad data include:
- Wasting time trying to find insights
- Finding insights that block your growth and take your organization on a wrong path
- Building marketing campaigns that limit your visibility
- Impacting your operational costs and efficiency
Basically, every single aspect of your business can be affected by inaccurate data – and even more so when you don’t realize it is inaccurate. Just imagine the kinds of disasters you could bring upon your business if you started a marketing campaign based on flawed data, if you launched a new product based on inaccurate data, or if you opened a new office based on incorrect data.
The doomsday scenarios can go on, and on. Enough to say that inaccurate data can make you see how all of your work fades into nothing: inefficiency, lack of profitability, and downright bankruptcy.
There is a monster of inaccurate data missing its crossed t’s and dotted i’s, hiding under your mountain of perfectly prim and proper data.
How to Source Out the Bad Data Monster?
Bad data is far more common than many would think.
At this point, organizations estimate that approximately 22% of their data is flawed. You’d think that, with Big Data growing in importance, this number would have improved over time. But the sad truth is that the percentage of inaccurate data has grown exponentially – the more data we collect, the more of it is flawed in one way or another.
So, do you just let the monster survive under the mountain while you live in fear of either falling behind when it comes to Big Data benefits or continuing to work with data is that is risky?
Neither, actually. The good news about bad data is that you can make bad data great again.
Or, better said, you can hygienize it. In fact, on Monday Sorin is going to publish an article with this exact topic: Data Hygiene: Pillars and Pitfalls. So stay tuned for our extensive how-to advice piece on data hygiene.
One of the essential ways to do this is by implementing a data collection workflow that is impeccable in every respect – from the way you build the actual collection tool (e.g. a form or a survey) to the way you store it and filter it. 123FormBuilder can help with that – and it can help you lay a healthy foundation for your Big Data from here onwards.
Furthermore, you should also consider:
- Assign specific people to handle your company’s data (which is actually required under GDPR)
- Make sure the correct data will reach the correct people in due time
- Create/use data collection and management systems that fit your specific business requirements. There is no one-size-fits-all solution
Aside from all this, keep in mind that you can also make use of the different data quality solutions out there. These tools are built to filter your data and help you clean out the weeds.
Lovecraft might have believed that the most merciful thing about mankind is that we cannot correlate our data.
But what if we build machines powerful enough to work with our data in an objective way? What if we only feed the “good” Big Data into the machine?
The possibilities are endless.
And no matter how much we like a good Halloween story, not all these possibilities end up in disaster, turmoil, and the complete annihilation of everything we are as a species.
Some of it might actually push us forward – as organizations and as human beings.