Tips and techniques you should do to prepare your data.

Data visualization and reporting is the hidden gem of the big data movement.  It does not matter how many, how big, how sophisticated, how thorough or how detailed all the data you collect is. If you can’t turn that data into actionable information in front of the people that need it, it’s worthless.

Organizations pull data from their databases like their ERP, CRM and financial systems to use for reporting. They begin creating reports on data that no one’s seen before. The data users understand the data and ideas begin rolling. But this leads the data users to ask for more and more data and increasingly complex scenarios to figure how they want to operate. It’s up to the data stewards (data geeks) of the organization to be able to say YES, we can deliver it.

You must have control of your data and the relationships between it so it’s manageable. There’s numerous statistics out there that call out that 80 or 90% of all data collected is not used; in 3 years we’ll have 10 times the volume of data we have today. So here’s what to do about it.

Follow the 5C’s of Data Visualizations to prepare your data and pre-empt your eye-opening data visualization.

5c_large

Capture

I can Google and create data source connections to most any data source I have within 15 minutes. However, I must be able to replicate that connection to data across to all the individuals that need this data. We’ve got to be able to scale. This may be harder and more costly to do when the data idea I had is a complex one. I do not want to be the bottleneck for others accessing production and live transactional systems.

I want to have a limited set of individuals perform the capture and acquisition and then point people to it. This apparent extra layer just upped my ability to insulate all downstream reporting from migrations of source systems in the future and prevented my transactional systems from a barrage of pings. 

Clean

We’ve spoken about the volumes of data available. It’s overwhelming. The increased volume of data introduces an overall degradation in the quality of the data coming in. So now it’s up to the systems, in this case the data repository system, to be the authority on it. More data generated with limited numbers of rules put in place on data entry leads to incomplete items that need filling in and cleansing. Web traffic data comes in volumes and many times is irrelevant information to you, and you must seamlessly and continuously filter the data.

Before data visualization happens, having a clean set of data is imperative. One effort should move on cleaning the captured repository. Downstream if this simple rule is followed, questions and reduced confidence in the final results drop dramatically. The data users can then make educated decisions on what changes need to be made.  This is the core reason why people look at data.   

Combine

Experience shows that great data visualizations come when pairing data that comes from varied sources. The hardest way to combine data is to do it cross platform however. Once you’ve captured data in a common repository you have removed completely any barriers to combining sources. In fact you’ve generated a single new source and cut your technical challenges exponentially for every source you add beyond that.

Think, if you try to combine 2 sources, that’s 1 relationship to manage. With 3 you must manage 3 relationships and 4 it’s up to 6; 5 = 10. It’s best to platform your repository once and remove that redundant technical fumbling, fast. Because your sources won’t stop at 5.

Calculate

Calculations seem to gain a mind of their own in every reporting system. And similar to the clean data conversation we have, calculations done once and used in many places must be pivotal to an end solution.

One would not want every report in the organization calculating the margin percent on a sale after removing sales tax. It’s prone to simple order of operations errors, localization errors and more and more. Centralized calculations done prior to data visualization allow for the ad hoc analysis to render real results that can be acted on quickly. Central calculations lead to an increased confidence in the data system and in the results and findings.

There’s nothing like hearing, “it was easy to find the metrics we were looking for, and when we found them, all the assumptions were taken into account.”

Control

Control is what you gain by putting together the C’s into your data before visualization. Control can sometimes be a word that gets misconstrued, however having control over your data lets you be more agile in your decision making process-- and ultimately more reactive to a data hungry business. That kind of control leads to good data visualization.