Is your data too fast and too unreliable?
Remember when 128 mb of RAM was all you'd ever need to store every scap of information available to you? Today, data is being generated at an incredible rate: every day, we create 2.5 quintillion bytes of new data. And companies are adding new data sources all the time in the hopes of improving their Business Intelligence (BI).
Business leaders likewise require high velocity, demanding access to the data they need when they need it. If your company relies on traditional methods of warehousing data, however, speed is likely an issue. Every Director of IT knows that each source integration into the reporting back end, data warehouse and front end can take weeks or even months. By the time IT completes the integration, it’s highly likely that a newer, better source has taken its place. And the disgruntled business user gets a report in three months rather than three hours.
Both IT and business leaders want data faster--preferably right this instant. That's why adoption of self-service data visualization tools such as Qlik and Tableau have skyrocketed. Everyone from the CEO to the data scientist to Bob in IT wants to slice and dice the data in order to make smarter decisions.
But there is a catch to each and every one of these self-service data visualizers: in order for the output from these tools to be truly meaningful, the data must be comprehensive, trustworthy, clearly understood, and actionable in addition to being readily available. Put simply, data visualizations are only as good as the data that goes into them.
The first step towards data governance
In the past, the only way to control the vast volumes of corporate data was to meticulously build a traditional data warehouse, technology first developed in the last 1980s, to ensure that data quality and consistency was governed and maintained. However, as any IT engineer will tell you, building a traditional data warehouse is a long, costly, and risky process. Even the father of data warehousing, Barry Devlin, recognizes there’s room for a different approach these days.
Full data governance: the Discovery Hub
In a recent whitepaper, Devlin describes the need for a 21st-century approach to data governance, what he calls a "Discovery Hub," which is a place “where core business data can be cleansed, reconciled and documented prior to making it available to business users.” According to Devlin, “Its design allows and encourages business users to make it the trusted foundation in their discovery processes.” Devlin points out that traditional data warehouses offer no help in three key Business Intelligence (BI) areas:
- Poor data quality the long-standing, pervasive issues of poor data quality and documentation
- Lack of data consistency the lack of data consistency across sources
- Lack of expertise the analysis by business users of data beyond the boundaries of their knowledge and competence.
How does a Discovery Hub differ from a traditional data warehouse? It is a data store where ALL core business information can be cleansed, reconciled, and made available as a consistent resource for business users. The structure is simpler, and the contents are narrower and cleaner than that of a traditional data warehouse. The goal is to be the "single source of truth," meaning that when quality issues arise, or bad data is found, the error can be corrected once in the data hub for all users.
Key aspects of the Discovery Hub
- Governance The Discovery Hub provides a single, consistent, and managed source of all internal core business information in a governed data discovery environment.
- Business-defined sources Business owners idenfity a single source for each element; IT collaborates to implement.
- Storage A Discovery Hub is also a consistent, managed location to store other internally and externally sourced data.
- Historical views It stores ongoing historical snapshots of this core business information.
A Discovery Hub is a place where data quality and consistency are maintained and fully governed, a place that acts as a central repository so the right person can access the right data at the right time. Your data keeps pace with your needs, and all data is properly and consistently governed automatically.