Data warehousing: building on a solid foundation

Plenty of textbooks, papers, and dedicated blogs exist on the matter of data warehousing: methodologies, tools, data structures and schemas, querying languages, data ETL (extract, transform, load) processes, development and maintenance of enterprise data warehouses are all widely discussed. I will not revisit these aspects in the current post.

Keys to success: data quality and putting business in charge

I view two conditions as essential to underpin a successful corporate-wide data warehousing effort, especially since the accepted perception is that on the order of 70-80% of such endeavors fail to provide a satisfactory return on investment, or do not achieve the hoped for end user engagement, or both. Frequently, the mad rush to get something from the warehouse quickly works against the systematic approach that is needed to put in place a solid foundation. The old saw that there is never time to do something right but there is always time to do it again only applies in part, because the cost of a failed warehouse effort is very high and usually means no second chances.

The first requirement of this proper foundation, then, concerns paying attention to data quality in the myriad source systems that are meant to feed data into an enterprise warehouse.  While this is intuitively obvious, it is equally obvious that not everyone is doing it well, and that some may be putting in a half-hearted effort at best.  The excuses offered are varied, and usually have something to do with the hernia-inducing “culture change” that is deemed necessary to break down silos. It is also work that is not glamorous and, furthermore, it does not take place on the front end of the data warehouse but rather behind the scenes.  Many then, out of sheer weariness at dealing with various immovable corporate objects, opt for dumping data wholesale into the warehouse, somehow hoping that eventually technology will cause matters to take care of themselves. The predictable result of not doing enough when laying the warehouse foundation to achieve the required data quality at the source — where by quality I mean fit for a specific purpose, which includes required accuracy — is the wholesale rejection by potential end users of such a flawed implementation of the long-promised “one source of truth” vision for corporate data.

The second, and even more fundamental idea, without which the data warehousing effort will quickly head for the growing waste heap of underutilized and ultimately rejected technical “solutions”, is paying insufficient attention to the fact that data are there to answer questions. Occasionally, these are research questions, but in the main they are business questions.  A successful data warehouse is an enabler, and it is there to help us learn and do our jobs better and in more collaborative fashion.  Unfortunately, the term warehouse brings to mind the idea of passive storage, and no doubt the default perception of a warehouse appliance is as a repository of historical data and not much more. This is not the complete picture, however. Indeed, just as for personal data living in folders on my PC or in my hopefully private cloud, I value ease of retrieval.  This is because I am thinking actionable data. And I believe that, unless your lifelong ambition is archiving for its own sake, you should be thinking along such lines too.

Where an organization is concerned this means that, before any warehousing project is embarked upon, it is paramount to have clarity as to the enterprise strategy, how it maps to tactical and operational goals, how these cascade and map into key performance indicators, measures, and targets, and how these point to specific data to be collected, profiled, cleansed, and warehoused. This clear vision and understanding of the subject matter, which is about how to run one’s business, and the derived, coherent, tightly integrated goals framework, is the foundation upon which a data warehouse can be built with reduced risk and without turning into a resource sinkhole.

Only the above grasp of the business essentials can tell us what data we need to focus on to be able to steer the corporate ship with economy of effort and a proper heading and, just as important, what data to ignore.  The alternative is a succession of haphazard efforts at warehousing data of unclear value, with a view to storage as an end unto itself, as opposed to actionable retrieval. Lack of business clarity also guarantees fuzzy deliverables and scope creep for the initiative, and eventually, blown budgets and schedule overruns, poor adoption, and general disillusionment. Thinking of a warehouse as an IT project instead of as a business-driven solution with enterprise-wide engagement by the business savvy means a scatter-gun or, worse, a kitchen-sink approach rather than a laser-focused one to data collection. This happens when there is no understanding of data differentiation on the basis of their relative value — or lack of it — to the enterprise. It also means throwing more IT resources at the problem as it gets worse, when what is needed is better business guidance to scope things down and achieve the necessary focus on key data, most likely a subset of all data available. For a perspective on big data crunching, see an earlier post. Or, find out more about data quality.

At a time when every healthcare executive seems mindful of reduced reimbursements, dwindling revenues, growing expenses, and thinner margins, it is crucial that they focus on both data quality and whether data to be warehoused actually enable them to run their business better in terms of measurable outcomes. The degree of executive involvement can determine the difference between success and failure of a data warehousing effort, which starts with a properly developed business model and leads to a realistic road-map and a carefully populated warehouse that can bring great value to the enterprise.






Leave a Reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *