12:30 PM
Data Quality: The Misguided Quest for the Truth
The truth -- it’s what everyone wants. Give it to me straight. I can handle it. The truth is a noble pursuit -- the quest for the God Particle or the meaning of life. The quest for the truth in science, philosophy, and literature is one thing that separates man from animal. However, in the data management world, this same quest is what separates success from failure, but not how you might think.
It is widely accepted that between 50% and 80% of all data warehouse projects fail. As someone whose company provides a data warehouse for the financial industry, this is an embarrassing problem that we, as data professionals, have spent too little time trying to solve.
There are a number of reasons for the high failure rate in data warehouse projects, but I believe that one of the primary culprits is the misguided quest for the truth. That is, there is a confusion that data quality, which we can all agree is of the utmost importance, is based on its truth. Equating quality with truth causes us to commit inordinate amounts of time and resources for a futile quest, for truth is based on context, and context changes from user to user.
However, if we instead realize that the truth can be determined only by context, then we are free to change the equation from “quality equals truth,” to “quality equals factually correct in accordance with the definitions of data.”
With this radically simplified approach to quality, we can make the whole QA process much more efficient by dividing it into two steps, one prior to loading the data and the other when reading (interpreting) the data. This will allow us to verify that the data is factually correct (prior to loading) and at the same time support multiple definitions or interpretations of the data when reading it. Supporting multiple definitions of the same concept is something that is getting increasingly important as most industries get more and more complicated, something that is reflected in the data an organization works with.
For example, a bank’s risk department may define market value differently from the way the wealth management department does. Each has its own understanding of market value, but each has a different definition, or context, for how to get that number. As such, one may include accrued interest for fixed income assets, while the other does not. Both are true, but nonetheless, we waste months or even years building data warehouses that will offer just one of these departments -- or perhaps neither of them -- the truth as they define it.
Instead of determining a single overriding truth, we need to evolve with the business and design systems that let users find their own truth as defined by their context. So if the quest for a single version of truth should really be context-focused, how do you make this a reality in today’s data warehouse? Well, as with most problems that are disassembled and analyzed, there is a very logical process to take to capture this more realistic understanding of quality.
Specifically, companies need to get out of the old data warehouse extract/transform/load protocol, and move to ELT, in which transformation -- and context -- follow loading. ELT has a big-data-like feel to it, in that data professionals can more quickly and easily bring in data, allowing each individual department to interpret the data in a way that they feel adds the maximum value to their business. Moving the transformation layer very effectively addresses the challenge of multiple versions of the truth, because it allows strategy to take the lead position over a data management tactic, and strategy always beats tactics.
ELT is also much better for today’s dynamic data environments and makes it easier to store data from multiple sources, to support multiple definitions of the truth, and to support multiple interpretations of the same fact. For example, you can have multiple definitions of market value, one based simply on underlying securities and another including such arcane details as unsettled accrued interest, restitutions, and so on.
There is much to be excited about in data management, with an incredible amount of innovation around the idea of getting more out of data. But, when creating a forest, it can be distracting to focus on the trees. With that in mind, focusing too much on the data can endanger the whole data warehouse project.
Jonas Olsson is the CEO and founder of Graz , a provider of data warehouse and business intelligence software built specifically for the needs of investment managers, insurers and banks worldwide. Olsson founded Graz in 2000 as an IT services firm, and transitioned it ... View Full Bio