Sunday, February 10, 2008

Defining and managing the lifecycle of your data

We are in the process of helping to define the data lifecycle for 2 very different clients.

Data life cycle management (DLM) is a policy-based approach to managing the flow of a companies data throughout its life cycle: from creation and initial storage to its end of life and is deleted. DLM manages the processes involved, typically organising data into separate tiers according to specified policies, and automating data migration from one tier to another based on those criteria. For example a typical rule is; newer data, and data that must be accessed more frequently, is stored on faster, but more expensive storage media, while less critical data is stored on cheaper, but slower media.


I read Krish Krishnan's introduction to the issue of the data lifecycle. He included a reference to Bill Inmons work on Information Lifecycle Management for Data Warehousing. This was fortunate because I have always been an admirer (like so many others in the BI industry) of his work. Recently he published work on Data Warehousing 2.0 - tackling industry trends, unstructured data and the data lifecycle.

DMReview interviewed Bill here, they asked the question why do we need Data Warehouse 2.0 and his response was ...

There are two reasons for DW 2.0 - the first is for the integrity of the definition because I feel there are too many definitions floating around. The second reason is the need for a vision for the future of data warehousing, which I believe a lot of people in the industry have wrong. It came from confusion and from vendors trying to sell products. There were people building transactional systems they were calling a data warehouse; people building federated versions of a data warehouse; people building data marts that they were calling a data warehouse. Those are just some of the renditions.


When asked how do you convince a CEO about the benefits of a datawarehouse he replied ...

The whole subject of return on investment has vexed me. I have seen people take a macro approach. They say, "Okay, the data warehouse was installed, and the stock price of the corporation began to rise." I don't think data warehousing is particularly relevant to a measure like that. I started looking at a micro level and said, "Consider two companies. One company has a data warehouse, one company doesn't. What are the different information capabilities of these two companies?"

The second major case is speed of information. Once the data warehouse is built, the ability to get the information quickly in the hands of the right person in the corporation is greatly facilitated. With DW 2.0, there's also the issue of the data lifecycle, and by recognizing that, you can cut the cost of the data warehouse dramatically.


We have long known that approximately 20% of our data is stored in a structured environment. BI vendors realised it is time to address the 80% of unstructured and semi-structured data. This now provides further insights into our systems and additional understanding of our business and its profitability.

Business Objects recently aquired a product called Insight to use your unstructured text and get a more complete view of your business.

No comments: