The Data Life Cycle

big-data-lifecycleLike you and me, data has a life cycle beginning with its acquisition and ending with its purging. In the middle of this life cycle is when the data is most useful, similar to our lives one we enter adolescence and our parents begin to imagine some relief from the financial and emotional stress we have place on them. Eventually, the data are no longer useful, but they can still tell a story, so they are put in an archive to be called upon when there is some interest. Ultimately, there is a point when it is no longer useful and only have historical context and they are purged.

The Beginning – Data Capture

To start its life cycle, data must enter our infrastructure, be it our consciousness or our enterprise infrastructure. The primary methods for this are:

  1. Data acquisition – this is the gathering and taking in of data from outside the processing or storage center. This data can be purchased from other organizations or can be physically collected by technicians.
  2. Data entry – the physical entry of data, gathered from outside or generated from within, into the processing or storage center
  3. Signal capture – gathering data from sensors and devices by data loggers and IoT devices.

All of these methods have particular data governance challenges. There may be contractual or legal agreements from data acquired from outside an organization. Data entered from within may have reliability or integrity issues; think in terms of bias. Data from signal capture may have quality issues.

Infancy – Data Maintenance

Once collected data must be nurtured and processed into a state that will make it useful. No value is derived in the data maintenance stage and it has a net expense. During this stage the data is enriched through munged, cleaned, transported, related, extracted, transformed, and loaded (ETL) for later use. Data governance in this period is concerned with how the data are handled, documented, and manipulated.

Early Childhood – Data Synthesis (development of information)

Now in a useful state, data is analyzed for value. It is subjected to logic, algorithms, equations, and computations to see if value can be extracted or if it will support the original reason it was collected. This area is handled by subject and content specialists for the topic at hand.

Adolescence – Data Usage

At this stage data is useful and begins to generate value. It is applied to tasks of interest, such as making predictions, supporting decisions, or evaluating risks and begins to align with the purpose of the organization that collected it. Governance at this stage focuses on proper and permitted use of the data.

Adulthood – Data Broadcasting

Once realizations are made, the data and the findings are generally shared with similar organizations or clients. Corrections cannot be made at this point and data governance rules will have to dictate how corrections and retractions are made as well as how to handle those affected by any errors or omissions.

Mature Adulthood – Data Archival

At some point, after many discussions and uses, an organization has realized all the value it can from a set of data and moves on; the data has lived its useful life. At this point the data is stored in a manner that it can be called back into use or interrogated by auditors should the need arise. It is not being maintained nor cared for, it is just at rest.

Death – Data Purging

This is the end of life for data. It is removed from the organization entirely with its archive deleted. The challenge of this phase is to ensure that all of the data has been deleted and there are no fragments or extra copies in circulation or hiding.

Final Thoughts

Data may not experience each of these phases and in reality, much data is not destroyed. More often than not, data is retained in the event that it may be useful again. I find this quite often in my line of work where so much effort was put into collecting, transforming, modeling, analyzing and storing the data that it is just too difficult to let go. As a data administrator and engineer in the mining industry I have worked with some truly great statisticians and scientists that hold on to their data for dear life, and I understand why. In our industry, they changed history and made the future, paving the way for the digital processes that we are using now. But at some point, the volumes of data have to be dealt with, and the organization must move on. Finding the proper balance, to ensure all value has been extracted and that just enough is kept, is one of the challenges of a well-managed data governance program.

If you found this post interesting, or you have experience with managing data life cycles,  please let me know and leave a comment below for find me on Twitter.

Image credit