The myth that humans use only 10 percent of their brains still exists in the world. Even after years of countless neuroscientists dismissing it as a hypotheses, it is commonly believed that we have untapped potential that we are yet to fully utilize. But make no mistake that there is no myth that this is absolutely the case with data inside organizations. Organizations are filled with so-called, dark data, which the organization has already paid for, collected, and stored in various systems and data stores, but is not actually using, analyzing, or even accessing currently at the moment.
The continued data explosion across organizations and increased pace of business has resulted in a scramble to become even more data-driven. Enterprises started storing a plethora of data at a significant expense. All this data today lives in silos and legacy data stores across enterprises. As a result, most organizations use only a fraction of the data they have already paid to collect and store. It is a potential goldmine just waiting to be tapped. This is the Dark Data opportunity.
When teams are attempting to answer difficult questions or improve the way they work, they avoid the challenge of seeking out and analyzing unfamiliar datasets. It is particularly difficult to access this data when either the organizational structure or data architecture gets in the way. Also, the costs of processing discovered dark data in legacy systems is pretty much on the higher side.
One of the most important reasons certain data is not made available to analytics tools is the fear of security breaches and concerns about regulatory compliance with important privacy legislations. So we know that dark data is expensive in terms of both real costs as well as an unfortunately hard-to-measure opportunity cost. It is clear that organizations must be able to, at the very least, access their dark data.
Attempts to drive value from dark data just once will not help organizations sustainably and repeatably. It cannot be a one off process as data continues to multiply exponentially. Building a repeatable process to continuously make data readily accessible and consistent is something organizations must invest in.
There are 3 key imperatives for shedding light on dark data:
Drive speed and flexibility by using technology to automate ingestion, transformation, cleansing, and governance and using agile development practices.
The days of long waterfall development processes have ended. Organizations need to be able to gather requirements and execute on them quickly and iteratively. Using agile development practices and accelerating development practices with visual development and pre-built components can dramatically accelerate the delivery of trusted data.
Leverage data lake technology with data stewardship practices for maximum trust.
Using technologies like Hadoop, enterprises can cost-effectively store and process any kind of data, and in a scalable approach. Combining Hadoop with a collaborative and holistic program of data stewardship can ensure that raw data can be quickly curated into fit-for-purpose datasets for different users, with data quality issues identified automatically with scorecarding.
Utilize metadata and data intelligence to increase visibility and improve security and governance. Technologies that understand and recognize the meaning of data can help organizations recognize patterns and detect potential liabilities from data. Data security intelligence and other new metadata intelligence capabilities leverage machine intelligence for understanding data and applying necessary remediation.
The dark data imperative is not just about the potentially game-changing insight that data can have, but it is also about discontinuing the breathtakingly inefficient and expensive practice of enterprise-level data hoarding. There has never been a better opportunity to apply data management technology and practices to shed light on dark data and unleash new levels of competitiveness and profitability for the organization.
The author is Principal Product Marketing Manager, Big Data, Informatica