| Data integration: A base for enterprise architecture | |||
| DI enables general integration | |||
Ramendra Mandal Data integration (DI) software not only does data warehousing, but also provides capability for general integration. DI can pull huge amounts of dissimilar data from any number of disparate sources; rapidly transport, transform and cleanse it; and integrate it so that it appears to have come from a single source. Gartner once reported that "large enterprises should create a central competency center to reduce the time and cost required to integrate application systems." More and more, companies are turning to DI software as the foundation, or integration competency center, upon which their overall enterprise data architectures reside. This may surprise those who think of DI only in terms of data warehousing, yet DI provides capabilities and advantages that are simply not available from any other single integration technology. These include the ability to perform complex transformations, focus on data quality and profiling, quickly move terabytes of data scheduled or event driven, leverage rich metadata capabilities, use codeless integration and utilize adaptive integration to dynamically keep pace with changing information requirements and environments. Moreover, DI marries well with other integration technologies, such as enterprise application integration (EAI) and enterprise information integration (EII). Its unique attributes can be leveraged by them, and vice versa, in order to integrate, visualize and track any type of data, in any quantity to and from any platform, scheduled or event driven, for any enterprise data requirement. This said, why should data integration software be the platform upon which to layer other integration technologies? The answer lies in how today's typical enterprise data architecture encompasses a variety of integration requirements, and in how those requirements can be best met to enable true business agility. It's all about data High on the list of applications requiring this functionality are business intelligence type applications, explaining why DI is currently so closely associated with data warehousing. Without DI there could be no unified views of business data across multiple systems; no single views of customers and suppliers to drive customer relationship management (CRM) operations; and no trusted source of cleansed and normalized data for business intelligence and corporate performance management (CPM) applications. However, an Informatica World 2003 survey of senior IT executives found that 87 percent indicated that they use DI solutions for general integration, not just data warehousing. A recent TDWI survey concluded the same point, with 80 percent using DI solutions for beyond data warehousing, such as data migration, application integration and reference data projects. DI is equally essential for migrating data among systems, replicating and synchronizing large amounts of data across databases, meeting data-interchange requirements such as HIPAA in healthcare and SWIFT in financial services, or consolidating ERP and other data onto a single platform in the wake of a merger or acquisition. DI's ability to move massive volumes of data quickly, while performing data transformations and data-quality operations, all come into play in these scenarios. There are also comparatively new arenas that require DI functionality: business activity monitoring (BAM), zero-latency enterprise (ZLE), and reference data hub initiatives. All of these initiatives require some sort of real-time integration component. DI is often pigeonholed as a batch technology because batch was normally required to support traditional data warehousing. As we will discuss, DI can source transactional data, and integrate it in real time, to enable the real-time enterprise. Data integration-architectural advantages DI software will scale linearly and can transparently leverage parallel-processing technologies for enhanced performance and application server technologies for dynamic balancing of workloads. DI software is also quickly deployed and inherently easy to maintain, as it does not require hand coding. Web services have emerged as a significant DI trend. Within a DI environment, plug-and-play support for Web services enables DI solutions to adapt quickly to a company's existing and new Web services processes. For example, an order application can invoke DI's transformation and integration functionality via Web services, while a DI solution can in turn invoke external Web services to receive data or to leverage external functionality (e.g.: invoking an external workflow manager). Leveraging metadata Metadata is also important in a business intelligence and analytics context. Giga states, "When combined with enlightened data management practices and business acumen, metadata-driven design is making possible significant benefits in terms of reuse, productivity improvements and reduced coordination costs.†The total integration imperative BAM is the new generation of business intelligence, capable of providing real-time dashboards and scorecards that enable users to keep their fingers on the pulse of enterprise events as they happen. As Gartner states, “Data is presented to the BAM recipient in much the same way as car performance data is presented to a driver via the dashboard of a car.†Everything that DI brings to business intelligence-the rapid transformations, the consistent data quality, and the single views, the sourcing from numerous systems-is required by BAM. A DI solution can integrate real-time data sourced from real-time data feeds such as EAI message queues. The real-time data can be transformed, de-duplicated and inserted into a real-time data repository by the DI solution as it emerges from the messaging pipeline. It also manages the immense amount of metadata that is integral to the effective use of data by BAM or any other business intelligence application. Enterprise information hubs-whether they are called ZLE hubs, reference data repositories or real-time customer information stores-require a similar synergy. Unlike BAM, these constructs are essentially operational in nature. These are single places for people and applications to find clean, trusted, consolidated enterprise information. Data is sourced from a wide variety of enterprise transaction systems, some of it in real time (and some not), and transformed, cleansed and aggregated by the DI solution. The real-time data is integrated from EAI or other real-time message queues. The data in these hubs is, then, an amalgamation of real-time data and historical data, with the real-time data being kept continually up-to-date, often through DI-driven changed data capture. Once in the hub's repository, or "hot cache" as it's sometimes called, data can be enriched by data that is already there or by new data flowing through the hub. And it can be pushed or pulled out to applications and users via EAI-enabled publish/subscribe messaging. The consolidated data can also be used to populate downstream data marts and warehouses, and to feed BAM, traditional business intelligence, data mining and other analytic-type applications.
With these emerging applications, it's also necessary not just to integrate real-time data, but also to integrate it in real time. A DI solution can bring numerous performance features to bear-from real-time pipelining to leveraging parallel processing-to do just that.  The author is the Country manager, Informatica India |
|||