Knowledge Portfolio
Introduction ………………. 2
Business Process Integration Framework ………………. 2
Data Integration ………………. 3
Some Issues ………………. 3
Enterprise Information Integration (EII) ………………. 3
EAI vs. Information Integration ………………. 4
Data Integration vs. Information Integration …………… 4
Information Integration Solutions ………………. 4
Challenges ………………. 5
Markets Merging ………………. 5
Metadata Management ………………. 5
Best Practices for Information Integration ……………… 6
Conclusion ………………. 7
Sources and Links ………………. 7
Introduction
A decade ago, Michael Hammer pushed the idea that process reengineering was the next business revolution. He promised companies that if they overhauled their business processes the businesses would become more efficient. Most businesses that bought into the argument laid a lot of people off, but the expected efficiency eluded them. In part, this was because it was hard to get everybody who was left to agree to change their work processes and in part because even if they did, companies had no mechanism to enforce the changes. But today, the concept is back in the guise of BPM software, which provides a way to monitor and/or enforce efficient business practices. BPM software does this by extracting data from a company's business applications and doing one of two things with it: tracking how the information is used to perform a task so that you can map an existing business process, or escorting the data through a set of tasks to ensure that a business process is being followed.
BPM is defined as follows:
Supporting business processes using methods, techniques, and software to design, enact, control, and analyze operational processes involving humans, organizations, applications, documents and other sources of information.
Business Process Integration Framework
There are four distinct forms of integration. An example of framework of integration in an organization is as under.
Supporting Processes
Applications
Data
Fig. 1 Information Integration framework
1. Business-process management coordinates processes across application and possibly enterprise boundaries, such as those involved in a supply-chain relationship. Web services and their derivatives are becoming important here.
2. Supporting processes- Integration of all supporting processes to reduce redundancy.
3. Application integration, in which applications that do similar or complementary things communicate with each other, is typically focused on data transformation and message queuing, increasingly in the XML (Extensible Markup Language) domain.
4. Information integration, wherein complementary data are either physically (through warehousing tools) or logically brought together, makes it possible for applications to be written to and make use of all the relevant data in the enterprise, even if the data are not directly under their control. A typical example of this would be a new customer relationship application that combines the relational call log with the speech-to-text translated call itself.
Although the four models of integration are complementary, in this portfolio, I will focus on data integration in an organization.
Data Integration
Business today increasingly demands a unified view of information. Massive amounts of data are already stored throughout the business, but in a fragmented and disintegrated manner. In order to be efficient and responsive, business users need the ability to use this data transparently, wherever and however it resides, without concern for its timeliness, consistency or security. They require real, useful information. They demand that the IT infrastructure take care of all the details of how the data can be integrated into the information they need.
This demand for integration of information is not easily satisfied today. Some of the required functionality does not exist. Other enabling technology is still emerging. Available functionality is spread over many products in different technology categories.Businesses typically use the traditional integration approaches of extract, transform and load (ETL) and replication. These approaches, here called data placement or consolidation, integrate information by physically consolidating it locally in advance of using it. An emerging approach, known as distributed access or enterprise information integration (EII) enables users to obtain direct access to data in its original locations.
Enterprise Information Integration (EII)
The workhorse of data integration has been ETL tools. They were created to extract the information, transform it into a consolidated view, and then load it into a data warehouse in a batch mode. The data volumes involved were generally large, the load cycles long, and information in the data warehouse typically a day to a week old. For synchronizing data across operational systems, operational data stores were created, which enabled the real-time update of information.
But the problem with each of these solutions was the need to physically move large volumes of data from source systems to multiple consolidated data stores including the data warehouse, distributed data marts, operational data stores, and analytical multi-dimensional databases. While these consolidated data sources continue to be important to organizations, latencies and inconsistencies are pretty much a given with such an architecture.
EAI vs. Information Integration
The batch ETL solutions of the past were not capable of meeting the real-time integration needs of the new breed of online systems. Information that is days old is not acceptable for real-time solutions. While the ETL tools continue to serve a valuable function in organizations, they became the step child of integration.
The newer Enterprise Application Integration (EAI) solutions came along and solved the data latency problem by synchronizing changes across systems in real time. However, EAI less adequately addressed the need to aggregate and consolidate data and information across the enterprise. EAI can effectively move data among systems in real time, but does not define an aggregated view of the data objects or business entities.
For example, a customer service representative on the phone needs to be able to answer a customer’s question in real time, without having to figure out which system is involved. This requires the ability to make a query across distributed data sources as if they were a single database. EAI does not address this problem at all. Enterprise Information Integration (EII) does.
Data Integration vs. Information Integration
Along with the movement to real time, the need to integrate different kinds of information has also become more important. For example, a Web site or portal could aggregate data from multiple databases and synchronize updates to all of them, as well as present other types of unstructured data such as graphics, audio and video. While data integration generally focuses on structured data, managed by databases, the term information integration includes both structured data and unstructured electronic media. Enterprise Content Management (ECM) provides these capabilities, and may also provide some application integration and workflow capabilities, redundantly providing these integration services also provided by other technologies in the infrastructure. Some of the emerging EII tools will handle both structured and unstructured information. Further, most also provide metadata management solutions in more open repositories. In other words, EII is fast becoming the part of the infrastructure that manages the information across the enterprise.
Information Integration Solutions
Information integration can be used for the following kinds of applications:
Creating a single view of a customer or other business entity
Enterprise data inventory and management
Real-time reporting and analysis, and creating management dashboards
Updating a data warehouse
Creating a virtual data warehouse
Updating common information across information sources
Creating portal applications containing both structured and unstructured data from disparate systems
Integrating unstructured data, including documents, audio, video and other electronic media, into applications.
Providing an infrastructure for enterprise information management, including all forms of digital media
Information Integration simplifies the creation of all these applications by enabling the information so that it can be accessed and managed as if it came from a single data source.
Challenges
The long-term goal of information integration research is to build systems that are able to provide seamless access to a multitude of independently developed, heterogeneous data sources. These systems should have the following capabilities:
integrate sources at scale (hundreds of thousands of sources
support automated discovery of new data sources
be easy to configure, manage and maintain
protect data privacy
incorporate structured, semi-structured, text, multimedia, and data streams, and possibly inconsistent data
provide flexible querying and exploration of the sources and the data
adapt in the presence of unreliable sources
support secure data access.
Markets Merging
Enterprise information integration is an emerging market sector. EII provides the data aggregation capabilities of the old ETL tools, combined with access to real-time information that EAI provides. The market includes the extract, transform, and load (ETL) tools popular for batch data synchronization, as well as the emerging EII and ECM solutions. Many of the ECM solutions have also added workflow and business process management, so those markets are also overlapping. Moreover, some EAI vendors are adding EII capabilities, because their customers are demanding it. Yet another category of solutions is integrated platforms that will do it all. The bottom line is that most large organizations at some point will need it all. The questions is how to make it all work together, and leverage the analysis, design and implementation from one integration project to the next. A good place to start is metadata management.
Metadata Management
In addition to providing real time access to aggregated information, EII provides an infrastructure for integrated enterprise data management. While the graphical EAI data mapping tools are easy to use and speed the integration process, the information they capture is valuable corporate information required for enterprise data quality management. The analysis required to capture the metadata to drive data transformations, is the same information required for enterprise information management.
In an ideal world, the metadata repository would manage the data that is used by the transformation engine. In the real world, semantic metadata is in multiple places, and not centrally managed. Metadata management is an issue for long-term quality of distributed information. An Aberdeen research report states that database administration costs now dominate the TCO (total cost of ownership) of applications below the 500-user level, and they continue to increase in importance for all sizes of applications (“Enterprise Information Integration: The New Way to Leverage E-information”, July 2003, Aberdeen Group). This is a pressing need that EAI tools were not designed to address, but one that the enterprise needs to address to ensure long-term quality of corporation information. After all, it is of little value to provide real-time access to inaccurate information. Merely creating a canonical data format will not be sufficient to create long-term value.
Best Practices for Information Integration
As with most technologies, success depends more on how you use it than which product you buy. Managing enterprise metadata as a valuable corporate asset will go a long way toward creating long-term value, agility, and reuse from integration efforts. The ultimate value of real-time information access will depend largely on the accuracy of the information itself. There are a number of ways an organization can improve accuracy, increase reuse, and maximize the investment made in discovering and defining aggregated data definitions:
Conduct design reviews: A metadata model represents an aggregated definition of data from different systems in a canonical format. The only way to ensure the common definition is correct is to have the model verified by all the stakeholders those who have knowledge of each of the systems, and those who need to utilize and integrate the data.
Create an enterprise metadata repository: To ensure long-term value, an enterprise metadata repository, based on standards, provides a platform for storing, accessing and managing metadata, and access to information across the organization. It is the Rosetta stone to disparate enterprise data. The repository can grow over time, on a project-by-project basis. However it needs to be actively managed to ensure integrity and data quality, and maximize reuse.
Manage metadata at an enterprise level. It is not sufficient to simply create canonical data formats. The work of researching, defining and verifying the intent and meaning of data in systems, which forms the foundation for integration, needs to be managed and leveraged. It represents a considerable investment and a valuable and reusable resource for the organization. While different projects may work with different data, a centralized group, such as an integration competency center, can track and manage how the metadata is used across projects.
Move toward semantically rich metadata. The more meaning the metadata contains, the less work the programmers need to do. Semantically rich metadata enables electronic transactions to be implemented across systems without needing to add application or database code to ensure the integrity of the data. It is the key to enabling e-commerce faster and cheaper than ever before. The standards bodies, including the OMG and the W3C, are currently working on semantic metadata standards. The Semantic Web is an example of such an effort.
Metadata management is another old idea that is new again. In the very early days of distributed systems, organizations started their efforts by defining an enterprise data dictionary. Unfortunately, many of these efforts failed because they were too large in scope, and there was no underlying integration infrastructure to automatically convert data from one format to another. The data dictionaries gathered dust on the shelves until the efforts were dispersed.
Fast-forward to the present, when companies are creating metadata throughout the organization, but failing to manage it on an enterprise level. These efforts may succeed in the short run, but fail to meet future needs. Creating an enterprise information architecture and strategy, and managing aggregated enterprise data at an enterprise level, will ultimately provide those companies willing to make the investment with the largest long-term ROI and greatest agility for meeting new business requirements. While there is a large number of integration technologies and solutions in the market, the emerging EII tools provide an excellent foundation for data consolidation and aggregation, and providing accurate and timely enterprise information to whoever needs it, when they need it, and where they need it.
Sources:
http://www.qilinsoft.com.cn/library/White_Papers/Next_Generation_Business_Process_Integration.pdf.
http://tmitwww.tm.tue.nl/staff/wvdaalst/Publications/p222.pdf
Useful Links:
http://www.javaworld.com/javaworld/jw-08-2002/jw-0809-eai.html - This link provides some detailed information on data integration.
http://csdl.computer.org/comp/proceedings/hicss/2002/1435/09/14350290.pdf -This link defines a generic Integration Framework and describes the key steps in extending this framework to a logical and then physical architecture.
