Data Warehouse


CIF

Source :  Information Management Magazine, December 1999 Claudia Imhoff

The Corporate Information Factory (CIF) is a logical architecture whose purpose is to deliver business intelligence and business management capabilities driven by data provided from business operations. The CIF has proven to be a stable and enduring technical architecture for any size enterprise desiring to build strategic and tactical decision support systems (DSSs). The CIF consists of producers of data and consumers of information. Figure 1 shows all the components found within the Corporate Information Factory architecture.

Business Operations:

Are the family of systems ( e.g Operational, reporting, ERP) from which the rest of the CIF inherits its characteristics. These are the core operational systems that run the day-to-day business processes and that are accessed usually through Application Program Interfaces (APIs). The success or failure of CIF depends heavily on these operational systems to supply the richness in data needed to understand customers and to provide the history needed to judge the health of business.

Business Intelligence:

BI consists of the ability to analyze data and information used in strategic decision support. These systems are major consumers of data and are composed of various BI applications as well as the repository of historical data from which these applications are created. The main components of BI are the data warehouse, data marts, data delivery and decision support interfaces ( DSI), and the processes for “getting data in” and “getting information out.” The data marts, Exploration warehouse and Data mining warehouse are the subsets  or derived collections of the data found in the  data warehouse, formatted for their particular function or department.

Business Management:

Business Management enables  corporation to act in a tactical fashion upon the intelligence obtained from the strategic decision support systems. Operational Data Store (ODS) is considered a major consumer of operational data. The sources are same one we use for BI, except that in this case the data form the operational systems updates the ODS. The old data is overwritten by the new data and little or or no history is retained, the history is stored in data warehouse. Thus, the ODS is an integrated, cleaned, dynamic(or updatable), and the current set of data for these tactical decision making activities.The ODS is accessible from anywhere in the organization and should not support any single operational application.

Inmon and Kimball are the cofounders of Data warehousing. The two leading approaches of data warehouse architecture are Inmon’s Corporate Information Factory (CIF) and Kimball’s Data Warehouse Bus (BUS). This paper briefly discusses the differences and similarities of these approaches.

W.H Inmon’s Approach

According to Bill Inmon who is considered as the father of Data Warehouse, “A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management’s decisions (Inmon, 2001).” Contrary to an operational system where data is stored by operational applications, in data warehouse, data is stored by business subjects. The data in a data warehouse usually comes from diverse data sources. Using ETL process the inconsistencies in the source data is removed and data elements are standardized before storing the data into a data warehouse. The data in a data warehouse is time variant in nature as it contains historical data. Inmon proposes a top-down model approach to create a centralized Enterprise Data Warehouse using traditional database modeling techniques (ER Model), where the data is stored in 3NF.  For the development of this large data warehouse, Inmon suggests a spiral development iterative method, where small parts of relational database are added to the data warehouse on iteration. This approach permits granularity of data and provides maximum flexibility to create new optimized dimensional data marts according to the current requirements of an enterprise. The data warehouse acts as data source for the new data marts (Jukic, 2006).

R.Kimball’s Approach

Kimball’s Data Warehousing architecture, known as, Data Warehouse Bus (BUS), uses bottom-up technique to create dimensional data marts for specific business process. Dimensional data marts are created using Dimensional Data Modeling, a modeling technique which violates normalization rules and is unique to data warehousing.  The data to the data mart is populated from a staging area, where data is at the lowest grain to populate tables (Ponniah, 2001, p.137). The integration of data marts to create the Data warehouse is achieved by the data warehouse bus in the BUS architecture. Dimensional modeling focuses on ease of end user accessibility and provides a high level of performance to the data warehouse. A popular design used by Kimball for dimensional modeling is the Star Schema, comprising fact tables and dimension tables (Kimball, 2008).  The fact table contains very less number of rows and represents the factual or additive values and the dimension table holds the descriptive data for the dimensions (Star Schemas,”IBM”, 2008). Kimball’s suggests the concept of ‘Conformed Dimensions,’ dimensions which are shared between fact tables, to deal data replication.  Kimball recommends a Four-step Dimensional design process for the development of data warehouse with emphasis to keep the granularity of data to the lowest level possible.

Key Differences in Approach

Methods proposed by Inmon and Kimball have its own differences in design and architecture. In Kimball’s vision a Data Warehouse is union of data marts with conformed dimensions whereas in Inmon’s view a Data warehouse is normalized enterprise level data storage. Inmon’s used top-down approach to create a normalized enterprise level data warehouse while Kimball used bottom-up approach to create departmental data marts on selected business process. Inmon focuses on ER modeling technique and the data loaded to data warehouse is in 3NF. Kimball focuses on multidimensional database design and uses star schema to create denormalized dimensional model.

Key Similarities/Agreements in Approach

Both Inmon’s and Kimball’s approaches give importance to time attribute of the data. In Inmon’s approach the time attribute related to a database may be spread in different normalized tables where as in Kimmon’s approach of dimensional modeling time attributes are grouped together as time dimension (Beslin,”tdwi”, 2008). Both approaches use ETL process to develop the data warehouse. The extracted data from different data sources has to be integrated, optimized and transformed before loading to the database. Both Kimball and Inmon share the view that stand-alone data marts are of marginal use for enterprise wide data warehouse.

Which Approach is a Better Design

The two leading methodologies for designing the data warehouse are Inmon’s Corporate Information Factory (CIF) and Kimball’s Data Warehouse Bus (BUS). Choosing a data warehousing approach depends on many factors like user requirements, data sources, level of granularity required, resources u have to build the data warehouse and the methods used to analyze the data. For data warehouse built from ERP systems, like Oracle eBusiness Suite, Kimball’s approach to build data marts is a more suitable approach (Kiriti, 2007). Inmon’s approach is suited if u need an enterprise level data warehouse where transactions are modeled to 3NF,like the pre-built data warehouse solutions for industries like telecom from Oracle (Kiriti, 2007). I think best approach will be a combination of methodologies and architecture, a hybrid approach, with extended customization to meet the scope of the project.