CIF

Source :  Information Management Magazine, December 1999 Claudia Imhoff

The Corporate Information Factory (CIF) is a logical architecture whose purpose is to deliver business intelligence and business management capabilities driven by data provided from business operations. The CIF has proven to be a stable and enduring technical architecture for any size enterprise desiring to build strategic and tactical decision support systems (DSSs). The CIF consists of producers of data and consumers of information. Figure 1 shows all the components found within the Corporate Information Factory architecture.

Business Operations:

Are the family of systems ( e.g Operational, reporting, ERP) from which the rest of the CIF inherits its characteristics. These are the core operational systems that run the day-to-day business processes and that are accessed usually through Application Program Interfaces (APIs). The success or failure of CIF depends heavily on these operational systems to supply the richness in data needed to understand customers and to provide the history needed to judge the health of business.

Business Intelligence:

BI consists of the ability to analyze data and information used in strategic decision support. These systems are major consumers of data and are composed of various BI applications as well as the repository of historical data from which these applications are created. The main components of BI are the data warehouse, data marts, data delivery and decision support interfaces ( DSI), and the processes for “getting data in” and “getting information out.” The data marts, Exploration warehouse and Data mining warehouse are the subsets  or derived collections of the data found in the  data warehouse, formatted for their particular function or department.

Business Management:

Business Management enables  corporation to act in a tactical fashion upon the intelligence obtained from the strategic decision support systems. Operational Data Store (ODS) is considered a major consumer of operational data. The sources are same one we use for BI, except that in this case the data form the operational systems updates the ODS. The old data is overwritten by the new data and little or or no history is retained, the history is stored in data warehouse. Thus, the ODS is an integrated, cleaned, dynamic(or updatable), and the current set of data for these tactical decision making activities.The ODS is accessible from anywhere in the organization and should not support any single operational application.

Inmon and Kimball are the cofounders of Data warehousing. The two leading approaches of data warehouse architecture are Inmon’s Corporate Information Factory (CIF) and Kimball’s Data Warehouse Bus (BUS). This paper briefly discusses the differences and similarities of these approaches.

W.H Inmon’s Approach

According to Bill Inmon who is considered as the father of Data Warehouse, “A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management’s decisions (Inmon, 2001).” Contrary to an operational system where data is stored by operational applications, in data warehouse, data is stored by business subjects. The data in a data warehouse usually comes from diverse data sources. Using ETL process the inconsistencies in the source data is removed and data elements are standardized before storing the data into a data warehouse. The data in a data warehouse is time variant in nature as it contains historical data. Inmon proposes a top-down model approach to create a centralized Enterprise Data Warehouse using traditional database modeling techniques (ER Model), where the data is stored in 3NF.  For the development of this large data warehouse, Inmon suggests a spiral development iterative method, where small parts of relational database are added to the data warehouse on iteration. This approach permits granularity of data and provides maximum flexibility to create new optimized dimensional data marts according to the current requirements of an enterprise. The data warehouse acts as data source for the new data marts (Jukic, 2006).

R.Kimball’s Approach

Kimball’s Data Warehousing architecture, known as, Data Warehouse Bus (BUS), uses bottom-up technique to create dimensional data marts for specific business process. Dimensional data marts are created using Dimensional Data Modeling, a modeling technique which violates normalization rules and is unique to data warehousing.  The data to the data mart is populated from a staging area, where data is at the lowest grain to populate tables (Ponniah, 2001, p.137). The integration of data marts to create the Data warehouse is achieved by the data warehouse bus in the BUS architecture. Dimensional modeling focuses on ease of end user accessibility and provides a high level of performance to the data warehouse. A popular design used by Kimball for dimensional modeling is the Star Schema, comprising fact tables and dimension tables (Kimball, 2008).  The fact table contains very less number of rows and represents the factual or additive values and the dimension table holds the descriptive data for the dimensions (Star Schemas,”IBM”, 2008). Kimball’s suggests the concept of ‘Conformed Dimensions,’ dimensions which are shared between fact tables, to deal data replication.  Kimball recommends a Four-step Dimensional design process for the development of data warehouse with emphasis to keep the granularity of data to the lowest level possible.

Key Differences in Approach

Methods proposed by Inmon and Kimball have its own differences in design and architecture. In Kimball’s vision a Data Warehouse is union of data marts with conformed dimensions whereas in Inmon’s view a Data warehouse is normalized enterprise level data storage. Inmon’s used top-down approach to create a normalized enterprise level data warehouse while Kimball used bottom-up approach to create departmental data marts on selected business process. Inmon focuses on ER modeling technique and the data loaded to data warehouse is in 3NF. Kimball focuses on multidimensional database design and uses star schema to create denormalized dimensional model.

Key Similarities/Agreements in Approach

Both Inmon’s and Kimball’s approaches give importance to time attribute of the data. In Inmon’s approach the time attribute related to a database may be spread in different normalized tables where as in Kimmon’s approach of dimensional modeling time attributes are grouped together as time dimension (Beslin,”tdwi”, 2008). Both approaches use ETL process to develop the data warehouse. The extracted data from different data sources has to be integrated, optimized and transformed before loading to the database. Both Kimball and Inmon share the view that stand-alone data marts are of marginal use for enterprise wide data warehouse.

Which Approach is a Better Design

The two leading methodologies for designing the data warehouse are Inmon’s Corporate Information Factory (CIF) and Kimball’s Data Warehouse Bus (BUS). Choosing a data warehousing approach depends on many factors like user requirements, data sources, level of granularity required, resources u have to build the data warehouse and the methods used to analyze the data. For data warehouse built from ERP systems, like Oracle eBusiness Suite, Kimball’s approach to build data marts is a more suitable approach (Kiriti, 2007). Inmon’s approach is suited if u need an enterprise level data warehouse where transactions are modeled to 3NF,like the pre-built data warehouse solutions for industries like telecom from Oracle (Kiriti, 2007). I think best approach will be a combination of methodologies and architecture, a hybrid approach, with extended customization to meet the scope of the project.

A database management system , DBMS, is a computer software designed to assist in maintaing and utilizing large collections of data. DBMSes may use any of a variety of database models, such as the network model or relational model.

Advantages of DBMS

* Data independence – DBMS provides an abstract view of data that hides the details of data representation and storage.

* Efficient data access – DBMS utilizes a variety of techniques to store and retrieve data efficiently.

* Data integrity & security – DBMS can enforce integrity constraints and access controls.

* Data administration – When several users share the data, centralizing the administration of data can offer significant improvements.

* Concurrent access, crash recovery- DBMS provides concurrent access and also  protects the users from the effects of  system failures

* Reduced application development time – DBMS applications are likely to be more robust as many important tasks are handled by DBMS itself.

*  So why not use them always?

  • Can be expensive, complicated to set up and maintain
  • DBMS is optimized software for certain types of workloads, its performance may not be adequate for all applications
  • An application may need to manipulate data in ways not supported by query language
  • This cost & complexity must be offset by need

1:M (One to Many)Relationship

Should be the norm in any relational database design. A one-to-many relationship is the most common type of relationship. In this type of relationship, a row in table A can have many matching rows in table B, but a row in table B can have only one matching row in table A.For example, the Painters and Painting tables have a one-to-many relationship: each painter produces many paintings, but each paintings comes from only one painter.

1:1 (One to One)Relationship

One entity can be related to only one other entity, and vice versa. Sometimes means that entity components were not defined properly, Could indicate that two entities actually belong in the same table. In a one-to-one relationship, a row in table A can have no more than one matching row in table B, and vice versa. A one-to-one relationship is created if both of the related columns are primary keys or have unique constraints.

This type of relationship is not common because most information related in this way would be all in one table. You might use a one-to-one relationship to:

  • Divide a table with many columns.
  • Isolate part of a table for security reasons.
  • Store data that is short-lived and could be easily deleted by simply deleting the table.
  • Store information that applies only to a subset of the main table.

M:N (Many to Many)Relationship

Can’t be accurately implemented in relational database
Result every row in table 1 links to every row in table 2 – typically an absurdity

Solution
Add a transaction (activity) table
Produce multiple 1:M relationships

For example, the Authors table and the Titles table have a many-to-many relationship that is defined by a one-to-many relationship from each of these tables to the TitleAuthors table. The primary key of the TitleAuthors table is the combination of the au_id column (the authors table’s primary key) and the title_id column (the Titles table’s primary key).

There are several common types of databases; each type of database has its own data model (how the data is structured). They include;  Flat Model, Hierarchical Model, Network Model, Relational Model ,Dimensional Model and Object Oriented Model.

  1. Flat Model
    • The flat model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password that might be used as a part of a system security database.
  2. Hierarchical Model
    • Database resembles a tree like structure.
    • Implies a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.
    • To create links between the records, the hierarchical model uses Parent Child Relationships.
    • This structure allows one 1:N relationship between two types of data.
    • One limitation of the hierarchical model is its inability to efficiently represent redundancy in data.
    • E.g. : Used in IBM’s IMS  DBMS
  3. Network Model
    • The network model permitted the modeling of many-to-many relationships in data
    • The network model organizes data using two fundamental constructs, called records and sets. Records contain fields . Sets  define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.
    • The network model is able to represent redundancy in data more efficiently than in the hierarchical model.
    • E.g. : Used in IDS and IDMS
  4. Relational Model
    • The basic data structure of the relational model is the table, where information about a particular entity (say, an employee) is represented in columns and rows (also called tuples).
    • The columns enumerate the various attributes of the entity (the employee’s name, address or phone number, for example), and a row is an actual instance of the entity (a specific employee) that is represented by the relation.
    • All relations (and, thus, tables) in a relational database have to adhere to some basic rules to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there can’t be identical tuples or rows in a table. And third, each tuple will contain a single value for each of its attributes.
    • E.g. : IBM’s DB2, Informix, Oracle, Sybase, Microsoft’s Access, FoxBase, Paradox, Tandem ..
  5. Object Oriented Model
    • Recent development, Stores data and methods to act on the data
    • E.g. : Objectstore and Versant
  6. Dimensional Model
    • The dimensional model is a specialized adaptation of the relational model used to represent data in data warehouses in a way that data can be easily summarized using OLAP queries.
    • In the dimensional model, a database consists of a single large table of facts that are described using dimensions and measures. A dimension provides the context of a fact and is used in queries to group related facts together.
    • The dimensional model is often implemented on top of the relational model using a star schema, consisting of one table containing the facts and surrounding tables containing the dimensions.

Some definitions of database –

“An organized collection of logically related data” McFadden – 1999

“A database is a self describing collection of integrated records” Kroenke 2002

Database is a shared organized collection of logically related persistent data.

A database defines a structure for storing information. Databases are typically organized into tables, which are collections of related items. You can think of a table as a grid of columns and rows.

Welcome to my blog “mydbaworld”. In this blog am planning to post the basic concepts and terminologies related to database and data warehouse. I also like to go through some programming languages and interesting topics realted to databases. :)… Stay Tuned…