Tech

All you wanted to know about OLAP

CIOL Bureau

21 Dec 2006 00:00 IST

Updated On 21 Dec 2006 07:04 IST

New Update

By: Anadi Misra and Anubhav Verma

Advertisment

Relational databases have been functional as life-lines for information workers for decades now. They deliver results no doubt, but it is also a fact that it is not easy (if not tough) to query and analyze information stored in them, on the fly and with respect to multiple metrics. You would run queries for analysis that would measure one or more KPI (Key Performance Indicators) for your business and this would be against a particular metric. For example it would either be sales data for a period of time or in a region or for a product. How about analyzing the information of say sales figure for a product with respect to a region over a period of time, or say sales figures for the same product in a particular outlet of a particular region in a time period. Well as for just a relational design, you might have to make sure that your sales table has all these columns. And if they are not there, adding columns and values to them would be the only solution left.

It is places like these where using OLAP (On Line Analytical Processing) can give you all the information you want while keeping multiple variables in your queries. In the previous part of the series, we talked about how data warehouses can help information access and knowledge discovery, and subsequently your business analytics.

Advertisment

Applies To: CIOs, CTOs

USP: OLAP systems basics, benefits and how to go about implementation

Primary Link: www.olap.com

Google Keywords: hypercubing, FASMI, OLAP

In this part of the series, we will take you through the basic concepts related to OLAP, how to go about it, what do you need for such an implementation and a simple demo of how it can be done using the Business Intelligence Development Studio provided with MS SQL Server 2005.

What is OLAP?

OLAP is a technique that enables you to analyze data with respect to multiple metrics that are termed as 'Dimensions'. As it is a part of BI implementation you need not worry about where the data is coming from (source) or how is it presented (the format). In an OLAP setup, the entire data is categorized into two types: Dimensions and Measures. Measures represent the data that are measurable or countable in numeric terms. For example, the number of units sold, the inventory stock, number of employees, and gallons of oil are all examples of what can be included as a measure. Dimensions on the other hand are non-numeric quantities but carry relevance in that they form one of the aspects with respect to which other factors are to be measured. Examples are sales area or region, type of product etc.

Advertisment

Multiple dimensions are organized in an OLAP system to give what is called a 'Multidimensional Cube' or simply a 'Cube'. A cube is a data-structure that stores measures along predefined dimensions. The dimensions of a cube are further organized in hierarchies. For example, daily figures aggregate to monthly sales, monthly sales aggregate to give the annual sales revenue. In cubes, each of these measures (also called variables) have multiple dimensions (also called axes).

Decision makers can query OLAP databases using Cubes having each metric along its dimensions

For example, the sales revenue measure can have dimensions of salesperson, sales area, and outlet. Once defined, the cube pre-calculates and aggregates values for these measures at each hierarchy of these dimensions. And this pre-calculation is what makes OLAP quick. OLAP's speed is one of the major characteristics associated with such a system.

Advertisment

Taking a step further

Now that you know about OLAP, cubes, dimensions and measures let us look at some more jargon that form an integral part of OLAP implementations. We exemplified cubes with three dimensions only in the preceding section. Although a 'cube' has just three dimensions by geometrical definition, an OLAP Cube can have more than one dimension. How many it has, depends on the complexity of your table structure as well as your information workers' need. Such a cube is termed as a 'Hypercube'. OLAP systems have an OLAP server and OLAP client software.

An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multidimensional data structures. The server houses all the cubes. OLAP clients are applications that request multidimensional data from the OLAP server for manipulation, calculations or analysis.

Measures and Dimensions are not the only part of a cube or hypercube. The measures stored in a cube can be used as inputs in expressions for calculating values for further variables on the fly. These expressions are termed as 'Formulas' and the resultant variables are termed as 'Calculated Members'.

The most distinguishing characteristic for identifying a calculated member is that it is always a resultant of existing measures in a Cube. What all these facilitate in the end is termed as an Ad-hoc Analysis of data. Ad-hoc analysis allows users generate new queries without defining anything in advance. This is the most important resultant of an OLAP system, as ad-hoc queries do not exist in the purely relational data model.

Advertisment

Star or Snowflake Models

You do not specifically need a data warehouse for OLAP. Your existing relational models can also act as a source for cubes and hyper cubes. A Snow Flake schema is a special, de-normalized data model used by relational databases to provide a multidimensional data structure for OLAP applications.

Several dimension tables surround the fact table. A fact table is a table that record the metrics of a KPI and provides independent attributes using which different dimensions can be analyzed. This model is central to the implementation.

The Physical Model of our database will contain several tables, with fields connected with relationships

Advertisment

Our sample database

For our sample, we have a holistic multi-location multi-store supply chain. This lets us address several entities like clients, products, orders, outlets and time. A real-life deployment would be more complex, involving more parameters. In our database, we have tables for these entities: customer, order, outlet, product and sales. The order table has basic fields like order number, date the order was made, customer number and product code, and quantity. Then we have a product table with the product number, name and unit price. Likewise there are tables for customers and outlets.

First steps

Let's implement a cube in this database step-by-step. The first step is to get a database with tables and columns. These tables are a low-level form of relational databases. Our retail company wants to view monthly and weekly reports. Someone feeds in data into a spreadsheet as rows and columns.

These relational databases only provide limited capability. If you want the performance report for a particular time frame or for some specific customer, product and location then you have to pull in all the data and work on it all over again.

Advertisment

Creating these reports according to different requirements creates a lot of work and number of sheets increase with every additional relationship added. The structured tables from a database can be changed into a star schema which lets you interconnect the different fields in it. These tables are linked together using their inter-dependencies. For example, our order table is dependent upon customer and product tables for customer id and product id respectively. Now the sales table we created depends on the product, customer, outlet and order tables. In our case, the dimensions invariably are Customer, Product, Order and Outlet while the sole fact table is the Sales table.

Cubing

On top of the relational database we will build a cube for multi dimensional analyses. As a general procedure you can take time and product on two axes and as the third axis either you can put employee or outlet. This in turn provides the data based on performance of an outlet for products in any given time frame. If you choose to have the product performance over time for employee then instead of changing the whole table, only one coordinate has to be changed. These dimensions can have many different parameters and thousands of entries. In a cube, there are three faces and each of these faces depicts a relation between two of its dimensions.

Putting up a fourth dimension would let you make a hypercube for advanced relationships and queries. Instead of creating a domain restricted to months or years, it is more useful to create a larger 'time' domain. This domain will invariably include all time domains for which you need to track the reports. Further, the hierarchy system is to be put in place for every product in the section it comes under. The locations can be categorized for region, zone, country, and so on as this helps in finding the relevant data when queried. The only problem faced while defining these hierarchies are dependencies between them. These dependencies can also cascade. As the data grows, it becomes easier for reports to have hierarchical patterns. This would let you look at the whole picture instead of in small subsets.

The aggregate data from these hierarchical systems can be easily obtained as the data is stored based on data hierarchy. The attributes in these dimensions are to be defined like the product size, weight, etc. These attributes help in narrowing the approach and making cube more versatile, by adding as much content as it can. There are some calculations that would still exist and will require inputs from existing measures. For example, over here you might need calculating product revenues depending upon taxes and VAT. In such cases you need to correlate two dimensions for calculative analysis depending on a derived measure. With all the structures defined you can go ahead with deployment of the cube using the specific tool that your database or warehouse vendor provides.

You can also use temporal calculations for some special cases. Using temporal calculations, cases can be formulated for checking out on factors such as what would be the effect on sales given one of the measures carried a different value than what you have currently. You will also be able to compare two particular entities using some calculations. And finally using conditional soothing i.e. predicting trends based on assumed values of measures, and market situations, you will be able to get the market trend for coming months.

Building a cube for OLAP is not an isolated exercise as we have outlined just the concepts behind cubing. In the real world, you would be first running processes to cleanup and get all the data into one universal format, followed by defining a model that better suits the OLAP paradigm in case your existing model does not suffice. Whatever be the processes involved, if carried out meticulously, it is one exercise that would not only change but also ease the way information workers look and execute queries for information retrieval or even analysis.

Source: PCQuest

tech-news