Performance in Reports (Part II)

Hello there!

Today I’m going to talk about the second part of the series “Performance in Reports”. If you would like to read the first part just click on this link “Report Performance (Part I)“.

I believe that nowadays I always see the query that Microstrategy creates when creating a new report. It recommend you to do so.

Sometimes you see that the query delays so much, and you don’t know what is going on.

Let’s believe that you have created all indexes needed to perform that query and still you want to improve the performance. Have you already heard about aggregated fact table?

Let’s analyse this simple database below:

pic1

As you can see, we have 2 hierarchies and 1 fact table here. Inside hierarchy Employee we have two dimension tables: employee and language. Two tables in geography hierarchy: country and continent. The fact Salary was created using the lowest level of the hierarchies: employee_id and country_id. That facts are commonly called Base Facts, just because it uses the base (lowest level) of your hierarchies.

So, if you want to display language_DESC and continent_DESC in a report? Microstrategy needs a few more steps to accomplish that, as you can see below:

select language_DESC, continent_DESC from Salary_fact a1
inner join employee a2 on (a1.employee_ID = a2.employee_ID)
inner join language a3 on (a1.language_ID = a3.language_ID)
inner join country a4 on (country_ID = a1.country_ID)
inner join continent a5 on (a5.continent_ID = a1.continent_ID)

It needs to go to Employee and Country tables because Salary doesn’t have a bridge to language and continent, only to employee and country.

Now, look at this datagram below:

pic2

I have added another fact called Salary_Fact_Agg that contains language_ID and continent_ID. Now, Microstrategy reduce the steps to get those columns.

select language_DESC, continent_DESC from Salary_fact_Agg a1
inner join language a2 on (a1.language_ID = a2.language_ID)
inner join continent a3 on (a1.continent_ID = a3.continent_ID)

That will make your query run faster than the first one.If you reduce the amount of joins that is required to get the data that you desired, you will increase performance of your report. So, if you change the granularity of your facts aggregating them in a higher level, you create an aggregated fact table.

 PROS

  1. Performance – The amount of joins is reduced and your query is executed faster.

CONS

  1. Database size – Creating more facts will increase the size of your database;
  2. ETL complexity – ETL needs to maintain another fact;

An advice:

Only create aggregated fact table when you notice that you have a lot of queries that are using a higher level dimension than the base facts that you have and the performance isn’t that good. Don’t agg facts at the beginning of your data warehouse creation. You have to test the base facts first and see if it’s really required an aggregate fact table to resolve performance problems.

 

Hope it helps.

God bless you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s