So that’s it you have downloaded PowerBI Desktop and are now ready to start your first PowerBI report. Congratulations!
Like any architect you need to get the foundations of your PowerBI report right.
There is a lot of training material on the official PowerBI Training page to learn about PowerBI to get you started, however we’d like to give you some Key Best Practices to build PowerBI reports.
Rule #1: Import data with maximum level of detail
One day the CEO of my company asked me how a new product we just launched was doing. I extracted the Sales Worldwide vs. the Global Forecast we had, made a simple comparison and handed it over to him. He got his answer: the launch was doing below expectations, and then fired me with some more questions: “can I get the performance by market?” I went back to my desk and re-extracted the data with markets as a level of detail, re-did the analysis and handed it over to him. I thought that was the end, until one minute later he asked: “how about the performance by color? and I would also like to know which color is doing well in which market!”
I should have known better, and be prepared for these natural questions. Now imagine that instead of going back to my desk at each question and coming back to him, I had extracted all the data directly with maximum level of detail (by market and by color). I could have answered his questions straight away, and not only it would have saved both of us time, it would have had blown his mind!
Be smarter than me and make sure you always extract your data with maximum detail before you import it into PowerBI.
After all PowerBI is a true BI tool, you are not limited by the amount of data you can put in it so why limit yourself by importing data already aggregated? Import it with maximum detail (unaggregated) and let PowerBI aggregate it for you to let you explore all the levels of details you need.
So take the time to prepare your data to make sure you will import it with the maximum level of detail available to you, this will prove a valuable investment!
Rule #2: Embrace the Flat File Format!
PowerBI works best with “flat files” tables as an input. This can be destabilizing for most Excel Users who are comfortable with the “matrix” format in Excel. First let’s clarify what is a Matrix and a Flat File on a simple example of Sales by Product and by Date:
|Date 1||Date 2|
Flat File Format
|Product 1||Date 1||1000|
|Product 1||Date 2||1100|
|Product 2||Date 1||500|
|Product 2||Date 2||450|
Before uploading your Data in PowerBI, make sure that all your data is in Flat File Format. If you have it in Matrix format, you might need to convert it to Flat File before uploading it into PowerBI.
Why? Because you will see it’s much better to have “1 column per type information”. In the Matrix format above you have the same information (Sales) in several columns. The only difference between the columns is the date. With PowerBI it’s much more efficient to have everything in 1 column and if you need to access different dates you’ll use the famous CALCULATE + SUM formula (equivalent of SUMIF for our Excel fans)
Sales_ProductX_MonthX = CALCULATE ( SUM (Sales[Sales], Product = “Product X”, Date = Date X )
Whereas going from Flat File to Matrix format is easy with Excel (just do a Pivot Table from the Flat File), going from Matrix to Flat File is surprisingly not straightforward, and we will cover this topic in a dedicated post.
Rule #3: Separate Measures and Attributes
So now you have imported some flat files. You therefore have a bunch of tables, with a bunch of columns (fields) in them. It’s important to make the distinction between the three types of columns (fields) you can have: Keys, Measures and Attributes
- Keys (or ID): this is what identifies a record and make it unique. For example if you have a table with Sales by Product and Date, Product and Date are the Keys because the combination of a Product and a Date is what defines a unique record. Sales will be a measure.
- Measures (or KPI): a measure is essentially a field that can be “measured”, so usually with a numerical data type. For example Sales, Orders, Forecasts, Budget… would all be “measurable” fields. PowerBI usually automatically detect them and display they with a little Sigma sign next to their description
- Attributes: All other fields non measurable are called Attributes. For example it could be the Product Line, Size, Color, the Country or Geographical Area of a Store.
OK easy, so why am I bothering you with this? You will see that it’s best to keep your Measures and Attributes separated as much as possible. That means you should have only Key + Measures tables (KM), and Keys + Attributes (KA) tables:
- Key+Measures tables (KM tables): Do not import Attributes with Measures. We recommend that you avoid mixing them with attributes in the same table. Just prepare a table with the Keys identifying your record (Product, Date, Store code…for example) and the measures associated to this Key (Sales for example). Avoid polluting this table with attributes such as Product Line, Store Country etc… and import them separately
- Key+Attributes tables (KA tables or “Master Data”): We recommend that you store your Attributes separately in what we will call Master Data tables, and that you connect in the Data Model the Keys of the KM and KA tables together
Let’s take a simple example where we want to analyze our Sales and Stock by Product Line and Country:
Poor Data Model: Attributes mixed with Measures
So here you see that you have the Prod. Line and Country information in both Sales and Stock tables.
Nothing wrong at this point, but if you are going to build your model in that way, it is very important to be clear on what PowerBI can and can not do.
What PowerBI can do:
- breakdown your Sales by Prod. Line and Country using the Prod. Line and Country attributes from the Sales table
- breakdown your Stock by Prod. Line and Country using the Prod. Line and Country attributes from the Stock table
What PowerBI can NOT do:
- breakdown your Sales by Prod. Line and Country using the Prod. Line and Country attributes from the Stock table. If you try to do so the same value will be returned for all Prod. Lines or Countries because PowerBI will have no clue how to break it down
- breakdown your Stock by Prod. Line and County using the Prod. Line and Country attributes from the Sales table. Same here, the value returned is the sum
What does THAT mean? That means that as long as you want to work with either Sales or Stock information INDEPENDENTLY you are totally fine:
- if you want to look at Sales and break them down by Prod. Line and Country, use the Prod. Line and Country attributes from the Sales table
- vice versa if you want to look at Stock and break it down by Prod. Line and Country use the Prod. Line and Country attributes from the Stock table
But that’s quite limited don’t you think? What if you want to COMBINE Sales and Stock info by Prod. Line and Country, for example to calculate Stock Coverage by Prod. Line and Country with the below measure:
StockCoverage = SUM (Stock[Stock]) / SUM(Sales[Sales])
This measure will return a correct result if we do not try to break it down by Prod. Line or Country. PowerBI will just do the sum of Stock and Sales in both tables and divide both.
However if you try to break this measure down by Prod. Line or Country, PowerBI will only break the Stock by Prod. Line and the Sales by Country. When breaking down the Stock by Country and the Sales by Prod. Line he will use the total Stock and total Sales and this is totally wrong.
You have to be careful here, because PowerBI will return you something which could “look right” at first sight, but which is “actually wrong”:
So how do we get it right? By storing the attributes separately in a Master Data table and liking the Master Data to the Measures tables
Great Data Model: Attributes separated in Master Data tables
So we’ve separated our Attributes here, and we make a link in the Data Model between the Keys of both the Measures tables (Sales and Stock) and the Master Data tables (Product Master Data and Store Master Data)
Now thanks to the link we made between the KM and KA tables, PowerBI knows how to breakdown both the Sales and Stock by Prod. Lines and Country, using the attributes from the Master Data tables. You can now use your Stock Coverage Measure safely, and the below breakdown is correct.
By using this technique you will be able to use PowerBI not only to explore your Sales and Stock data INDEPENDENTLY but to actually CROSS information between the two. WOW!
Bonus rules: how to build your Master Data tables
Now you know that for making the most of PowerBI you are going to have to maintain some Master Data tables to store all your attributes. Here are a few rules for these:
- MasterData Rule #1: Have 1 Master Data table per Key in your KM tables.
- MasterData Rule #2: Have all the possible Keys in the Master Data. In our example above, that means that each single Product need to be in the Product Master Data and each single Store in the Store Master Data. If you miss some Products or some Stores, then PowerBI will not be able to calculate on them.
- MasterData Rule #3: No Duplicate in the Master Data tables. You need to have 1 unique record per Key. Otherwise the link will not work in your Data Model.
Conclusion of this long post: if you build your models using data with Maximum detail, import your data as Flat File, and Separate your Measures from your Attributes then link KM and KA tables together you will enjoy the full PowerBI experience and avoid many mistakes. We will come back in future posts on other side advantages of using these 3 rules.
And you, do you have any other Best Practice to share when building PowerBI models?