Time Series Forecasting: Don’t forget Seasonality!

Why is Seasonality important

Seasonality is in everything we do, we even considering unconsciously, for example when we leave for work early to beat morning traffic, or when we book our Summer holidays early to avoid peak prices.

If you are running a Winter attire business, you might only sell a few pieces during summer but your sales might boom when the weather gets colder, therefore requiring additional resources such as Inventory, Staff Availability… etc.

Seasonality is therefore a very important component of Planning and especially in Forecasting.

How to calculate Seasonality

Seasonality is usually calculated using the Time Series Decomposition Method.

This method assumes the Signal can be broken down in 3  components:

  • The Trend: is your signal flat? increasing or decreasing?
  • The Seasonality: does your signal show peaks and drops at specific Time periods (for example peak of Sales for Christmas in December)
  • The Noise: this is the part of the signal that can not be explained. If the Signal is well decomposed, the Noise component should be a process of mean = 0
  • Sometimes a Cycle component is also added. We will assume there is no Cycle going forward

 Multiplicative Time Series Decomposition

Signal(t) = Trend(t) * Seasonality(t) * Noise (t)

Additive Time Series Decomposition

Signal(t) = Trend(t) *+Seasonality(t) + Noise (t)

At AnalystMaster we generally prefer to consider Times Series as Multiplicative (and that is what we will use going forward). In that case, Seasonality for each time period could be seen as a weight and the Sum of Seasonality for all components is equal to 1 or 100%.

 Calculate Seasonality in Excel

You can also use the attached Excel Model to calculate your Monthly Seasonality in Excel.

This model works with 24 months of Historical data

  1. It first evaluates the Trend using a Centred Moving Average (only possible from time bucket 6 to 18), and extrapolate this trend linearly for time buckets 1 to 24.
  2. Then the trend is removed (we divide the original signal by it as we consider the Time Series as Multiplicative), in order to leave only the Seasonality and Noise as the only components of the Time Serie.
  3. Finally, Seasonality coefficients from both time periods are averaged

Calculate Seasonality with R

R is a great tool to calculate the Seasonality of a Time Series. You can use the following piece of code to read a monthly time serie from a data.csv file and return the Seasonality coefficients.

R will return you the Seasonal, Trend and Random components from the Multiplicative Time Series decomposition.

library(gdata)
> library(forecast)
> library(tseries)
> mydata = read.csv(“data.csv”)
> signal<-ts(signal ,start = c(2014,1), end = c(2017,6), frequency = 12)
> seascoef<-decompose(signal, type=”multiplicative”)

season_r

 The next levels

Including  Seasonality in your planning will dramatically improve your planning accuracy. However you might find out that only considering the Seasonality at Monthly level is not good enough and that you need to also include seasonality at a more detailed level to maximize your planning accuracy.

Seasonality exists also at a more detailed level: weekly, daily, hourly… for example

  • Weekly seasonality: if you are a Retailer: although December is a peak month, not all the December weeks are equal. The seasonality is much stronger on the last week before Christmas and failing to anticipate this can result in shortages of capacity
  • Daily seasonality: if you own a shop or a restaurant, Seasonality is usually stronger on some days of the week: Saturday for example
  • Hourly seasonality: peak times also vary hour by hour. If you are running a call centre you need to plan your capacity accordingly. Or if you are going shopping at Harrods’s, you might want to go when the store is less crowded according to the chart below which is available on Google

 

harrods

Sometimes Seasonality can even be more complex as it does not necessarily follow a regular Month-Week-Day-Hour pattern.

A well-known example by Retailers is Chinese New Year, which follows the Chinese Moon calendar and will therefore fall on a different week and month every year. Easter or Ramadan are also moving holidays and seasonality can be hard to evaluate.

chinese

How far should I go?

Seasonality is important and should be included in your forecasting activities, however you need to keep it at a level which is both relevant for your activity and simple enough to implement. For example if you plan Production at a Monthly level, keep your signal at a Monthly level and evaluate Seasonality at this level too. If you are planning a Warehouse Capacity at Weekly level, then get your Signal and Seasonality at Weekly level, and if you are planning how to staff a Call Centre or a Shop on an Hourly basis, then plan and measure Seasonality hour by hour. But do not introduce unnecessary complexity in getting a signal at hour level if you only expect a monthly plan

AnalystMaster

Football Predictions for Premier League, La Liga and Serie A

Let’s kick off 2018 with our latest Football Predictions!

Premier League Week 22

On the contrary to Bing, we prefer to bet on Brighton vs. Bournemouth in a game that we expect to be close.

After 3 consecutive games without winning, we believe than Manchester United will continue to have a difficult time at Everton in a game too close to call.

Finally in the Arsenal – Chelsea derby we believe that the Gunners will make good use of their Home Advantage and we give them slightly favourable odds.

Premier League Week 22 AnalystMaster Bing
Home Away H D A H D A
Brighton Bournemouth 45% 28% 27% 22% 28% 50%
Burnley Liverpool 32% 25% 43% 29% 21% 50%
Leicester Huddersfield 52% 24% 24% 58% 28% 14%
Stoke Newcastle 40% 25% 35% 46% 36% 18%
Everton Man United 37% 25% 39% 12% 25% 63%
Southampton Crystal Palace 45% 28% 28% 58% 28% 14%
West Ham West Brom 39% 28% 33% 40% 27% 33%
Swansea Tottenham 21% 24% 54% 17% 16% 67%
Man City Watford 66% 18% 16% 89% 5% 6%
Arsenal Chelsea 42% 25% 33% 32% 25% 43%

La Liga Week 18

Despite benefiting from the Home Advantage, we believe that Valencia will be challenged by Girona, with a win % of 49%, much lower than Bing’s 81%.

We agree with Bing that the Leganes-Socidedad game is too close to call, with each side having equal winning probabilities.

La Liga Week 18 AnalystMaster Bing
Home Away H D A H D A
Ath Madrid Getafe 50% 29% 21% 58% 28% 14%
Valencia Girona 49% 24% 27% 81% 15% 4%
Las Palmas Eibar 43% 24% 34% 45% 23% 32%
Sevilla Betis 64% 21% 14% 89% 5% 6%
Leganes Real Sociedad 36% 27% 36% 25% 39% 36%
Barcelona Levante 72% 18% 10% 89% 5% 6%
Ath Bilbao Alaves 46% 30% 24% 58% 28% 14%
Vilarreal La Coruna 63% 21% 16% 89% 5% 6%
Celta Vigo Real Madrid 25% 23% 51% 12% 25% 63%
Malaga Espanyol 44% 25% 31% 64% 20% 16%

Serie A Week 20

We have some differences here with Bing, as we predict the games Chievo vs. Udinese and Fiorentina vs. Inter Milan too close too call

Genoa-Sassuolo is also expected to be a very close game, with the home side having slightly favourable odds.

As usual our odds are lower than Bing for the Napoli, AC Milan, Roma and Juventus games, although we both agree that a win from these top teams is the most likely outcome.

Serie A Week 20 AnalystMaster Bing
Home Away H D A H D A
Chievo Udinese 36% 25% 39% 56% 28% 16%
Fiorentina Inter 37% 26% 37% 8% 16% 76%
Torino Bologna 42% 26% 32% 56% 28% 16%
Benevento Sampdoria 29% 23% 48% 17% 16% 67%
Genoa Sassuolo 40% 25% 35% 25% 39% 36%
Spal Lazio 27% 23% 50% 17% 16% 67%
Napoli Verona 67% 19% 14% 89% 5% 6%
AC Milan Crotone 59% 23% 18% 89% 5% 6%
Roma Atalanta 52% 24% 24% 89% 5% 6%
Cagliari Juventus 24% 21% 56% 8% 16% 76%

Happy New Year 2018 everyone!!

AnalystMaster

Year End Football Predictions

 

Premier League Week 21

The notable difference with Bing Predicts is on the Watford-Swansea and Huddersfield-Burnley games that we predict too close to call, whereas Bing seems to see a pretty clear victory of the Home side for both games.

Bing Predictions for the last 3 games on Dec 31st are not available at this time.

Premier League Week 21 AnalystMaster Bing
Home Away H D A H D A
Chelsea Stoke 64% 20% 16% 81% 15% 4%
Bournemouth Everton 45% 26% 29% 55% 31% 14%
Liverpool Leicester 55% 24% 22% 45% 25% 30%
Watford Swansea 37% 26% 38% 52% 25% 23%
Huddersfield Burnley 37% 29% 34% 46% 36% 18%
Newcastle Brighton 39% 30% 30% 55% 31% 14%
Man United Southampton 64% 23% 13% 89% 5% 6%
Crystal Palace Man City 18% 21% 61% N/A N/A N/A
West Brom Arsenal 35% 26% 39% N/A N/A N/A
Tottenham West Ham 64% 21% 15% N/A N/A N/A

Serie A Week 19

Overall our predictions are similar to Bing, with the notable exception of the Bologna-Udinese game which we think will be very close whereas Bing foresees an easy Bologna victory. We also believe that the game between Inter and Lazio will be very challenging to predict as well, Inter only having a 43% victory odd vs. 55% for Bing.

As usual we believe the 89% that Bing predicts for Sampdoria, Atalanta and Roma wins is too high and our victory probability is in the 60s %

Serie A Week 19 AnalystMaster Bing
Home Away H D A H D A
Crotone Napoli 17% 22% 61% 17% 16% 67%
Fiorentina Milan 47% 23% 30% 55% 31% 14%
Bologna Udinese 36% 25% 40% 81% 15% 4%
Benevento Chievo 28% 25% 48% 12% 25% 63%
Sampdoria SPAL 63% 21% 16% 89% 5% 6%
Atalanta Cagliari 60% 23% 17% 89% 5% 6%
Roma Sassuolo 60% 22% 18% 89% 5% 6%
Torino Genoa 45% 27% 28% 42% 28% 30%
Inter Lazio 43% 22% 35% 55% 31% 14%
Verona Juventus 17% 20% 64% 17% 16% 67%

We hope you all enjoy these last 2017 games, we will come back strong in 2018 with more Football articles and predictions!

Happy New Year everyone!

AnalystMaster

AnalystMaster 1-0 BingPredicts and Boxing Day Predictions

BingPredicts challenge results

Last week we challenged BingPredicts over their Football Predictions over Premier League, La Liga, Ligue 1 and Serie A predictions.

AnalystMaster wins the challenge with 49% of correct predictions vs. 44% for BingPredicts.

In Particular AnalystMaster correctly predicted Barcelona’s win over Real Madrid in El Clasico, or better anticipated the difficulties of Man United at Leicester

Boxing Day Predictions

You will find below our predictions and the ones from BingPredicts for Boxing Day.

They are very similar, we only disagree on the Watford-Leicester game where we predict a tough game with a Leicester win being more likely, whereas Bing bets on a Watford victory.

We also foresee Arsenal having a tough time at Crystal Palace in the last game too close to call, where Bing sees an easy Arsenal win.

Happy Boxing day everybody!

AnalystMaster

Premier League Week 20 AnalystMaster Bing
Home Away H D A H D A
Tottenham Southampton 60% 24% 16% 56% 28% 16%
Man United Burnley 62% 24% 15% 89% 5% 6%
Watford Leicester 32% 23% 45% 56% 28% 16%
Chelsea Brighton 59% 25% 16% 50% 42% 8%
West Brom Everton 44% 27% 30% 64%
Bournemouth West Ham 46% 24% 30% 48% 24% 28%
Huddersfield Stoke 48% 26% 26% 64%
Liverpool Swansea 62% 24% 14% 52% 28% 30%
Newcastle Man City 16% 22% 62% 17% 16% 67%
Crystal Palace Arsenal 36% 25% 39% 17% 16% 67%

Football Predictions: AnalystMaster vs. BingPredicts

Challenging Bing Predicts on Football Predictions

Bing Predicts prides itself at being great in forecasting the outcome of some events. A quick look at its wikipedia page shows that it predicted the winners from American Idol or The Voice with more than 80% accuracy, US elections with more than 90% accuracy and that they did a perfect 15 out of 15 predictions in the 2014 World Cup! Well done Bing Predicts!

At AnalystMaster we also have our Predicting algorithms. We decided to challenge Bing Predicts on the next European Football games by comparing our Predictions with theirs. Although our predictions match on most games, there are few significant differences including “El Clasico” between Real Madrid and Barcelona, Leicester vs. Manchester United or the next Bayern Munich and Juventus games.

See you next week for analysing the results!

AnalystMaster

Premier League Week 19

The first thing that strikes when looking at Bing Predicts forecasts is that it sometimes gives very important winning probabilities.

The Leicester vs. Man United game is a good example. Bing Predicts foresees an easy Man United victory (75%) whereas we think Jose Mouriho’s team will face a much tougher opposition from a team who just took Pep Guardiola’s Man City to penalty kicks in League Cup this week and will be benefiting from the Home Advantage.

Premier League Week 19 AnalystMaster Bing
Home Away H D A H D A
Arsenal Liverpool 48% 21% 31% 58% 28% 14%
Everton Chelsea 32% 24% 43% 17% 16% 67%
Stoke West Brom 33% 28% 39% 48%
Southampton Huddersfield 50% 25% 25% 42%
Swansea Crystal Palace 33% 30% 37% 34% 31% 35%
Man City Bournemouth 66% 19% 14% 89% 5% 6%
West Ham Newcastle 45% 27% 28% 55% 31% 14%
Brighton Watford 33% 26% 41% 29% 28% 43%
Burnley Tottenham 35% 28% 37% 8% 16% 76%
Leicester Man United 36% 26% 39% 6% 19% 75%

La Liga Week 17

Our algorithm disagrees with Bing Predicts regarding “El Clasico”. Bing Predicts bets on an clear victory from Real Madrid (56%), probably giving a lot of weight to the Home Advantage.

This could be surprising to who followed the two teams this year: Real Madrid has been struggling this year and is currently 4th of La Liga so far with 31 pts and losing to Tottenham in Champions League, whereas Barcelona has been outrageously dominating the championship with 42 pts and no defeat so far.

Our algorithm on the other side predicts that this game will be too close to call, giving Barcelona a narrow edge at 39% vs. 37% for a Read Madrid win.

La Liga Week 17 AnalystMaster Bing
Home Away H D A H D A
Levante Leganes 35% 28% 37%
Getafe Las Palmas 58% 24% 18% 81% 15% 4%
Real Sociedad Sevilla 50% 23% 27% 29% 28% 43%
Eibar Girona 38% 25% 37% 32% 35% 33%
Alaves Malaga 46% 26% 28% 33% 38% 29%
Betis Ath Bilbao 48% 23% 29% 34% 30% 36%
Espanyol Ath Madrid 28% 28% 45% 29% 28% 43%
Real Madrid Barcelona 37% 24% 39% 56% 28% 16%
Valencia Villarreal 62% 22% 17% 81%
La Coruna Celta Vigo 32% 24% 44% 29% 28% 43%

Serie A Week 18

Again in Serie A Bing Predicts gives very high probabilities to wins of Udinese (89%), Genoa (81%) or Napoli (89%) which seems quite bold for a game which is in the end quite random. Although we also believe these teams will win, we would not be ready to bet on victories with these odds.

We also predicts that Juventus and AC Milan will have a tougher opposition than Bing Predicts thinks with a winning probability of 42% and 45% vs. a bigger margin of 64% for Bing

Serie A Week 18 AnalystMaster Bing
Home Away H D A H D A
Chievo Bologna 41% 26% 33% 31% 36% 33%
Cagliari Fiorentina 34% 25% 41% 6% 19% 75%
Lazio Crotone 65% 20% 16% 89% 5% 6%
SPAL 2013 Torino 36% 25% 39% 25% 29% 46%
Genoa Benevento 54% 24% 23% 81% 15% 4%
Udinese Verona 46% 24% 30% 89% 5% 6%
Sassuolo Inter Milan 23% 27% 50% 17% 16% 67%
Napoli Sampdoria 59% 22% 19% 89% 5% 6%
AC Milan Atalanta 45% 27% 28% 64%
Juventus Roma 42% 24% 34% 64% 20% 16%

Ligue 1 Week 19

Here too Bing Predicts very high odds on some games, especially the 89% for Monaco vs. Rennes. Our algorithm also largely favours a Monaco win, but gives Rennes (who has been doing quite well in the last few games) more chances to draw or to even win that game.

We also think Amiens will exploit their Home Advantage to challenge Ranieri’s FC Nantes at home, and that Montpellier will bring a much tougher opposition to Bordeaux that Bing Predicts seems to believe.

Ligue 1 Week 19 AnalystMaster Bing
Home Away H D A H D A
Angers Dijon 42% 25% 33% 27% 33% 40%
Lille Nice 43% 28% 29% 56%
Toulouse Lyon 25% 24% 51% 21% 33% 46%
Amiens Nantes 48% 27% 25% 35% 23% 42%
Guingamp St Etienne 55% 27% 18% 42% 25% 33%
Marseille Troyes 61% 21% 18% 55% 31% 14%
Monaco Rennes 68% 18% 14% 89% 5% 6%
Paris SG Caen 75% 16% 9% 89% 5% 6%
Bordeaux Montpellier 39% 29% 32% 55% 31% 14%
Metz Strasbourg 30% 24% 46% 29% 21% 50%

Bundesliga Week 18

Our algorithm predicts a tough time for Bayern Munich at Leverkusen, with a game too close too call. This prediction sounds quite bold when we look at the rankings today: Bayern Munich is at the top of the Bundesliga with 42 pts and Leverkusen at a distance in 4th place with 28 pts. Betting on a Munich win or at least a draw would sound like the reasonable prediction, however we have chosen to leave the algorithm’s results as is as we think that the fact that it gives this game almost equal odds is interesting in itself.

Bing has yet to provide its predictions for Bundesliga matches at this time.

Bundesliga Week 18 AnalystMaster Bing
Home Away H D A H D A
Leverkusen Bayern Munich 39% 24% 37%
Werder Bremen Hoffenheim 39% 24% 37%
Frankfurt Freiburg 52% 22% 26%
Stuttgart Herta Berlin 42% 29% 28%
Augsburg Hamburg 53% 23% 24%
Hannover Mainz 58% 22% 20%
Leipzig Schalke 04 49% 23% 27%
Koln M’gladbach 39% 24% 37%
Dortmund Wolfsburg 56% 22% 23%

 

Forecasting with PowerBI… really?

So PowerBI just added a Forecasting feature? As a Forecasting specialist, I got really interested in this new feature.

What all organizations are looking for is not only a tool to explore the past, but also to get insight about the future. No wonder it’s a great marketing tool for all BI platforms to come with a Forecasting tool that can provide insights from your data about predicting the future.

This feature will allow you to build sleek charts like this one, but how good is it really? Can we trust PowerBI for forecasting future trends?

8ec980b1-822c-4f72-bde7-13944329d088

How it works?

First let’s take a look at what we have. Follow this link to understand how to activate the PowerBI Forecasting feature. Now you can have PowerBI forecasting a Time Serie for you. Great but wouldn’t it be nice to understand HOW PowerBI forecasts it? What is the model used in background?

The PowerBI team answered this question in this post where they explained that the algorithm was the R ETS(A,A,A) or ETS(A,A,N), which is an Exponential Smoothing algorithm that you can choose to make Seasonal or not (seasonality will be calculated in an Additive not a Multiplicative way). I will refer you to this paper for more information about R and this ETS algorithm.

Back to basics: Forecasting 101

We will dedicate some more posts on Forecasting Time Series in this blog, so stay tuned!

What you should know in the meantime is that if you want to get a good forecast you pretty much need to go through all the following operations:

  1. Clean your data set from outliers:  you might want to remove exceptional values (think zero sales because you were out of stock, or an exceptional sale due to a promotion for example) from the baseline as theses values are not relevant and risk of getting your forecasting algorithm off-track
  2. Pick a forecasting algorithm and the values of its parameters: there are a bunch of methods you can use to forecast a Time Series: Time Series Decomposition, Moving Averages, Exponential Smoothing, Holt-Winters, ARIMA, Croston… to name a few. Each method comes with some parameters values that have to be defined and this is equally important. It’s much better to use a simple method with relevant parameters than a sophisticated algorithm with an absurd set of parameters.

How to best clean your data, pick the best algorithm and set of parameters for it are important and difficult questions that Forecasters have to answer. There is often no absolute answer and the algorithm that works best today will not necessarily be the best one tomorrow. So should you change every time you forecast or should you give more points to stability? Should you let the algorithm pick himself the parameters to use or should you fix them based on your knowledge of the business? Again important questions!

Actually these questions are so important that we touch here the core of what Forecasting really is: STRATEGIC! Forecasts drive Strategic decisions and Tactics.

As such Forecasts are usually reviewed at the highest possible level in Organizations. And the most important thing when presenting a forecast to a top executive is to be able to JUSTIFY how you built it: does it makes sense based on past trends and seasonality? The worst possible answer to justify your forecast is: “the machine did it”…

PowerBI Forecasting

The problem of PowerBI’s Forecasting module (as we are writing these lines), is that it answers all these strategic questions for you and leaves you with very little FLEXIBILITY to chose methods and parameters

  • data cleaning: PowerBI won’t really do it. At most it will perform “Linear extrapolation” between missing data points
  • you can pick if your algorithm should be Seasonal or not (thank you!), but the way to calculate seasonality will be Additive and nothing else (sorry but I actually usually prefer to consider Seasonality as Multiplicative…)
  • The algorithm will be the R ETS additive one, with linear trends and nothing else. If you want to use a different method or non-linear trends you can forget it
  • Parameters of the ETS are automatically picked (this is usually done by doing back testing and minimizing the error for the 1st time period forecasted). This can be quite dangerous because it’s not so hard to be accurate for the 1st time period, many methods and parameters can provide good forecasts for that. But what Executives and Top Management will likely look at is more the medium to long term forecast, and the parameter that minimizes the first month error is not necessarily the one that will provide the best medium to long term forecast.

Don’t get me wrong the R ETS algorithm is quite good and will provide excellent results in many cases. But I also know a lot of cases on which it is not appropriate, will give you absurd forecasts and you will be in difficulty when having to explain them to your Management. So for all these reasons it’s important to understand how this Forecasting PowerBI “Black Box” works, what it can do and can not do, and most importantly be careful and not trust it blindly like “PowerBI is so smart and tells me this so it must be true…”

Bottom Line: 

  • It’s great that PowerBI works on Forecasting features (we insist that these are still in beta version as we write and will be for sure improved in the near future). We are also happy to see that they use R which comes with awesome packages for Time Series analysis and forecasting, so this also goes in the right direction
  • However keep in mind that Forecasting is a highly strategic activity and being able to justify how you calculate your forecast is as important as the forecast itself. Therefore a little bit more control in the choice of the algorithms and parameters would be highly appreciated instead of having a “Black Box”. Why not also provide very simple “What if?” tools such as boxes for Forecasters to build a forecast based on simple strategic inputs from their management like Trend: +10%, Seasonality: 12 months… Come on PowerBI team, you know how much we love you on this blog so… challenge accepted?

AnalystMaster

Too big for Excel…

We have seen in our PowerBI Best Practice Article that it is best to work with data at maximum level of detail. Well the problem is that sometimes it can be a lot of data… like really a lot…

One day I wanted to extract a lot of information from an ERP system. Not only the data was way above 1m rows Excel capacity, but I was also limited by how much information I could extract from this ERP at at time.

Crazy isn’t it? You know the information you need is out there but it’s LOCKED… What is the point of owning a Ferrari if you’re only going to drive it in first gear?

So what can you do? If you have read this article, you know that PowerBI is the right tool for you. We will show you how to use it depending on how much resources you may have or support from our IT department.

Regarding your data, your have essentially 3 options:

  • Use PowerBI to connect directly to the database
  • Make it fit
  • Work around it

Use PowerBI to connect directly to the database

Good news #1: we know a very cool tool: PowerBI who is capable of handling large amount of data, this is the way to go!

Good news #2: it is wired with tons of connectors to big databases (SAP, BW, Hadoop…)

If you get sufficient support from your IT department you should be able to activate these connectors and work without any issue with your large dataset

However if you are reading this blog, we do not expect you to get this kind of resources or too much support from your IT department. Your are left with the last 2 options:

Make it fit

At the other end of the spectrum, you can obviously try to make your data fit by reducing the amount of information you bring into PowerBI. This can essentially be done by

  • filtering: by filtering out some of your data, based on some attributes or combinations of attributes that you are less likely to need later you can make your dataset smaller and eventually make it fit within your capacity requirements. You can also limit your time horizon
  • aggregating: by giving up strategically on some levels of details that will be less likely to be useful in your analysis later on. For example instead of brining data at a daily or weekly level of detail you might consider monthly.

In our opinion, the Make It Fit option should be activated only in last resort. If you go for it, since you will obviously lose some detail in your information, be strategic about it. Think wisely about which levels of details are less likely to add value to your analysis and take decision on which data to leave out accordingly

Thankfully even with limited IT resources, we do not take NO for an answer and there is a last option:

Work around it

This sounds absurd: we don’t have the tools to manage (extract or store) our data with maximum level of detail, but we don’t give hope that we will import it into PowerBI anyway. How? by slicing it and re-aggregating it later!

  • Slicing: First you need to slice your data into slices that fit within your extraction capacities. For this you can refer to the filtering strategy of the “Make it Fit” section.
    Define some filters so that your extraction is manageable and run them one after the other so that you get the full data set.
    Obviously the less slices the better, so don’t necessarily limit yourself with the storage capacities of Excel or Access. Just focus on your extractions capacities and Go with Text (.TXT) files for Storage if necessary
  • Re-aggregating: this can be managed with PowerBI using the Append Query function in the Query Editor
    This works seamlessly if you have a few slices to re-aggregate however if you have many you might want to run a batch to manage this aggregation outside PowerBI

Back to my ERP extraction problem, unfortunately there was no direct connector for PowerBI that I could use so I got the IT department to extract the past information for me into a giant text file, and to run a script every day to extract the new daily data and drop it into another TXT file. All TXT files were in a folder that I could access.

I was left with a bunch of TXT files with the same format, the only thing I had to do is aggregate them together. A quick Python script later (that I’m giving to you below) and I was done. I managed to get all my data with maximum detail out of this ERP and play with it in PowerBI!

Never give up!

AnalystMaster

Appendix: how to aggregate text files using Python

Text files are not limited in row capacity and could be a efficient low cost way to store large datasets.

One of the best tool to work with text files is Python.

If you do not have Python you can download it for free from the Official Python website

Here we suppose that:

  • you have a file called Startfile.txt at the path Path/Startfile.txt which contains the headers (and only the headers) of all your text files (which are all the same and in the same order)
  • that all the text files that you want to append together are in a folder called Path/Files ; and that they are the only text files in this folder
  • you have a file called Finalfile.txt that will contain the results of your appending of all the text files containing your data. This file is at the path Path/Finalfile.txt

What you need to do:

  1. open your notepad and copy paste the script below
  2. replace Path by the actual path where your files will be stored (and only these files), create the Startfile and drop your other text files in the Path/Files folder
  3. save this script as a .py extension.
  4. Run it, that’s it! You can now open your Finalfile.txt or connect it to PowerBI to read your data

import glob

g = open(‘Path/Finalfile.txt’,’w’)

g.truncate

s = open(‘Path/Startfile.txt’,’r’)

g.write(s.read())

g.close()

g =open(‘Path/Finalfile.txt’,’a’)

path= ‘Path/Files/*.txt’

file=glob.glob(path)

for fle in file:

with open(fle) as f:

text = f.read().splitlines(True)

g.writelines(text[1:])

g.close()