How we helped Unilever forecast their sales

Sales Forecasting is one of the hottest topics nowadays. It enables companies to make informed business decisions and predict short-term and long-term performance. Moreover, Sales forecasting gives insight into how a company should manage its workforce, cash flow, resources, marketing strategies and plan the future growth.

We are working on different Sales Forecasting tools for Unilever Belgium.
The project is dealing with two main goals, for 14 different product categories:

  • Predict the daily sales amount of a certain product for the next 2 weeks.
  • Predict the monthly sales amount of a certain product for the next year.

Data analysis

Discovering what is in the data is not the most difficult thing, but it is time-consuming. The first step to start with in this kind of problems is data exploration, understanding the data dealing with and its characteristics. In our problem, we spent a lot of time in this step due to really bad data available. There were a lot of missing values in specific attributes, the data were skewed distributed and we did not have enough feature information to build an accurate model from.

Firstly, we filtered out products that do not have enough history in the past (historical data is crucial for this type of projects). We discovered that some products just started being sold in last month, but the model will not be able to find the right pattern and accurately predict for this products. Secondly, we treated missing values applying different techniques like filling with the mean values and interpolation (numeric features), and adding “unknown” as a new category for categorical features. Lastly, the most effective way which drastically improved our predictive model was Feature Engineering. As a time series problem, we created new features using previous sales information (Lag features called in time series). In this way, we provide information about the near past to the model, also the model is able to detect whether there is a decrease or increase of sales. During data analysis, we detected some trends of the data for different product categories, for instance there are more sales of specific products during the summer (which indicates temperature is an important factor for some products. Likewise, we considered seasonality in our model, and added new features based on that.

The monthly basis predictive model follows the same strategy as the model for daily forecasting, but using different lag features. In this model, we step back 2 years and collect data from the season based on which month we are predicting. For instance if we are predicting for January 2018, there are 6 new features created using data of February 2017, January 2017, December 2016, February 2016, January 2016 and December 2015. As we mentioned seasonality in the data, the model learns from this feature the trend of sales in past years in the same season, tries to find the right pattern on the data and giving a prediction score based on the information processed.

Forecasting model

This is clearly a regression problem, since we have to predict continuous numeric values. Different models were considered like LSTM (Long-short term memory – neural network), tree based models and gradient boosting. We ended up with gradient boosting models giving us the best result.

There are 14 different categories with more than 100 products. The behavior of people differs based on category, for example people tends to buy ice cream in warm weather, but the sales of for example house hold care products is not related to temperature. Since there are 14 different categories you can’t simply merge the behaviors. We decided as a better approach to build a specific model for each category. Moreover, we have built a script which identifies new categories, trains the model and forecasts automatically. The best score (RMSE) achieved is 0.6, having an accuracy of 70% based on the coefficient of determination (r2 score).

Tool for business users

The main users of our tool are not Data Scientists, so they need an easy-to-use interface. That’s why in order to show the predicting values to the Unilever employees, we are using Power BI. In the near future we’ll be working on a dashboard that provides an overview of both the daily and monthly forecasts.