Skip to main content
IndustryFMCG
ClientUnilever
TechnologyPython, Gradient Boosting, LSTM, Power BI

Forecasting sales across 14 product categories

Unilever Belgium needed to predict sales volumes across 14 product categories — daily forecasts for the next two weeks, and monthly forecasts for the next year. We built category-specific forecasting models that account for seasonal patterns, promotional effects, and product-level trends, with results served through Power BI dashboards for the commercial team.

CHALLENGE

Forecasting across 14 product categories

Unilever Belgium wanted to forecast sales at product level across 14 categories, with two distinct time horizons: daily predictions for the next two weeks (for operational planning) and monthly predictions for the next year (for strategic planning). The existing process lacked the granularity and automation the commercial team needed.

Unilever
DATA

Working with imperfect data

The available sales data presented several challenges that had to be solved before any model could be trained.

Gaps and inconsistencies

The dataset contained numerous missing values, skewed distributions, and limited feature information. Some products lacked sufficient historical data entirely. We filtered those out, applied interpolation and mean-value filling for numeric gaps, and introduced an "unknown" category for missing categorical features.

Feature engineering

We created lag features from previous sales periods to capture trends, and identified seasonal patterns in the data — for example, certain product categories show consistent spikes during summer months. For monthly forecasting, the model looks back two years to pick up seasonal signals from previous cycles.

APPROACH

Category-specific models

We framed the problem as a regression task and tested several model architectures: LSTM neural networks, tree-based models, and gradient boosting.

One model per category

Consumer behaviour varies significantly across product categories, so we built a separate model for each of the 14 categories rather than forcing a single model to generalise across all of them. This gave each model the room to learn category-specific patterns.

Automated retraining

We built automated scripts that detect new categories, train models, and generate forecasts — so the system scales without manual intervention when the product portfolio changes.

RESULTS

What the models delivered

The best-performing models achieved an RMSE of 0.6 and an r² of 0.70, meaning the models explained roughly 70% of the variance in actual sales, which is strong for product-level retail sales, where consumer noise typically caps explained variance well below the headline numbers seen in cleaner forecasting domains. Results were delivered through Power BI dashboards designed for the commercial team, not data scientists.