Seasonal Decomposition of Sales Data using SARIMAX

Time series forecasting is a cornerstone of modern analytics, especially for e-commerce, retail, and inventory management. Raw sales data often contains trends, seasonality, and irregular fluctuations that can be hard to model with simple techniques. In this article, we’ll perform a detailed seasonal decomposition and build a robust SARIMAX model in Python.

Understanding SARIMAX

SARIMAX stands for Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors. It extends ARIMA to handle seasonality and allows external variables (exogenous features) to improve predictions. SARIMAX is powerful for data with clear seasonal patterns, such as monthly sales that spike during holidays.

Loading Sales Data

We’ll use Python’s pandas to load and inspect our sales dataset.


import pandas as pd

# Load the dataset
sales_data = pd.read_csv("sales_data.csv", parse_dates=['date'], index_col='date')

# Preview the first rows
print(sales_data.head())

# Plot the sales
import matplotlib.pyplot as plt

plt.figure(figsize=(12,5))
plt.plot(sales_data['sales'], label='Sales')
plt.title("Daily Sales")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.show()
        

This plot will give us an idea of trends, seasonality, and potential anomalies in our data.

Seasonal Decomposition

Using statsmodels, we can separate the time series into trend, seasonal, and residual components:


from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose the time series
decomposition = seasonal_decompose(sales_data['sales'], model='multiplicative', period=30)

# Plot components
decomposition.plot()
plt.show()
        

This decomposition helps us understand:

Building the SARIMAX Model

We’ll create a SARIMAX model to capture trend and seasonality, then forecast future sales.


from statsmodels.tsa.statespace.sarimax import SARIMAX

# Define the SARIMAX model
# Here (p,d,q) = (1,1,1) and seasonal (P,D,Q,s) = (1,1,1,30) for monthly seasonality
model = SARIMAX(
    sales_data['sales'],
    order=(1,1,1),
    seasonal_order=(1,1,1,30),
    enforce_stationarity=False,
    enforce_invertibility=False
)

# Fit the model
results = model.fit(disp=False)

# Model summary
print(results.summary())
        

Forecasting Future Sales

Once the model is trained, we can forecast future sales and visualize predictions alongside actual values:


# Forecast the next 60 days
forecast = results.get_forecast(steps=60)
forecast_mean = forecast.predicted_mean
forecast_ci = forecast.conf_int()

# Plot forecast
plt.figure(figsize=(12,5))
plt.plot(sales_data['sales'], label='Observed')
plt.plot(forecast_mean, label='Forecast', color='orange')
plt.fill_between(forecast_ci.index,
                 forecast_ci.iloc[:,0],
                 forecast_ci.iloc[:,1], color='orange', alpha=0.2)
plt.title("SARIMAX Forecast")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.show()
        

Model Diagnostics

To validate our model, we inspect residuals. Ideally, residuals should behave like white noise:


# Plot diagnostics
results.plot_diagnostics(figsize=(12,8))
plt.show()
        

If residuals exhibit autocorrelation or patterns, the model may need refinement, such as adjusting the ARIMA or seasonal parameters.

Adding Exogenous Variables

Sales may be affected by external factors like promotions or holidays. We can include these as exogenous variables in SARIMAX:


# Example exogenous variable: holiday flag
exog = pd.read_csv("holidays.csv", parse_dates=['date'], index_col='date')

model_exog = SARIMAX(
    sales_data['sales'],
    exog=exog,
    order=(1,1,1),
    seasonal_order=(1,1,1,30),
    enforce_stationarity=False,
    enforce_invertibility=False
)

results_exog = model_exog.fit(disp=False)
print(results_exog.summary())
        

Conclusion

SARIMAX allows us to account for both seasonality and external factors when forecasting sales. By performing seasonal decomposition first, we gain insights into trends and patterns, helping us select appropriate model parameters. This approach is particularly valuable for retail and e-commerce businesses with complex seasonal sales patterns.