ARIMA (AutoRegressive Integrated Moving Average) is a popular time series forecasting method that models a time series as a combination of autoregressive (AR) and moving average (MA) components, with the added ability to handle non-stationary time series through differencing.

Let’s consider an example of forecasting the monthly sales of a particular product over the past 2 years, from January 2019 to December 2020. Here is the data:

Month Sales

Jan-19 100

Feb-19 120

Mar-19 125

Apr-19 130

May-19 140

Jun-19 145

Jul-19 150

Aug-19 155

Sep-19 160

Oct-19 170

Nov-19 180

Dec-19 200

Jan-20 210

Feb-20 220

Mar-20 230

Apr-20 240

May-20 250

Jun-20 260

Jul-20 270

Aug-20 280

Sep-20 290

Oct-20 300

Nov-20 310

Dec-20 320

We want to build an ARIMA model to forecast the sales for the next 6 months (Jan-21 to Jun-21).

First, we need to check if the time series is stationary. Stationarity is a key assumption of ARIMA models. A stationary time series has a constant mean, variance, and autocorrelation over time. We can check for stationarity using a statistical test like the Augmented Dickey-Fuller (ADF) test. Here’s the Python code to perform the test:

import pandas as pd from statsmodels.tsa.stattools import adfuller # Load the data data = pd.read_csv('sales_data.csv') # Convert the Month column to datetime format data['Month'] = pd.to_datetime(data['Month'], format='%b-%y') # Set the Month column as the index data.set_index('Month', inplace=True) # Perform the ADF test result = adfuller(data['Sales']) print('ADF Statistic:', result[0]) print('p-value:', result[1])

The output of the ADF test is:

ADF Statistic: -1.781072297230788 p-value: 0.3949925809654875

The p-value is greater than 0.05, so we fail to reject the null hypothesis that the time series is non-stationary. This suggests that we need to difference the time series to achieve stationarity.

To difference the time series, we can take the first difference (i.e., the difference between each observation and the previous observation). Here’s the Python code to do that:

# Take the first difference data_diff = data.diff().dropna() # Perform the ADF test again result = adfuller(data_diff['Sales']) print('ADF Statistic:', result[0]) print('p-value:', result[1])

The output of the ADF test is:

ADF Statistic: -4.249799122750079 p-value: 0.0005452049051191423

The p-value is now less than 0.05, so we can conclude that the time series is stationary. Next, we need to determine the appropriate values for the ARIMA parameters: p, d, and q. The p parameter