Modelling and Forecasting Climate Time Series with State-Space Model

This study modelled and estimated climatic data using the state-space model. The study was specifically to identify the pattern of the trend movement i.e., increase or decrease in the occurrence of the climatic change; to use of Univariate Kalman Filter for the computation of the likelihood function for climatic projections; to modelling the climatic dataset using the state-space model and to assess the forecasting power of the state-space models. The data used for the work includes temperature and rainfall for periods January 1991 to December 2017. The data are tested for normality. Shapiro-Wilk, Anderson-Darling and Kolmogorov-Smirnov test of normality for the climatic data all showed that the variables are not normally distributed. The work spans the use of breaking trend regression model to fit climatic data to estimate the slopes which show much increase in climatic data has been recorded from the initial time data collection until the present. Investigations and diagnostic are carried out by checking for corrections in the residuals and also checking for periodicity in the residuals. The results of this investigation show significant autocorrelation in the residuals indicating the presence of underlying noise terms which is not accounted for. By treating the residual as an autoregressive moving average (ARMA) process whereby we can obtain its spectral density, the result from the parametric spectral estimate shows underlying periodic patterns for monthly data, thus, leads to a discussion on the need to treat climatic data as a structural time series model. We select appropriate models by considering the goodness of fit of the model by comparing the Akaike information criterion (AIC) values. Parameters are estimated and accomplished with some measures of precision.


Introduction
Largely, state-space models (SSMs) have been used in several areas of applied statistics. In specifically, there are some desirable properties of the linear state-space models and also, they have vast potential in time series modelling that incorporates latent processes. When a model is found to be in the linear state-space form, the most used algorithm to predict the latent process, the state, is the Kalman filter algorithm. This algorithm is a technique for computing, at each time (t = 1, 2, ...), the optimal estimator of the state vector based on the existing information until t and its success lies on the fact that is an online estimation procedure.
This predicting structure, whose performance was in recent times compared to other numerous forecasting methods across thousands of time series [1], adapts to underlying alterations in series dynamics and automatically revises forecasts as new observations. In agreement with the above, the study adopts the state-space model to analyse and forecast for the temperature and rainfall data in Nigeria.
The purpose of this study is to model and estimate the climate using the state-space model and the specific objectives are to: identify the pattern of the trend movement i.e. increase or decrease in the occurrence of the climatic change; use of Univariate Kalman Filter for the computation of the likelihood function for climatic projections; modelling the climatic dataset using the state-space model and assess the forecasting power of the state-space models.
Shamshad et al. [2] show a comparison of Artificial Neural Networking Multilayer Perceptron (ANN-MLP) with the Automatic Exponential Smoothing Algorithm (ETS) and the Auto-Regressive Integrated Moving Average (ARIMA) models for forecasting Lahore, Pakistan's main weather parameter. Models are built by taking into account average monthly maximum and minimum temperature, relative humidity, wind speed and precipitation amount. Data covering thirty years (1987-2016) was used to build the models. ANN-MLP is a mathematical method, and the computational methods are ARIMA and ETS. They divide the thirty years data (1987-2018) data into training (1987 till 2016) and test (2017 till 2018) set to ensure the efficiency and reliability of all these models along with the performance criteria of the estimates. The research explained in brief how the various methods of learning can be used to formulate ANN-MLP. Deciding the most appropriate model and network configuration is based on their performance forecast. MAE (Moving Average Error), RMSE (Root-Mean-Square error), ME (mean error), MASE (mean absolute scaled error) suggest better results for ANN-MLP. Afsar et al. [3] investigated variability in temperature and precipitation in the Gilgit-Baltistan region. They used regression and stochastic models to show temperature and rainfall predictions. We observed precipitation prolonged with temperature increasing. A decrease in the amount of precipitation is observed from 2007 to 2011 with an increase in the monthly average maximum temperature. They considered AR(1) ideally suited to temperature forecasting.
In the study of Faisal and Ghaffar [4] on the Thiessen Polygon technique to test Pakistan's 56 stations for 50 years (1961-2010) weighted rainfall area (AWR). Month-to-month precipitation records of fifty-six stations, storm measurements for the fifty-year season (1961-2010) and a standard precipitation size relationship were used by the Theissen system. Yusof et al. [5] used an amount of rainfall to be categorised into seven categories (extremely wet to extremely dry) to analyze dry and wet events using Peninsular Malaysia data. They used precipitation index (SPI) standardizes to model the best fit distribution to reflect the rainfall. In comparison with Gamma and Weibull distributions, the lognormal distribution is found to be better matched to the daily rainfall in the area.
Extreme Pakistan temperature events and rainfall for the period 1965-2009 were examined in Zahid and Rasul work [6] to quantify Pakistan frequency. They used F-test to determine the country minimum and maximum extreme temperature events. We pointed out that all over the country, certain extreme events are increasing. Regarding extreme rainfall events, they used the K-S method at a confidence interval of 95 per cent and concluded that the southern half of Pakistan faces more wet spells due to global warming and climate change. In the same vein, rainfall data in Queensland, Australia including climate indices, monthly rainfall and temperature was surveyed in Abbot and Marohasy [7]. They brought ANN into the area to predict monthly rainfall. They suggested there is scope for improvement in this product design. Analysis of ANN to forecast lasting monthly temperature and rainfall from 76 stations in Turkey at any point for the period 1975-2006 was also carried out in Bilgili and Sahin's work [8] based on knowledge from neighbouring stations. They divided 76 measuring stations into training sets and test sets. The fitted model was satisfactory because the errors are within reasonable limits.

State-Space Model
A state-space representation in control engineering is a mathematical model of a physical system as a collection of input, output and state variables similar to first-order differential equations or difference equations. State variables are variables whose values change over time in a way that depends on the values they have at any given time and often depends on the values of input variables placed externally. The values of the output variables depend on the values of the state variables.
"State-space" is the Euclidean space, where the variables on the axes are the variables of the body. The state of the system within that space can be expressed as a vector. The state-space method is characterized by significant algebraization of general system theory, which makes it possible to use Kronecker vector-matrix structures. The ability of these structures can be efficiently used to study systems with modulation or without modulation. The state-space representation (also called the "time-domain method") gives a convenient and compact way to model and analyze systems with multiple inputs and outputs. With p and q outputs, we would otherwise have to write down q × p Laplace transforms to encode all the information about a system.

State-Space Representation
In the time domain, a system can be described in general by a set of linear differential and algebraic equations (i.e., state-space model): Where xis the vector of the state variable u is the vector of inputs (manipulated variables) y is the vector of outputs (controlled/measured variables), and A, B, C, and D are constant matrices of appropriate dimensions.
Taking Laplace transform of Equations (1) and (2) using zero initial conditions, and re-arranging, the system transfer function matrix G(s), the relationship between the inputs uand the outputsyrelates the state model equations (1) and (2) as: The representation of the (SSM) was developed based on graph theory principles. Two major types are used: 1. The signal-flow-graph (SFG) type.
The first type is easier to use and hence has been adopted by many researchers [9][10], and it has also been adopted in this research for the same reason. Each SFG consists of directed branches interconnected at nodes. In the state-space model, the nodes represent the variables (signals), and the branches connecting the nodes indicate that these nodes are related. Each branch is assigned a numerical value or a function, which quantifies the relationship between the two variables in terms of a gain factor or a transfer function.

Identification and estimation of a state-space model process
Before the analysis can be processed, identification of the statespace model would be carried out in the following ways: The measurement equation has the form To accommodate data properties as temporal correlation and the periodic behaviour, a periodic state-space model is proposed. This model is defined as where s is the season of the year with s = 1, 2, .., S ; n is the year with n = 1, 2, ...,N; Y s,n represents the time series observation in the S th season of the n th year; [S (n − 1) + s] th is the observation of the time series; [1, 2, 3, 4, . . ., S ] are the unknown parameters representing the fixed effects in the model; D s,n is a 1 × S matrix of known values, a design matrix; the error process (e s,n ) is a white noise disturbance, which is assumed to have var (e s,n ) = 2 e µ s is the mean of the process (X s,n ) for the s th season, s is the autoregressive parameter for season s and e s,n is the white noise disturbance; The means, standard deviations and variances of the different series gotten for each sector would be examined where the series with a lesser variance being the most efficient. Their skewness and kurtosis will also be determined. Also, the existence of stationarity, unit root and long memory properties would be determined.

Univariate Kalman Filter for the computation of the likelihood function for climatic projection
Parameter estimation that specifies the state-space model is very important in analyzing various components of the time series model. The idea is that Θ = {µ 0 , 0 , ϕ, Q, R, Y, Γ} used to represent the vector of unknown parameters containing the elements of the initial mean µ 0 , the covariance 0 , the transition matrix, θ, the state and observation covariance matrices, Q and R are inputs Y and Γ are estimated using the maximum likelihood estimation. The maximum likelihood, for a time series model where the observations y 1 , y 2 , . . . , y n are not independent, is defined as a conditional probability density function to write the joint density function Where p(y t |Y t−1 ) is the distribution of y t conditional on the information set at time t − 1 that is Y t−1 = {y 1 , y 2 , . . . , y n }. The maximum likelihood is used under the assumption that the initial state is normal X 0 ∼ N(µ 0 : 0 ) and the error V 1 , V 2 ..V n and W 1 , W 2 ..W n are jointly normal and uncorrelated vector variables. W t ∼ N(0 : Q)andV t ∼ N(0 : σ 2 ). Hence, the likelihood is derivedbyusing the innovations Which are independent normal where E(ε t ) = 0 and the covari- Where the dependence of the innovation on the parameter θ has been emphasized using the Kalman filter for given θ, this can be done as follow; 1. Select initial and starting values for the parameter (θ 0 ) 2. For θ 0, compute the likelihood L(y : θ 0 ) using the Kalman filter 3. Apply a numerical optimization algorithm to L(y : θ 0 ) 4. Repeat this process for n steps until the value of θ corresponding to the maximum likelihood is found.

Forecasting with Kalman filter
An m-period-ahead forecast of the state vector can be calculated from the equation below: Lead to equation The error of this forecast can be found by subtracting ξ t+k from ξ t+k|t from which it follows that the mean squared error of the forecast is These results can also be used to describe m-period-ahead forecasts of the observed vector y t+m , provided that {x,} is deterministic. Applying the law of iterated expectations to E(y t+k jξ t , ξ t−1 , . . .y t , y t−1 , . . .) E[(A x t+k , +H ξ t+k + W t+k ) jξ t , ξ t−1 , . . .y t , y t−1 , . . .) (15) results in y t+k|t = E(y t+k jy t , y t−1 . . ., The error of this forecast is With mean squared error  Table 1 shows the values of the various components of the spectral analysis for temperature. The numbers in parentheses, (d, D, s, M, T ), are defined as follows: d is the regular differencing order, D is the seasonal differencing order, s is the number of seasons (ignored if D is 0), M is 1 if the mean is subtracted, 0 otherwise, T is 1 if the trend is subtracted, 0 otherwise. (0, 0, 12, 1, 0) indicates that there is no regular differencing. The seasonal differencing is zero, while the number of seasons is zero. The value indicates that the mean is subtracted, while the trend is not subtracted. Figures 3-6 show that there exists an underlying periodic component in the residuals obtained by fitting the smoothed and filtered data to the observation. We can see that the first peak looking at the monthly series corresponding to the fitted data is at a frequency ω = 0.389, corresponding to a period of 50 months. And the second peak is at a frequency of ω = 0.35, with a period of approximately 12 months. Therefore, the need to extend the structural model to contain a Seasonal and cyclic component is important.    Table 2 shows the values of the various components of the spectral analysis for rainfall. The numbers in parentheses, (d, D, s, M, T ), are defined as follows: d is the regular differencing order, D is the seasonal differencing order, s is the number of seasons (ignored if D is 0), M is 1 if the mean is subtracted, 0 otherwise, T is 1 if the trend is subtracted, 0 otherwise. (0, 0, 12, 1, 0) indicates that there is no regular differ- encing. The seasonal differencing is zero, while the number of seasons is zero. The value indicates that the mean is subtracted, while the trend is not subtracted. Figures 7-10 show that there exists an underlying periodic component in the residuals obtained by fitting the smoothed and filtered data to the observation. We can see that the first peak looking at the monthly series corresponding to the fitted data is at a frequency ω = 0.25, corresponding to a period of 50 months. And the second peak is at a frequency of ω = 0.23, with a period of approximately 12 months. Therefore, the need to extend the structural model to contain a Seasonal and cyclic component is important.

State-space model
From Table 3, the AR coefficients with respect to the MLEs are given as 1.271 and -0.274, respectively also, the roots of the characteristic equation φ = 1 − 1.271z + 0.274z 2 = 0 are 0.542 and 1.998, respectively. Since these values are greater than 1, the AR component is covariance stationary.
The variance of the permanent component is 0.000000, and the variance of the transitory component is 1.107762854. Hence, the variance of the transitory component is higher than the variance of the permanent component and the ratio of the variance of the permanent component to the variance of the stationary component is 0.000. This shows that the stationary component is almost twice as important as the permanent component for explaining the variation of temperature.
Since these values are greater than 1, the AR component is covariance stationary.
The variance of the permanent component is 0.0000000, and the variance of the transitory component is 2.648752906. Hence, the variance of the transitory component is higher than the variance of the permanent component and the ratio of the variance of the permanent component to the variance of the stationary component is 0.000. This shows that the stationary component is almost twice as important as the permanent component for explaining the variation of temperature.

Forecasting
The in-sample performance of the state-space model for forecasting the rainfall and temperature series appears to be more favourable to the model identified by Shittu and Yemitan [11]. This is evident in Tables 5 and 6.
Given the relative accuracy of the model by Shittu ad Yemitan [11], the improvement achieved by the Kalman filter method is mainly as a result of the built-in the specification for updating the estimation based on latest available information. Hence, the rainfall and temperature exhibit elements of time-varying or regime-switching characteristics over the study period. As a result, it is an indication and a signal that the climatic data may be best estimated using non-linear time series methodologies.

Conclusion
This study modelled and estimated climatic data using the state-space model. The study was specifically to identify the pattern of the trend movement, model the dataset using the state-space model and to evaluate the forecasting power of the state-space models.
The data used for the study include temperature and rainfall for periods January 1991 to December 2017. The data were tested for normality. The study showed that the average temperature is 27.3 • Cand standard deviation is 1.87 • C. The maximum temperature is 31.5 • C and minimum temperature is 23.3 • C.The average rainfall is 94.5mm with a standard deviation of 86.3mm. The maximum rainfall is 314.3mm and the minimum rainfall is 0.2mm. Shapiro-Wilk, Anderson-Darling and Kolmogorov-Smirnov test of normality for the climatic data all showed that the variables are not normally distributed. The plot of the monthly temperature series shows the underlying trend and possible seasonal and cyclic patterns as well. The plot of the monthly rainfall series shows the underlying trend and possible seasonal and cyclic patterns as well.
Investigations and diagnostic were carried out by checking for correlations in the residuals and also checking for periodicity in the residuals. The results of this investigation show significant autocorrelation in the residuals indicating the presence of underlying noise terms which is not accounted for. Also, the result from the parametric spectral estimate shows underlying periodic patterns for monthly data, thus, leads to a discussion on the need to treat climatic data as a structural time series model. We selected the appropriate models by considering the goodness of fit of the models and by comparing the AIC values. Parameters were estimated and accomplished with some measures of precision. An important aspect of fitting structural models is the underlying changes in the unobserved trend component.
For the case of state-space models with correlated errors, further work can be done here by taking into consideration correlation between the error terms in state equation when considering structural models of the form trend, seasonal, cycle and an autoregressive process. This type of results is important since they show how well the Kalman filter for correlated errors carries out the filtering of the unobserved component as compared to the case when errors are not correlated.