Goodness of Fit Test of an Autocorrelated Time Series Cubic Smoothing Spline Model

We investigated the finite properties as well as the goodness of fit test for the cubic smoothing spline selection methods like the Generalized Maximum Likelihood (GML), Generalized Cross-Validation (GCV) and Mallow CP criterion (MCP) estimators for time-series observation when there is the presence of Autocorrelation in the error term of the model. The Monte-Carlo study considered 1,000 replication with six sample sizes: 30; 60; 120; 240; 480 and 960, four degree of autocorrelations; 0.1; 0.3; 0.5; and 0.9 and three smoothing parameters; λGML= 0.07271685, λGCV= 0.005146929, λMCP= 0.7095105. The cubic smoothing spline selection methods were also applied to a real-life dataset. The Predictive mean square error, R-square and adjusted R-square criteria for assessing finite properties and goodness of fit among competing models discovered that the performance of the estimators is affected by changes in the sample sizes and autocorrelation levels of the simulated and real-life data set. The study concluded that the Generalized Cross-Validation estimator provides a better fit for Autocorrelated time series observation. It is recommended that the GCV works well at the four autocorrelation levels and provides the best fit for time-series observations at all sample sizes considered. This study can be applied to; non-parametric regression, non-parametric forecasting, spatial, survival and econometric observations. DOI:10.46481/jnsps.2021.265


Introduction
A cubic spline is the most widely recognized example of the smoothing spline regression model. It's anything but a piecewise cubic function that interpolates a bunch of observation focuses and ensures smoothness of the observations [1]. It is piecewise third-degree polynomials that go through a core of interests. It has a nonstop first and second subordinate with the request for (d − 1) coherence, where d is the polynomial degree [2]. The model with shortened force premise work b(t) changes the factors t i and fit a model utilizing these changed factors, which adds non-linearity to the model and empowers the splines to fit smoother and adaptable non-straight cubic measures. It is assumed that the variables (t i , y i )and (t i+1 , y i+1 ) are connected by a cubic polynomial S it = a i t 3 + b i t 2 + c i t + d i that is valid for t i ≤ t ≤ t t+1 for i = 1, 2, . . . , n − 1 [3]. The interpolation function is derived by firstly finding the coefficients a i , b i , c i , d i , for each of the cubic functions. For n points, there are n − 1 191 cubic functions to find, and each cubic function requires four coefficients. Therefore we have a total of 4(n − 1) unknowns, which implies that 4(n − 1) independent equation coefficients are required. Firstly, cubic functions must intersect the observation on the left and the right: S i (t i ) = y i , i = 1, 2, · · · , n − 1, S i (t i+1 ) = y i+1 , i = 1, 2, · · · , n − 1 (1) Equation (1) produces 2(n − 1) conditions. Then, we need each cubic function to join as easily with its neighbours as could be expected, so we compel the splines to have consistent first and second subsidiaries at the observations i = 1, 2, . . . , n − 1: Besides, S i (t) is figured by choosing to fit the additional conditions being performed. A typical arrangement of definite imperatives accepts that the subsequent subsidiaries are zero at the endpoints; this implies that the bend is a "straight line" at the endpoints, written as; There exist a few studies on the goodness-of-fit test for nonlinear regression models in the literature; these current studies can be grouped as a penalized smoothing, polynomial regression model and soothing spline test statistics, double coordinations regression and nonparametric regression models. [4] proposed another test estimation for testing uprightness of assault of a m th demand polynomial backslide model. The test measurement is; 10 µ (n) λ(t) 2 dt, where µ(n)λ is the z th request subsidiary of a z th request smoothing spline estimator for the regression model µ and λ is its related smoothing parameter. The huge example qualities of the test measurement are gotten from both the invalid and elective speculation. [5] portrays a goodness-of-fit technique for testing the parametric capacity for the regression model and the change in a parametric nonlinear regression model. [6] proposed a likelihood and restricted likelihood extent tests for decency of-attack of a nonlinear regression using first-request Taylor assessment gauge around the maximum likelihood estimator of the relapse boundary to harsh the invalid and elective theory is shown nonparametrically using penalized splines. [7] applied bootstrap techniques that are computationally productive to assess the achievement of goodness-of-fit measurement and see that for the most part, the power and type one error of the goodness-of-fit measurements rely upon the model being scrutinized. [8] considered a smoothing-based test estimation and surmised its invalid scattering using a bootstrap philosophy to propose a goodness-offit test for examining parametric covariance capacities against general nonparametric choices for both irregularly noticed longitudinal perceptions and thickly noticed useful perception. [9] offered a goodness-of-fit test for nonparametric regression models with straight smoother structure by noticing factual dependence between the assessed error terms and the covariates using the Hilbert-Schmidt Independence Criterion (HSIC). The bootstrap is used to acquire p-values and show the fitting type one error and power of the test execution through Monte-Carlo data reenactment. It is clear from the existing literature that the goodness-of-fit of smoothing spline for time series observations have not been investigated so far. This paper presents a goodness-of-fit test for time series observations using three classical cubic spline nonparametric regression functions. In section two, the cubic smoothing spline was discussed, smoothing spline selection parameters like Generalized Cross-Validation, Generalized maximum Likelihood, Mallow's C.P. criterion and performance evaluation criteria were also addressed in this section. The simulation result is given in section three, while section four presented the real-life dataset result. Finally, a discussion of findings and conclusion are presented in Section five.

Cubic Smoothing Spline
The spline smoothing model is written as; Where; y i is the response/dependent variable, f is an unknown smoothing function, t i is the independent/predictor variable and ε i is zero mean autocorrelated stationary process [10]. The general cubic spline function is given as; where; a, b, c, and d is real number coefficients and a 0, t is the independent variable, ε is the error term and d. f. is k − d − 1 (k is number of knots and d is the degree of the cubic spline) The cubic smoothing spline estimatef of the function, f is defined to be the minimizer (over the class of twice differentiable function) of; where; 1. λ > 0 is a smoothing parameter, 2. The initial part of the equation is the residual sum of the square for the goodness of fit for the observation. 3. The subsequent term is a roughness penalty, which is enormous when the incorporated second derivative of a regression function f (t) is likewise huge 4. If λ approaches 0, then f (t) Simply interpolates the observations. 5. If λ is very large, then f (t) will be chosen so that f (t) Is wherever 0, which suggests a by and large direct leastsquares fit the perceptions.
If f (t) values are fixed at f (t 1 ) , . . . . , f (t 2 ) the roughness b a f (t) 2 dt is minimized by a natural cubic spline, this solution is written as a basic function as; 2.2. Selection of the smoothing parameter The smoothing parameter in cubic spline smoothing is to control the smoothness of the fitted curve, to estimate the optimal value of the smoothing parameter λ, three smoothing parameter selection criteria are considered and compared in this study: Generalized Cross-Validation (GCV), Generalized Maximum Likelihood (GML) and Mallow Cp (MCP). The generalized Cross-Validation (GCV) selection method was suggested by [11,12] as a substitution for Cross-Validation (CV), which is the most famous technique for selecting the intricacy of statistical models. The essential standard of crossvalidation is to leave the information that brings up each in turn and choose the estimation of λ under which the rest of the information best predicts the missing focuses [13,14]. To be exact, let g −1 λ be the smoothing spline determined from all the information sets aside from (t i , y i ) utilizing the worth λ for the smoothing boundary. The cross-approval decision of λ is then the estimation of λ, which limits the Cross-Validation score, be; Equation (8) is similar to the criterion for model estimation in regression, generally [15]. Define a matrix A (λ) by; [12, 16,17] also suggest the use of a related criterion, called Generalized Cross-validation, obtained from (10) by replacing A ii (λ) by its average value, n −1 trA(λ), this gives the score.
Where; RSS (λ) is the residual sum of squares, In their study [12] likewise give hypothetical contentions to show that Generalized Cross-Validation ought to, asymptotically, pick an ideal estimation of λ in the sense of minimizing the average squared error at the design points. The predicted published practical examples bear out a good performance in [18]. The summed-up Cross-validation technique is notable for its optimal properties [19]. If there exists an n x n, the impact matrix, with the property and Generalized Cross-Validation is the adjusted form of Cross-Validation, a traditional technique for estimating the smoothing parameter. The GCV score is constructed by comparison to the CV score obtained from the ordinary residuals by dividing them by 1 − S (λ)) ii . The accepted format of GCV is to replace the notation 1 − (S (λ)) in Cross-Validation with the mean score 1 − n −1 trace S (λ)Thus, by adding the squared residual and notation {1 − n −1 trace S (λ)} 2 , by the already known ordinary cross-validation, the GCV smoothing method is written mathematically as; where; n is the dataset (x i , y i ), λ refers to the smoothing parameters and S (λ) is the ith diagonal member of a smoother matrix Generalized Maximum Likelihood (GML) selection method; [20] proposed the GML technique for correlated data with one smoothing parameter. In a bivariate model, two smoothing parameters should be assessed simultaneously along with the covariance boundaries. Following a comparative determination, GML is given as; is the product of the n− m nonzero eigenvalues of (I − − S (λ)) . λ is Smoothing parameter, W is the correlation structure, S (λ) is the diagonal element of the smoother matrix, n is n 1 + n 2 pairs of measurements/observations and m are the number of zero eigenvalues. Mallow's C.P. Criterion (MCP) selection method was developed by [21] to estimate the fit of a regression model dependent on Ordinary Least Square. It is applied to a model choice situation where explanatory variables can predict a few results and locate the best model associated with subset independent variables. The more modest the estimation of the Cp, the generally exact it is, the Cp is written numerically as; where; n is the measurements or observations, λ is the smoothing parameters and S (λ) is the i th diagonal member of the smoother matrix. The assumption underlying the application of the Generalized Cross-Validation (GCV), Generalized Maximum Likelihood (GML) and Mallow's CP criterion (MCP), the observations must be well represented by the model.

Simulation Study
In this section, a simulation study is performed to assess the performance of the three cubic smoothing spline estimator, namely 193 Generalized Cross-Validation (GCV), Generalized Maximum Likelihood (GML) and Mallow's CP criterion (MCP) when autocorrelation is present in the error term. Before the results were computed, datasets for the different simulation combinations are generated using codes written in the [22]. The data generation procedure, with accompanying explanation is presented in Table 1.

Performance Evaluation Criteria
A comparative analysis was performed made to test the performance and goodness-of-fit test of the three cubic spline estimation methods (i.e. Generalized Crossed Validation (GCV), Generalized Maximum Likelihood (GML) and Mallow CP Criterion (MCP) in the presence of autocorrelation error. The Predictive Mean squared prediction error of a smoothing or Curve fitting procedure according to [22,23,24] is the normal worth of the square distinction between the fitted value suggested by the predictive functionf (x i ) and the value of the observed function f (x i ). It is utilized to assess the performance and nature of explanatory variables or Smoothing techniques like Cross-Validation, Generalized Cross-Validation, Generalized Maximum Likelihood and so forth. The Predictive Mean Square Error (PMSE) is written mathematically as; The Predictive Mean Square Error is usually grouped into two parts; the first part is the sum of square biases of the fitted observations, while the second is the total of variances of the fitted observations. Where; f (x i ) is the observed value andf (x i ) is the fitted/predicted/estimated value. Based on each estimate of the parameter, the methods were ranked according to their performance at the criterion. The evaluation of methods was concluded at two levels using individual measures and the totality based on the standard. For the first level, the ranks were added for each technique and the whole method. Then the methods of estimation were ranked by this total. The smoothing procedure with the least capacity was adjudged the most preferred method and the one with the largest sum the least preferred. These ranks were added together over all the criteria to know how each estimator performs in each parameter in the model. The best estimator in terms of the model was identified by further adding all the ranks over the model's parameters. An estimator is ranked as the best if it has a minimum sum of levels.
Here the groups' total was used, which will give identical results in terms of ranks if the mean levels had been used. But the consequences might be different if the median of the groups had been used. The disadvantage of the median is that if further work were to be done on these ranks, the mathematical procedure would be at least slightly more complex than with the mean. The goodness of fit of the smoothing methods explains how well the methods fit the simulated and real-life data. It also summarizes the differences between the observed value and estimated/predicted values. The Adjusted R-square was used to determine the best-fit smoothing methods. It is written mathematically as; Where; n = number of observations and p = number of parameters Table 2 presents the summary fit result of the cubic spline regression model and the model performance criteria, namely; the predictive mean square error (PMSE), multiple and adjusted R-square based on six small sizes (T = 30, 60, 120, 240, 480 and 960) and autocorrelation level (ρ = 0.1). It was revealed from the result that all the coefficients of the smoothing methods' parameters were significant at (P-value <0.001, <0.01 and < 0.05). The adjusted R-Square result indicated that GCV had the highest values at all sample size levels (T = 30, 60, 120, 240, 480 and 960) and ρ=0.1 with adjusted R-squares of; 0.9963, 0.9979, 0.9971, 0.9992, 0.9986 and 0.9804 respectively. It can be inferred from the result above that; the GCV smoothing method provides the best fit to the time-series observations at time series size of (T = 30, 60, 120, 240, 480 and 960) and ρ=0.1. Tables 3, 4 and 5 show the predictive mean square error (PMSE), R-Square and Adjusted R-square simulation results of GCV, GML and MCP for autocorrelation levels 0.3, 0.5 and 0.9 for sample sizes; 30, 60, 120, 240, 480 and 960. The result indicated that the adjusted R-square value for GCV was greater than GML and MCP's value; this is an indication that the cubic smoothing spline chosen by Generalized Cross-Validation possesses the best fit model. Figures 1 to 6 clearly show the comparisons of the behaviours of the cubic smoothing spline selected by GCV, GML and MCP for sample sizes 30, 60, 120, 240, 480 and 960, respectively. It Steps Explanation Step 1: Obtain n, n repl. and ρ

Simulation Result
The sample size of the simulated dataset, number of replication of n and autocorrelation levels respectively Step 2: Decide on X i , and Y i , Read the simulated sample data (x i , y i ) for i = 1 − T and each i s error for these points sum up all the PMSEs to get the corresponding GCV, GML and MCP scores for the given values of n, λ and ρ.

Application to Real-life data
In this section, the performance of the cubic smoothing spline selection methods on the real-life dataset of the federal government capital expenditure (in billion nairas) in Nigeria between 1981-2019 sourced from [25] is presented as our example. This series has 39 datasets of the expenditure, the cubic spline was fitted for the mean function (i.e. f ∈ w) and a first-order Autoregressive process AR (1) for the disturbance. The Generalized Cross-Validation (GCV) was used in the example because it was found to perform better than the other cubic smoothing spline competing models in the simulation study presented in section three. This cubic smoothing spline curve presented in Figure 7 showed that the observed data in our curve is very 197 close to the estimated data. This provided great insight on the cubic smoothing spline selection method whose model produces the best fit for the time-series observations used as an example to validate generalized cross-validation (GCV) cubic spline selection method as the preferred model for time series observation.

Discussion and Conclusion
This paper presents the goodness-of-fit test for time series observations using three cubic spline nonparametric regression functions. A simulation study and real-life dataset on the total federal government capital expenditure (in billion nairas) between 1981-2019 in Nigeria were used to demonstrate how the three classical cubic smoothing spline selection methods perform when a time series dataset possesses autocorrelation in its error term. 198  In the general structure of the simulated result, it was observed that an increase in the sample size and changes in the level of disturbances from autocorrelation affect the performance of the three cubic smoothing spline methods see (Tables 1-4 and Figures 1-6). The adjusted R-Square result indicated that the GCV had the highest values of 0.9992 at n = 240 and ρ = 0.1, closely followed by the GML and MCP. It was discovered that the generalized cross-validation (GCV) smoothing method provides the best model fit and proved to be more efficient than the other smoothing methods for the simulated time-series observations with autocorrelation levels (ρ = 0.1, 0.3, 0.5 and 0.9) in the error term and for sample sizes - Our findings also revealed that the GCV smoothing spline estimator out-performed the other competing selection methods for time series observation disturbed with four autocorrelation levels (ρ = 0.1, 0.3, 0.5 and 0.9). The GCV is a smoothing spline method fitted without any defection and shortcoming under cubic spline functional form with the highest adjusted R-Square of 0.9992, 0.9297, 0.8974, at ρ = 0.1, 0.3, 0.5, for n = 240 and 480 respectively. This finding is corroborated by [13,26,27,28 ] whose findings found that GCV was fairly better when compared to GML for n = 64. That GCV was distinctly unrivalled for n = 128, while for n= 32, GCV was better for more modest σ 2 and the examination close for bigger σ 2 . Other findings recommended generalized cross-validation as the best method for penalized Spline Smoothing parameter estimation and that GCV-Spline decides fitting measures of smoothing fMRI time arrangement.