Modeling and Forecasting the Third wave of Covid-19 Incidence Rate in Nigeria Using Vector Autoregressive Model Approach

Modeling the onset of a pandemic is important for forming inferences and putting measures in place. In this study, we used the Vector autoregressive model to model and forecast the number of confirmed covid-19 cases and deaths in Nigeria, taking into account the relationship that exists between both multivariate variables. Before using the Vector Autoregressive model, a co-integration test was performed. An autocorrelation test and a heteroscedasticity test were also performed, and it was discovered that there is no autocorrelation at lags 3 and 4, as well as no heteroscedasticity. According to the findings of the study, the number of covid-19 cases and deaths is on the rise. To forecast the number of cases and deaths, a Vector Autoregressive model with lag 4 was used. The projection likewise shows a steady increase in the number of deaths over time, but a minor drop in the number of confirmed Covid-19 cases. DOI:10.46481/jnsps.2021.431


Introduction
Coronavirus disease has now been declared a global pandemic. The first case in Nigeria was discovered on February 27, 2020, and was confirmed at the Lagos State University Teaching Hospital's Virology Laboratory. In late December 2019, the coronavirus disease known as a form of severe acute respiratory syndrome was initially discovered in the city of Wuhan, China and it was acknowledged as a pandemic by [1] on the 11 th of March 2020 after infecting over 118,000 people globally. This virus has been the main worry of doctors of medicine, community health experts, and researchers of all fields. Much international public/ community health inventiveness is being executed and swift research of the biology of the virus and pathogenesis of the virus are being conducted in research institutes all around the globe.
The virus which has spread at an exponential rate all over the world has negatively affected the healthcare system in many countries. The covid-19 pandemic is one of the worst pandemics mankind has ever been confronted after the Spanish flu pandemic in 1918, which caused the deaths of about 50 million people at a time the world's population was around 2 billion. Economic and social interruption triggered by the pandemic is overwhelming. Disease prevention and control are eager for disease prediction guidance. Effective models for short-term forecasting have a pivotal role to develop strategic planning methods in the public health system. Under the guidance of the prediction model, we know the severity and the trends of the pandemic under different strategies [2]. According to [3], those who would suffer in the medium-term as a result of actions made to prevent the spread of COVID-19 are the influence of the virus on the socioeconomic determinants of health, as well as its consequences on the next generation.
In Nigeria, the first known and confirmed case of Covid-19 was documented on the 27th of February 2020 in Lagos state according to [2]. After the index case on the 27th of February, the number of confirmed cases has been on the rise with the earliest reported death case on 22nd March 2020. Due to the speedy escalation in the number of cases in Nigeria, the Federal government had to enforce total lockdown in Lagos, Abuja, and Ogun state. Some states which were not included in the total lockdown by the federal government also had lockdown enforced by the State Government to curtail the rise of the deadly virus. With the outbreak of Covid-19, a lot of studies have been carried out in various science disciplines to either reduce the spread or control the increasing trend of the disease. Therefore, to manage and comprehend the epidemic, various approaches of estimation, modeling, and forecasting have been introduced.
Based on five deep learning approaches, [4] conducted a relative analysis to forecast the new number of covid-19reported cases and retrieved instances. Long Short-Term Memory (L STM), Gated recurrent units (GRUs), basic Recurrent Neural Network (RNN), Variational Auto-Encoder (VAE) algorithms, and Bidirectional L STM (BiL STM) algorithms were utilized for the global prediction of COVID-19 cases with a small amount of data. Their research is based on daily verified instances and the number of cases retrieved from six countries: China, Spain, Italy, Australia, the United States, and France. When the performance of each model was tested, it was discovered that VAE had a higher predicting precision than the other models.
Forecasting the Coronavirus (Covid-19) cases and deaths, [5] proposed the approach of statistical time series to model and forecast the short period behavior of Covid-19. They assumed a trend that is multiplicative which aims to capture the persistence of the two variables predicted (number of cases and mortality rate) as well as their uncertainty. The anticipated time series model showed an excellent level of precision and ambiguity as additional data were collected. In a study by [6], the effect of total lockdown on Covid-19 prevalence rate and death rate in China was investigated, and it was concluded that lockdown is effective in lowering the incidence rate and mortality rate.
The widespread increase in covid-19 and death as a result of corona-virus infection has been predicted by [7]. They also looked at a time series model that was used to forecast the number of confirmed and recovered coronavirus cases. The error distributions were carefully designed as a two-member scale combination of classical (TP-SMN) models, with the best match carefully selected. The chosen model was used to forecast the global number of diseases and fatalities caused by covid-19. The study [8] looked at epidemic data and statistics with a focus on Covid-19 and found that lockdown measures in Italy, Spain, and China, as well as the closure of firms in Hubei that provided non-essential services, were beneficial approaches.
In the study carried out on the coronavirus (covid- 19) in Spain and Italy by [9], two simple mathematical epidemiological models were applied where it was observed that the loglinear regression yielded an improved result and basic estimate of the everyday incidence for both countries. [10] studied the gender-based covid-19 prevalence rate and death rate in Nigeria. In the study, a Wilcoxon signed-rank test was adopted to examine disparity in the sex distributions of the daily prevalence. In the work of [11], the autoregressive integrated moving average was adopted to forecast the covid-19 incidence rate in India where an increasing tendency in the number of coronavirus cases was observed. The Vector autoregressive model and Co-integrated Vector Autoregressive models are time series models used for multivariate time series data set. A lot of research has been carried out through the adoption of this model which takes into account the linear dependence that exists among the variables. For example, [12] adopted the Co-integrated Vector Autoregressive model in modeling Wind speed along with some selected meteorological variables.
Using the (VAR) Model, a time-series analysis was proposed to investigate the impact of environmental pollution on mortality in Nigeria. The data set passes the stationarity test, indicating that the data is steady and that the VAR model would fit well. Furthermore, environmental pollution has a considerable impact on mortality in Nigeria, according to the study [13].
A study conducted in the United State by [14] used a VAR model to predict the covid-19 prevalence rate in the U.S. The result of the research stated that the situation of the pandemic will get shoddier if there is no active control.
Vector Autoregressive Integrated Moving Average analysis (VARIMA) to establish the relationship between the number of deaths due to covid-19 and the number of new cases of covid-19 in the country. The AICC was used to select the best model after it fulfilled all the assumptions [15].
Modeling the outbreak of a pandemic is pertinent for inferencemaking and implementation of policies. In this paper, we adopted the Vector autoregressive model in modeling and forecasting the number of covid-19 cases and deaths in Nigeria.
In the next section (Section 2), we describe the data we used in the analyses and the VAR model and analysis plan for the research. The Results section (Section 3) provides the prediction results by VAR modeling and an internal validation/evaluation of the model. Section 4 discusses the model performance, further improvement, and comparison with other models.

Data source
The data used for this study is daily data on the number of Covid-19 cases and death obtained from https://raw.githubusercontent. com/owid/covid-19-data/ master/public/data/ owid-covid-data.xlsx The methodology used for this paper is given below;

Vector Autoregressive Model
Consider a k-dimensional vector autoregressive model of order 2 given below, where a t ∼ N (0, ), Φ 1 and Φ 2 are k×k matrices of unknown coefficients. Let S t (t) denote the price of the risk free asset at time t and the model is given as follows Subtracting y t−1 from both sides of equation (1), and adding Φ 2 y t−1 to the right-hand side of equation (1), we have (2) is referred to as the vector error correction model, otherwise called the co-integrated vector autoregressive model. Suppose we have a multivariate variable consisting of variables P and Q, according to [16], equation (2) can be written in matrix form as; In the formulation of the vector error correction model in equation (2), there are three cases of interest to be considered which are; 1. Rank(Φ) = 0 implies y t is not cointegrated and the vector error correction model in equation (2) reduces to a vector autoregressive model in (1) 2. Rank(Φ) = k, then y t contains no unit root. That is, y t is stationary and I(0), where k is the total number of variables. 3. 0 <rank(Φ) = m < k, then there is at least one stationary linear combination of the variables which is the co-integrating relation. In this case,(Φ) = αβ 1 , where α =vector of adjustment coefficients and β =co-integrating vector.
For equation (2) to hold, Φ should be of a reduced rank [17].

Augmented Dickey Fuller test (ADF)
The Augmented Dickey fuller test statistics were used to test the stationarity of the data. Considering the hypothesis; H 0 :∅ 1 = 1 VsH 0 :∅ 1 < 1, The ADF test statistic is given as;

Akaike Information Criterion
The AIC for selecting the underlying vector autoregressive (p) is given as Where n is the size of the sample, p is the parameter numbers, a t is the error term

Final Prediction Error (FPE)
The FPE for selecting the underlying vector autoregressive (p) is given as Where N is the number of values in the estimation data set, e(t) is the n by 1 vector of prediction errors, θ N represents the estimated parameters, d is the number of estimated parameters.

Hannan-Quinn Information Criterion (HQIC)
The HQIC for selecting the underlying vector autoregressive (p) is given as where L max the log-likelihood, k is is the number of parameters, and n is the number of observations

Lagrange Multiplier Statistic
The Autoregressive Conditional Heteroscedastic (ARCH) Lagrange Multiplier (LM) is used to test the hypotheses of homoscedasticity. This involves regressing the squared residuals on the conditional mean equation which may be an autoregressive, or moving average model. For example, considering an ARMA (1,1) process, Where T= Sample size, R 2 = R squared

Residual Autocorrelation
The Box-Pierce statistic which was proposed by [18] will be used to test the autocorrelation in the residuals. H 0 : No autocorrelation up to order k vs H 1 : There is autocorrelation up to order k. The statistic for the test is given as Where n = Sample size, r = Autocorrelation at lag j

Normality Test
The Jarque-Berra test was used to decide if the error correction model is Gaussian distributed. The test is used to measure the discrepancy in Skewness and Kurtosis of a variable compared to those of the Gaussian distributions.
H 0 : The variable is distributed normally vs H 1 : The variable is not distributed normally.
where M= number of observations, p=Number of estimated parameters, S= Skewness, L=Kurtosis.
The condition is to reject the null hypotheses if the p-value ≤ level of significance.

Time plot
The foremost step during the study of the data is to generate the time plot of the variables. Figure 1 is a time plot of the amount of Covid-19 confirmed cases in Nigeria and the number of Covid-19 related deaths in Nigeria. The graph shows an increasing inclination (trend) in the number of confirmed covid-19 cases and deaths in the country with the first wave occurring early in the year 2020, the second wave in the late year 2020, and the third wave which is the greatest occurring in the early year 2021. There seems to be no decrease in the pandemic  Table 1 gives a descriptive summary of statistics of the number of confirmed covid-19 cases and covid-19 deaths. It is observed that there is a high level of variation in the data obtained. The difference between the three measures of central tendencies (mean, median, and mode) in the number of confirmed covid-19 cases shows a departure from normality, the same with the number of covid-19 deaths. This departure from normality can be ignored considering the reasonably large sample size.

Stationarity test
From table 2, it is seen that the null hypothesis of nonstationarity could not be rejected for the number of confirmed Covid-19 cases as our p-value is greater than our level of significance 0.05, hence the need for differencing. H 0 : The series is non-stationary vs H 1 : The series is stationary MacKinnon approximate p-value for Z(t) = 0.9970 From table 3, it is seen that the null hypothesis of non-stationarity was rejected after taking the third difference as our p-value is less than our level of significance 0.05. The series is stationary after the first differencing. H 0 : The series is non-stationary vs H 1 : The series is stationary MacKinnon approximate p-value for Z(t) = 0.0000 From table 4, it is seen that the null hypothesis of non-stationarity was rejected for the number of Covid-19 deaths as our p-value of 0.0002 is less than our level of significance 0.05 which means the series is stationary. H 0 : The series is non-stationary vs H 1 : The series is stationary MacKinnon approximate p-value for Z (t) = 0.0002

Lag selection and model estimation
A lag of order 4 was selected based on the following information criterion used where it is observed that the minimum occurs on the fourth lag.
Using equation 2 on the Covid-19, the model for the number of deaths and number of cases are estimated and presented in equations 12 and 13 respectively with the model summary presented in tables 6 and 7. From table 6, it is evident that for every increase in the number of deaths, there is a 0.0012504, 0.0024392 and 0.0011357 upsurge in the number of reported   Table 8 gives the normality test of the disturbances as not being normally distributed. However, with practically huge sample sizes, the contravention of the Gaussian hypothesis ought not to cause any setback [19] Number of deaths = −0.0008647y t−1 + 0.0024392y t−2 + 0.0011357y t−3 + 0.0012504y t−4 Number of cases = 2.924165y t−1 − 4.285465y t−2 + 1.410195y t−3 − 0.280125y t−4 (13) Table 9 reports the heteroscedasticity test of the number of confirmed Covid-19 cases and deaths in Nigeria. The test result shows that the null hypothesis of no ARCH effect was not rejected. That is, there is no ARCH effect. Table 10 gives the Box-Pierce statistic for the autocorrelation test of the 4 lags which shows no autocorrelation is depicted across the four lags.

Lagrange Multiplier Test for Autoregressive Conditional Heteroscedasticity (ARCH) and Autocorrelation
H 0 : noARCHe f f ectsV sH 1 : ARCH (p) disturbance H 0 : noautocorrelationatlagorder 3.6. Forecast Precision Figure 2 shows a forecast of the number of covid-19 cases and deaths for the next one year (365 days). From the graph, it can be seen that there is a sharp rise in the Covid-19 mortality rate alongside a slight decrease in the number of confirmed cases with 95% confidence interval. This shows the future expectation of the current pandemic

Conclusion
The surge of covid-19 has crippled the health care system in Nigeria and other parts of the world. The need to model and study the incidence rate of covid-9 cannot be overemphasized as it is pertinent for concrete decision-making. There has been a sharp rise in the figures of covid-19 cases and death as depicted by the time plot. The time plot of the number of cases and deaths shows an "S" shape which indicates the increase in the pandemic. A Vector Autoregressive model of lag 4 was adopted which was used to make a forecast on the number of cases and death. Moreover, an autocorrelation test and a test of heteroscedasticity were carried out where it was observed that there exists no autocorrelation at lag 3 and lag 4 and there exists no heteroscedasticity. A Jarque-Berra test of normality of the disturbances was done on the Vector Autoregressive model   which indicates a departure from normality. However, this result can be ignored for a reasonably large sample size of at least 30 according to [19]. The forecast also reveals an upward trend in the number of deaths with a slight decrease in the number of infections. This is an indication that in the future, the spread may be reduced however, there will be a high mortality rate resulting from this pandemic. Though the cause of death can also be attributed to some other factors such as age or other underlying ailments the patient may have which could have been triggered by Covid-19. This can be an area of further research.