Jackknife Kibria-Lukman M-Estimator: Simulation and Application

The ordinary least square (OLS) method is very efficient in estimating the regression parameters in a linear regression model under classical assumptions. If the model contains outliers, the performance of the OLS estimator becomes imprecise. Multicollinearity is another issue that can reduce the performance of the OLS estimator. This study proposed the Robust Jackknife Kibria-Lukman (RJKL) estimator based on the M-estimator to deal with multicollinearity and outliers. We examine the superiority of the estimator over existing estimators using theoretical proofs and Monte Carlo simulations. We put the estimator to the test once more using real-world data. We observed that the estimator performs better than the existing estimators. DOI:10.46481/jnsps.2022.664


Introduction
The regression model is commonly used in many disciplines to analyze data. The model's analysis is a form of predictive modelling technique which statistically examines the relationship between two different sets of variables. The first variable is called the dependent, also known as the target variable. The second variable is called the independent, also known as the predictors because they are usually more than one. The model is frequently used for forecasting. Mathematically, the general regression model includes; an n × 1 vector of observations referred to as dependent variable labelled y, a known full column rank of n × p standardize and centered independent variables labelled X, a p × 1 vector of unknown parameters labelled β and an n × 1 vector of disturbances labelled ε. ε is assumed to be normally distributed with E (ε) = 0 and dispersion matrix Cov (ε) = σ 2 I. The model is mathematically written as The Ordinary Least Square (OLS) method is very efficient in estimating the regression parameters in a linear regression model under classical assumptions. The Gauss Markov theorem establishes this fact. The theorem stated the OLS estimator has the best linear unbiased estimator (BLUE) with minimum vari-251 ance in the class of all unbiased linear estimators. However, if the dataset for the regression analysis contains outliers, the performance of the OLS estimator becomes imprecise [1][2][3]. The OLS estimator of β is given by β = S −1 X y (2) where S = X X. When outliers are present, robust regression is used, which gives better results than the OLS method [4][5][6][7]. The M-estimation approach is the most common robust regression method which is used to handle outlier in the y-direction [8]. It is a generalization to maximum likelihood estimation in context of location models. The means that it is nearly as efficient as the OLS. The approach involves minimizing residual function rather than minimizing the number of squared errors as the objective. Generally, the approach considers the likelihood function of β and and by replacing the OLS criterion with a robust criterion, Mestimator of β is The purpose of this study is to propose an estimator that solves the problem of multicollinearity and outlier in linear regression model. We investigate the superiority of the estimator through theoretical comparison, simulations and practical application.

Existing Shrinkage Estimators
A popular shrinkage estimator is the ridge estimator (RE), developed by [9] and expressed as: where W(k) = (S + kI) −1 S and k > 0. However, RE can be sensitive to outliers in the y-direction. Silvapulle [10] combined the advantage of the ridge estimator and the M-estimator to form the Ridge M-estimator (RME), expressed as follows: Kibria and Lukman [11] recently proposed an estimator called the Kibria-Lukman (KL) estimator, the estimator is expressed as: where M (k) = (S + kI) −1 (S − kI) = I − 2k(S + kI) −1 .

A New Robust Estimator
Using the same approach as [10,17,3], we combined the JKL estimator in (9) and the M-estimator to form the Robust Jackknife Kibria-Lukman (RJKL) estimator. It is obvious that the presence of outliers in the y-direction will reduced the efficiency of the JKL estimator. Thus, we defined the RJKL estimator as follows where k > 0.
The canonical form of model (1) is written as where Z = XT, α = T β and T is the ortogonal matrix whose columns contains the eigenvectors of X X. Then where λ 1 , λ 2 , . . . , λ p > 0 are the ordered eigenvalues of X X. Let α M be an M-defined by the solution of the M-estimating equations ϕ(e i /s)z i = 0 where e i = y i − z i α M , s is an estimator of scale for the errors and ϕ(.) is some suitably chosen function [18]. Thus, the estimators presented in (2)(3)(4)(5)(6)(7)(8)(9) can be written in canonical form as follows: where The organization of this article is as follows. The theoretical comparison among estimators is given in section 2.2. Robust choice of the biasing parameters are discussed in section 3, a simulation study conducted to evaluate the performance of the proposed estimator in section 4 and a real life data was analyzed in Section 6 to illustrate the finding of the paper. Section 7 ends with some concluding remarks. 252

Superiority of the RJKL Estimator
The Bias of an estimator β is expressed in equation (20), its Mean Squared Error Matrix (MSEM) in equation (21) and its Scalar Mean Squared Error (MSE) in equation (22).
where D β is the variance matrix of β, E β is the expectation of β and tr (A) is the trace of a matrix, A. Also, for α = Additionally, for two estimators β 1 and β 2 , β 1 is said to be superior to β 2 with respect to the MSEM criterion, if and only if, The converse is not true. We also made use of the following lemmas for the theoretical comparison: Lemma 2.1 [19] For some vector, α and a positive definite matrix, A (that is, A > 0); A − αα ≥ 0 if and only if αA −1 α ≤ 1.
Lemma 2.2 [20] Let β 1 and β 2 be two competing estimators of β. Suppose that the difference between the covariance of the two estimators, 1. MS EM β j and d j denote the mean squared error matrix and bias vector of β j respectively for j = 1, 2.
We used the MSE to prove the superiority of the RJKL. The Mean Squared Error of the OLS estimator and the M-Estimator is expressed in equation (23) and (24), respectively.
where Ω ii = Cov α M . The mean squared error matrix and the scalar mean squared error of the JKL is defined as follows: where Bias α JKL = I − 2k(Λ + kI) −1 2 I + 2k(Λ + kI) −1 − I α. The corresponding mean square error matrix and the scalar mean square error for the RJKL estimator is: We only consider the JKL and the robust estimators in the theoretical comparison and we impose the following conditions to present the main theorems: Proof: The difference between the MSE of the RJKL and the JKL estimator from (28) and (26) is For Superiority of the RJKL estimator over the robust KL estimator The scalar mean square error of the KL estimator is Proof: The difference between the MSE of the RJKL and the KL estimator from (28) and (30) is Consequently,

Superiority of the RJKL estimator over the Ridge M-estimator
The scalar mean square error of the Ridge M-estimator is The difference between the MSE of the RJKL and the Ridge M-estimator from (28) and (32) is Consequently, Proof: The difference between the MSE of the RJKL and the M estimator from (28) and (24) is Consequently, 6 . Notice that the right hand expression after the addition sign is positive, this follows from bias squared.

Robust choice of the biasing parameter
It is customary to use the optimization procedure to obtain the biasing parameter of an estimator [3]. This is done by minimizing equation (28), which is rewritten in (35), with respect to k.
This can be obtained by setting Proceeding with (36) will yield a rather complex estimation for k. Thus, we propose to use the robust version of the biasing parameter used for the jackknife KL estimator [15]. The biasing parameter used for the jackknife KL estimator is presented in (37).k This parameter (37) is the squared root of harmonic mean of the biasing parameter used for ridge estimator. The robust equivalence of the parameter,k iŝ whereÂ 2 is given by Huber [8] aŝ With the assumption that α M ∼ N(α,A 2 Λ −1 ). And, this holds since n with the scale estimate s o .

Monte Carlo Simulation Study
We adopt the Monte Carlo simulation design [21,22] to observe the superiority of the Robust Jackknife KL estimator. The design was also recently adopted in related studies [23][24][25][26][27][28]. R programming language was used for the simulation.
The following equation is used to generate the predictors: where ρ 2 denotes the correlation between independent variables and z i j are pseudo-random numbers from the standard normal distribution. The coefficients β 1 , β 2 , . . . , β p are selected as the normalized Eigenvectors corresponding to the largest eigenvalue of X X such that β β = 1. This is a common restriction in simulation studies of this kind [3,[25][26][29][30]. The dependent variable determined by where the ε i s are independently generated from N(0, σ 2 ). We consider number of independent variables, p = 3 and p = 7. We introduced 10%, 20% and 30% outlier into each sample size considered in the simulation study [31]. The other specifications considered in the simulation design is as follows: As the standard deviation, σ and the degree of multicollinearity, ρ increases, the mean square error, MSEs of the estimators, α, α M , α K , α M K , α KL , α M KL , α JKL and α RJKL also increases. The mean square error, MSEs of the estimators,  KL estimator performs better than other non-robust estimator as argued by Ugwuowo et al. [15]. The proposed estimator in this study, α RJKL performs much better than the existing estimators considered in this study. That is, the Robust Jackknife Kibria Lukman estimator has the smallest mean square error.

Real-Life Application
We adopted the Hussein data for the data analysis Hussein and Zari [32] were the first to originally adopt the Hussein data and was recently used to test the performance of a proposed two parameter estimator in the presence of outlier [3]. The data contains 31 observations and three independent vari- ables. Details description can be found in [32,3]. The data contains multicollinearity with its variance inflation factor, VIF>10 and about 19.4% outliers in the y-direction at observations 12, 14, 15, 16, 30 and 31. Hence, it is suffi-cient to use the data in this study. The output of the analysis is presented in Table 9. The RJKL estimator has the smallest mean square value. Thus, the RJKL estimator performed better. The percentage MSE  Though, the intercept value of OLS estimator is the highest. We noticed a sharp reduction in the intercept for other estimators, especially the jackknife KL and the proposed.

Conclusion
We introduced a new robust estimator for the linear regression model in this study and named it the Robust Jackknife Kibria Lukman (RJKL) Estimator. The RJKL estimator was proposed to handle outlier and multicollinearity together. The estimator was formed by grafting the M-estimator into the Jackknife Kibria Lukman (JKL) estimator. We presented theorems that state the necessary conditions for the new estimator to perform better than the JKL estimator and other existing robust estimators discussed. We observed a good performance in the simulation study of the new estimator and the real-life data analysis. Both results supports the efficiency of the new estimator.