Statistical Modelling by Topological Maps of Kohonen for Classiﬁcation of the Physicochemical Quality of Surface Waters of the InaouenWatershed Under Matlab

Self-organizing maps (SOMs) and other artiﬁcial intelligence approaches developed by Kohonen can be used to model and solve environmen-tal challenges. To emphasize the classiﬁcation of Physico-chemical parameters of the Inaouen watershed, we presented a classiﬁcation strategy based on a self-organizing map (SOM) artiﬁcial neural network in this study. The use of a self-organizing map to classify samples resulted in the following ﬁve categories: Low quantities of Sodium Na (mg / l), Potassium k(mg / l), Magnesium Mg(mg / l), Calcium Ca(mg / l), Sulfates SO4(mg / l), and Total Dissolved Solids TDS (mg / l) distinguish Classes 2 and 3. Bicarbonate HCO3 (mg / l), Total Dissolved Solids TDS (mg / l), Total Alkalinity CaCO3(mg / l), Mg(mg / l), Calcium Ca (mg / l), and electrical conductivity Cond (s / cm) are slightly greater in Classes 1 and 4. Except for Dissolved Oxygen D.O. (mg / l) and Nitrate NO3(mg / l), Class 5 has exceptionally high values for all metrics. The results suggest that Kohonen’s self-organizing maps (SOM) classiﬁcation is an outstanding and fundamental tool for understanding and displaying the spatial distribution of water physicochemical quality.


Introduction
Self-organizing maps (SOM) are artificial neural network techniques based on unsupervised learning algorithms [1]. Because of their classification capabilities and visualization performance, they have been successfully used in environmental fields (soil,water, air, etc.). Shen et al. [2] studied the groundwater chemistry and quality of a multilayered groundwater system in the Northwest China Coal an approach based on the SOM technique combined with multivariate statistical tools. Bigdeli et al. [3] applied self-organizing map (SOM) technique and K-means clustering algorithms to describe geochemical anomalies in Moalleman district, northeast Iran. Santos et al. [4] investigated the regional hydrogeochemical spatialization and controls of the Serra Geral aquifer system in the southern region of Brazil, using Kohonen's self-organizing maps (SOM) in combination with k-means clustering. Amiri et al. [5] conducted a Spatio-temporal assessment of groundwater quality in a coastal aquifer, based on Kohonen's linear discriminant analysis (LDA) and self-organizing maps (SOM).
The goal of this study is to use a statistical technique (SOM) to display and analyze the spatial distribution of water samples and their Physico-chemical properties at the level of the Inaouen watershed. To emphasize the distinct classes and detect the spatial fluctuations of the Physico-chemical characteristics of the examined watershed, SOM's hierarchical classification (SOM-CHA) is utilized.

Sample Location and Description
The Oued Inaouene watershed is located in northeastern Morocco, between the Middle Atlas and the Pre-Rif, and covers approximately 5109 km2 (Figure 1. The watershed has a Mediterranean climate with oceanic influences, with great seasonal variations and very evident abnormalities in rainfall due to its geographic location [6]. With an average annual rainfall of roughly 600 mm/year, this region is known for its very varied rainfall from month to month. The rainfall distribution reveals two bands: the first below 500 meters with an annual rainfall of roughly 800 mm, and the second between 500 and 1000 meters with an annual rainfall of 800 to 1500 mm. The rainy season in the study region lasts from November to April, with December and January being the wettest months. July and August, on the other hand, are the driest months of the year. The Oued Inaouène watershed is distinguished by a lithological contrast between the two banks. The Pre-Rif, whose outcrops are marly and support a clear sown flora, is on the right bank. The left bank corresponds to the Middle Atlas' northern boundary, with outcrops ranging from Tazzeka's Paleozoic formations to the triasi formations [6].

Data Source
The dataset for this study consists of 16 Physico-chemical characteristics measured on 100 surface water samples taken in the Inaouen watershed between 2014 and 2015. The processes for collecting, transporting, and stocking water samples is done by our team according to the protocol of the National Office of Drinking Water.
A portion of the analyses was completed on-site, while the remainder was completed at the CURI laboratory in Fezby our team [7].

Self-Organizing Maps (SOM)
Kohonen, who was looking for a technique to represent huge multidimensional data, was the first to introduce topological maps. To do this, Kohonen uses machine learning [8]to divide data into "similar" groups whose neighborhood structure can be realized and represented using a discrete low-dimensional (1, 2, or 3D) space called a "topological map" [9]. Topological map methods allow data to be projected onto a low-dimensional environment while revealing the data's inherent structures. As a result, the SOM technique preserves both the topology of the data and the distance relationship between them ( Figure 2). Unsupervised learning artificial neural network methods [10,11], often known as SOM maps, are a type of neural network [12]. The samples (input vectors) are supplied to a grid of d neurons (or nodes, or units) in these networks [13]. The choice of the parameter d (the map's dimension) is chosen ahead of time. The input vectors are connected to each grid neuron via d synapses and d weight vectors w. The map neuron in data space is represented by the vector w, which is also known as the prototype or referent vector [14]. The BMU is the closest data referent (the Best Matching Unit). The quantization and topological preservation capabilities of a self-organizing map are assessed. Topological error (Te) and quantization error (Qe) are commonly used to validate the SOM categorization [15,16].
Quantization Error Q e : (or resolution measure) The average quantization error, which is defined by the average distance of the data from their referents(BMU), is used to determine the degree of deployment of the map on the data or the degree of quantization [1,14]. The better the quality of the SOM algorithm, the lower the value of Qe. It is expressed as follows: where N is the number of data, x (k) is the k-th individual, and w x (k) is the BMU of the individual x (k) Topographic error Te [16]: (or a measure of topology preservation) This criterion assesses how well the SOM preserves the data set's topology [17]. Which is the percentage of data for which the two nearest referents do not correspond to adjacent map units [18]. In contrast to the Quantification mistake, Te considers the SOM card's structure [19].
The criterion used to measure the number of observations where the first winning neuron (c i ) and the second winning neuron (s i ) are not neighbours on the map. The second winning neuron of observation has its closest referent vector to this observation after the first winning neuron [20]. The topographic error is a metric that is determined as follows: where r c and r s are respectively the locations of neuron c and neuron s on the map. The topology is perfectly preserved when this criterion is 0. The topological map has several advantages over the linear and classification methods [21] usually used to extract groups of collected samples, such as Principal Component Analysis (PCA) [22], Correspondence Analysis (C.A.) and Hierarchical Clustering (H.C.) [23]. Their limitations are well known. For example, for each of them, a strong distortion is observed when there are non-linear relationships between the variables [24].

Hierarchical Clustering by SOM (SOM-CHA)
Like other data analysis methods, of which it is a part, the SOM-CHA classification aims to obtain a simple schematic representation [25]. It consists in calculating a matrix expressing the mutual distances between the points to be classified, which are the nodes of the map, and then, based on this matrix, grouping together the closest points. This method allows the construction of a hierarchical tree [26], which reveals several possible partitions, where each point is assigned to one of the groups of a given partition. The choice of the best partition is made once the hierarchical classification is completed [27].
2.5. Algorithm: Kohonen maps (SOM) [16] Let w t 1 , . . . ,w t N ∈(R n ) N be neurons of the vector space R n . We designate by V w j the set of neighbouring neurons ofw j for this Kohonen card. By definition, we have w j ∈V w j .Let (X 1 , . . . ,X K ) ∈(R n ) K a cloud of points. We use a sequence of positive real numbers (α t ) checking t≥0 α 2 t < ∞ and t≥0 α t < ∞. initialization The neurons w 0 1 , . . . ,w 0 N are distributed in the space R n in a regular way according to the shape of their neighbourhood. t←−0. closest neuron We choose a point of the cloud X i randomly; then, we define the neuron w t k * , so that: As long as the algorithm has not converged, return to the nearest neuron step. The update step can be modified to improve the convergence speed [28] : where h is a function with value in the interval [0,1] which is 1 when w t j =w t k . And that decreases when the distance between these two neurons increases. A typical function is : where r c and r k are respectively the locations of neuron c and neuron k on the map, and σ(t) is the radius of the neighbourhood at iteration t of the learning process. Kohonen maps are used in data analysis to project a point cloud into a two-dimensional space in a non-linear manner using a rectangular neighbourhood. They are also used to perform unsupervised classification by clustering neurons where the points are concentrated. The edges connecting the neurons or vertices of the Kohonen map are either narrowed to indicate that two neurons are neighbours or distended to indicate a separation between classes.

Classification of Surface Water Samples by SOM
The concept of the SOM algorithm is to conduct a nonlinear classification of complicated datasets by recognizing similar patterns. In this work, the input layer consists of vectors representing individuals., each of which contains 16 components representing the 16 Physico-chemical parameters of the surface waters studied. The output layer is composed of 225 100 neurons (10 rows × 10 columns). This size was chosen for the output map because it minimizes the two error criteria (Qe=0.268 and Te=0.03).
The SOM component planes of the data set allow distinguishing two types of colors; dark red cells represent high values, while blue cells represent low values (Figure 3) [16]. The similar colors between the variables correspond to a positive correlation; this can be illustrated between the variables Bicarbonates (HCO3), Chlorides (Cl), Magnesium (Mg), Sodium

Principal Component Analysis (PCA)
The PCA Result shows the score road composed of the two components, PC1 and PC2, which are regarded as the most informative ones since they contribute to most of the variance. In our case, PC1 and PC2 respectively contribute 51.1% and 11.2% of the total variance. Therefore, the first two components of the circle of correlations between variables in the subspace PC1 vs PC2 contain 62.3% of the data. Figure 4 illustrates the circle of correlations between the variables in the factor plane (PC1 × PC2).
The correlation circle between variables in the factorial diagram (PC1 × PC2) shows that the variables Mg, HCO 3 , CaCO 3 , K, SO 4 , Na, Cl, NH 3 , and TDS are positively correlated with the PC1 axis with coefficients above 0.6. However, the element Oxy Diss is negatively correlated with the PC2 axis, with coefficients greater than −0.6. The elements Mg, HCO 3 , CaCO 3 , K, SO 4 , Na, Cl, NH 3

SOM-CHA Hierarchical Classification
Once the Kohonen map is obtained, we use a hierarchical classification based on the Ward method [29,30]and Euclidean distance. The hierarchical classification by SOM allows grouping the cells of the SOM map into groups of Physico-chemical parameters of the Inaouen watershed. The dendrogram obtained by SOM-CHA suggests that the 100 neurons should be grouped into five classes ( Figure 5). The first class contains 13 samples and represents 13% of the total data, it includes waters with average chemical element concentration respectively (HCO (105.39 mg/l), TDS (100.54 mg/l) CaCO 3 (86.38 mg/l), Mg (4.41mg/l), Ca (24.69 mg/l) and electrical conductivity (201.08 µs/cm)) which are a little high and (NH 3 (31.23 µg/l) and (Na (11.60 mg/l) which are low.
The second class includes the largest number of samples (48) and represents 48 % of the total database. It is characterized mainly by low concentrations of chemical elements such as: Na (9.00 mg/l), K (1.02 mg/l), SO 4 (4.61 mg/l) and TDS (58.33 mg/l) and by high dissolved oxygen concentration (6.24 mg/l). The third class contains 13 samples and represents 13% of the total database. It is characterized mainly by low concentrations of chemical elements (SO 4 (4.87 mg/l), Cl (3.19 mg/l) and Na (4.91 mg/l)), Mg (1.81 mg/l), K (0.84 mg/l), Ca (9.95 mg/l), NH 3 (17.69 mg/l), and TDS (43.38 mg/l) and a very high concentration of P(225.39mg/l) and dissolved oxygen(6.30 mg/l). The fourth class contains 23 samples that represent 23 % of the database. It is characterized by medium high concentrations of chemical elements HCO 3 (117.76 mg/l), Cl (47.98 mg/l), Mg (4.85 mg/l), CaCO 3 (96.52 mg/l), Ca (29.83 mg/l), Na (38.90   Table1.Statistics on-base quantities (Min, mean, maximum) for Physico-chemical parameters, respectively, for the whole database and for classes 1, 2, 3, 4 and 5.

The U Matrix Classification:
For the U Matrix classification map, the hexagonal topology was chosen to achieve a better resolution and speedierresults. The rectangular topology needed many fewer neurons to get a small quantization and topography error. The result of the Umatrix ( Figure 12) from the selected SOM parameters. The precision of the classification via the BMUs in the U-matrix is considered almost exact and produces a high quality and very smooth mapping.

Conclusion
Statistical analysis based on Kohonen's self-organizing map (SOM) approach was applied to a database consisting of 16 Physico-chemical parameters carried out on 100 samples of the surface waters of the Inaouen watershed between February 2014 and December 2015. It highlighted the different positive, and negative correlations between the different Physicochemical parameters studied. The hierarchical classification of the SOM map (SOM-HC) detected spatial variations from one source to another, identifying the physicochemical behaviour of the waters of the Inaouen watershed. This differentiation would probably be related to the geological nature of the land crossed, the difference in altitude and the domestic discharges of neighbouring municipalities.