Variation Between DDC and SCAMSMA for Clustering of Wireless Multipath Waves in Indoor and Semi-Urban Channel Scenarios

Article history: Received: 08 September, 2020 Accepted: 07 October, 2020 Online: 20 November, 2020


Introduction
The European Cooperation in Science and Technology (COST) 2100 Channel Model (C2CM) [1]- [4] can reproduce the properties of multiple-input multiple-output (MIMO) wireless propagation channels. A multipath component (MPC) is classified based on the delay (τ), angle of departure (Azimuth of Departure (AoD), Elevation of Departure (EoD)), and angle of arrival (Azimuth of Arrival (AoA), Elevation of Arrival (EoA)). Groups of multipath components with similar delays and angles comprise a multipath cluster, which characterized C2CM.
Analyzing wireless multipaths is an important problem where clustering is crucial. The attribute, performance, and efficiency of the communications system can be studied, understood, and improved by the generated channel model. The accuracy and correctness of the channel models significantly affect the precision and exactness of clustering wireless propagation multipaths. Several channel models and measurements reveal the clustering of multipaths. Inaccurate clustering of the wireless propagation multipaths leads to incorrect channel models and thereby degradation in performance.
Clustering finds the underlying structure of the data. It also group similar data together [5]- [16]. In our previous works [17]- [20], Simultaneous Clustering and Model Selection Matrix Affinity (SCAMSMA) [21] and Divergence-Based Clustering (DDC) [22] were used to cluster the dataset [23,24] generated by C2CM. In this work, the comparison of the clustering accuracy of SCAMSMA and DDC are presented. The main contributions of this study are (1) the paper shows the variation in the performance of SCAMSMA and DDC in clustering the COST 2100 dataset; and (2) SCAMSMA and DDC have a significant difference in their accuracy in clustering the wireless multipaths. The paper is organized in the following way. Section 2 presents the dataset generated by C2CM. Section 3 describes the clustering approaches. Section 4 explains the ANOVA used. Section 5 defines the Jaccard index. Section 6 discusses the ANOVA results. Section 7 concludes the work.
A channel impulse response (CIR) from the base station (BS) to mobile station (MS) antennas is characterized by the combination of MPCs from all the active multipath clusters and is given as where K is the set of visible cluster indexes, α k,p is the complex amplitude of the pth MPC in the kth cluster, Ω BS k,p is the direction of departure (AoD, EoD), and Ω MS k,p is the direction of arrival (AoA, EoA) of the MPC.
The dataset [23,24] is generated by the C2CM, which consist of two indoor channel scenarios at 5.3 GHz and six semi-urban channel scenarios at 285 MHz as follows:

Clustering Approaches
The dataset generated by C2CM is clustered using SCAMSMA and DDC. SCAMSMA can simultaneously determine the number of clusters and the membership of the clusters. DDC can solve the membership of the clusters and the cluster count can be calculated according to the membership of the multipaths to their clusters. The clustering approaches are used to cluster images and it is the first time that they are applied to cluster multipaths.
SCAMSMA [21] begins by formulating an affinity matrix C calc using the self-expression method where a given dataset X can be represented as XC calc as follows where · 1 is the 1 norm, which returns the sum of the absolute values of all elements and diag(·) are the diagonal entries of the matrix. The solution of (2) corresponds to C calc . Introducing an ideal affinity matrix such that where z k = 1 if the point belongs to the cluster otherwise z k = 0 and • represents the vector outer product. By denoting W = −C calc , the clustering problem can be expressed as where ·, · is the Frobenius inner product, e M is an all one vector of size R while K is the number of clusters. SCAMSMA simultaneously solves the number of clusters and the membership of clusters. DDC [22] optimizes a loss function based on informationtheoretic measures. The loss function is defined by www.astesj.com 539 where k is the number of clusters, K hid is the kernel similarity matrix, α is the soft cluster assignment, A is the cluster assignment matrix, triu(AA T ) is the upper triangular of AA T and m is the simplex corner assignment. The clusters are represented by their probability density functions. Divergence measures the dissimilarity between clusters. The divergence builds on two fundamental objectives: the separation between clusters and the compactness within clusters, as shown in Figure 2. DDC explicitly exploits knowledge about the geometry of the output space during the optimization. DDC supports end-to-end learning, does not require hand-crafted feature design, and does not need a pre-training phase. Figure 2: Fundamental objectives of divergence: separation between clusters and compactness within clusters [22] 4 Analysis of Variance (ANOVA) One-way ANOVA [25,26] is used to determine if there is a common mean of the data from several groups of a factor. The statistical tool can find out if there are different effects on the response variable of the different groups of an independent variable. The anova1 function of Statistics and Machine Learning Toolbox of MATLAB returns the p-value for a balanced one-way ANOVA. The MATLAB function also displays the box plots of the independent variable. Lastly, the anova1 function tests the hypothesis that the samples in the independent variable are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same.
One-way ANOVA is a simple, special case of the linear model which can be expressed as where y i j is an observation, i represents the observation number, and j represents a different group of the predictor variable y. All y i j are independent.
α j represents the population mean for the jth group.
i j is the random error, independent and normally distributed, with zero mean and constant variance.
The equality of column means for the data in matrix y is tested using the MATLAB function anova1(y), where each column is a different group and has the same number of observations. ANOVA tests the hypothesis that all group means are equal versus the alternative hypothesis that at least one group is different from the others as follows: H 0 : α 1 = α 2 = . . . = α k H 1 : not all group means are equal (6) ANOVA tests for the difference in the group means by partitioning the total variation in the data into two components: variation of group means from the overall mean and variation of observations in each group from their group mean estimates. It means that ANOVA partitions the total sum of squares (S S T ) into sum of squares due to between-groups effect (S S R) and sum of squared errors (S S E) given by where n j is the sample size for the jth group, j = 1, 2, ..., k. ANOVA compares the variation between groups to the variation within groups. If the ratio of within-group variation to betweengroup variation is significantly high, then the group means are significantly different from each other. This ratio can be measured using a test statistic that has an F-distribution with (k − 1, N − k) degrees of freedom where where MS R is the mean squared treatment, MS E is the mean squared error, k is the number of groups, and N is the total number of observations. If the p-value for the F-statistic is smaller than the significance level (0.05), then the test rejects the null hypothesis that all group means are equal and concludes that at least one of the group means is different from the others. The p-value is derived by anova1 from the cumulative distribution function (CDF) of the Fdistribution. The p-value is correct, if i j are independent, normally distributed, and have constant variance.

Clustering Validity Index
The performance of a clustering approach is measured using a clustering validity index. The study uses the Jaccard index, which compares the similarity between the reference data and the calculated data. For the number of clusters, the Jaccard index is calculated as www.astesj.com where | · | refers to cardinality, C k ∈ C, K = |C| is the number of multipath clusters, C 11 is the number of clusters that are present in the calculated clusters that are also present in the reference clusters, C 10 is the number of clusters that are present in the calculated clusters but not present in the reference clusters, and C 01 is the number of clusters that are present in the reference clusters but not present in the calculated clusters. For the membership of the clusters, the Jaccard index is calculated as where M 11 is the number of members that are present in the calculated clusters that are also present in the reference clusters, M 10 is the number of members that are present in the calculated clusters but not present in the reference clusters, and M 01 is the number of members that are present in the reference clusters but not present in the calculated clusters. A Jaccard index of one means that the calculated multipath clusters are the same as the reference multipath clusters, or the membership of the calculated multipath clusters is the same as the membership of the reference multipath clusters.
A zero Jaccard index, on the other hand, means that there are no calculated multipath clusters that are equal to the reference multipath clusters, or there is no membership of the calculated multipath clusters that are equal to the membership of the reference multipath clusters.

Result and Discussion
The clustering accuracy of SCAMSMA and DDC are shown in Table 1 for the indoor scenarios and Table 2 for the semi-urban scenarios. The performance of the clustering approaches in both indoor and semi-urban scenarios are based on the number of clusters and the membership of clusters. The means of the Jaccard indices are compared using ANOVA. The ANOVA is illustrated using box plots. The box plots and p-values are generated using the anova1 one-way approach function of MATLAB. The box plots display the range of Jaccard indices and can be used to visualize the means. Values of p < 0.05 indicate that the means of SCAMSMA and DDC are significantly different.  The box plots of the Jaccard indices of the number of clusters of SCAMSMA and DDC for the indoor scenarios are shown in Figure 3. The p-value is 0.0388, which indicates that there is a significant difference in the means of the clustering accuracies. This difference can be seen in the figure where the box plot of DDC is higher than that of SCAMSMA. DDC is more accurate than SCAMSMA in clustering multipaths by 19.77%. Figure 4 presents the box plots of the Jaccard indices of the membership of clusters of SCAMSMA and DDC for the indoor scenarios. The p-value is 0.0996, which attests that there is no significant difference in the means of the Jaccard indices. This difference can be visualized in the figure where the box plots are almost on the same level (∼ 0.8). DDC has higher accuracy in clustering multipaths than SCAMSMA by only 8.19%.
The box plots of the Jaccard indices of the number of clusters of SCAMSMA and DDC for the semi-urban scenarios are illustrated in Figure 5. The p-value is 0.7419, which suggests that the means of SCAMSMA and DDC are similar (0.0112 vs. 0.0102). The figure indicates that the box plots are on the same level near the horizontal axis (∼ 0). SCAMSMA is more accurate this time by 9.80%. Figure 6 displays the box plots of the Jaccard indices of the membership of clusters of SCAMSMA and DDC for the semi-urban scenarios. The p-value is 0.0139, which proves that the means of SCAMSMA and 3CAM-SCAMSMA are significantly different. The figure shows that the box plot of DDC is higher than that of SCAMSMA. DDC has a higher clustering accuracy of 34.49%.
DDC shows consistency in its clustering performance due to higher accuracy of clustering multipaths in all channel scenarios except for the membership of clusters in semi-urban scenarios where SCAMSMA has a slight clustering advantage of 0.0012. Also, the means of SCAMSMA and DDC are significantly different since the p-values are less than 0.05 except in the membership of clusters in semi-urban scenarios. Lastly, the clustering approaches can be used in indoor scenarios but not in semi-urban scenarios based on accuracy which is validated by the measurements done in indoor environment [27].

Conclusion
This work presents the comparison of the clustering accuracy of SCAMSMA and DDC in clustering wireless propagation multipaths generated by C2CM. Jaccard index is used as the performance metric of the clustering approaches. Results show that there is a significant difference in the cluster-wise Jaccard index between SCAMSMA and DDC for indoor scenarios while the membershipwise Jaccard index is not different. On the other hand, the clusterwise Jaccard index is not different between the clustering approaches for semiurban scenarios while the membership-wise Jaccard index is a little different. The clustering approaches can be used in indoor scenarios based on accuracy. However, a better multipath clustering method should be used for semi-urban scenarios. For future work, the results will be compared with other clustering approaches to determine the best performance in terms of clustering wireless multipaths in indoor and semi-urban environments.

Conflict of Interest
The authors declare no conflict of interest.