Comparison of K-Means and Fuzzy C-Means Algorithms on Simpliﬁcation of 3D Point Cloud Based on Entropy Estimation

In this article we will present a method simplifying 3D point clouds. This method is based on the Shannon entropy. This technique of simplification is a hybrid technique where we use the notion of clustering and iterative computation. In this paper, our main objective is to apply our method on di ff erent clouds of 3D points. In the clustering phase we will use two di ff erent algorithms; K-means and Fuzzy C-means. Then we will make a comparison between the results obtained.


Introduction
The modern 3D scanners are 3D acquisition tools which are developed in terms of resolution and acquisition speed. The clouds of points obtained from the digitization of the real objects can be very dense. This leads to an important data redundancy. This problem must be solved and optimal point cloud must be found. This optimization of the number of points results in reducing the reconstruction calculation.
The problem of simplifying point cloud can be formalized as follows: given a set of points X sampling a surface S, find a sample points X with |X| ≤ |X|, Such that X sampling a surface S is close to S. |X| is the cardinality of set X. This objective requires defining a measure of geometric error between the original and simplified surface for which the method will resort to the estimation of the global or local properties of the original surface. There are two main categories of algorithms to sampling points: sub-sampling algorithms and resampling algorithms. The subsampling algorithms produce simplified sample points which are a subset of the original point cloud, while the resampling algorithms rely on estimating the properties of the sampled surface to compute new relevant points.
In the literature, the categories of simplification algorithms have been applied according to three main simplification schemes. The first method is simplification by selection or calculation of points representing subsets of the initial sample. This method consists of decomposing the initial set into small areas, each of which is represented by a single point in the simplified sample [1][2][3][4]. The methods of this category are distinguished by the criteria defining the areas and their construction.
The second method is iterative simplification. The principle of iterative simplification is to remove points of the initial sample incrementally per geometric or topologic criteria locally measuring the redundancy of the data [5][6][7][8][9][10].
This paper will present a hybrid simplification technique based on the entropy estimation [19] and clustering algorithm [20].
It is organized as follows: In section 2, we will recall some density function estimators. In section 3, we will present clustering algorithm. Then in section 4, we will present our 3D point cloud simplification algorithm based on Shannon entropy [21]. Section 5 will show the results and validation. Finally, we will present the conclusion.

Defining the Estimation of Density Function and Entropy
There are several methods for density estimation: parametric and nonparametric methods. We will fcus on nonparametric methods which include the kernel density estimator, also known as the Parzen-Rosenblatt method [22,23] and the K nearest neighbors (K-NN) method [24]. We will only use in this article K-NN estimator.

The K Nearest Neighbors Estimator
The algorithm of the k nearest neighbors (K-NN) [24] is a method of estimating the nonparametric probability of the density function. The degree of estimation is defined by an integer k which is the number of the nearest neighbors, generally proportional to the size of the sample N. For each x we define the estimation of the density. The distances between points of the sample and x are as follows: r i with (i = 1...k...N ) are distances sorted by ascending order.
The estimator k-NN in dimension d can be defined as follows: where r k (x) is the distance from x to the k th nearest point and V k (x) is the volume of a sphere of radius. r k (x) and C d is the volume of the unit sphere in d dimension. The number k must be adjusted as a function of the size N of the available sample in order to respect the constraints that ensure the convergence of the estimator. For N observations, the k can be calculated as follows: By respecting these rules of adjustment, it is certain that the estimator converges when the number N increases indefinitely whatever is the value of k 0 .

Defining Entropy
Claude Shannon introduced the concept of the entropy which is associated with a discrete random variable X as a basic concept in information theory [21]. The distribution of probabilities p = p 1 , p 2 , ..., p N associated with the realizations of X. The Shannon entropy is calculated by using the following formula: Entropy measures the uncertainty associated with a random variable. Therefore, the realization of the rare event provides more information about the phenomenon than the realization of the frequent event.
3 Clustering Algorithms Definition N is a set of observations described by d attributes; the objective of clustering is the structuring of data into homogeneous classes.
Clustering is unsupervised classification. The objective is to try to group clustered points or classes so that the data in a cluster is as similar as possible. Two types of approaches are possible; hierarchical and non-hierarchical approaches [25].
In this article, we will concentrate on the nonhierarchical approach which is encapsulated in both the Fuzzy C-Means Clustering (FCM) algorithm [26,20] and K-means algorithm (KM) [27].

Fuzzy C-Means Clustering Algorithm
Fuzzy c-means is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade. This technique was originally introduced by J.C. Dunn [20], and improved by J.C. Bezdek [26] as an improvement on earlier clustering methods. It provides a method that shows how to group data points that populate some multidimensional space into a specific number of different clusters. X = {x 1 , x 2 , ..., x n } is a given data set to be analysed, and V = {v 1 , v 2 , ..., v c } is the set of centers of clusters in X data set in p dimensional space R p . Where N is the number of objects, p is the number of features and c is the number of partitions or clusters. FCM is a clustering method allowing each data point to belong to multiple clusters with varying degrees of membership.
FCM is based on the minimization of the following objective function Where, D 2 ijA is the distances between i th features vector and the centroid of j th cluster. They are computed as a squared inner-product distance norm in Equation (4): In the objective function in Equation (3), U is a fuzzy partition matrix that is computed from data set X: m is fuzzy partition matrix exponent for controlling the degree of fuzzy overlap, with m > 1. Fuzzy overlap refers to how fuzzy the boundaries between clusters are. That is the number of data points that have significant membership in more than one cluster.
The objective function is minimized with the constraints as follows: FCM performs the following steps during clustering: 1. Randomly initialize the cluster membership values, u ij .
2. Calculate the cluster centres: 3. Update u ij according to the following: where t is the iteration number.

K-Means Clustering Algorithm
The K-Means algorithm (KM) iteratively computes the cluster centroids for each distance measurement in order to minimize the sum with respect to the specified measure. The objective of the K-Means algorithm is to minimize an objective function named by squared error function given in equation (6) as follows: For c clusters, the K-Means algorithm is based on an iterative algorithm that minimizes the sum of the distances of each object at its cluster center. The goal is to have a minimum value of the sum of the distance by moving the objects between the clusters. The steps of K-means are as follows: 1. Centroids of c clusters are chosen from X randomly.
2. Distances between data points and cluster centroids are calculated.
3. Each data point is assigned to the cluster whose centroid is close to it.
4. Cluster centroids are updated by using the formula in Equation (7): 5. Distances from the updated cluster centroids are recalculated.
6. If no data point is assigned to a new cluster, the execution of algorithm is stopped, otherwise the steps from 3 to 5 are repeated taking into consideration probable movements of data points between the clusters.

Evaulation of the Simplified Meshes
In order to give a theoretical evaluation for the simplification method, we have based on a metric of mean errors, max error and RMS(root mean square error) used by Cignoni et al. [28]. Where he measured the Hausdorff distance between the approximation and the original model. Hausdorff distance is defined as follow: Let X and Y be two non-empty subsets of a metric space (M, d). We define their Hausdorff distance In our experiments we will use the symmetric Hausdorff distance calculated with the Metro software tool [28]. In order to calculate the approximate error, we will reconstruct the models from the point clouds. We find in the literature several reconstruction techniques [29] to create a 3D model from a set of points.
In the next section, we will present our simplification approach based on the estimation of Shannon entropy and algorithm clustering. Where we will first use the FCM algorithm and then replace it with the KM algorithm and compare the results obtained from the use of the two algorithms in our simplification method. We have previously two types of nonparametric estimators, the K-NN estimator and the Parzen estimator. Each type has advantages and disadvantages. For Parzen estimator, the bandwidth choice has strong impact on the quality of estimated density [30]. For this reason, we will use K-NN estimator to estimate the density function. Now, we will propose an algorithm to simplify dense 3D point cloud. First, this algorithm is based on the entropy estimation algorithm. It allows the estimation of the entropy for each 3D point of X as well as making the decision to eliminate or to keep the point. In this pproach we will use the K-NN estimator to estimate the entropy. Moreover, it is based on clustering algorithm (KM or FCM) to subdivide point cloud X into clusters in order to minimize the computation time. The procedures are: • End for End.

Results and Discussion
To validate the efficiency of the use of the two clustering algorithms FCM and KM in our simplification method, we use three 3D models that represent real objects such as Max Planck ( fig. 1,b) and Atene ( fig. 1,a). Fig. (2,a), (2,b) show simplification results on various point cloud using FCM algorithm. Fig. (3,a), (3,b) show simplification results on various point cloud using KM algorithm. In this section, we will validate the effectiveness of our proposed method. Actually, we have conducted a comparison between the original and simplified point cloud. Accordingly, we will use a comparison between the original and simplified mesh.
Thereafter, we make a comparison between the original mesh and the one created from the simplified point cloud. To reconstruct the mesh, we use ball Pivoting method [31,29] or A.M Hsaini et al. method [32]. Then, to measure the quality of the obtained meshes, we compute the quality of the triangles using the compactness formula proposed by Guziec [33]: Where l i are the lengths of the edges of the triangle. And a is the area of the triangle. We note that this measure is equal to 1 for an equilateral triangle and 0 for a triangle whose vertices are collinear. According to [34], a triangle is an acceptable quality if c ≥ 0.6. In figures 4, 5, we have presented the triangles compactness histogram of the two meshes. In each figure, the first line presents the reconstructed mesh from the original point cloud. The second line presents the simplified point cloud using FCM algorithm. The third line presents the simplified point cloud using KM algorithm. Note that, the evaluation of the mesh quality is achieved by the compactness of the triangles.
Depending on [34] meshes are compact if the percentage of the number of triangles, which composes mesh with compactness c ≥ 0.6 is greater than or equal to 50%. Also, according to the histograms in figures 4 and 5, it is observed that the surfaces obtained from the simplified point cloud are compact surfaces.
The table 1 also shows that the use of the KM and FCM algorithms retains the compactness of the surfaces. However, the compactness obtained by the FCM algorithm is greater than that obtained by KM for the two surfaces. Concerning the number of vertices obtained after simplification, we note that this number is higher in the case of FCM for the two models.
It is interesting to note that in the case where FCM is used better results are produced in terms of speed. In contrast, in the other case where KM is used the speed is slow. Table 3 and table 4 shows the numerical results obtained by the implementation of the two algorithms FCM and KM in the simplification method. The main results are average error, maximal error and root mean square error (RMS). Figure 6 and 7 present differences between original and simplified meshes using Hausdorff distance. Note that it is a red-green-blue map, so red is minimal and blue is maximal, so in our case red means zero error and blue high error.  Table 3 presents the results relative to the evaluation of the approximation error concerning Max-Planck model. Table 4 Also presents the same error related to Atene model. using FCM particularly in simplification does not reach a high level of simplification. Moreover, it records in general the worst result in terms of error    shown figures 6.a et figure 7.a. By contrast, it is interesting to note that this method produces the best results when speed is needed (look at table 2). As expected, KM algorithm in table 1 yields good results in terms of average error, max error and RMS error. Moreover, she recorded in general the worst result in terms of calculation speed.
We have implemented our simplification method under MATLAB. The calculations are performed on a machine with an i3 CPU, 3.4 Ghz, with 2GB of RAM.

Conclusion
This work presents a brief overview of two clustering algorithms, K-means and C-means. The results of an empirical comparison are presented to make a comparison between the use of the clustering algorithms. These clustering algorithms are integrated in our method of simplifying 3D point clouds. We have compared the computation time and the precision of the simplified meshes.
From the point of view of accuracy, the results show that K-means gives the best results in terms of error. As for claculation time, the use of Fuzzy Cmeans algorithm makes simplification faster.

Acknowledgment
The Max Planck and Atene models used in this paper are the courtesy of AIM@SHAPE shape repository.