Texture Based Image Retrieval Using Semivariogram and Various Dis- tance Measures

Article history: Received: 03 January, 2021 Accepted: 09 February, 2021 Online: 28 February, 2021


Introduction
The capturing of an image is started back in the past one and half a century, and the collection of images is rapidly growing from the past thirty years of growing digital technology and internet usage. The collection of images is also increasing leading to a extremely large database. In many applications, the extraction of equivalent images from the large database is essential and useful. There are two trends currently using the image search from the image collection data set namely text-based and content-based. The text-based search uses meta-data like keywords, description of the image to get back the similar images. It is the main disadvantage of this system. In contrast content based uses existing property (color, texture, and shape) to find the equivalent images from the large database. This paper is an extension of work originally presented in conference [1].
Many researchers have come up with the idea of combining basic image features to get the most comparable images as required by the query image. The color and texture feature is taken together to get the image information by applying wavelet [2]. The wavelet transform decomposes the images into orthogonal components for the better localization of the spatial information in an image. The method applied to extract color and texture information is a color histogram and wavelets respectively. .
It is noticeable that based on these work the retrieval rates are improved by combining multiple features, multiple algorithms and preprocessing steps, classification, segmentation, etc. In this work, a good CBIR system is designed by using the texture feature. In an image retrieval system all the three features color, texture, and shape play an important role [3]. Texture identification is an important part of the research in image retrieval and pattern recognition. It is categorized into two classes i.e. regular and irregular, directional, and non-directional based on the observations of real life. In general, the texture is characterized as two main approaches: Statistical and Structural.
1)Structural approach: the texture analysis is done based on the arrangement of preserving texture pixel in some pattern or repetitive pixel. 2) Statistical approach: The texture feature can be analyzed by studying numerical data of the image. It provides the relative estimation of the arrangements of the intensities in an image region [4]. The most commonly used and widely accepted statistical approach for texture, classification is the GLCM method [5]. The recently developed method for the texture classification and analysis includes block truncation coding [6], Markov random fields [7] to construct the image patches using neighboring pixel, local energy histogram [8], hidden Markov model [9], shearlets and linear regression [10]. Many of the texture algorithms and methods have been applied to the Brodatz database.
In this paper, geostatistical parameter called semivariogram and robust semivariogram are applied to the image to get the texture information to the image databases. The semivariogram method is basically applied to remote sensing images to characterize the texture property. It is a distance metric that calculates the absolute differences between the semivariogram of the two images to determine the class of test images. The various distance measures are used to verify the effective retrieval of images.

Motivation
The main idea of the designing image retrieval system is to provide the desired images as of user interest. This leads to finding the solution to set up a good image retrieval system that will be able to find similar images from the huge image data set. The image search is basically with the use of content in the image.
In this study, a new CBIR system is proposed based on semivariogram. In general the semivariogram is applied for the remote sensing images in the area of geo-statistics.These images are captured by satellites that will be having high spatial resolution.To analyse the characteristics of this data semivariogram is one of the best method.Like remote sensing image CBIR system works with large image database. These databases contain many categories of images that will have a non-uniform distribution of the pixel intensities and dissimilar backgrounds.Since semivariogram is a statistical approach, it really works well for the image to characterise the texture intensities.This is the case the idea of adapting semivariogram method for the CBIR system.The main focus of the work is to use the minimal feature from the image and reduce the computational time required for the entire image database.

Organisation of the paper
The remainder of this article is organized as follows. Section 2 discusses the related work and section 3 gives the mathematical model of the proposed system. In section 4 feature extraction is discussed with an example. Section 5 briefly explains the experimental design and results. Section 6 gives the distance metrics for similarity measurements. Section 7 discusses the experimental design and Section 8 gives performance evaluation and conclusion followed by section 8.

Related Work
From the discrete wiener-Levy process in [11], it is proved that a semivariogram gives a spatial correlation of the pixels than the covariance method. The random process called the Levy process is a Brownian motion mathematical representation. The covariance fails to extract spatial information with distortion whereas the semivariogram strongly characterizes the spatial features. In geostatistics, the features analysis has to be done by fitting the experimental semivariogram to a theoretical semivariogram. The model examines the structure and continuity of the spatial relationship in a random field. The experimental semivariogram model can be fit into the theoretical semivariogram model. There are two methods have been used in [12]. One is by manual fitting and the second is by automatic model fitting. The automatic model fitting uses methods like maximum likelihood or least squares [13].In manual fitting, the model was chosen based on picture perception of the experimental semivariogram. For a given waveband information reproduction of the pattern will be clear and strong explanation of the pattern in an image can be provided by the semivariogram.
In general remote sensing images for the land, consolidation is having a high spatial resolution. This is a limitation for the pixel-based analysis and land cover classification in remote sensing images concerning its spectral behaviour. To overcome this problem texture extraction is done using fixed window size and varying window size depending on semivariogram result. Texture classification in microwave images [14] is executed using the semivariogram method. The statistical parameters are determined and considered as image features. They are first and second-order element difference, first and second-order element inverse difference, entropy and, uniformity.
The accurate object-oriented classification is presented in [15] for Quick Bird images by applying a method called the set of semivariogram texture feature(STF) based on the mean square root pair difference(SRPD). The parameter like direction, lag distance, and moving window size are considered for semivariance analysis to classify the texture feature.
A new set of 2-D RCWF(Rotated complex wavelet filter) is designed to improve the retrieval accuracy using complex wavelet filter coefficients. It produces strong texture information of the oriented images in six different directions. This is non-separable and oriented relatively improves the classification of oriented textures. Also, they have used the combination of dual-tree RCWF and DFWCT to extract the texture features in 12 different directions [16].
The new algorithm based on Gabor filter and EHD(Edge Histogram Descriptor) has been proposed for the texture feature extraction. These two are fused to get the good performance of the system. The Gabor and EHD are applied to every 40 images and then six bins are considered from each of the images. This improves the efficiency of the EHD. As a result, the dimension of the character is reduced to 40X6 as mentioned in [17].
Perceptual texture features are considered for the representations and retrieval of images. These features contain contrast, directionality, and coarseness and dryness. Where busyness has given less importance compared to the other three features. They have used these four perceptual features for the experiment. The experiment result shows appreciable scores for the broadtz image database retrieval [18]. The image retrieval system uses a combination of color and texture features. As a color feature color autocorrelation of the HSV color model is used and for texture feature BDIP(Block difference of inverse probabilities) and BVLC(Block variation of the local correlation coefficient)are used. These two texture features effectively measure the texture smoothness. The image is divided into 2X2 blocks and these moments are computed in the wavelet transform domain. This gives better accuracy when compared with conventional texture extraction methods [19].
Palm print retrieval is proposed in [20] using texture features. Texture energy is presented to identify the global and local texture features of the palm print. The four masks which are defined in the work are used to extract the line segment in horizontal and vertiwww.astesj.com cal directions and is referred to as local feature. The local feature provides detailed information and global feature provides overall properties of the palm print. The steps involved in global feature calculations are image alignment, energy computation, boundary tracing, a center of gravity calculation, and mapping the key points.
The content-based image retrieval system is an active research area for the past two decades. Many researchers have come up with different ideas to enhance the performance of the system by extracting texture features. The most used texture extraction methods are GLCM, covariance method and co-occurrence matrix, and so on. In this work, the semivariogram method is applied for texture extraction. The semivariogram is widely used to analyse the pattern of the remote sensing images. The content-based image retrieval databases will have distinct images with different types of patterns. The function of the semivariogram is to find the spatial distance between each pattern in the image. The semivariogram approach and spectral distortion measures are applied to image texture retrieval in [21]. The experiment is conducted using the Broadtz texture database. The spectral distortion methods are combined with a semivariogram to retrieve similar images. The effectiveness of the work is tested on Illinois at Urbana-Champaign texture database. The semivariogram method does not need the knowledge of the mean value of the function for the estimation of the spatial relationship. The probability of the random function is not required for the semivariogram methodology as in the GLCM method and co-occurrence method.
A hierarchical retrieval system is proposed in [22] using the visual contents(color,shape and texture).The fusion of histogram gradients, adaptive tetrolet transform and auto color correlogram methods are applied to extract the image features and to make the image retrieval more effective.
The proposed method estimates the pattern similarity by finding the distance between the pixels as it plays an important role in recognition of the similar images from the large database.

Mathematical Model for Semivariogram and Robust semivariogram
Variance is a measure of data spread between numbers in a data set. It measures the data on how far each number in the set is from the mean and also its variability from the average mean. A small variance shows how close the data to the mean and high variance indicates the data points are how far from the mean.The variogram is a statistical measure can be represented graphically in a manner which characterizes the spatial continuity i.e. roughness of a dataset. Semivariogram is one of the standard geo-statistical parameters is used to find spatial relationships between two values in that location as a function of the distance. The lag distance h is taken for paired location in an image.
Let X, z, and h be a random function, a spatial location, and a lag distance in the sampling space, respectively. The variogram of the random function is defined as: where γ(h) is called the semivariogram of the random function. Let Z(x i ) be the pixel value at the location , i = 1, 2, . . ., n, where n is the sampling size. The variable z is the intensity value at location of the image. The experimental semivariogram, of the random function is expressed as in [26] γ The reason for introducing the robust semivariogram is because the semivariogram is not robust with respect to skewness and distorted information.The robust semivariogram denoted as γ R (h) is defined as in [23] , [24] where 4 Extraction of Texture Feature of an Image The texture feature can be extracted using the semivariogram.The block diagram of the proposed method is as shown in fig 1.Let us consider an image as shown in fig 2. To extract texture information the image is resized into 640X640 pixels. Then the image is divided into 9 sub-images of size 215X215. The lag distance 'h' value is taken as 20. The variogram is calculated for each sub-image and its results in a feature vector size 20X9. The pictorial representation of the semivariogram features and robust semivariogram features is as shown in fig 3 and fig 4.The variogram shape near the origin is linked to the smoothness of the phenomenon. If the shape is continuous and differentiable it shows smoothness of the image, if it is not differentiable it shows roughness of the image. This process is applied to all the images in the query database and search database and these features are stored to find the similarity distance.

Distance Metrics for similarity measurements
Euclidean distance: If S (x1, y1) and T (x2, y2) are the two feature points of the image then the Euclidean distance between the two points is given by, label1.4 Canberra distance: The Canberra distance is a numerical measure of the distance between pairs of points in a vector space. The Canberra distance (CBD) between two vectors is given by , Manhattan distance: If S (x1, y1) and T (x2, y2) are two points then the Manhattan distance between two points S and T is given by, Chord distance: It is used to find the correlation between the two images.If a and b are referred as a array of the two images then the chord distance is given by.
6 Experimental Design and Discussion

Experiment with three databases using Euclidean distance
The image retrieval experiment is conducted using the natural and object-based image database namely Corel 1K, Corel 10K, and Coil 100. The images in these databases are taken to investigate the performance of the proposed method. The Corel 1K database contains 10 categories and each category has 100 images such as people, beach, bus, dinosaur, roses, and so on. Each image of size 256X384 and 384X256.This is divided into query database of 300 images i.e 30 images from each category and search a database of 700 images i.e. 70 images from each category.
The proposed method is examined using the Corel 10K database [6]. This database contains 10000 natural image collection of 100 categories. Each image of size 126X187 and 187X126. The images in this database are having a dissimilar and unstructured pattern. Some categories of images are dolls, doors, sunset, mountains, etc. Each category has 100 images which are divided into 3000 images of query database i.e 30 images from each class and 7000 images of search database i.e. 70 images from each category.
The COIL 100 dataset is used to investigate the performance of the proposed image retrieval system. This database contains 100 categories of 7100 images. Each image is of size 128X128. This database is having object-based images like a hammer, car, toy, fruits, etc with uniform background and these images are most similar in pattern. This database is divided into a query database of 1000 images 10 images from each class and a search database of 6100 images 61 images from each class.
The system is designed to show the top ten retrievals of similar images for each query image. The similarity matching is based on the shortest distance score measured using Euclidean distance metrics between the query image and the database image. If the system retrieves similar images according to query, then we say the system is retrieved the target images else the system fails to retrieves the target images. The performance of the system is examined using average precision and average recall. Table 1 shows the three databases' precision and recall values using semivariogram method and lag distance is set to 20.
www.astesj.com The retrieval rates are computed using the complete Corel 1k dataset using four distance metrics. The retrieval test is done using the semivariogram and robust semivariogram method by setting the lag distance value h is equal to 20. The distance measures compared using semivariogram and robust semivariogram methods. Table 2 and Table 3 show the average precision and average recall values for four distance measures.The examples for retrieval of these measure is as shown in fig 12,fig 13,  fig 14 and fig 15. The fig 11 shows the query images.

Experiment with lag distance 'h' for the Corel-1k database using Euclidean distance
The semivariogram is a function of variable lag distance h. The semivariogram increases the dissimilarity with the lag distance h. If the distance is more it fails to identify the patterns of the image. The semivariogram is calculated based on the lag distance. In this experiment, the distance chosen is 20 and 30. As the distance increases the textural information is lost.if the distance is less than strong textural information can be extracted . The table 4 shows the time taken for the calculation of semivariogram and robust semivariogram for a single image. Table 5 shows the precision and recall values for the retrievals of corel 1k database with lag distance 20 and 30.From the table 5 we can notice that for the lower lag distance the performance of the system is effective.   Table 4, we can see that as distance increases computation time increases, and the amount of information decreases. As the distance taken is small the rate of information increases and results in more retrieval of similar images i.e. lower the distance higher the continuity. For the lag distance h the proposed method results in an effective retrieval of similar images.

Retrieval results
The feature vectors are stored and are tested for four distance metrics. The result shows that Euclidean distance is best suited for the retrieval of similar images from the image database. The robust semivariogram shows still good performance comparing with the semivariogram method. This is because the semivariogram is not robust against the distribution and distortion of the pixel values. The Euclidean distance measure results in good retrieval of similar images when compared to the other three distance measures. Results www.astesj.com are tabulated for the first top ten retrieved images. For the Dinosaurs, Rose, and Horse class Ten images of the same class are retrieved at the top ten images. The results by using the semivariogram method for the image databases are tabulated in Table 1. The top ten retrieved images for the Coil-100, Corel 1K, and Corel 10K database are shown in fig 6, fig 8, and fig 10 respectively. The Query samples for these retrievals are shown in fig 5, fig  7, and fig 9. For the COIL 100 database, the proposed technique is effective in retrieving similar images from the database for all the categories present in the database. The semivariogram method works well for the Corel 1k and Corel 10k database in which these databases are having the most dissimilar pattern of images.

Performance Evaluation and comparison
In content-based image retrieval, most of the researchers use precision and recall as the evaluation metrics. Precision is the proportion of relevant images retrieved over the total number of images retrieved and recall is the relevant images retrieved over a total number of relevant images are there in the database. The precision and recall are defined, where, R: Number of relevant images retrieved. R1: Total number of images retrieved. R2: Total number of relevant images in the database. The proposed method gives 77% of efficiency for the corel 1k database,32.3% for corel 10k and 85.8% efficiency for the coil-100 database as shown in the Table 1.
The experiment is conducted to investigate which distance measure is best suited to find the query image's closeness with the database image.We have compared the performance of the CBIR systems in terms of precision and recall values using Euclidean distance, Canberra distance, Manhattan distance, and Chord distance. The performance results are as tabulated in the table shows that Euclidean distance is best suited for retrieving images from the large database for the proposed approach. Whereas Chord distance shows very low performance compared to Euclidean distance,Canberra distance and Manhattan distance.

Comparison with existing methods
The experiment is compared with some of the existing methods under the Corel 1k database. It can be seen that the proposed method performs better in comparison with the existing CBIR methods as in [12]- [25]. The proposed methods yield the average precision value 0.77 and 0.78 concerning semivariogram and robust semivariogram. Table 6 shows the comparison of Corel-1K database with existing methods.
For Corel 10000 database, the experiments were conducted as similar to Corel 1000 database. In this work, 100 categories are used to check the system performance. The work carried in [26] uses the 20 image categories to test the system performance. As similar to the experimental set-up in [27], [28]the number of retrieved images is set to 10. Table 7 exhibits the proposed method's comparisons over the existing methods for Corel 1k and Corel 10k database.
The COIL-100 database is compared with existing methods as shown in Table 8. The former schemes show good results by considering only a few categories of the COIL-100 database. In our work, the results are tabulated for the entire categories present in the COIL-100 database.
The Table 7 and 8 shows the average precision values for the best retrievals of image categories. The authors from [29], [30] www.astesj.com presents the system performance which gives best results for some categories of the image database.The number categories does not play any role for the system performance ,it mainly depends on right choice of the feature and the similarity measurement. The proposed work is tested for all the categories of the image database and it is capable of retrieving most similar images for all the categories except for the Corel 10k database.

Conclusion
The semivariogram and robust semivariogram techniques are applied to the color images for the feature extraction and retrieval processes from the large database.These techniques gives promising texture feature for the images.Both methods are analysed and compared with different distance metrics. The experiment tested using three databases having images of natural scenes, color, texture, and objects. The results are computed in terms of precision and recall. The precision rate achieved for the Coil-100 database is 85% and the Corel-1K database is 76%. The improved performance is observed with Euclidean distance measure over three distance measures. The experiment results show that the semivariogram is more effective for the content-based image retrieval system but not limited to remote sensing images.