Image Tag Recommendation based on Ranked Categorical Nearest Neighbors and Weighted Tag Features

A R T I C L E I N F O A B S T R A C T Article history: Received: 16 October, 2020 Accepted: 24 November, 2020 Online: 16 December, 2020 The huge number of images on the image sharing websites poses challenges for classification and retrieval of the images. On many image sharing websites, tags can be assigned by the users to an image that describes the contextual and visual description of an image. However, ambiguous or incorrect tags have appeared in frequent tags that affect the performance of an image retrieval system. Thus, assigning appropriate tags to the images plays a very important role in image retrieval and classification. In this paper, the ITR-WTF image tag recommendation method is proposed which explores tags from ranked nearest neighbors of each category. For a given input image, the method first determines the neighbors from training images of each category and ranks the neighbors according to the distance from the input image. In the second step, the weight is assigned to each tag based on the vote from each neighbor. Finally, the weighted tag frequency is determined to recommend appropriate tags to a given image. The experimentation is done on two datasets self-generated and NUS-WIDE. The results obtained using the proposed method ITR-WTF gives good results as compared with the existing methods of tag recommendation.


Introduction
The rapid development of advanced technology and the high usage of social media have created a large repository of images which poses many challenges for an image retrieval system. On many image sharing social websites, the users are uploading images for faster communication or to find the people with the same interest. The images on the image sharing social websites are associated with the tags which are assigned by the users. The tags describe the visual content of the images along with the context information such as the location and time the images are captured. These tags are used indexing during an image search. Thus, tags play a very important role in tag-based image retrieval systems and classification.
Many times, users assign tags that are imprecise and irrelevant to the image. According to the survey, only 50% of the Flickr tags describe the content of the images [1]. The presence of irrelevant tags affects the performance of the classification and image retrieval system. Hence, it is necessary to design an algorithm that assigns correct tags to the images. It improves the accuracy of the image retrieval system by suggesting proper tags for the images. The advantages of tag suggestions are: it reduces the cost of manual annotation of the images, spelling mistakes. Several studies have been done on tag recommendation using visual contents, tags and metadata. Still, the performance of tag recommendation is not satisfactory due to personalized tags as shown in Figure 1. The tags may not describe the content of the images and may be assigned as per the user perspective. The tags Denmark, 2011 does not describe visual content of an image as shown in Figure 1   In this paper, the ITR-WTF image tag recommendation method is proposed which contributes in the following way: • Obtained ranked visual neighbor of each category for each feature • The tag score is calculated by combining weighted tf-idf value and vote from rank categorical nearest neighbor to recommend top k tags The paper is organized as follows: existing methods for tag recommendation are described in section 2. Section 3 explains the ITR-WTF proposed method for image tag recommendation. Section 4 and 5 describe the metric used for evaluating the performance of the proposed method and dataset used for experimentation. Section 6 describes the experimental results. Finally, in section 7 the conclusion and future work is given.

Work Done
The methods for tag recommendation are categorized into three methods: classification, semantic and nearest neighbor based.
In classification-based methods, the features of the images are extracted and classified into a category using a classification algorithm. The classifier trained for multiple class result in a multi-class problem. Finally, the tags are recommended based on the category of an input image. In [2] the method was proposed labeling of the images. The images were segmented into regions and identified salient region. Based on particle swarm optimization, the SVDD trained to assign labels to the images by assigning more weights to the salient regions. In [3] the method was proposed using probability and weight based SVM classifier for annotation of images. Given an image with tag, the method identifies the related and unrelated images using majority of voting from SVM in [4]. The system named 'SheepDog' was developed to which identifies the suitable group for the inclusion of photos and suggest appropriate tags to the users on Flickr dataset in [5].
In the semantic method, the tags are recommended based on the joint distribution of image and tag features. The method was proposed in [6] for annotation of images using KCCA framework by constructing semantic space in which the correlation was built between image feature and tag features. In [7] an approach for retagging of social images with diverse semantics was presented. Both the relevance of a tag and the semantic compensation to the already determined tags was fused to determine the final tag list for a given image. The method was proposed in [8] to recommend tags for geotagged images using unified subspace which correlates the textual and visual features. The hyper-graph-based method was proposed for tag-based image retrieval using image features and tag features simultaneously [9].
The nearest neighbor-based tag recommendation methods are very popular due to its effectiveness. The model is based on the assumption that feature based similar images tend to have the same tags. Given an input image, the method determines k nearest neighbors by combining various features either by using early or late fusion and calculates the tag relevance score by collecting votes of a tag from its nearest neighbors. The advantages of the method are: it is scalable and model building is not needed as it makes an assumption based on training data. The tag recommendation method was proposed in [10] using random walk on bipartite graph constructed based on weighted user and image nearest neighbor. In [11], the Bayesian based image annotation model was proposed based on semantic nearest neighbors. The method was proposed in [12] for annotation of images using a variation of traditional kNN algorithm by defining matrix which shows the relationship between labels and images. In [13] the method was proposed in which the given an image the similar images were determined using k nearest neighbor and tag graph was created from tags of neighbors and clustered to assign label to an image. The personalized image tag recommendation method was proposed based on neighbor voting scheme by building tripartite graph to show relationship between user, tags and images in [14]. The VS-KNN method was proposed in [15] for image labeling by exploring image features and label features simultaneously as a maximum posterior estimation. Given an image with label l, the method was proposed which identify the images labeled with l using kNN and denote it by set S [16]. Finally, the labels were assigned to the images by calculating the similarity between the input image and set S. The tag relevance method was proposed by assigning weight to each neighbor based on distance using kNN [17]. The method was proposed in [18] to suggest tags for an image based on visual features and tag correlations using neighbor voting scheme. Photo tagging method was proposed in [19] using history of the users. The method finds the geographical, visual and time neighbors for a given image and recommends tags by counting accumulating votes for each tag from three types of neighbors. In [20], the method was proposed to suggest tags to the images with and without labels. The method first identifies a set of k images using feature-based k nearest neighbor. Finally, assign or recommends the tags by counting the difference between tag frequency count from the entire database and k neighbor. The image annotation algorithm in [21] identifies rank based and weight based nearest neighbor and suggests the label for an input image using a probabilistic model.
In this paper, the proposed method recommends tags based on the nearest neighbor method. Compared with the classification and semantic based method, the nearest neighbor method is popular due to its effectiveness and scalability. Also, it does not require any training.
However, the performance of the existing nearest neighborbased tag recommendation methods depends on the number of k neighbors and may be affected due to the equal weight and voting irrespective of the class. This observation motivated us to develop a method which improves the performance of tag recommendation. The method first determines the weighted categorical nearest neighbors and suggest the tags based on the ranked categorical neighbor and weighted tag frequency.

Research Methodology
In this section, ITR-WTF proposed tag recommendation algorithm is described. The main objective of the proposed algorithm is to improve the accuracy of tag recommendation by identifying distance based nearest neighbor and rank the tags by combining image score and tag score.

ITR-WTF Tag Recommendation Method
The block diagram of the proposed method for tag recommendation is shown in Figure 2. The proposed method consists of three main modules: feature representation, classification and tag recommendation. Features play a very important role in image representation. Several researchers have worked on feature extraction methods for image retrieval using color, texture and shape features. Extracting an effective feature and represent them efficiently is very important.
Color is the most used feature in an image retrieval system. Using color features, a human can recognize most images and objects included in the images. Also, the color features are invariant to scaling, translation and rotation of an image. Another important feature is texture. Texture measures look for visual patterns in images and how they are spatially defined.
During the training phase, the features are extracted using color moment and wavelet packet transform [22,23]. The first, second and third moments are extracted as color features in L*a*b* color space by segmenting an image into two by two subblock along with a centralized sub-block of the same size resulting into 9 features for each sub-block. For texture features are extraction using wavelet packet transform, an image is decomposed into sub-bands up to level three using daubechies wavelet. The energy and standard deviation of each band are determined as texture features using eq.1 and 2. Each feature has a different range of values. To avoid the influence of one feature due to variation in a range of values, the features are normalized between 0 and 1 range using min-max normalization method.
There are two methods used for combining features: early and late fusion. In early fusion, the individual features are combined before image similarity score calculation. In late fusion, the image similarity score determines for each feature and combine the individual score to calculate the final score. The late fusion method needs more cost of computation [24]. For this reason, the early fusion technique is used to determine the image similarity score.

Nearest neighbor:
After feature extraction, the nearest neighbor classifier determines the neighbors of a test image. In the nearest neighbor classifier, the neighbors for each class/category are determined as shown in Figure 3 and combined to form the final neighbors. The harmonic mean of the neighbors of each category is determined and finally predicts the category with the smallest mean [25]. To improve the effectiveness of feature-based neighbors, the weight is assigned to each neighbor so that nearby training image will get more weight and the training images which are farther away will be assigned less weight.

ITR-WTF Model:
Once the feature based neighbors are determined, the method assign an importance to each tag based on image similarity score as follow: where ( , ) denotes the importance of image in predicting tag t according to their visual similarity; ( , ) is equal to one if tag is associated with an image , otherwise ( , ) is equal to zero. Finally, the score of tag t is calculated as follow: where Nt represents the number of images associated with t and NN represents the total number of nearest neighbors.

Algorithm
where Step 2: Calculate the distance between TV and TFVj using eq. where Step 3: Merge the nearest neighbors obtained using color and texture features Step 4: Assign an importance to each tag based on image similarity score as follows: where ( , ) = ( , ) * = 1 = 1, … … , Step 5: Calculate the w-tfidf of each tag as follows Step 6: Rank the tags according to the w-tfidf values and select top k values

Performance Metric
To evaluate the performance of proposed method, we use NDGC, Precision, Recall and F1-score.
Given an image with ranked tag list T1, T2, . . . , Tn, the NDCG is computed as where r(i) is the relevance level of the i th tag and Zn is a normalization constant that is chosen so that the optimal ranking's NDCG score is 1. After computing the NDCG measures of each image's tag list, we can average them to obtain an overall performance evaluation of the tag ranking method.
rel(i) is a binary indicator, which is equal to one if the i th tag in the ranking list is relevant to an input image, and zero otherwise.

Dataset
Two datasets are used: self-generated and NUS-WIDE. The self-generated dataset consists of images collected from Flickr image sharing website belonging different categories such as fish, actor, aeroplane, butterfly, autumn etc. and each category consists of 300 images. The size of the images is fixed with maximum width or height to 320 pixels. The eight categories of the images from NUS-WIDE dataset are used for experimentation. The images are divided into two groups: training and testing images after 10-fold cross validation.
For self-generated dataset, the tags associated with the images are collected from Flickr using public API. However, some tags are do not describe the image content. Therefore, tags related to year, camera and brands are excluded from the tag list.

Experimental Results
The performance of traditional K nearest neighbor and nearest neighbor is shown in Table 1 and 2 respectively.   Table 1 and Table 2 it is observed that the performance of nearest neighbor based on each category achieves good performance as compared to traditional k nearest neighbor.
The Figure 4 shows that the NDGC value is high for tag recommendation result when no of the tags recommended is 5 for self-generated dataset and 15 for NUS-WIDE dataset. Table 3 shows the performance of the existing and proposed tag recommendation algorithm. The performance of proposed tag recommendation algorithm is better as it can recommend/suggest tags with higher NDGC score.
Tagvoting method [18]: In this method, the feature based similar images are determined and the tags are recommended to an input image based on the frequency of tags that appeared in k visually similar images. The method assigns a uniform weight to each neighbor.
TagProp method [20]: In TagProp method, the weights are assigned each visual neighbor of a query image. The weights are assigned using rank based and distance-based method. NVote method [19]: In Nvote method, the tags are recommended based on the difference between global and local tag frequency by assigning equal weight to each neighbor.  Table 3, it is observed that the methods for tag recommendation using the nearest neighbor depend on the value of k which indicates the count of neighbors. The existing methods first identify the uniform/weighted neighbors and then consider only tag information for tag voting which affects the accuracy of tag recommendation. The proposed method improves the accuracy of tag recommendation by combining the image score and tag score.  Table 4 shows the result of tag recommendation obtained using the proposed algorithm on self-generated dataset. The initial tag does not describe the entire image content. The proposed algorithm recommends relevant tags to the images. For the second image in table 4 initial tags do not cover the trees, forest and path between the trees which are added by the proposed algorithm.
Therefore, the effectiveness of the proposed method ITR-WTF for tag recommendation is demonstrated using examples in Table 4.

Conclusion and future work
In the paper, a method is proposed for tag recommendation of the images by identifying rank neighbors from each category. The method improves the accuracy of tag recommendation by combining the tag frequency score and weighted similarity score of the nearest neighbor images of each category. The experimentation is done two datasets: self-generated and NUS-WIDE dataset. The effectiveness of the proposed method is demonstrated in the experimental results.
The future work will focus on: i) exploring the relationship between tags obtained using from nearest neighbor ii) developing a more optimized approach which works on large dataset iii) exploring metadata associated with the images.