Evaluating the Impact of Semantic Gaps on Estimating the Similarity Using Arabic Wordnet

Article history: Received: 30 August, 2020 Accepted: 21 October, 2020 Online: 26 October, 2020


Introduction
In Natural Language Processing applications, a common task is to estimate the semantic similarity among words [1]. Lexical resources, such as, bilingual and multilingual dictionaries, thesauruses, lexical ontologies (wordnets), machine translation services among others, are widely used to estimate the similarity [2]. For instance, various tasks of natural language processing, knowledge engineering, and computational linguists have exploited the lexical and semantic knowledge encoded in the English WordNet (EnWN) [3,4]; including sense disambiguation, information retrieval, text summarization, and question answering [5]- [6].
EnWN has been expanded to provide multilingual knowledge in many wordnet projects [7]- [8]. The Arabic WordNet (ArWN) [9] has extended EnWN by translating English synsets. However, English synsets that do not have translation in Arabic introduce semantic gaps in ArWN's semantic structure. For instance, synsets containing a single and polysemous word are difficult to determine their meaning by means of direct translation; in fact, more evidence is required to disambiguate their meaning [10]- [11]. Thus, similarity measures designed for English (i.e., English-based similarity measures) may not be effective in the same way when applied over resources in other languages; in this work we consider Arabic language. Experiment findings in [12] showed that ArWN has limited coverage of lexical and semantic knowledge compared to EnWN. Further attempts have been made to improve the content of ArWN [9], [13]- [14]. However, resolving the semantic gaps was not considered. In [15,16] they studied the performance of different similarity measures over ArWN. However, no explicit configuration was stated when calculating the similarity scores. Further, no explanation was given on how some semantic similarity scores were reported.
In [17], a preliminary study was conducted to examine the impact of the semantic gaps on estimating the semantic similarity scores using ArWN. They examined the impact of improving the semantic structure of ArWN on estimating the similarity between Ara-bic synsets. The semantic gaps were analyzed and identified. Then new synsets in Arabic were added to ArWN and mapped to their corresponding synsets in English, using interactive cross-lingual mapping approach [18]. The impact of the enriched ArWN was studied in semantic similarity experiment using only one English-based semantic similarity measure.
In this paper we extend previous work presented in [17]; a large scale experiment is conducted to further examine the degree to which wordnet-based applications can be influenced by improving their semantic structure, mainly considering ArWN. In particular, the main contributions of this work can be summarized as follow.
(ii) Study to which extent the semantic similarity measures that are developed for Arabic-based applications can perform efficiently well compared to English-based similarity measures. A comprehensive comparison between the similarity measures over the different configurations is provided, for both EnWN and ArWN.
The similarity scores obtained from the different measures, in the different settings, are compared to a standard benchmark for Arabic word pairs obtained from the AWSS dataset [23]. Two measures, the Person Correlation and the Mean Square Error measures, are used to quantify the performance of the similarity measures. Reported values indicate the importance of the semantic evidence obtained from the enrichment process, and its significant effect on estimating the semantic similarity between words. In addition, the results show that Arabic-based measures performs competitively good compared to English-based measures.
The rest of this paper is organized as follows. Section 2 overviews related works on building wordnets, and the development of wordnet-based semantic similarity measures. Section 3 and describes the approach used to evaluate the impact of Semantic Gaps on estimating the Similarity over ArWN. Section 4 discusses experiments conducted: the benchmark dataset, the performance measures, and the obtained results. Finally Section 5 draws some conclusions and outlines future work.

Related works
This section provides an overview of the construction of wordnets and the ArWN contents; presents wordnet-based semantic similarity measures, which will be used in the experiment.

Wordnets overview
Wordnets, also known as lexical ontologies [24], are considered to be a resource of lexical and semantic knowledge, which organize natural language words (lexicons) into synsets. A synset is a collection of synonym words that express one meaning in a specific context (i.e., concept) [3,25].
In wordnets, words are arranged in a lexical database. Words can have several senses, such that each sense of a given word is identified by a number and its part of speech type. For instance, the sense village#n#2 indicates the second (#2) nominal (#n) sense of the word "village". Words are linked through lexical relations, for example, antonym and synonymy relations. When a word can have more than one meaning, it is called polysemous word, which can be member of several synsets. Otherwise, it is called monosemous word, which is a member of a single synset. For example, the word "village" has three noun senses as defined in EnWN; which are indicated in the following set of synsets:{{village#n#1, small-town#n#1, settlement#n#2}, {village#n#2, hamle#n#3}, {Greenwich-village#n#1, village#n#3}}.
Synsets are related by semantic relations. The Hypernymy and Hyponymy relations are considered to be the key semantic relations that form the semantic structure in wordnets. Hypernymy is described as the inverse of Hyponymy. For instance, in Figure 1 the synset {village#n#2, hamle#n#3} is hypernymy of the synset {settlemt#n#6}, while the synset {settlemt#n#6} is hyponymy of the synset {village#n#2, hamle#n#3}. Further, definitions (glosses) are also attached to synsets to convey their meaning. For example, the word sense village#n#2 defined as "a settlement smaller than a town" 1 .
The HyperT ree of a given synset (i.e, word sense) is defined as the sequence of synsets that are linked with hypernymy relations, which connect a synset with its ancestor synsets up to the root node. The function HyperT rees(word) produces the set of HyperTrees which a given word belongs. Figure 1 shows an excerpt of nominal HyperTrees in English and their correspondence in Arabic 2 .
Computational linguistics has defined the Inter-Lingual Index [7], to establish links between different wordnets which is considered to be independent of language. For instance, nearequivalence and equivalence semantic relations are used to link synsets from the individual wordnets to the Inter-Lingual Index. Wordnets for several languages have been developed under the guidance of the Global WordNet Association 3 , which seeks to organize the creation and linking of wordnets. Further, the Open Multilingual WordNet project [31] offers access to open wordnets in a number of languages, which are all connected to the latest version of EnWN (v3.0) 4 .

Arabic wordnet contents
In the construction of ArWN [9], the extend method has been adopted. English Synsets have been translated into Arabic; and the structure of the EnWN (v2.0) has been inherited by ArWN . In the release of ArWN (v2.0) 5 , 23,841 Arabic words, such as broken plurals, Named Entities, and roots have formed 11,296 synsets. Twenty-two types of semantic relationships have been used to connect synsets that formed 161,705 semantic links. Consequently, and in comparison with EnWN, which contains 147,306 words (117,659 synsets) 6 ; one can observe that ArWN has a limited coverage in terms of semantic relations and lexicons [12]. To this end, many attempts have been made to enhance the quality of ArWN by expanding its lexical coverage [13,9] or semantic relationships [32,14] by different approaches. In [32] they released their work under the Lexical Markup Framework. However, the public release of ArWN ignores the synsets that are not linked to EnWN [31]. Nevertheless, synsets (semantic gaps) which are resolved in this work will be made for public 7 . In future work we plan to compile an xml format of ArWN enhanced structure, to enable researcher to utilize the ArWN in different applications.

Wordnet-based similarity measures
In linguistics, philosophy and information theory, estimating the semantic similarity between concepts is extensively studied [2,15], which is a common and crucial task in many NLP applications, text summarization, word sense disambiguation, entailment, machine translation, among many others [33]- [6], [34,35].
Estimating the semantic similarity between words is calculated by measuring the similarity between concepts (synsets) associated with the words [2]. Given two words, one can calculate the semantic similarity by exploiting wordnet (i.e, a lexical knowledge base). The lexical and semantic knowledge in wordnet have been used in many semantic similarity measures, which are originally designed and evaluated over EnWN (English-based measures) [36,2].
In [15] they defined four broad categories of the similarity measures; Path-based similarity measures [2,16,19,20,21,22]; information content similarity measures [37,38]; feature-based similarity measures [39]; and hybrid similarity measures [40,41]. There have been few works concerned with the similarity of Arabic; AWSS measure [22] and Aldiery measure [16]. These have mainly adapted measures from those constructed for English. In particular, Li measure [2] was adapted, which is a path-based measure that consider the depth of concepts in the HyperTrees; the distance between two compared concepts; and the depth of the least common concept (lsc) that subsumed two compared concepts. Noting that, these measures needs to tune weighting parameters to find the optimal values [22,16]. In this regards, several preliminary experiments are necessary to find the best weights that provide the optimal values.
An attempt to investigate the performance of the similarity measures over ArWN was conducted in [15]. They studied the performance of seven measures; including AWSS measure [22]. All measures were applied over 40 word pairs that are selected from AWSS dataset [23], which are also considered as the benchmark dataset in this work. The experiments findings [15] Figure 2: The adopted approach overview WuP measure [20] has the best performance in estimating the semantic similarity between Arabic word pairs. The experiments in [16] also introduced a competitive Arabic-based similarity measures (Aldiery measure) in comparison to WuP measure.
In [17] they further studied the impact of enhancing the Hyper-Tree over the Wup measure. This work adopted and extend their experimental configurations and examine further the impact of the enhanced semantic structure of ArWN over Six measures including English and Arabic path-based measure, further details are provided in Section 4.
Recall that, for a given concepts c i and c j , the function S im m (c i , c j ) calculates the semantic similarity between c i and c j , where m indicates the name of the measure. Next the description of the measures used in the experiment is given.
1. Path measure [19] finds the shortest path between the two concepts, by counting the number of edge (hypernymy relation) between the concepts, in order to compute the semantic similarity. Path measure which is considered as the pioneer similarity measure is defined in equation (1).
Where the length function, len(c i , c j ), returns the length of the shortest path between c i and c j in the wordnet semantic hierarchy. For example, in Figure 1, len(hill#2, mountain#1) = 3, and S im path (hill#2, mountain#1) = 0.333.

2.
Wup measure [20] calculates the similarity by computing the distance between the two concepts and the maximum depth of the least common concept (lsc) that subsumed the two concepts under evaluation. WuP measure is defined in equation (2).
Where d(c i ) is the depth of the concept c i using edge counting in the semantic hierarchy, lcs(c i , c j ) is the least common subsumer of c i and c j, d(lcs(c i , c j )) is the maximum length between lcs of c i and c j and the root of the hierarchy, where d(entity) = 1. For example in Figure 1, d(hill#2) = 7, d(mountain#1) = 7, d(lcs(hill#2, mountain#1) = 6, and S im Wup (hill#2, mountain#1) = 0.857.
3. Lch measure [21] uses the length of the shortest path between the two concepts, and also the maximum depth of the semantic hierarchy of a given part of speech type. Lch measure is defined in equation (3).
Where, maxDepth pos is the maximum depth of the hypernymy structure for a given part of speech. For instance, maxDepth n is 20 and 15 in EnWN and ArWN, respectively. For example in Figure 1, S im Lch (hill#2, mountain#1) = −log(3/2 * 20) = 2.590.
Noting that the Lch scores reported in Section 4 are normalized into the range 0 to 1 by dividing Lch scors over 3,688, Hence, S im Lch (hill#2, mountain#1) = 0.702.

4.
Li measure [2] computes the similarity using non-linear function, which consumes the shortest length between concepts and the minimum depth of the concepts in the semantic hierarchy. Li measure is defined in equation (4).
Noting that the parameters α and β need to be calculated manually for good performance. The optimal parameters are α = 0.2 and β = 0.6 as reported in [2]. For example, S im Li (hill#2, mountain#1) = 0.548.
5. AWSS measure [22] is an Arabic-based measure that adapted Li measure to compute semantic similarity with modification on the depth and length computation to be proper for ArWN [23]. AWSS measure is defined in equation (5).
www.astesj.com 6. Aldiery measure [16] is an Arabic-based measure also adapted Li measure to compute semantic similarity with modification on the depth and length computation to be proper for ArWN. Aldiery measure is defined in equation (6).
Noting that, the similarity functions defined above consume either words, or word senses as parameters. In the first case, the similarity function returns the highest similarity score for all the possible combination of word senses for the two given words. In the second case, it returns the similarity score between the two defined senses.
In addition, the six measures defined in the equations (1,2,3,4,5, and 6) are path-based measures, this study focus on the impact of the structure without interference of other semantic evidence such as features extracted from corpuses, which depend on the quality of the used cuprous, as well as the availability of resources in Arabic.
On the other hand, Path, WuP, and Lch measures are considered as linear path-based measures, while Li measure is a non-linear path based measure. AWSS and Aldiery are also non-linear path based measures, which are derived from Li and purposely developed for Arabic.
Observe that, for the Path, Wup, and Lch measures no weights are required to be tuned. While the other measures need to find optimal value of the defined weights. The four English-based measures, as well as the two Arabic-based measures are selected because they achieved good performance against other measures [22,16], and to compare the performance between the measures using Arabic benchmark dataset.

Evaluating the impact of semantic gaps on estimating the similarity
This section presents the approach that is used to evaluate the impact of enhancing the structure of ArWN on estimating the semantic similarity. Figure 2 illustrates the main phases of the approach, which are explained as follow.
In total, [17] reported that 5, 493 (69%) of the 7, 960 nominal synsets in ArWN have at least one semantic gap. In particular, compared to the structure of EnWN, the semantic gaps have been resulted from the missing of 88 synsets in ArWN.
The distribution frequency of the semantic gaps in ArWN is reported in Table 1, "Semantic Gaps" refers to the number of synsets that have the reported freq, and "Freq" indicates the number of HyperTrees that have at least one semantic gap. For instance, the first column reports an English synset ({"physical-entity#1"}) that has no correspondence in Arabic, introduces 4, 525 semantic gaps in ArWN. While the 8 th column indicates two synsets ({"armed-service#1",. . . }, and {"health-care-provider#1",...}), each introduces 30 semantic gaps in ArWN. Last column reports the totals.
2. HyperTrees Improvement. In this phase ICLM Web application [18] is used to fill the identified semantic gaps. ICLM is a semi-automatic matching approach that supports feedback provided by multiple users. In ICLM the number of users that are asked to perform each mapping task is estimated based on the lexical characterization of concepts under evaluation, i.e., on the estimation of the ambiguity conveyed by the concepts involved in mappings [42], with the assumption that as the selection tasks difficulties increase, more users agreement is required.
The candidate matching of the source concepts in Arabic are automatically computed to the English target concepts using a lexical based disambiguation algorithm [43]. The study [42] recommended that combining lexical resources improves the quality of translations and provide a valuable support for candidate match retrieval in cross-lingual ontology matching problems. Accordingly, translations of the missing synsets are collected by combining lexical knowledge from different external resources. English synset translation was The difficulties of the mapping selection tasks, that is determining the number of user which are asked to perform the task, are estimated using lexical characteristics of concepts under evaluation: Ambiguity of lexicalization, Synonymrichness, and Uncertainty in the selection Step. The mapping tasks are validated by some users based on a CAUTIOUS strategy. The task difficulty level is estimated as Low, Mid, and High level. One, three, or five users are asked to perform the Low, Mid, or High tasks, respectively.
In [17] ten users (bilingual speakers) are asked to validate the mapping tasks, that is, to fill a semantic gap in ArWN, and accordingly define new link with EnWN, hence, import the semantic relations among the concepts. The top ten frequent semantic gap are listed in Table 2. As a result 94% of the identified gaps are resolved, that is more than 98% of HyperTrees are filled in.
Observe that, some concepts are hard to resolve, and more evidences are needed. For Example, {mechanism#3}, {attache#1}, and {climber#1} synsets, which contain a single and polysemous word, are hard to determine their meaning with direct translation and no context [42], for this reason in the validation task users did not reach an agreement. Noting that, the semantic gaps for every word sense in the benchmark dataset used in the experiment are resolved.
3. Calculate Similarity. In this phase similarity measures defined in Section 2.3 are applied over the ArWN and EnWN using Arabic benchmark dataset (AWSS dataset [23]

Experiment
The conducted experiment aims at studying the efficacy of the semantic evidence in ArWN. In particular, the experiment focuses on the improvement of hypernymy relations in the semantic structure of ArWN. The experiment studies the extent to which the semantic structure of ArWN affects measuring the semantic similarity between concepts. This section reports and discusses the results obtained from running a set of configurations for measuring the semantic similarity scores over ArWN and EnWN. Next sections present the tool which is used to calculate the semantic similarity scores, the benchmark dataset, the measures used to evaluate the performance of the structure improvement, and discuss obtained results.

Similarity Measure Tools
Significant efforts are being made in developing similarity measures to consume ArWN content. For example, the Java ArWN API 12 . The application consumes Arabic words with diacritics (vocalized), whereas the benchmark dataset in this experiment contains unvocalized (without diacritics) word pairs. If Arabic words are vocalized, similar to the work done in [16,15], then their senses will be defined in advance. The experiment's configuration DS (see Section 4.4) studies the performance of determining the word senses on the similarity scores.
To avoid predefined senses, in this experiment the similarity scores are obtained using the WS 4J online application 13 . In computing the scores, WS 4J uses EnWN's semantic structure (v3.0), which is used to measure the similarity scores between Arabic words. Noting that, in this experiment Arabic senses under evaluation have the same structure of their correspondence senses in English, as the semantic gaps in ArWN has been improved and linked to EnWN(v3.0). The similarity scores between the Arabic concepts are then measured using their correspondence concepts in EnWN. In addition, WS 4J provides the description of all Hyper-Tree of words under evaluation. The HyperTrees which returned for EnWN are validated to obtain Arabic words' HyperTrees with semantic gaps as depicted in Figure 1. For instance, this information is necessary to measure the similarity scores in uHT configuration, details are provided in Section 4.4.

Benchmark dataset
Similar to the work performed in [15,16], the AWSS benchmark [22] will be used in this experiment. The obtained similarity scores will be compared with Human Judgments obtained from the dataset of AWSS [23]. The AWSS dataset contains 70 nominal word pairs of Arabic, divided into three similarity levels, Low, Medium, and High; 40 word pairs are selected and used in this experiment, listed in Table 3,which are also used in [15,16]. Noting that, some words in the dataset benchmark are not covered in ArWN. For instance, the words " " stove, " " wizard, and " " magician are not covered in ArWN, hence, the 3 rd and 38 th word pairs are not covered in the experiment. While, the words " " smile and " " Gem, which are also not covered in ArWN, instead the words " " and " " are used to measure the similarity scores, respectively.

Performance Measures
The obtained similarity scores are evaluated against human ratings benchmark (HR), which is a human judgment similarity scores of Arabic nominal word pairs obtained from the dataset of AWSS.
Two measures are used to quantify the performance of the obtained similarity scores. The Person Correlation measure (r) defines the strength of the linear relationship between the obtained similarity scores and HR; the Mean Squared Error (MS E) calculates the average squared difference between the similarity scores and HR. The best performance is indicated by a similarity measure with the smallest MS E value and r value is close to 1. While the negative r value means that the obtained scores are increase as the HR ratings decrease. In addition, the similarity scores are compared to the performance results reported in [15,16], which are listed in Table 4.

Experimental settings
Six path-based semantic similarity measures, which are defined in equations (1,2,3,4,5, and 6), will be applied over the Arabic word pairs benchmark dataset, which is described in Section 4.2. Using the following configurations, the similarity measures are applied over ArWN and EnWN to quantify the efficiency of ArWN structure enrichment: 1. UnDefined Senses (uDS ): calculates the semantic similarity between given words without determining their senses. In this setting, which is considered as the default setting of the similarity measures, the similarity measure returns the maximum score obtained from the all possible combination of the senses of the given words.
2. Defined Senses (DS ): calculates the semantic similarity between given words senses (i.e, sense are determined in advance). By extending the work in [17], the sense of each word pairs under evaluation is determined based on a majority vote (consensus) approach. Similar to the tasks of filling the semantic gaps [17,18] (see Section 3), the CAUTIOUS strategy is adopted, where users are avoided to decide among word pairs that share the same words.  translations defined in the benchmark dataset. In wnT rans the maximum similarity score is selected, such that the ArWN and the EnWN cover the Arabic word and its translation in English, respectively. Otherwise, the default setting uDS is applied.

Upper Bound (U B):
calculates the semantic similarity between given words senses, such that, U B selects the sense pair that maximize correlation r values and minimize MS E values w.r.t the HR ratings (benchmark dataset). U B indicates the optimal scores for the considered experiment settings.

5.
Unimproved HyperTrees (uHT ): calculates the semantic similarity using ArWN while ignoring the structure enhancement. That is, the semantic gaps are considered in calculating the similarly scores.

Results & Discussion
Tables 5, 6, 7, and 8 report the semantic similarity scores using six similarity measures, which resulted from applying uDS , DS , wnT rans and U B configurations over ArWN; respectively. Such that two variants, uHT and iHT , are considered. The tables also list the Arabic senses and their correspondences senses in English, which are used to provide the obtained similarity scores. Table 9 reports the semantic similarity scores that are obtained from applying uDs, DS , and U B configurations over EnWN. English-based www.astesj.com Observe that the word senses are defined differently based on the applied configuration. For example, the word boy " " is selected differently w.r.t the applied configuration; in Table 5, in the uDS setting the selected sense is (Sabiy 1, juvenile#1) 14 , in DS (Table 6) and wnT rans (Table 7) settings the selected sense is (walad 1, boy#1), and in U B ( ; lad: }. For example, the word " " has one sense in ArWN "tawoqiyE 1", which is mapped into the "endorsement#5" in EnWN, while none of the five senses for the word "signature" in EnWN is mapped into ArWN. Noting that 28 word pairs out of the 40 word pairs has at least one missing correspondence sense in EnWN when considering uHT setting, For example; similarity scores of the 21 st word pairs (Hill " "; mountain " "); which is also illustrated in  The performance measures r and MS E are reported for every configuration in the bottom ofTables 6, 5, 7, and 8; including the performance for each similarity level. Observe that, r values show that iHT achieves better performance compared to uHT . While; MS E values indicate that uHT has less difference in similarity scores than iHT , compared to HR rates. In fact, the values of MS E are strongly influenced by uHT , the semantic gaps. Noting that when HyperT rees of two senses have the same semantic gaps; the lcs is reduced which decreases the similarity scores. This gives less difference in similarity scores compared to HR rates. In particular, this happens for MS E values at mid similarity level. For examples, in row 10, the HyperT rees of the word pairs (Glass; Diamond) has the {physical entity#1} as a semantic gap. That is, d(glass#2) = 9; d(diamond#2) = 8 and d(lcs(glass#2, diamond#2)) = 3; while d(kuwb 1) = 8; d(AlomAs 1) = 7; d(lcs(kuwb 1, AlomAs 1)) = 2.
Furthermore, wnT rans configuration scored the worst performance; this is due to the low Arabic word coverage. A significant finding is that, the richness of ArWN content has a high effect on the evaluation the semantic similarity between the concepts, in terms of the coverage of lexical and semantic relations.
Performance measures in [15,16]; presented in Table 4; showed that WuP measure scored the best MS E value 0.0165 with 0.94 for r; and comparatively Aldiery measure has obtained the values 0.96 www.astesj.com  [15,16] semantic similarity scores were reported to be equal to zero for the word pairs in rows 1 − 9, which are at the low similarity level, and the word pair in row 21 was considered as not covered ArWN, hence, this increased the r values and reduced MS E values. However, no explanation is provided.
Overall, the reported performance values show that the enhancement of the semantic structure has a strong effect on estimating the semantic similarity between the concepts. Observe that, word pairs at low and mid similarity levels gives better r values than high similarity level. While words pairs in high similarity level gives better MS E values. in other words, similarity measures obtained best coloration values when the concepts are not similar. Both ArWN and EnWN, r and MS E measures indicate that best performance is achieved when word senses are determined in advance, i.e., DS configuration. However, it is important to distinguish the approach which is used to define the sense, in this work consensus based approach is used.
In other hand; the user feedback based approach, ICLM application that adopted to fill the semantic gaps, shows its effectiveness in selecting the senses, such that scores obtained in DS are close to optimal scores achieved with upper bound setting U B. Further, Arabic-based measure Aldiery performs better than AWS S , also www.astesj.com Aldiery measure provided a competitive performance in comparison to WuP measures.

Conclusion & Future Work
Six path-based similarity measures including English and Arabic based measures are applied over ArWN and EnWN to examine the effect of the improvement of the lexical and semantic coverage on wordnet-based semantic similarity measures. wo variants uHT and iHT of ArWN structure are considered in the experiment to evaluate the impact of filling the semantic gaps on estimating the semantic similarity. The efficacy of the improved structure is examined by experiments in the context of semantic similarity. The semantic similarity scores for a benchmark dataset, human rating for 40 Arabic nominal word pairs, are calculated over ArWN and EnWN in different configurations (uDs, DS , wnT rans, and U B). The obtained performance values indicate the importance of the semantic evidence gained with the enrichment process; and its signification effect on estimating the semantic similarity between concepts. Moreover, when considering Arabic-based measures the experiment results showed that Aldiery measure performs better than AWS S measure. Beside that, Aldiery measure has provided a competitive performance in comparison to the English-based WuP measures. Finally, the resolved semantic gaps of the new structure are made for public. As a future direction, we plan to compile xml format of the new structure, and to integrate it with available ArWN resources (i,e., ArWN release available at Open Multilingual WordNet [31]). It is also interesting is to study the effect of the semantic gaps over NLP applications; for instances Question Answering similar to the work presented in [44], and word sense disambiguation [33,35] in the context of Arabic.