Comparative Study of Semantic and Keyword Based Search Engines

A R T I C L E I N F O A B S T R A C T Article history: Received: 30 September, 2019 Accepted: 28 December, 2019 Online: 15 January, 2020 Day by day, the data on the web becomes very huge which makes it difficult to find relevant information. Search engines are one of the successful factors that can retrieve information from the Web. The process of seeking information by search engines helps users find information on the internet, however it is not an easy task to find the exact information from this massive data available on the Web. Semantic Web technology has an ability to focus on metadata rather than syntax, which made the semantic search engines to search for the meaning of keywords instead of the keyword syntax. Consequently, an effective role of performance in conventional search engines can be achieved by rising the accuracy of information returned by a search query. In this paper, a survey for syntactic-based search engines and semantic-based search engines are studied, a comprehensive comparison between the two is presented, finally, their technologies are compared and discussed.


Introduction
The term Web is one of the most important technologies that allow users to access huge and different information through different locations in the world [1]. This information which is stored on servers is typically unstructured or -semi structured data [2,3]. The data on the Web is tremendously increasing which leads to many obstacles such as difficulty of finding relevant information or discovering exact knowledge on the Web [4].
The first generations of Web search engines, such as AltaVista, were indexing the contents of Web pages. The second generation of search engines, for instance Google, were considering the links to/from a Web page as a method of determining relevance. Both generations were mostly syntactic, which means they were depending on the keywords as text in their queries. Searching for interested information in most of the current search engines will result of retrieving hundreds of thousands retrievals while most of them are not relevant to what is meant to be found [5,6]. Hence, to solve such difficulties, Semantic search is intended to be updated so that it depends on the meaning of the keywords instead of the context [7].
Semantic Search Engines (SSE) use Semantic Web (SW) technology in their systems which make them intelligent to retrieve related information on the Web [8]. The SW aims at  Bzar Khidir Hussan, Bzar.hussan@epu.edu.iq providing information as formal, well defined meanings, compatible, sharable knowledge base, and can be processed by machines [3]. Ontology acts an important role in the SW technology as it's famous as of the backbone of the SW structure, and is the vital element of SW infrastructure [7]. Web Ontology Language (OWL) and Resource Description Framework-Schema (RDFS) are the recommendations of the World Wide Web Consortium (W3C) for data representation models so as to deliver foundations for the ontology descriptions [4]. Ontologies provide distinct descriptions in their information, as a result, they are used in numerous fields and applications since its knowledge representation is understandable and processable by software agents and systems [8]. Ontology is a collection of semantically related concepts built on a limited number of predefined relations and terms of a domain. These terms and concepts can be represented visually so as to ease the representation for both syntactic and semantic data [9]. In Web, once abstract data is distributed across several knowledge bases, ontologies are the solely resolution as commonplaces to interpret the mutual senses of the domain key terms. Hence, significant concerns seek the development of ontologies [10].
Transforming syntactic search engines to semantic ones is not an easy task, since in the later one search results rely on the meaning of the query keywords, henceforth the search engines have to understand the keywords semantically in order to ASTESJ ISSN: 2415-6698 Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 1, 106-111 (2020) www.astesj.com Special Issue on Multidisciplinary Sciences and Engineering retrieve relevant information. To do so, a new layer could be added to the so called syntactic search engines [11] This paper presents the Web and SW technologies along with the Keyword and Semantic search engines. Keyword Search Engines (KSE)s are presented in section 2, section 3 reveals SSEs. In section 4, technologies of SSEs are presented, and section 5 exposes the most common SSEs. In section 6, a comprehensive study to the literature is given and a comparison of the works is presented. Section 7, gives a discussion panel for the studied systems. Finally, the suggestions and conclusions finalizing the research.

Keyword Search Engines
Conventional Search Engines are very helpful in finding information on the internet and getting results within some time, but they suffer from the fact that they do not know the meaning of the terms and expression used in the web pages and the relationship between them [12,13]. Surveys show that users who seek for information on the web do not find accurate results in the first set of URLs returned, because of increasing size of links on the web pages. Sometimes one word has several meanings and several words have the same meaning, in that case if a user wants to search for a particular word then it may produce confusion and user will not get what he wanted to search [2,14].
Search Engine Optimization (SEO) are used to find and search alternative search terms that people use with search engines while looking for similar subjects [15]. SEOs search for more keywords, which are used to achieve better rankings in search engines. Once a different keyword is found, they expand on it to find similar keywords for that keyword [16]. Keyword suggestion tools usually aid the process of finding similar keywords such as in [17] where substitutional keywords are the suggestions for the query. There are many techniques used in keyword search engines such as identify the core of the keyword, research related search terms, create a list of main terms and long-tail keywords, use the Google AdWords keyword planner…etc. [18].

Semantic Search Engines
The Semantic Search Engines (SSE)s are the intelligent engines that search for keywords depending on their meaning [19]. In addition, they guarantee the results those are related to the meaning of searched keywords. SSE use ontologies so that they achieve the meaningful retrievals and get a high accuracy result. [12]. SW considers as a Web 3.0, or an extension to the current web which represent information in order to link information in the web as a form of HTML, OWL and RDF files [5]. The SSEs are distinguished to have several types of relational links among verity types of resources instead of the single relation of resources. There are many examples of SSEs such as Hakia, DuckDuckGo, Swoogle… etc. The methods to store information within the SW technology are able to answer complex queries given to a search engine [17] [12].

Technologies of Semantic Search Engines
The SSEs are essentially based on some technologies or methods which effectively achieve the SSEs [20]. These technologies are sometimes called the SW layers which include applying some tools such as inference engines, rule languages, annotation tools… etc. One of the main technologies used in the SSEs are the ontologies which can be offered within the form of RDF, RDFs, and OWL [21]. These technologies are used in the structure of SSE which briefly can be described below:

Unicode and URI
Defines as the base level of the SW technology which is used for identification and determination of the resource location. The Unicode is used to standardize the letters of machines, while the URI is used for identifying each resource by unique name [22].

Extensible Markup Language (XML)
A subset of Standard Generalized Markup Language (SGML) and machine readable represented by markup language. This language is widely used in the web for some reasons such as it is simple, flexible text format and its structure used to describe data. XML meets the challenges of E-business and electronic publishing as well. In addition, it has a very important role to exchange between different kind of data on the web [23].

Resource Description Framework (RDF)
It is representing the primary layer used in SW. RDF is very important to represent data which can be processed by machines [17]. The method that is used to identify and provide the relationship among the resources called graph model. The best simple model language which is used in this layer is RDF Schema which is used to create relations and descriptions of resources [24].

Ontology Vocabulary
The ontologies are used to describe data of the SW and improve methods to give uniform way to make easy communications among different parts of resources and be understandable by each other. Ontology is the method that can provide common grammar and vocabulary of data that are published, specifically the description of semantic data which represent by ontology [18].

Logic, Proof and Trust
Are the last layers of the SW cake which follow the ontology layer. These layers are used to check and solve consistency problems and the trustworthiness of SW. In addition, redundancy of concept duplicate data [25].

Common Semantic Search Engines
The SW technology has grown a new generation of the web by using some new methods to search about the best results related to searcher intent. There are many engines which depend on the semantic approach [17]. In the following sections, the most common SSEs are presented.

Hakia
It considered one of the common SSEs which provide results that are relevant to the concept of words instead of main keywords. In other side, it's not just depending on the keywords but use the concepts of entire phrases such as questions or sentences. The one of the most important characteristics of Hakia is its capability to provide the results depending on equivalent concept such as "cure=treat" or "cons=disadvantages" [12]. The results of the search are divided into classes such as Web, News, and Video etc.; also, they can be divided according to the date or relevance. The technology which is used in Hakia is the OntoSem technology, which is a linguistic database [19]. The words here are categorized into different senses by depending on the QDEX (Query indexing technique), which considered as an infrastructure to index the data by Semantic Rank algorithm. This algorithm use the ontology and fuzzy logic to search about all possible requests [18].

Kngine
The Kngine divides the search results into two types: documents or images. This engine searches for the information related to the search term which means search about the concepts of the words [12]. It is very intelligent engine because its retrievals related to natural of question. For example, search about the city, the expected results will be related to the city lactation, events, weather and history [20].

DuckDuckGo
It has many of features that distinguish of other SSEs. When a keyword is being searched the results will have many related retrievals, meaning that, it provides different answers for the searcher, the searcher can choose the answer that is related to his intent [17]. For example, when we search about the term Apple, the engine will provide many of answer such as fruit, computer, bank, etc. [26]. The DuckDuckGo is distinguished among other types of semantic search engines by dealing with all users the same results when search the same term. Also, it deals with many other websites including Wikipedia, Bing and yahoo [27]. An approach to achieve a related and accurate information to user queries [12].
Considered as traditional engines which return the results depending on context of queries [18].
2 It depends on the stop words and punctuation marks. These marks effects on the search results [16].
It does not depend on the stop words and punctuation marks and the results are not accurate [21]. 3 The OWL and FDF languages are the base languages used to creating web pages [19].
The HTML, XML and CSS languages are the base languages used for creating the web pages [12]. 4 Seeks to provide the accuracy in returned information by understanding the meaning of keywords related to what the seeker desires [20].
It is searching exactly depending on words in the website which determined by searcher [22].

5
Tries to access to relations among main words by using the ontology [22].
Tries to expand the query by using keywords instead of using the methodology [25].

Sensebot
The search process is to analyze web pages and to define the keywords depending on semantic concepts. It provides many of the documents that are related to search term and make summarization of the content of the documents to give the best answer to searcher [25]. The summary brings the best idea of some topic related to the searcher query. This summary is coherent to the searcher. Also, it saves time to provide the best topic related to search term and references to right resources. The engine tries to understand the whole concepts of sentence to give the suitable answer to searcher [18]. The searcher does not need to open many pages to meet his requirements [20].

Swoogle
Swoogle is an intelligent SSE that searches about the meaning of the words instead of the syntax. In addition, it is considered as a crawler depending on indexing and retrieval systems [24]. The structure of Swoogle is divided into four main components which are: 1) metadata creation, 2) Simple Web Discovery (SWD), 3) data analysis, and 4) the interface [27]. At the backend the SWD creates the database of the SW documents depending on a hybrid approach. It uses the address of URLs, URLs from conventional search, analysis SWDs and generates new URI candidates, this is used to generate URLs to find SWDs on the web. The indexing part is used to index SWDs depending on its metadata. The used techniques in this engine are RDF/XML, N-triple [16]. In addition, some other languages are also used such as OWL, DAML, PDFS and RDF. The analysis part of the engine used to create metadata to describe documents to classify the Semantic Web Ontologies (SWOs) and Semantic Web Databases (SWDBs) [19]. The last part of the system called services which considered as the engine interface that tries to provide data services depending on ontologies at the term level [27].

Literature Review
The SW is one of the important subjects that many of authors work within this area. These days everyone focuses on some features and techniques that uses the SSEs. Sahu et al. [12] made a comparison among four kinds of search engines according to their performance, they conclude Google as the best one has features. In addition, they give better results in most cases compared with other type of search engines because it uses the semantic query. Finally, they concluded that Google and Yahoo are developing every day but Bing is developed every month while Ask.com had become every old. The order from top to down: Google, Yahoo and Bing respectively.
Shah et al. [22] compared between different SSEs by using RDF technique in their approach. The RDF is depending on several classification criteria on SSEs. In addition, some technologies that used there are discussed with the evaluation of their performance. The advantage and disadvantage of each search engines are also presented there. The paper included analyzing the search engines' technologies and how a researcher can reduce the flaws for each engine. Finally, discussions for each engine as better, most suitable depending on the purpose, the quality of results in SSEs are presented and how do they need to improve day by day.
Malve and Chawan [18] concludes that the SSEs are better and have many advantages over the KSEs in terms of accuracy of presenting the results. The process of search in SSEs depends on semantic queries. In SW, the users have more assurance to achieve the accuracy of information and getting the answers based on the meaning of words that been searched by users instead the page rank algorithms and keywords. In addition, the main different between KSEs and the SSEs are presented. They also present the clear idea of the techniques used in SW which enables the user to achieve the best information.
Qureshi et al. [19] focus on exploring differences of SW search dimensions. They use the excellent pyramid to test the different dimensions to study about the SSEs. Even now the SSEs are in their developing stage and there are few numbers of resources in this field. In addition, there are many of explore search by querying for various device and difference semantic search engines record and stored formerly. All related materials used in semantic search can be obtained by many authors. Each of the search engines are depending on the pyramid of standard SSE. The authors also compare analysis between different of emerging search engines depending on pyramid which shows the requirements of the search engines.
Jain et al. [17] concluded that, the web 2.0 search engines are different from the SW search engines, because those in the web 2.0 search engines are unable to give the answers directly to user's query. The reasons of this problem are the web 2.0 search engines consist of unstructured information in nature while the web 3.0 (SW) uses RDF format to form the information in more suitable structure which helps semantic search engines such as Falcon, Swoogle, SWSE etc. understand the data and try to give more efficient results to the user. Web 3.0 deals with the data which is structured by RDF or OWL formats only. In addition, the web 2.0 also consists of large library of data linked together in semi-structured such as CSV and XML. The data in web 2.0 can rearrange in OWL format which can be benefited by the SSEs to expand the area of data search. Finally, the authors concluded that SSE technique is what is used in crawling, indexing, ranking and result formation process.
Jagtap et al. [20] briefly surveys many kinds of the SSEs which use different type of methods to search information for the user query. Furthermore, a comparison between the intelligent SSEs and their techniques, and the search engines which depending on high recall of perspectives but low of accuracy are presented. The determination of identify users, inaccuracy queries and crawler efficient, and the used tools in the SSEs for search of information on the websites are discussed as well. In addition, the development of SSEs is efficient and uses the technology to answer the complex queries of users. Also, the author makes short overview of the best SSEs which use different approaches in many methods to present the unique search experience for the users. The search on the internet today is a challenge because the most of the complex question unanswered while the SSEs present the suitable answers to user's queries.
Chitre in [21] presents some of the SSEs depending on different approaches using by different methods to decrease the exclusive search experience for the users. In addition, the search process on the internet today is challenging to predicate the efficient answers of the user queries that are suitable for the meaning. The author provides ways for how SSE can do better performance outperforms the limitations of the KSEs.

Discussion
In this section, a discussion for the reviewed search engines will be presented. As shown from table 1, the main differences between SSE and KSE is that, KSE is based mostly on conventional technologies such as HTML, XML... etc. while SSEs are using Semantic Web technologies such as RDF, OWL…etc. As given in table, all of the systems use RDF, OWL and other semantic web technologies. As it is obvious from Table   2, the mutual advantage of the SSE is that all of them retrieve accurate results to the query, while the common disadvantage is that indexing of large chunks of data is a challenge. KSEs are simple to implement and most of the users know how to use it, while SSEs are more complex systems and need the technology of Semantic Web to be implemented. As shown from table 3, most of the SSEs are not yet common to the public, this is due to a good knowledge is required for lay-users to let them use it which makes it complex for them. The giant search engine companies such as Google and Bing started moderately to include Semantic Web in its search results so that the SW concept becomes more familiar for normal users.

Conclusion
The current Web offers an easy way to share information online, this makes the size of data on the web become huge gradually. Search engines help users to find information on the web. There are two types of search engines: Keywords Search Engines (KSE) and Semantic Search Engines (SSE). The KSE considers as the base search engines of the web, but they cannot find exact and accurate information to the user queries because they depend on the syntax of the keywords. SSEs solve this problem by looking into the meaning of the keywords and retrieves related results semantically. SSE depend on the technology of the Semantic Web which help them understand the concepts and help machines to understand and process information. In this paper, an overview of Web technology is given, search engines and their types are presented, then a comprehensive comparison for the two most common search engines KSE and SSE are exposed. Finally, a wide-ranging discussion for each of the reviewed systems with their used technologies, techniques, methods, pros and cons are presented in details.