Text Mining Techniques for Sentiment Analysis of Arabic Dialects: Literature Review

Article history: Received: 28 November, 2020 Accepted: 06 February, 2021 Online: 16 February, 2021 Social media attracts a lot of users around the world. Many reasons drive people to use social media sites such as expressing opinions and ideas, displaying their diaries and sharing them with others, social communication with family and friends and building new social relationships, learning and sharing knowledge. Written text is one of the most common forms used for communication while using social media sites. People use written texts in different languages, and due to the increased usage of social networking sites around the world, the amount of texts and data resulting from this use is large. These generated data considered as a valuable source of information that attracted business owners, companies, government institutions, and of course, it attracts researchers and data scientists as well. Researchers and data scientists increasingly presented great efforts in investigating and analyzing Arabic Language texts. Most of these efforts targeted the Modern Standard form of Arabic Language. While exploring the social media sites, most of the Arab users tend to use their dialects while utilizing Social Media sites, which results in generating a massive amount of Arabic Dialects texts. The number of researches and analysis of Dialects' form of the Arabic language are limited, however, it is increasing recently. This literature review aims to explore approaches and methods used for Sentiment Analysis of Arabic Dialects text.


Introduction
Social Media sites have become very popular in society, the popularity of social media is increasing day by day. Recently, many people prefer to spend their time using various applications in smart devices and using the Internet as well. Perhaps social media may take the majority of this usage. People use social media for various reasons such as online shopping, learning, communication, expressing opinions and ideas, sharing their diaries, and many different reasons. People tend to express their opinions, thoughts, feelings, and comment on the various topics that are posted on social media using their dialects. Dialects are the informal form of the language. Each country of the Arab world has its Dialect, and each dialect has many sub-dialects. In [1], the author stated that the population of the Arab world prefer to utilize their dialects in their daily communication, Arabic dialects increasingly utilized online for communications and in social media, moreover, Arabic dialects utilized in TV shows as well as radio programs. As social media usage increased sharply, the amount of data generated as a result of this usage is increasing as well. In [2], the author stated that due to the great amount of data in the form of Natural Language generated in a daily manner online, there is a great need to process this kind of data. This huge amount of generated data attracted companies' owners, marketers and business owners, government institutes and, scientists and researchers as well.

Arabic Language Background
Arabic is one of the most popular languages that are spoken by millions of people all around the world. In [5], the author stated that the Arabic language is considered the fifth most common language that is spoken by more than 420 million people all around the world. The Arabic language has its unique features. It consists of 28 letters and it is written from the right side to the left side. In [6], the author mentioned that based upon statistics presented by Wikipedia in 2018; the Arabic language is the official language of 25 nations and 380 million is the approximate number of Arabic speakers. In [7], the author mentioned that the Arabic language is one of the Semitic languages meaning that it is written from the right side to left, moreover, Arabic language letters shape changed according to the position of the letter in the word itself. In [1], the author stated that Arabic language letters are used as well in Malay, Urdu, and Persian languages. The Arabic Language is written from right to left. The Arabic language has three types i.e. Classical Arabic which is the language of the Holy Qur'an, Modern Standard Arabic, and Dialectal Arabic form. The Arabic language attracted researchers due to the increased usage of this language over the internet. In [7], the author stated that Arabic users over social media are increasing year after year according to official statistics, this increase resulted in a massive amount of data generated daily online that are in Arabic language. Hence, there is an increasing need for powerful tools and effective approaches for processing Arabic language texts that are in either the Modern Standard Arabic form or in the Arabic Dialects form.

Modern Standard Arabic
Modern Standard Arabic (MSA) is the standard form of the Arabic language that is used in formal papers, schoolbooks, education, TV news, newspapers, street signs, etc. Modern Standard Arabic has a written standard format while the dialects are not. As mentioned earlier, the Arabic language attracted researchers due to the increased use of this language over the internet. Most of the researches that targeted the Arabic language focused on the Modern Standard Arabic form of the Arabic language. In [8], the author stated that NLP tools and applications are mostly based upon the Modern Standard Arabic form of the Arabic language. Modern Standard Arabic is closer to Classical Arabic compared to Arabic Dialect that is less related to classical Arabic. In [9], the author attracted by NLP for Arabic Language and most of the researches done concentrated on MSA as reported in a systematic review.

Arabic Dialect
Arab World consists of 22 countries. Each of these Arab countries has a special Arabic dialect that their population used for daily conversations and talk. In [1], the author mentioned some of the most common Arabic Dialects such as Levantine Arabic, Egyptian Arabic, Gulf Arabic, North African Arabic, and many other Arabic Dialects are spoken by the Arab population. It stated that the Arab population prefers to use their dialects in their daily communication, Arabic dialects increasingly used online in social media sites, moreover, Arabic dialects appeared and utilized in TV shows as well as radio programs. The main dialect for each country can be divided into more sub-dialects. It stated that Arabic Dialects consists of Arabic and non-Arabic words that exist as a result of many reasons; an example of the reasons: Gulf people traveling to India and Iran, moreover, European traders came to Gulf countries after oil discovery, these reasons resulted in non-Arabic words existence in Gulf dialects. In [10], the author interested in their research in studying Arabic dialects. The authors described how Gulf Arabic dialect is the language of the population of Gulf Cooperation, however, this Gulf Arabic dialect is differing slightly between the population of each of the Gulf Cooperation Countries. In [1], the author mentioned that the limited number of Dialects software and NLP tools resulted in limited works and researches that studied and analyzed Dialects. However, [9] in a systematic review presented some research papers that have studied Arabic Dialects. Authors presented valuable researches done in the field of basic language analysis such as ADAM which is an Analyzer for Dialectal Arabic Morphology of Egyptian and Levantine dialectal language, and CALIMA which is an analyzer for Egyptian dialects morphological. Authors as well presented researches and works that have been concentrated on building resources such as Curras which is a dataset of Palestinian dialects and it consists of 56,000 morphologically annotated tokens, DART which is a dataset of around 25000 Arabic tweets, ArabicWeb16 which is a dataset of 10.8 TB of Arabic dialects, CALYOU which is a dataset of Algerian dialect, NileULex which is an Arabic sentiment lexicon of Egyptian and Levantine dialects in addition to Modern Standard Arabic, and TSAC dataset which is sentiment analysis dataset for Tunisian dialects.

Machine. Learning
Machine Learning involves constructing systems and models that can be improved over experience. It stated that Machine Learning involves machines and systems that can program themselves to learn and get the knowledge needed for better performance. The most common machine learning methods are Classification, Clustering, Regression, Deep Learning and Neural Networks, Transfer Learning, Word Embeddings, Natural Language Processing, Dimensionality Reduction, Reinforcement Learning, and Ensemble Methods. Deep Learning and Neural Networks, Word Embeddings, Transfer Learning, and Natural Language Processing methods of Machine Learning will be further explained below.

Deep Learning and Neural Networks
Deep Learning and Neural Networks are considered a revolutionary approach in the Machine Learning domain. In [6], the author explained that Artificial Neural Network utilized for complex problem solving as it functions in a way similar to Neural Network in brains of humans. Deep Neural Networks are known for their accuracy as well as outstanding performance. Deep Learning approaches are used increasingly for NLP tasks. Researchers utilized Deep Learning approaches for Arabic NLP. As an example of researches used Deep Learning for Arabic NLP: [6] used nine Deep Learning models for text categorization. Moreover, they utilized Word Embeddings approaches and evaluated performance and accuracy. Results showed that all of the nine Deep Learning models presented very good performance and high accuracy, moreover, the use of Word embeddings increased the accuracy and improved the performance. Additionally, in [11] the author investigated different Deep Learning models for Arabic Dialects text classification. Authors concentrated on Egyptian, Levantine, and Gulf dialects and reported that for Egyptian-Gulf pair; Bi-Directions LSTM offered better performance than other Deep Learning models, while for other dialects pairs; LSTM presented better performance.

Transfer Learning
Transfer learning involves using tasks or models that have been learned and transfer the learned knowledge along with applying improvements for a new task or model. In [12], the author defined transfer learning as the process of using data from a source domain to solve problems in another domain. The problem aimed to be solved is related to the data from the source domain, but it is different. In [13], the author successfully presented a model in which Transfer Learning can be used effectively in the case of multiple source domains used for solving problems in multiple target domains. In [14], authors used the Transfer Learning method for ANLP. Transfer learning was used as an extension for the word embeddings model. Authors investigated the effectiveness of the extension applied to skipgram model, the extension involved incorporation of lemmas and efficient use of word2vec word embedding model. The authors reported that the extended model presented better performance than word2vec and fastText on the Arabic word similarity task.

Word Embeddings
Word embeddings is an emerging field that involves distributed word representations which mean representing words as vectors in space. Word Embeddings models are either monolingual or bilingual. The most common is the monolingual models. In [15], the author stated that monolingual word embedding models can be utilized for word order and morphology, while bilingual word embedding models can be utilized for machine translation and parallel sentence extraction. The authors explained that bilingual word embedding models are vector representations of two languages, these languages are mapped into the same space. Word Embeddings have been implemented and utilized for NLP purposes. In [16], the author stated that word embeddings involve using semantic features for representing words as vectors, word embeddings utilized in NLP most commonly for classification and sentiment analysis. In [17], the author stated that the most common Word Embeddings methods are Word2Vec and GloVe. In [18], the author mentioned that there are 4 Arabic word Embeddings which are CBOW, GloVe, Skipgram model, and Arabic part of the Polyglot word embeddings. The authors evaluated 4 Arabic word embeddings models utilizing benchmark and reported that the best performance achieved from the CBOW model, while the least performance was achieved from the Polyglot model of word embeddings. In [5], the author defined Word Embeddings as vectors used to represent words in continuous space to find any relation between them. The authors presented AraVec which is an open-source Word Embeddings project utilized in the ANLP field. In [19], authors enlarged the informative content of the training sentences by efficient adaptations to word embeddings tools which result in improving the accuracy and performance. Authors as well were able to successfully utilize one embedding space to represent disparate dialects.

Natural Language Processing NLP
Language is the way of communication between people. Language helps us to understand the world around us. The languages that are spoken by people all around the world are known as natural languages. Natural Language Processing (NLP) involves the use of computers to understand and deal with natural languages. In [70], the author define NPL as a section of Artificial Intelligence and Computer Science that involves studying the interactions between human natural languages and computers, moreover, NLP involves Natural Language understanding and generation. The authors mentioned that the increased information in natural language form increased the need for understanding and processing this kind of information. In [2], the author agreed that the massive amount of Natural Language form of data generated daily online increased the need for processing this kind of data. The authors identified NLP as the process of automatic analysis, understanding, and presentation of human Natural Languages.

Arabic Natural Language Processing ANLP
ANLP is short for Arabic Natural Language Processing and it involves automatic analysis and processing of Arabic Natural Language. As mentioned earlier, the Arabic Language has three main forms; Classical Arabic, Modern Standard Arabic (MSA), and Arabic Dialects (AD). ANLP tools are supposed to have the ability to deal with the three forms of the Arabic language. However, Classical Arabic is rarely targeted by researchers as it represents the Arabic form of the Holy Qur'an. Tools and techniques are mostly utilized for MSA compared to AD. In [10], the author stated that the use of ANLP tools for AD may be hard due to the nature of AD and the differences between MSA and AD i.e. phonological differences and morphological differences.

Researches and works on Arabic Dialects
Recently, Arabic Dialects AD attracted researchers. The need to analyze, classify and process the Arabic dialects is increasing due to the fact of increasing the content of Dialect texts, especially in Social Media as stated in [10] . The authors stated that efforts done on MSA are big compared to the works on AD which are limited and mostly targeted Egyptian and Saudi Dialects. However, researchers increasingly do researches and studies that targeted AD. Researches conducted on AD involve basic language analysis, building resources, language identification, and Semantic level analysis. One of the most common examples of semantic level analysis is Sentiment Analysis.

Basic Language Analysis
Basic language analysis for Arabic Dialects involves Orthographic Analysis, Morphological Analysis, and Syntactical Analysis. Sections below present a brief description of each of the basic language analysis type.

Arabic Dialects Orthographic Analysis
Arabic dialects have no standard orthographic format meaning that the same word can be written in two or more different ways which may release challenges for NLP tools. In [8], the author stated that MSA and AD are phonologically different, AD have no standard orthographic, i.e. there is no standard format for written AD, Arabic Dialects usually written based upon its phonetics which makes it difficult for analyzing and processing AD. Researchers presented efforts in orthographic analysis for the Arabic language. In [20], the author introduced CODA which is a Conventional Orthography for Dialectal Arabic. CODA offered a computational model that can be utilized for AD. In [21], the author presented valuable efforts in providing conventional orthography that can be utilized for Tunisian Arabic. The presented conventional orthography is based upon CODA that was mentioned earlier. In [10], the author introduced Gumar Corpus which is a Gulf dialects corpus that consists of 110 million words. The corpus was annotated, and the authors presented guidelines for standard orthography analysis.

Arabic Dialects Morphological Analysis
The Arabic language is recognized as a rich language of Morphology. In [22], the author defined morphology as the science that involves extracting the word's branches from the word's source. In [8], the author explained how the morphology of MSA is different from the morphology of AD even the grammar, as well as stems of words, may differ. The exploration of Arabic dialects morphology attracted researchers early. In [23], the author introduced MAGEAD which is an Arabic Language Morphological analyzer. MAGEAD is considered as an online morphological generator as well as an analyzer. In [24], the author presented an accurate Egyptian dialect morphological analyzer which is an extension for the Egyptian Colloquial Arabic Lexicon. In [25], the author constructed a lexicon for Tunisian dialects and proposed an approach for Tunisian dialects morphological analysis. Researchers presented efforts as well in constructing a corpus that is morphologically annotated. In [26], the author successfully constructed a morphologically annotated Emirati dialects corpus that consists of about 200,000 words.

Arabic Dialects Syntactical Analysis
Dialects are different syntactically, the syntax in dialects affected by many factors, the most common factor that affect the syntax of Arabic dialects is the foreign languages. Syntactical analysis for Arabic dialects has been addressed in several research papers. In [27], the author explored the difficulties in Arabic dialects syntactic analysis, the authors proposed an approach for constructing treebank for Tunisian dialects. In [28], the author proposed a method that involves integration between syntactic analysis and morphological tagging for automatic diacritization of the Arabic language. The method is applied through the case and features prediction improvements. In [29], the author presented guidelines used for syntactic annotation for the treebank of Quranic Arabic dependency which is part of Quranic Arabic Corpus. In [30], the author proposed CamelParser which is a syntactic dependency analysis system for the Arabic language. The proposed system can be used for Morphological Disambiguation.

Building Resources
Researchers worldwide have done great efforts on collecting corpus for Modern Standard Arabic MSA, researchers increasingly attracted by Dialectal Arabic, Great efforts as well have been done to collect corpus for Arabic Dialects.

Modern Standard Arabic Corpus Resources
Researchers are increasingly attracted by the Arabic language analysis. One of the most important efforts conducted in the field of ANLP is building resources for the Arabic language. Most of the resources that have been built for the Arabic language are in the form of MSA. In [8], the author stated that almost all available Arabic datasets are for MSA form. Below are some of the researches in which great efforts have been conducted to create corpora of MSA. In [6] the author created two corpora of Modern Standard Arabic text i.e. SANAD and NADiA from Arabic news articles and offered the created corpora as open-source for the public to be utilized for further researches. Moreover, in [31] the author constructed a corpus of MSA that is manually annotated on the sentence level. The corpus was collected from newswire documents. In [32], the author presented AWATIF which is a corpus of MSA that is labeled for Sentiment Analysis purposes at the sentence level. In [33], the author presented noticeable efforts for creating a corpus of MSA from online newspapers.

Arabic Dialect Corpus Resources
Dialectal Arabic involves all the dialects that the population of the Arab World use. Arabic Dialects can be categorized according to the region and similarity into: (1) Gulf Dialects which include the Arabic Dialects Spoken by Arab Gulf people, (2) Egyptian Dialect, (3) Levantine Dialect which involves dialects spoken by the population of Palestine, Jordan, Syria, and Lebanon, (4) North African Dialect which include dialects spoken by Morocco, Algeria, Libya and Tunis people. In [8], the author stated that social media websites are considered as one of the most precious sources of AD as people tend to express their thoughts and opinions in written forms using their dialects. In [34], the author stated that although the Arabic Language has been used in a wide range online, the available Arabic datasets are still limited. Internet World Stats statistics represented that the Arabic Language is the fourth most common language used across the internet. Recently, researchers tend to present efforts in ANLP and especially in creating an Arabic corpus that can benefit researches in the ANLP domain. In [10], the author constructed corpus for Gulf Dialects that made up of 100 million words collected from 1200 forum novels, and this Gulf Dialects corpus called Gumar Corpus. In [26], utilized Gumar Corpus to collect a corpus of Emirati dialects. The Collected Emirati dialects consist of around 200,000 words of Emirati Dialects. In [8], the author created a Dialectal Arabic Dataset that include Gulf Dialects, Egyptian Dialect, Levantine Dialects, and North African Dialects. In [34], the author presented BRAD 2.0 which is an extension to BRAD 1.0 corpus. BRAD 1.0 is a dataset of Arabic book reviews that can be utilized for Sentiment Analysis as well as Machine Learning. While BRAD 2.0 is a dataset that is much bigger than BRAD 1.0 and it consists of more than 600,000 Arabic book reviews written in both Modern Standard Arabic and Dialectal Arabic. The Arabic dialects in BRAD 2.0 dataset are Gulf, Egyptian, and Levantine. In [35], the author successfully constructed a corpus of MSA and Saudi Dialect from Twitter and manually annotate the constructed corpus, and offered the constructed corpus for the research community. The authors named the generated corpus AraSenTitweet corpus, number of tweets collected were 2.2 million tweets while after annotation the remaining tweets are 17,573 tweets. In [36], the author constructed a corpus of Arabic Dialects. The sources for the corpus text are from Twitter, Facebook, and Newspapers comments. The corpus consisted of Gulf, Egyptian, North African, Levantine, and Iraqi dialects. Twitter texts are classified based upon either seed words, or coordinate points. While Comments from Facebook and Newspapers are classified depending on the nationality of the page owner and country of Newspapers respectively. The authors as well presented an online game that is utilized for text annotation. In [37], the author constructed two corpora i.e. News Corpus (NC) and Arts Corpus (AC) both corpora consist of Arabic Dialects texts from Facebook that can be utilized for Sentiment Analysis. From the above, it is clear how researchers are increasingly interested in building resources for Arabic Dialects. In this literature review, the author targeted research papers that are published in the period from 2014 onward. The databases the author utilized are IEEE, Springer, ScienceDirect, ACM, and WorldCat. The keywords mentioned below have been used for collecting the research papers: • "Arabic Dialects" and "lexicon" • "Arabic Dialects" and "dataset" • "Arabic Dialects" and "corpus" The inclusion criteria for research papers: • Must involve constructing resources (dataset / corpus / lexicon) for Arabic Dialects.
• Must be for Arabic Dialect texts only.
• Must be published in the period from 2014 onward.
• Research paper published in journal or conference  Table 1 illustrates some of the research and studies that involve constructing a dataset for Arabic Dialects texts. Table 2 below illustrates some of the research and studies that involve constructing a Lexicon for Arabic Dialects texts. Basic lexicon is manually collected and annotated, then synonym set and antonym set are used for automatic expansion of the lexicon [53] 25086 words Dialect Lexicon Algerian dialect Construct a dialect lexicon then merge two lexicons (a dialect and a sentiment lexicon) [54] AIPSeLEX: 3632 idioms/ proverbs Websites and books

MSA and Egyptian Dialect
Collected and annotated manually at sentence level

Arabic Dialect lexicon
Great researches and studies have been conducted in the field of creating lexicon for the English Language texts that can be used for the NLP domain while a limited number of research papers considered creating lexicon for Arabic Language either in its Modern Standard Arabic form or Arabic Dialects form. In [37], the author defined lexicon as a set of lexemes utilized for text classification. In [49], the author created a lexicon of MSA form of Arabic Language. The created lexicon used for text classification and the accuracy was high and reached around 97% of classification accuracy. In [37], the author successfully developed a lexicon that can be utilized for Sentiment Analysis. In [50], the author utilized 5 datasets for lexicon construction. All the utilized datasets are constructed from Twitter i.e. consisted of tweets that are annotated. the generated lexicon is dynamic as it is updated automatically to include new words. Table 2 below illustrates some of the research and studies that involve constructing a Lexicon for Arabic Dialects texts.

Language Identification: Arabic Dialect Identification
Language Identification involves the automatic identification of the language from speech or text. Researchers are increasingly interested in exploring approaches for dialects identification. Arabic Dialect Identification involves dialect automatic identification either dialectal text identification or dialectal speech identification. Some of the researches and studies in the domain of dialect identification are mentioned below. In [55] authors identified and classified Arabic Dialects text of 25 cities of the Arab world. Results were promising as the accuracy of the developed system was 67.9% for sentences of about 7 words length and 90% accuracy in the case of utilizing 16 words. Additionally, in [71] the author utilized an annotated dataset of online newspaper contents to train classifiers for the identification of Arabic dialects. The proposed system determines whether the given sentence is in Modern Standard Arabic form or Gulf, Levantine, Egyptian, Iraqi, Maghrebi dialects forms.

Semantic-level Analysis
The semantic-level analysis involves Machine Translation and Sentiment Analysis. In this literature survey, the author concentrated on Sentiment Analysis for Arabic dialects.

Sentiment Analysis Literature
One of the most common implementations that involve the use of NLP is Sentiment Analysis (SA). SA involves classifying text to describe whether its expressions are positive or negative. In some cases, the text is classified into positive, negative, or neutral. In [38], the author mentioned that Sentiment Analysis involves the text classification based upon its polarity or emotion. In [17], the author stated that people recently tend to express their thoughts, ideas, and opinions about products, services, etc. on websites, blogs, social media, and many other channels through the web. This massive content generated by users all over the world attracted NLP researchers. In [34], the author mentioned how Sentiment Analysis is important for investigating public attitudes toward product or services, Sentiment Analysis as well can be used for exploring wider public opinions. In [56], the author agreed that online websites and applications recently considered as a valuable source of opinions that can benefit business owners, services providers as well as customers who aim to explore public reviews about different products or facilities, etc. In Arab world, Arab people usually tend to use dialect language in their daily life rather than MSA form. Moreover, Arab people express their ideas and opinions as well thoughts through the web most commonly using their dialectal form of language which results in generating a massive amount of dialectal Arabic texts that are considered a challenge for ANLP researchers.

Sentiment Analysis Approaches
There are different approaches used for Sentiment Analysis; Lexicon-based approach for Sentiment Analysis in which lexicon is utilized, Machine learning approach for Sentiment Analysis, or in some cases, researchers utilized an approach that is a combination of both Lexicon-based approach and Machine Learning approach. In [7], the author mentioned that the Sentiment Analysis approaches are the Lexicon-based approach, machine learning approach, and hybrid approach which is a mix of both approaches. In this literature review, the author targeted research papers that are published in the period from 2014 onward. The databases the author utilized are IEEE, Springer, ScienceDirect, ACM, and WorldCat. The keywords mentioned below have been used for collecting the research papers: • "Arabic Dialects" and "Sentiment Analysis" • "Arabic Dialects" and "Sentiment Analysis" and "approach" The inclusion criteria for research papers: • Must involve Sentiment Analysis experiment study • Must be for Arabic Dialect texts only.
• Must be published in the period from 2014 onward.
• Research paper that is published in journal or conference. Table 3 below presents a comparative summary between the different approaches that are used so far in recent researches and studies for Sentiment Analysis of Arabic Dialects.

. Applications of Sentiment Analysis
Companies, Government authorities, institutions as well show great interest in Sentiment Analysis. In [64], the author explained in detail some of the most common applications of Sentiment Analysis. In the field of business, Sentiment Analysis can be utilized for consumer reviews analysis. Such implementations of Sentiment Analysis witnessed in Google Product search and Amazon websites. Moreover, Business owners and companies value the information retrieved from Sentiment Analysis as it would positively affect their production and help them apply required improvements. On the other hand, in the business field, Sentiment Analysis can be utilized for advertising and commerce online as well as for brand reputation. While in the political field, Sentiment Analysis can be used for monitoring public opinions about government practices and services provided. Sentiment Analysis can be utilized as well in the finance field to monitor financial situations and avoid financial risks. These are some of the applications in which Sentiment Analysis can be effectively used.

Sentiment Analysis of Arabic Dialect
Huge works for Sentiment Analysis have been conducted and targeted the English language, moreover, researches in the field of Sentiment Analysis for the Arabic Language are increasing as well. In [56], the author stated that limited works and researches have been conducted for Arabic Sentiment Analysis due to many reasons such as the morphological complexity nature of the Arabic Language, the requirement for pre-processing, feature representation, spam opinion elimination and handling the negation in Arabic language. The authors explained how the Arabic language has complex morphological nature such as words with different meanings that may have the same root. Sentiment Analysis for Arabic dialects has been addressed by several research papers. Below are some of the researches and studies that investigated Sentiment Analysis for Arabic Dialects. In [7], the author perform automatic extraction of opinions over social media that are written in MSA and Egyptian Dialects, Authors analyzed Sentiment automatically into either positive or negative. In [50], the author successfully generated hybrid system that can be utilized for Sentiment Analysis for Arabic language. The developed system offered high accuracy and great performance as lexicon was generated from five datasets and it intelligently allows for an automatic update to include new words. In [65], the author utilized OCA freely available corpus and generated ARMD corpus, both are for movie reviews analysis. The authors utilized both supervised and unsupervised approaches for Sentiment Analysis, after that, the authors combined both approaches. The authors reported that the hybrid approach in which supervised and unsupervised methods are used offered the best results in terms of precision, recall, and F-measure. In [22], the author stated that the most common classifiers for Sentiment Analysis of Arabic language are Support Vector Machine and Naïve Bayes. Authors found that the hybrid approach for Sentiment Analysis presented the best results in terms of preciseness both at the document level and sentence level. In [1], the author proposed a rule-based stemmer that can be utilized for gulf dialects. The performance of the offered stemmer is better than other algorithms. The offered stemmer as well showed acceptable accuracy. In [38], the author presented valuable efforts in creating a web-based tool that can be utilized for Sentiment Analysis of Arabic text. The presented webbased tool was developed using the R language and it showed good performance in term of Accuracy. To perform Sentiment analysis for Arabic dialects, some important steps should be taken into consideration such as Pre-Processing and Feature Extraction.

Pre-Processing
Pre-processing is a critical step; sometimes it is referred to as normalization and it involves transforming the word into its standard form. In [38], the author define the pre-processing step as the process of cleaning data to reduce errors and improve Sentiment Analysis performance. In [34], the author mentioned that the pre-processing step for the dataset would allow classifiers to efficiently learned the dataset. In [16], the author stated that preprocessing is essential for Arabic Natural Language Processing implementations such as sentiment analysis and summarization tasks. Authors explained that pre-processing for Dialectal Arabic involves the following steps: Tokenization, Remove Diacritics, remove non-Arabic words and letters, Remove Punctuations, replace Arabic Letters ‫,)أ،إ،/،آ(‬ ‫,)ة(‬ ‫)ئ،ي(‬ and ‫)ؤ(‬ with ‫,)ا(‬ ‫,)ه(‬ ‫)ي(‬ and ‫)و(‬ respectively. [56] as well described the pre-processing steps and mentioned that it involves Tokenization, non-Arabic words removal, Normalization, stop words removal, and light stemming. In [7], the author stated that the steps of pre-processing involve Tokenization which means text splitting into separated words, Normalization which involve return letters into the same form, all stop words are removed, and finally words stemming.

Feature Extraction
Sentiment Analysis involves text classification. The classification of texts requires the selection and extraction of text features. Features are the classifier's input. In [39], the author stated that feature selection is the process of extracting features that would affect the classification process. In [38], the author explained how features can be utilized for analyzing raw data. In [66], the author stated that features include part of Speech, frequency, opinion words, and negation. In [38], the author stated that the most common features utilized are N-grams which is frequency (terms presence) features, and the most commonly utilized type of N-grams is unigram followed by bigram and trigram. In [34], the author stated that bigram features consider two words, and these words most commonly come together. Authors mentioned as well that bigram tokens can be effectively utilized for negation detection for either MSA or AD as well. In [55], the author effectively extracted words n-grams and characters n-grams and utilized them as features for AD identification.

Machine Learning Approaches for Sentiment Analysis of Arabic Dialects
Machine Learning techniques have been widely used for Sentiment Analysis purposes for Many Languages. Machine Learning techniques as well have been used for Sentiment Analysis for the Arabic Language. In [22], the author stated that machine learning techniques can be used in sentiment analysis for Arabic text and SVM presented good performance when used for the sentiment analysis of Arabic texts. In [40], the author utilized Machine learning approaches for Sentiment Analysis for Arabic Dialects. The authors utilized different classifiers for Sentiment Classification of a labeled dataset and reported that PA and RR classifiers presented the best results in terms of accuracy, recall, Fmeasures, and precision. However, in [6] the author stated that the utilization of Deep Learning approaches recently for NLP tasks presented better performance and results. Table 3 above presents a comparative summary between the different approaches that are used so far in recent researches and studies for Sentiment Analysis of Arabic Dialects.

Sentiment Analysis for Arabic Dialects Challenges
In [22], the authors mentioned several challenges encountered while working with ANLP such as its complexity. Moreover, fewer works and researches have been done in the field of the Arabic Language compared to English language. In [10], the author presented how Dialectal Arabic does not have a standard orthographic written form. In [3], the author described how dialectal Arabic has no standard written form which results in a lack of NLP tools for Arabic Dialects. In [1]. the author mentioned challenges while working with Arabic dialects; Arabic dialects have no standard written format, moreover Arabic Dialects have complicated morphological forms. In [7], the author agreed that the Arabic language has complex nature; for MSA each word has a root and the task of finding the root for words is not easy and may reduce the accuracy, moreover, the Dialectal Arabic represents the language of different regions, meaning that each Dialect has its collection of words and this would add further challenges to Dialectal Arabic processing and analyzing tasks. In [4], the author mentioned several challenges while dealing with the Arabic language, first, Diacritical may change the meaning of the same word, second, the negation in Arabic may be challenging compared to English language in which negation is presented mostly using the prefix, moreover, the use of dialectal Arabic may present spelling errors since there is no standard written form for Arabic dialects.

Conclusion
Social media attracts people all around the world. Due to the increased utilization of Social Media, a massive amount of written text is generated daily and considered as a valuable source of information that attracted business owners, companies, government institutions, and of course, it attracts researchers and data scientists as well. Natural Language Processing NLP is an important field of science that involves studying and analyzing Natural language texts. Increasing efforts were presented in investigating and analyzing the Modern Standard form of Arabic Language as well as the Arabic Dialects. This literature review aims to explore researches that involve constructing resources for Arabic Dialects and investigate approaches and methods used for Sentiment Analysis of Arabic Dialects text, focusing on machine learning approaches and Lexicon-based approaches.