Q-Learning versus SVM Study for Green Context-Aware Mul- timodal ITS Stations

Article history: Received: 15 August, 2018 Accepted: 07 October, 2018 Online: 16 October, 2018,


Introduction
The ITS network is considered as the most important element of concern to the countries in the modern smart cities era. It is being the arteries that pass through continues chain of business activities, social, cultural and tourism that promote the march of the national economy for its role in providing services to the sectors of production and other services, and is reflected from providing employment opportunities. Therefore, the new ITS construction projects occupy the first initiative in sustainable development programs and projects to achieve higher rates of growth and development through the provision of passenger traffic between all stations or cities of the proposed new development. The governments directions are focused on planning and implementations of future stations that help in the preparations and developments of the countries through intelligent decision mechanisms. Today, the need for an ITS solution becomes crucial to provide a smart solution for managing the traffic on country's roads network. Significant outcomes expected of using ITS are to provide a smooth, smart, and sustainable transport means.
Moreover, distributing the passengers over all the available transportation means (like buses, cars, trains, planes, and ships) can lead to some kind of balance between them and help in solving the problems of traffic jams. This aim can be achieved if there will be an implementation of intelligent systems having clear information about some states statistics like current traffic means available, roads status, passengers per day and pollution's emitted per trip. Those implemented systems can predict the long term situation of city or state transportation system behavior and recommend the future planning for new smart cities. Another important thing is the world climate changes due to transportation activities. Those changes are serious and can be affected by the huge carbon emission levels from different means of transportation and hence the dangerous consequences on the humanity life. This could be an incentive to optimize and reduce the CO 2 emission either from public or private transportation means that will lead the concept of Green-ITS.
Context Awareness (CA) approach can be used to enrich the ITS services as it can adapt the control system dynamics. This shall grasp the potential advantages, which influence the control strategies taken to manage the transportation system especially, in multimodal traffic management. The multimodal traffic station has different crossings of different transportation means like metro,train, bus, and etc.
This paper is an extension of the work originally presented in (ICECTA) conference [1].
Through the work, we improved the previous implementations about advanced ITS traffic flows management [2]. The aim of the previous paper was to choose among different transportation means the best one. We considered the recommendation of the best transport mean from the available three means of transportation in multimodal station with crossing of metro, bus and train. This recommendation is done based on some user and system parameters at any time (t) for any passenger decides to reach a specific destination using the previous means. We used a reward based Qlearning approach to choose the best transport means available in multimodal stations. In the second phase of this work, we propose using Support Vector Machines (SVM) for their reputation in classification accuracy or prediction speed comparing to the previous approach.
The context of SVM foundations had been developed in 1995 by Vapnik [3] and obtained high popularity in different applications due to some attractive and intelligent features, and the promising performance in empirical tests [4].
To conclude our work, the goal is to select among different routes available between two stations the best one for the destination. The selection is carried out by classifying some collected ITS traces about all the possible routes including combinations between different means of transportation at any time (t). Moreover, we considered the influence of new parameters that are missed in the previous work [2] like: • The passenger waiting time at the starting station till the arrival of transportation mean.
• The passenger waiting time if there is a connection(s) between different or same transport means to reach specific destination.
• The measurement of CO 2 pollution from each trip.
Furthermore, the performance evaluation study in this work is done based on real traffic statistics that gathered in real time from SN CF T ransilien [5] and V ianavigo [6] French sites. This data covered some working and week-end days during one month testing period from the French transportation network of Paris area.
The construction for paper is structured as follows: In Section 2, we introduce the related work done in ITS decision-makings analysis and management using Machine Learning (ML) tools. Then, Section 3 presents the Reinforcement Q-Learning model then the SVM for the context aware ITS multimodal stations. Afterward, the evaluations done for the proposed models and their results are compared with different selected use cases of working and week-ends days in Section 4. This is conducted through extensive simulations based on collected real-time statistics from the French SN CF T ransilien site [5]. Finally, Section 5 concludes and highlights the perspectives of this work.

Work Background
An Intelligent Agent (IA) term in ITS domain has been defined as: the agent which receives precepts from the environment and based on that takes smart actions [7]. The IA is considered as the transportation station in our study. This agent has the capability to take decisions based on the gathered reactions from the passengers. Upon received those reactions, a contextaware learning system is used to optimize the taken decisions. Then, it proposes an accurate guides for all passengers crossing this IA (i.e. the multimodal station). Hereafter, we highlight some relevant insights that tackled the intelligent transportation prediction systems based on reported reactions.
The following literature is organized in two sections. The first one reviews some propositions based on using reinforcement and Q-learning techniques [8][9][10][11][12][13][14]. The second section reviews the propositions based on using SVM techniques in ITS. As it seems to our problem context, there is no related work uses SVM learning algorithm to detect the travel time of each transport mean and hence predict the best one at each time (t). As will be seen, the most of these works in this section is relating to detecting the travel time for bus and cars on highways based on SVM. Moreover, the pattern anticipates the vehicles arrival time from the history of the passed routes (i.e. the whole previous traveled routes). The main weak point of this model is the lack of consideration either in backup routes to be used as parallel routes or in emergency situations due to traffic jam. But, this proposal is a good solution for ITS tracking systems.

Q-Learning based Prediction Systems
The work in [9,10] proposed the reinforcement learning model for the traffic-based control. This is for the case of classical pre-timed systems in signal control and the control agents are also independent. Through the model, the authors implemented the learning system for their control agent based on reactions' rewards as a closed-loop control agent. This agent will interact with the environment to converge with the control policy for learning and achieving an optimal mapping for the optimal required control action and the environments state.
Moreover, the authors had been considered in another work [11] the use of the reinforcement learning approach in a decentralized system. They proposed an adaptive real-time agent for traffic signal control system that ables to minimizing the vehicles total travel time. Based on accumulated reward, Q-Learning agent is used to learn the optimal mapping between the environments state in one side and the corresponding control action in the other side.
The Q-learning model for traffic control considered in [11] is based on time varying and stochastic problems in the traffic flow while in [12], authors investigated the snag of agent-based self-optimization against multiple polices. They proposed the use of distributed w-learning reinforcement learning model. The purpose behind this idea applied in self-organized traffic control systems, is the performance improvement from multiple policies deployments simultaneously.
But, the work in [13] proposed a prediction model for the flow of traffic depending on the multi-agent reinforcement learning. This model is based on multicross roads control for the traffic signal.
Finally, authors in [14] concentrated on the coordination between control agents in order to adapt the roads traffic signals. Therefore, they propose to use an independent control agent mode besides an integrated mode merged with reinforcement learning. This model is capable of solving the agents' communication problems. Their results indicate that there is a delay reduction for the connection travel time according to their model proposition.

Classification Systems based SVM
SVM as a supervised machine learning tool that is based on simple principles, originated from statistical learning theory [3]. The SVM simplicity is coming from the applying of simple linear methods on the data [15].
The accuracy as well as simplicity of the SVM models are the key for many contributions based on classification and regression.
In this part, we present the most research directions that used SVMs in ITS domain as follows: The first main concern in this direction using SVMs was focusing on the time prediction for the trip. The authors in [16][17][18] introduced different models based on SVMs to predict the time of each travel on the highway routes. While in [19], the authors proposed their travel time prediction model using the SVM in urban transport networks.
Other contributions considered SVM for the bus travel times prediction. In [20], SVM based hybrid model was introduced for this time prediction. Their technique was based on combining between the classical SVM, the Grubbs test method and an adaptive algorithm.
In [21], a new bus travel time prediction model for multiple dynamic routes is proposed. The model used SVM with Kalman filtering technique.
The authors in [22] proposed their time prediction model for the flow of rail transit passengers in the city of Beijing using SVR to improve the rail transportation management. They used the PLSR: Partial Least Squares Regression method to resolve multi colinearity among the dependent variables.
Last direction in our study focused on the research directions in the traffic signs detection or prediction.
In [23][24][25][26], the authors developed automatic systems for either traffic sign detection or recognition using SVM. Through [26], the authors proposed the use of SVM as a recognize module to analysis the keep-clear signs to enhance the traffic management in cities.
Another work [27] was focusing on using SVM for automatic video based vehicle detection. This is for an automatic traffic surveillance system based on ITS.
In [28], the authors proposed a design for a longhaul bus. This device is used for driving safety warning based on SVM. It is based on classifying the lane departure and forward collision.
To the best of our knowledge, SVM is not used in any ITS based model to nominate the best traffic mean or route for passengers to reach specific destination at time (t) under the context of multi-modal traffic. In the following section, the proposed model will be described and the both learning algorithms as well.

Green Transport Decision System Model (GTDS)
Usually, passenger(s) arrive to the Transport Station (TS) for reaching certain destination. Therefore, they have to decide taking a Route i (Ri) which may includes one or more transport mean at time (t). Here, a model for Green Transport Decision System (GTDS) is proposed for facilitating the passengers travels. GTDS model nominates the best Ri for passenger(s) based on Q-learning. The system nominations rely on passengers reward reactions. In contrasting SVM with Q-learning, SVM is faster and has a higher accuracy. Therefore, in this paper SVM is proposed to be used in the second phase of this work. Figure 1 demonstrated the model main inputs and the collected reactions as system feedbacks. Stations apply the GTDS model should consider having multiple routes to the same destination. This is to provide different nominations at different situations aspects. According to that, the proposed model is applied on the departure station called: (Gare De P aris Montparnasse) and the arrival station called: (Gare De P aris N ord) as a study use case. The model is simulated by different week days activities (i.e. both weekend days and work days). The studied period consists of 25 quarters starting at 7:00 am until 1:00 pm. This period is intended to contain all the expected traffic conditions; either the rush hours or the free from traffic jam situations.
The choice of the best (Ri) is affected by many parameters, the most relevant are determined in the proposed model and illustrated in Fig. 1. The first factor is the passengers travel duration to reach the trendsetting. This factor will be considered taking into account the respect time for each means of transportation shown in Figure 2. The curves demonstrated in this figure have been obtained through the history of each transportation mean by collecting several observations. As indicated, the use of metro is more frequent and comfort than other transport means. The train achieves rank two and finally the bus. For this parameter which reflects respecting the arrival time, we are interested on some statistics which provide a wide sight for how the transport means follow or not the scheduled times arrival. This insight comes from the published SNCF statistics or real time scheduling plan for the transport means on the on-line site [5].
As clear in Figure 2, the best of the transport means in respecting the arrival times is the Metro. Even it may exceed its scheduled arrival times due to the rush hours. Therefore, when a passenger comes to the station and check in to see the best transportation tool to go to his destination D, the decision system inside the station uses the reactions from the station D about previous respecting arrival schedules to update the immediate reward function r when a passenger tries to take a transportation mean. Figure 2: Arrival times respect ratio for the main three transportation means, the metro comes in the first rank followed by local train and then bus as worst respect ratio to arrival on time By doing so, the decision system firstly tries to optimize its future decisions biased from the previous experience along the day.
Secondly, the travel cost which could affect the system decisions. As known, the most of passengers propose the use of the cheapest transport mean.
Thirdly, if there is a waiting time before catching the transport mean.
Fourthly, in the travels of multiple transport means, the connection time is considered as an effecting parameter.
Fifthly, the comfortability coefficient for the use of Ri at any time (t). This coefficient reflects the passenger satisfaction degree of current experience about using any route Ri. The impact is influenced by the available free chairs in rush hours comparing to the number of passengers intend to use this route. Therefore, passengers are more interested of using the more comfortable transport means. The comfortability reaction of the passengers is shown in Figure 3. The mentioned parameters are function of time even the ticket price. In some countries, ticket cost plan changes along the week days (i.e. working days and off days). At present, the major of the world countries became concerned about the CO 2 emission because of the catastrophic effects on the world climate.
In this contribution, we are interested in adding the CO 2 parameter that affects the selection of the transport means and hence the best route. In France, the government encourages the people to use the public transport means to decrease the emission of the CO 2 and hence the air pollution. Moreover, France, during high air pollution days, encourages the use of public traffic means for free, and prevents the even/odd cars plate numbers in Paris area. Passengers Comfortability Feedbacks/Reactions, comfort coefficient equal one (comfort coef=1) represents high satisfaction from the passengers while the low level equal zero for unsatisfied trips

Reinforcement Learning Model
Using reinforcement learning, any T S can learn from its experience to guide passengers and thus, optimize its interactions for the future decisions. The mentioned experience is stated as rewards and sanctions from the prior taken decisions. This model is initialized at each T S benefiting from the collected passengers reactions. When a passenger uses Ri to reach a destination, this Ri will be rewarded or sanctioned according to these parameters: the travel time for reaching certain destination, the passengers comfortability reaction, and finally the travel cost of every Ri. Consequently, each T S will tune its travel means nominations from its gained experience.
In this paper, the T S decision problem is formulated as a Markov Decision Problem (MDP), that can be solved by the reinforcement learning algorithm. The T S will elects the best Ri for reaching a destination biased by its experience at each day hour, or another duration period based on the traffic. Next to that, the system evaluates itself by assessing the decision taken as a reward or sanction. The MDP is defined as a tuple [i, t, a, r].
• i: stands for the object that is considered in this study by the route Ri.
• t: stands for the system states set that are considered in this model by the under-study time shift periods during the day.
• a: stands for the decisions set that taken by the T S. In this work, the decisions of the T S are to nominate the best Ri that can cover the passengers requirements/expectations. Only six possible Rs between the proposed T Ss are considered in our use case.
• r: stands for the immediate reward that the passenger will get according to taking decision a.
A Q i (t, a) is an action-state matrix that is formed to store the gathered reward/sanction for each state and the action pair of a route Ri at a time (t). Giving an example to that, this matrix reflects the predicted reward when takes an action (a) after using the route Ri at a time (t). The updating function of Q i (t, a) can be defined by: Where: • α: indicates the learning rate representing how fast the Q-values vary corresponding to the dynamic users reaction.
• γ: indicates the discount factor. It reflects if the immediate reward status against the future one. For the comparison, high value future reward is more valued than the immediate one.
• r i : indicates the predicted immediate reward when choosing Ri at a time (t).
• max a Q i (t + 1, a ): represents the maximum predicted future reward once the system reaches a state (t + 1) when taking any decision action a.
Furthermore, once a passenger arrives the desired destination via Ri after the scheduled arrival-time, T S will get a sanction reaction. While, in case all the Rs are delayed, bonus is counted to the Rs of the minimum counted delay time. Where: • mt is the reference trip duration of all Ri at a time (t) that represents the minimum travel time of Ri.
• B 1 parameter indicates the travel time due to catching Ri.
• B 2 parameter indicates the ticket price of the Ri comparative to a mean value (mc) that equals to one. It is assumed that, the ticket price plan is fixed during the week-days.
• B 3 and B 4 are bonus values for the Ri of minimum waiting and connection times respectively. These parameters are comparative between all the transport means and takes a value of 1 to the lowermost times up to 0.5 to the uppermost times.
• B 5 is the passengers satisfaction reaction in percentage reflecting the comfortability of using Ri at time (t). The default value equals to 1 as shown in Figure 3.
www.astesj.com • B 6 is an added bonus to the Rs of low delays comparatively between them and on the contrary Ri of a highest delay has no bonus.
• B 7 is an indication to the green transport means which is preferred to be used than others as it cause less pollution. This parameter is a bonus totaled to the Ri of low transport means CO 2 emission.
Passengers reactions/feedbacks about the transport means are collected in real-time as reports. Each passenger is supposed to send his reaction/feedback using a dedicated simple mobile application that will supply the Decision System Algorithm DSA(t) shown in Algorithm 1.

Support Vector Machine Classification
SVM, a supervised machine learning tool, is used for classifying linear and non-linear dynamic systems. SVM decisions can be considered for the classification problems and the regression aspect. The best classification results from finding the best hyperplane fits data separation with error free and maximal distance between the closest vector to that hyperplane. In this work, SVM is used to distinguish among two classes. Class one refers to the best route(s) to certain destination while the other class refers to the alternative routes that are available for use to the same destination.
The aim is to find a data separator between the two mentioned classes whatever linear or non-linear. From theoretical point of view, linear SVM matches our scenario as it is a question of only two classes. Assuming the routes data set (i.e. the collected traces) is D as: Where: • x is the set of training vectors.
• n is the number of days quarters under study that is equal to 25 period.
• y is the classs labels.
And the hyperplane is: Considering a canonical hyperplane [4], where the parameters w, b are constrained by, The following constraints must be considered for a separating the canonical hyperplane, which is the most fitting assumption for non-linear classification, The distance d(w, b; x) of a point x from the hyperplane (w, b) is: For optimal hyperplane, the margin, ρ has to be maximized based on the constraints of (6) and given by: So, the optimal hyperplane, the data separator that minimizes is: www.astesj.com The optimization problem solution of (9) using the constraints of (6) is given by the saddle point of the Lagrange function: where α is the Lagrange multiplier that must be minimized with respect to w, b and maximized with respect to α ≥ 0. This dual problem is given by: To get the minimum w and b of the Lagrangian, φ is given by, So, from (10), (11), and (12), the dual problem expressed by: and the solution of this problem will be given as follows: α k (14) where α i ≥ 0, for i = 1, · · · , l. By solving (14) according to its constraints defines the Lagrange multipliers then, obtains the optimal separating hyperplane as well as follows: where x r and x s are the support vectors from each class satisfying: α r , α s > 0, y r = 0, y s = 1 (16) and the hard classifier is: The input vectors classification that refer to the routes parameters can be solved as a nonlinear problem that fits our case as well. This depends on the input vectors distribution. For a general form in high dimensional feature space, lets suppose SVM maps the input vector x into a high dimensional feature space, z. So, the optimization problem becomes: α k (18) where K is the kernel function for non-linear mapping into feature space with the same limitations, which are: where C is a regularization parameter of the kernel function and represents the upper bound on the permissible values. Solving (18) with its constraints in (19), resulting the Lagrange multipliers, and a hard classifier as follows: where:

Performance Validation
This section demonstrates the proposed GTDS model performance evaluations. It is evaluated based on using Q-Learning and SVM learning methods. Matlab 2012 tool is used for the simulation part with the following conditions: • The model use case under study considers Paris region in France. The departure station is (Gare De P aris Montparnasse) and the arrival destination station is the (Gare De P aris N ord) as the most two famous stations in this Parisian region.
• The routes Rs available between the source and destination stations are many but, we processed and filtered them to form six routes only of closed results for simple comparisons.
• Time shift under study is divided into two sets. Set 1 is representing the two days of the weekend (i.e. Saturday and Sunday) and set 2 is representing the two selected working days which are Monday and Friday.
• Time shift under study starts at 7:00 AM till 13:00 PM • The ticket price of Ri is supposed to be unity and its value is redeemed comparatively based on the transport mean used. The route cost is supposed to be the same as this fits the case in France that depends on dividing the destinations to zones.
• The collected traces of travel duration, waiting and connection times are gathered from the realtime schedules of the SN CF T ransilien official site [5].
• The passengers comfortability reactions/feedbacks of the transport means understudy follows the chart in Figure 3. This comprises many parameters such as the capacity, the design satisfaction, the rate of flow (periodicity), and the ease of catching each of them.
• The actual CO 2 emission is measured in gram per person of each Ri and gathered from V ianavigo site [6].
• Number of stops: It is dynamic parameter. Its value depends on the route combinations that may comprise a combination of different transport means.
The simulations are categorized in two parts; part one is concerned about the SVM learning algorithm, and part two focuses on the Q-learning algorithm. In part one, the used traces of the determined source and destination are gathered for two weeks just for a proof of concept from [5]. We considered the first week traces as the training set and the second week traces as the testing set.
After the validation of SVM, the obtained testing set results are shown in Figure 4 and Figure 5. They illustrate the SVM accuracy calculated for both the week-end and working days respectively.
As illustrated in these figures (Figure 4 and Figure  5), the SVM classification accuracy is looking good in week-end days but has less accuracy in the working days. These are anticipated results as working days may have rush hours that influence on choosing the best route. Despite the results disorder, the system performs steady as the error is so small that is closed to one over six. Figure 4: The output SVM system testing accuracy during weekend tested days, for the X-axis is represented the time shift from 7:00 AM to 13:00 PM as testing period Figure 5: The output SVM system testing accuracy during working tested days, for the X-axis is represented the time shift from 7:00 AM to 13:00 PM as testing period The second simulation part is concerned to the Qlearning machine results of choosing the best route Ri. The same traces and system settings are used for the Q-learning simulation. This simulation considers two use cases as follows: A) The week-end days: it concerns to the delay of using each transport mean on one of the week-end days (Saturday) as shown in Figure 6. The use of bus compared to Metro and Train has the largest delay at the same day quarter although roads are not easily suffering of congestion during the week-ends. Moreover, the metros are not far from buses statistics due to their limited circulations in the weekend days. B) Working-days: this simulation concerns to one working day like Friday traffic situation. The highest rush hours of the day are simulated. Figure 7 shows each transport mean delay. As clear in the figure, the use of the bus causes the largest delay with respect to Metro and Train at the same day quarter due to the traffic jams on normal routes.     The results shown in Figure 8 and Figure 9 demonstrate the Q-values which stands for the rewards of the optional routes of each day quarter on Saturday and Friday respectively. As clear in the figures, the best route decision changes with respect to the day quarter time and routes conditions. These routes are composed of different two or more traffic means. Therefore, these figures illustrate that, the recommended routes to be used along the studied day hours depends on the conditions and delays for each route which is composed of different connections between the three-transport means (metro, bus and train). Figure 10 and Figure 11 show the best two routes suggested by the two proposed machine learning systems (i.e. SVM and Q-Learning) for Saturday and Friday respectively. As appear in these figures, each learning algorithm decision is various due to the difference of the learning technique of both algorithms. Also, due to the difference of the input parameters weight.And finally, because SVM is a supervised learning model but, Q-learning is based on reinforcement learning.
The main purpose from these two figures is to clarify the criteria of each algorithm output and shows that each algorithm may fits according to certain applications. If the ITS system is merely in stable with no fluctuates, SVM would be a perfect fit for these scenarios. In contrast, Q-learning could suitable for infrequent cases such as accidents. This is because of Q-learning depends on conditions of the current collected data in addition to its decision history of the previous day while SVM depends on the training sets.

Conclusions
The framework of route selection in the ITS multimodal stations is investigated in this paper using context awareness decision model based on machine learning tools. Through this work, we used two different models; the Q-Learning as a reinforcement model and the SVM as a supervised model. In the first phase, we built the Q-Learning to select among six studied routes the best one. After that, we used SVM model in order to first classify the proposed routes, second to calculate the model's accuracy, and third to predict the best solution for the upcoming routes. The obtained results from the simulation part of both approaches indicated the advantages and disadvantages for each technique and when (time during the day or day during the week) or where (according traffic status) we can use them in ITS stations.
Actually, Q-Learning approach is best choice that can be used when the ITS system is not stable and subject to many disturbances while, SVM model is better used when the ITS system is stable and the variations over the time in data sets are steady. That is why we recommend using hybrid solutions to face such cases when searching the optimal classification based on different time intervals and with green parameters. On the positive side, ITS will enable the Green Transport Decision System (GTDS) to decide based on contextsensitive information the best route, to improve the efficiency of multimodal stations usage and to reduce the environmental impacts by including the CO 2 effects.
In the future work, it will be more realistic to focus on three dimensional model approach that will consider in addition to user-location; the time. This spatio-temporal model can be applied in different universal applications. But, the problem is the difficulty of having such kind of statistics as it is subject to country regulations related to security and privacy issues. However, it can provide a new generation of traffic flows modeling that will add dynamic, predicative and adaptive control for multimodal traffic.