Decision Making System for Improving Firewall Rule Anomaly Based on Evidence and Behavior

,


Introduction
A firewall is an indispensable system for today's computer networks. It plays an important role to prevent access to various resources on the networks, for example, networking devices, databases, web servers and etc. Besides, it can also prevent attacks and intrusions by malicious users from dark side networks. Basically, the firewall is commanded by a set of instructions, called the rule. The number of firewall rules depends on the complexity of the policies of each organization. As the number of rules increases, the number of anomaly rules is also enlarged. The anomalies arise from two or more rules overlapping but having different decisions. There are five types defined by [1]: the shadowing, correlation, generalization, redundancy, and irrelevancy anomaly. Recently, a new definition of anomalies has been advanced, which is the semantics loss of rules [2]. The rule anomalies have a great impact on the overall performance of a firewall. That is, they decrease the processing speed of rule verifying or matching. Therefore, reducing the number of anomalies improves the speed of the rule verification as well. Analyzing, managing and resolving anomalies are big problems of firewall researches and attract much interest. The first major researcher about the anomalies was Al-Shaer, who presented five types of anomaly rules and also invented an effective method for detecting anomalies, called the finite state diagram [3]. Later, several researchers contributed various methods for detecting anomalies, for example, in [4] the authors revealed a powerful algorithm to relieve the root cause of anomalies, called SDD. However, the rules are only one type either "an acceptable status" or "an unacceptable status" appearing on the firewall. Next, in [5] they proposed a method to resolve the anomaly problems effectively by using the firewall decision diagram (FDD) and this is also the prototype of much other later research. The propositional logic model was presented by [6], they claimed that their model could remove anomalies that lead to a decrease in the number of rules without changing the policy. Next, in [7] authors demonstrated the anomaly management framework which encourages systematic detection and resolution of firewall policy ASTESJ ISSN: 2415-6698 anomalies based on the average risk values, called FAME. The risk values are calculated from the Common Vulnerability Scoring System (CVSS) [8] which does not consider attacking vulnerabilities in the overview, but it is considered only the point. The next proposed solution for analyzing and managing the firewall policies was Lumeta [9] and Fang [10]. They represented the tools that are used to analyze firewall rules, but tools cannot completely verify the misconfigurations policy settings. In [11] the authors contributed an algorithm for detecting and resolving conflicts in packet filtering. Though, the algorithm can only detect certain specific conflicts. The firewall rule optimization based on Service-Grouping was proposed by [12]. The basis of this technique is resolving the conflicts segment by grouping rules from work behaviors. They claimed that the processing time and number of packet hits are better than the traditional and FIREMAN [13] firewall. FPQE [14] is an automated system to resolve rule anomalies, which does not require any admin intervention. It uses an automatic rule removal in the case of redundancy and contradiction anomaly, and uses an automatic rule permutation against shadowing and correlation. Besides, some techniques allow the firewall to automatically detect and analyze conflict rules such as [15] and [16], but they are not based on real tangible evidence. By most methods, the burden of resolving rule conflicts is often given to the administrator's discretion instead.
This paper contributes the model for optimizing firewall rule anomalies by applying the probability (Bayesian) together with evidence of each rule, i.e., frequency of packets matching against rules, evidence of creating rules, the expertise of rules and protocol priority. This model provides guidance to firewall administrators to resolve rule anomalies with probability values. In order for administrators to be confident that the decision is more accurate based on the actual evidence. This paper is organized as follows: Section 2 overviews the background and related work. Section 3 presents the key contributions. In Section 4 articulates our system design. Section 5 addresses the implementation of details and evaluations. Section 6 concludes this paper.

Rule Definition and Anomaly
Generally, the firewall rule consists of two parts: the condition and decision parts. Let R be a firewall rule, C as a condition part and A is a decision part, a firewall rule format: In fact, firewalls always have more than a single rule. Therefore, the first equation (1) needs to be revised to the second equation:

= →
(2) where and are the condition and decision of rule (Any firewall rule) by ∈ [1, ], and n is a non-negative integer. Given representing the domain of positive integers is a finite range, denoted ( ) . For example, the domain of the source and destination address in an IP packet is [0, 2 32 -1] ( ( 1 ) and ( 2 )), source and destination port is [0, 2 16 -1] ( ( 3 ) and ( 4 )) and protocol is [0, 2 8 -1] ( ( 5 )). defines a set of packet fields over the fields 1 through specified as 1 ∈ 1 ∧ 2 ∈ 2 ∧ … ∧ ∈ where is a subset of ( ). is either accept or deny for each rule. If all conditions ( ) in are true, the decision is either accept or deny depending on the specified administrators like: Given as an IP packet over the d fields 1 , ..., , is a tuple of ( 1 , 2 , ..., ) where each (1 ≤ ≤ ) is an element of ( ). An IP packet ( 1 , 2 , ..., ) matches if and only if the condition 1 ∈ 1 ∧ 2 ∈ 2 ∧ … ∧ ∈ . A set of rules ( 1 , … , ) is valid when there is at least one rule in the set matching against . To make sure that firewall rules are working properly, the condition of the final rule in the firewall is usually specified as 1 ∈ ( 1 ) ∧ … ∧ ∈ ( ), where every packet must be matched as shown in 3 , called the implicit rule. The set of rules below shows an example of three rules over the three fields of condition 1 and 2 are redundant because any packet can match both rules which have the same actions (accept). Furthermore, 1 and 2 also conflict with 3 because both 1 and 2 are subsets of 3 while they are different actions. One typical solution to resolve such conflicts, that is, firewalls choose a rule which matches with the packet being considered first, called the first-match approach. The firewall rules anomalies can be classified into six [1], [2] types representing each anomaly by the theorems:

Shadow anomaly:
is shadowed by , if and only if their intersection is equal to and there are different actions illustrated in Figure 1(a).
is a database of all rules, and is the rule executed before .
Correlation anomaly: and in are correlated if their intersection is not equal to ∅, − ≠ ∅, − ≠ ∅, and they have different actions represented in Figure 1 is a database of all rules, and is the rule executed before .
Generalization anomaly: is generalized by , if and only if their intersection is equal to , and there are different actions (Figure 1(c)), where is the rule matched before .

Redundancy anomaly:
is redundant to , if and only if their intersection is not equal to ∅, and they are have same actions (Figure 1(d)).
where is an IP packet executed by the firewall.
Semantics loss anomaly: The semantics loss represented by [2] occurs when and are merged to by the meaning of both old rules that have been changed or replaced by a new meaning. This anomaly is mostly caused by redundant rules as shown in Figure 1(f).

Min-Max Feature Scaling
Min-Max feature scaling [17] (also known as data normalization) is the standard method used to adjust the range of data. Since the range of data values may be very different, it is therefore a necessary step in data preprocessing before processing in the next step. It is normally used to resize any data range into the range [0, 1], called unity-based normalization. Also, it can normalize the finite range of values in the dataset between any arbitrary points and as the following equation.
Let ′ denotes the value being considered has been normalized by ∈ [ , ] . and denote the minimum and maximum of the measurement range. and are the minimum and maximum of the target range to be scaled.

Bayes' Theorem
Bayes' theorem [18] (also known as Bayes' rule) is a formula that describes how to update the probabilities of hypotheses when given evidence. It is a useful tool for calculating conditional probabilities. Bayes' theorem can be defined as follows.
Let 1 , 2 , … , be events that partition the sample space , i.e., and ∩ ≠ ∅ when ≠ and let be an event on that space for which ( ) > 0. Then Bayes' theorem is: This formula can be used to reverse conditional probabilities. If we know the probabilities of the events and the conditional probabilities ( | ) , = 1, … , , the formula can be used to compute the conditional probabilities ( | ).

Moving Average (MA)
A moving average (MA) [17] is a widely used indicator for analyzing data trends. It helps smooth out data action by filtering out the disturbance from short-term data fluctuations. There are two types of basic moving averages, which are popular and widely used, namely Simple Moving Average (SMA) and Exponential Moving Average (EMA). SMA calculates an average of the last n data, where n represents the number of periods for which we need the average. = 1 + 2 + 3 +...+ (10) where is an average in period n, and n is the number of periods. EMA is a weighted average of the last n data, where the weighting decreases exponentially with each previous data per period. In other words, the formula gives greater weight to more recent data. The formula for the exponential moving average is the following.
where = EMA today, = Value today, = EMA yesterday, s = smoothing, d is the number of day.

Converting an IP Address to a non-negative Integer
The Internet Protocol Address (as known as IP Address) is a unique address that networking devices such as routers, switching, and computers use to identify themselves and communicate over other devices in computer networks. An IPv4 address (IP version 4) is equal to 32 bits, ranging from 0 to 2 32 -1 address space. It is usually divided into four parts, each part (8 bits = an octet) separated by a dot, e.g., 1 IPv4 address can be converted to any nonnegative integer with the following equation.
4 ′ is a new IP address to be converted, for example, 1.2.3.4 will be convert to:

Arithmetic Mean and Kappa Statistics
We use the average method ( ̅ ) to evaluate the administrator's satisfaction with the proposed firewall and use the Cohen's kappa coefficient (̂) [19] to measure the interrater reliability as the following equations. ̅ = 1 ∑ =1 (13) where ̅ is an average (or arithmetic mean), n is the number of terms (e.g., the number of items or numbers being averaged), and is the value of each individual item in the list of numbers being averaged. ̂= where: ̅ denotes the relative observed agreement among raters, ̅ denotes the hypothetical probability of chance agreement as in the equation 15 and 16.
where is the number of raters, the objective of the assessment is , is the type of information that needs to be evaluated (e.g., most satisfied, very satisfied, ..., least satisfied), and denotes the observed cell frequencies.

Key Contributions
As rule anomalies occur over firewalls, the decision-making power to resolve the anomalies mainly depends on the administrator's discretion. However, the decisions made often result in errors or loopholes over the existing rules, if admins cannot entirely understand the relationship between conflict rules. Therefore, it is necessary to develop a decision support system for admins to assist decision-making during real-time anomaly detection. The system consists of four procedures: 1) Firstly, preparing various information to be ready before processing, 2) Analyzing and detecting the rule abnormalities by the Path Selection Tree (PST), 3) Calculating the probability (Bayesian) of each rule based on the frequency of packets matched against rules, evidence of creating rules, expertise on creating rules, and protocol priority to help admins decide before optimizing the rules, 4) Lastly, optimizing anomaly or conflict rules based on the probability.

The System Design
There are four steps in the system design as shown in Figure 2.

The conditions ( ) and decision ( ) of each rule:
Referring to in the equation (2), in general, the members of have five fields ( 1 ∧ ... ∧ 5 ), where 1 = source IP address (SIP), 2 = destination IP address (DIP), 3 = source port (SP), 4 = destination port (DP) and 5 = protocol (PRO) respectively as shown in Table 1. According to 1 of Table 1, the preparation process of firewall rules begins with converting the IP addresses of 1 and 2 into a range of positive integers by equation (12). Hence, 1 and 2 are then converted into the following numbers:

Calculating Probability of Extra Fields of Each rule:
To determine the probability of each rule in this model, there are four additional fields added including the frequency of packets matching against rules (FPM), evidence of creating rules (ECR), the expertise of rules creator (ERC) and protocol priority (PRI). Let ( 1 ) , ( 2 ) , ( 3 ) and ( 4 ) are the probability of FPM, ECR, ERC, and PRI respectively. Therefore, the sum of the probability of rule is equal to the equation (17).
(17) Where ( ) is the probability of . For example, the information of extra fields of as shown in Table 2. From 1 in Table 2., matching rate (FPM : 1 ) between packets and 1 is equal to 2,125 times. 2 (ECR), 3 (ERC) and 4 (PRI) are 3, 2 and 4 respectively, explained more details in the next section. These four extra fields are calculated to be the probability pattern in which data is in the range from 0.0 to 1.0 ( 1 ′ by the Min-Max Feature Scaling in the equation (8).
In the case: it is the frequency of the packets matching against any rules over the firewall, the counting process starts from the time at which a rule had been created and continues until the present time. For example, if the maximum and minimum number of matching of any rules in the firewall are 5,000 and 1,200 times respectively, then 1 ′ here is equal to: where m = 2,125, = 1,200, = 5,000, = 0.0 and is equal to 1.0. However, recording 1 in the firewall requires the equation (10 and 11) to make the data smoother since the recorded data may be a swinging data caused by network attacks, user behaviors, or network usage during rush hours, etc. The period for calculating data with EMA method will depend on the suitability of each organization. For this research, 1 is recorded every hour per day as in the following example: Given 1 of 1 for each hour per a day: 1300, 1500, 1200, 1300, 1400, 1500, 1800, 4500, 6000, 6300, 5500, 1000, 2400, 2800, 2600, 2600, 2400, 1900, 1500, 1200, 1000, 800, 700, 600 times, then it can calculate the EMA of 1 using five hours in the past as follows.
SMA of 5th hour = Calculating the EMA for twenty-four hours as shown in Figure  3, the results will be calculated with SMA again. In order to find the average value of each day, which the calculated value will be recorded as 1 ′ in Table 2. To calculate the average value of 1 of each day as follows.   In the case: it refers to documents or pieces of paper used to confirm that such rules are approved to create them. In this paper, for example, the evidence for creating rules is divided into four levels: there is no evidence for approval, a firewall administrator is an approver, the head of the department is the approver, and approval is made by the owner of the organization. By dividing the weight of evidence according to the priority of the document approver: no evidence = 0, an administrator = 1, a head of the department = 2 and an owner of the organization = 3 respectively. If the weight of the document obtained is calculated by the Min-Max equation (8), the result will be the 2 ′ . Let the owner of the organization be approved to create the rule 1 , the result of the calculation is equal to: In the case: similar in the case of evidence for creating rules, the expertise in creating rules is also divided into four levels: newbie admins, normal admins with sufficient expertise, professional admins and very expert administrators. The newbie admins mean those who have recently been assigned to configure the firewall system and have the least experience. When they have configured the firewall for a while, they will be more proficient, which should have at least 3 -5 years of working hours, called normal admins. For those who have a lot of experience and training or firewall customization, with working hours of 5 -10 years, they will be professional administrators. Finally, those who have received a lot of training and certificates about firewalls will be considered expert admins. From the statistics, those who are very skilled will be able to analyze and design firewall rules to minimize errors as well. In this paper, therefore, determines the weight of the following expertise 3 : newbie = 0, normal = 1, professional = 2 and very expertise administrator = 3. Let professional admins create the rule 1 , the result is calculated as. In the case: the protocols communicating on the computer networks are always prioritized, such as video conferencing, and must be smooth throughout the meeting. On the other hand, sending electronic mail does not have an urgent need to be sent or received immediately. The protocol prioritization can be done depending on the policies of each organization. In this research, prioritization of the protocol is based on priorities from 3 GPP QoS Class Identification QCI categories [20] by IP Multimedia singling having the highest priority (1 = highest); Chat, FTP and P2P having the lowest priority (9 = lowest). From 4 in Table 2, it is a teleconference application with a priority of 4.When processing in the form of probability using the Min-Max Scaling, the result is equal to: Where m = 6 (Teleconference = 4), = 1, = 9, = 0.0 and is equal to 1.0. Notice that the priority of the protocols calculated must always reverse priorities, such as from 9 to 1 and from 1 to 9. For example, the priority of 4 is reversed to 6. Last, in Table 3, 4. represents examples of firewall rules consisting of all anomalies as previously mentioned, and these rules will be processed in the next step. Extra fields of each rule, when passing the data preparation process, it will produce the following results → ′ .

Analyzing and Detecting Anomalies (Step 2)
In this phase, the rules from the 1st step are used to build a tree structure, called the Path Selection Tree (PST), to analyze the anomalies. The algorithm begins with the creation of the root node of PST. After that, field 1 of the first rule is created as the first node on the tree, namely 1 as shown in Figure 4(a). In this node records source IP addresses of 1 to be < 1 :[1, 100]>, where 1 ∈ [1, 100]. The next node ( 1 ) stores the range of destination IP addresses ( 2 ) of 1 ranging from 1 to 100. Next, it is the node that records source port ranges from 0 to 65535, called 1 . The next node as 1 , this node contains a group of destination ports 4 between 80 and 85 (< 1 : [80, 85]>). The final field 5 of 1 as  In the next order, the second rule 2 is imported into PST as illustrated in Figure 4(b). Firstly, 1 of 2 ⊂ 1 of 1 , thus 2 ( 1 ) uses the same route as 1 ( 1 ) and also records < 2 :[10, 50]> into the 1 node. Likewise, 2 ( 2 ) ⊂ 1 ( 2 ), it is also recorded to the same node ( 1 = < 1 :[1, 100], 2 :[20, 60]>), and travels over the same route as 1 . Similar to 2 ( 3 ), it is equal to 1 ( 3 ), hence 2 ( 3 ) is appended in the 1 node to be < 1 , 2 :[0, 65536]>. In case of 2 ( 4 ) and 1 ( 4 ), 2 ( 4 ) is a subset of 1 ( 4 ), so the data of 1 is updated as < 1 :[80, 85], 2 :[80, 80]> as well as 1 updated to < 1 , 2 :{6, 17}>. On the other hand, the decision of 1 and 2 are not the same, so the decision path must be separated from each other, where < 1 > = 1, < 2 > = 0. For inserting 3 (Figure 4(c)) into PST, there is not much difference from inserting 2 , it is slightly different in the position of the protocol level in the tree. Since 3 ( 4 ) is a superset of 1 ( 4 ) and 2 ( 4 ), some destination ports of 3 ( 4 ) have to be separated into another node of the tree, namely 2 , which stores the destination ports ranging from 86 to 90 ( 3 ( 4 ) -1 ( 4 )) like < 3 : [86, 90] >. Remaining destination ports are combined with 1 in the first path together with 1 and 2 as < 1 , 3 :[80, 85], 2 :[80, 80]>. The decision of 3 is not allowed in both paths, where < 3 > = 0. Remaining firewall rules ( 4 , 5 , 6 ) will be executed like the previous rules ( 1 , 2 , 3 ). If all rules have been implemented successfully over PST, the results are shown in Figure 5.
In the process of checking the rule anomalies, the algorithm uses the information recorded on each node to detect anomalies by using the Cartesian product of all nodes separated from the protocol layer ( ) and looking back from the bottom to the root as follows.  (7) to find out what kind of anomalies they are. For example, in the equation (18) of group 1, ( 1 , 5 ) has the same decision (decision = 1), so it is executed by the equation (6). The result of the execution is a redundant anomaly. Next example, in equation (19), they consist of ( 2 , 3 ), ( 2 , 4 ), ( 2 , 6 ), ( 3 , 4 ), ( 3 , 6 ) and ( 4 , 6 ) by every pair of rules has the same decisions, thus all is executed by the equation (6) (6)), ( 3 , 5 ) = Generalization (5), ( 5 , 6 ) = Generalization (5).
Losing the meaning of rules always occurs by redundant rules, thus all members in equations 18, 19 and 21 are possible to be the semantics loss as well.

Calculating Probability of Each Path of PST (Step 3)
The PST obtained from the previous steps is used to calculate the probability of each path in order to advise administrators to make decisions about optimizing firewall rules effectively, which has the following steps. Let R as a firewall rule, e as an attribute field of a rule, and S is a sample space, then the conditional probability of R given e is the equation (24) and shown in Figure  6.
, =1 (28) Given is any property considering when giving ( ) on the firewall. Finally, we can substitute this into Bayes' rule (Equation 9) from above to obtain an alternative version of Bayes' rule, which is used heavily in Bayesian inference: From examples of the property fields (Extra fields) in Table 4., there are four fields  In case of ( ′ ): ( In case of ( ′ ): In case of ( ′ ): In case of ( ′ ): In case of ( ′ ): In case of ( ′ ):

Optimizing Rule Anomalies (Last step)
Anomalies occurred over firewall rules have different solutions, for example, the redundant anomaly is solved by merging the rules together. However, this method may result in semantics loss instead. Other anomalies such as shadowing, correlation, and generalization, should not use the merging method because their decisions are different. Sometimes, administrators choose to resolve problems by switching rules, but they are not sure what will happen in the future. Therefore, this research uses the calculated probability in each rule to help administrators decide how to proceed with anomalies to achieve maximum efficiency and reasonableness. For example, the path number 1 of Figure 5., 1 and 5 are redundancy. If admins decide to combine the two rules together, the result is. The property fields of 1 : 1 = 2500, 2 = 1, 3 = 2, 4 = 6; and where 1 as a new merged rule,  as a merging operator for the same decisions, is a function calculated the maximum value. In the same way as ( 2 , 3 ), ( 2 , 4 ),..., and ( 3 , 6 ), which are a redundant conflict, so they can solve the problem by combining rules like ( 1 , 5 ). The methods of resolving the remaining anomalies (Shadowing, Correlation, and Generalization) can be done in three ways: merging, swapping and removing rules. Nevertheless, admins must be highly skilled and aware of the consequences, almost all researchers do not recommend using these methods and often push the burden to the discretion of administrators instead. If the admins choose one of three methods, they can do by checking the probabilities of each rule. If the probability of any rule is the highest, it means that admins have the opportunity to resolve anomalies to be more effective. For example, ( 1 , 2 ) is the shadowing anomaly. If admins need to delete, merge or swap rules, they should give priority to 2 rather than 1 because 2 is a higher probability ( 1 = 0.157, 2 = 0.194) as shown in Figure 9. Updating property of 1 and 2 is not necessary in case of swapping and deleting rules, but in the case of merging, there are the following details.

Implementation and Performance Evaluation
PST uses the k-ary tree structure (also known as m-ary or k-way tree) to develop, so the processing speed to build the tree is ( ), where n is the number of nodes of the given k-ary tree. The number of levels of the existing k-ary tree is L, the depth of the kary in the worst case is − 1, where N is the number of nodes in a tree. The k-ary tree can also be stored in breadth-first order as an implicit data structure in pointer-based, each node would have an internal array for storing pointers to each of its m children. So, the space complexity of k-ary tree structure is ( × ) . Traversing the k-ary tree is very similar to binary tree traversal. Besides, the worst-case time complexity is ( ) . In practically, PST is developed on Intel Core i7 2.3 GHz, 8 GB RAM. The software developed includes Python language version 3.7 (64-bit), Graphviz [21] and NetworkX version 1.11 [22] running on Linux kernel (Version 4.4). The proposed model is illustrated in Figure 10. In this paper, we used ̅ (Equation 13) to evaluate the satisfaction of the firewall administrators for resolving firewall rule anomalies of both traditional (No recommendation system) and our proposed firewall (Recommendation system with probability). In which of the confidence test consists of ten scenarios to resolve anomalies and the total number of testers (Firewall expert) is five as shown in Table 5.
Referring to Table 5, the average ( ̅ ) of the five administrators' confidence for resolving ten scenarios of rule anomalies based on their skills for the traditional firewall is equal to 2.68; however, the average confidence of our proposed firewall which is a recommendation system based on probability is 4.16, which the confidence rate of the proposed firewall increased by 29.6% from the conventional firewall. In the case of evaluating reliability between raters, we apply Kappa statistics [19] in the equation (14) with the data from Table 5. The reliability value between the interraters of the conventional firewall is equal to 0.379, which means that the reliability is at a fair agreement as shown in Table 6. The reliability value between the assessors of the proposed firewall is equal to 0.510 (Moderate agreement), which means that the reliability increased from 37.9% to 51% significantly. Our proposed firewall (Use the recommendation system) Admins judged by probability 5 0 1 2 2 Remarks: the level of satisfaction is divided in to five levels: 5, 4, 3, 2, and 1; 5 is the highest confidence.

Conclusion
Practically, fixing anomalies of firewall rules is quite complex, depending on the administrator's perspective and experience. Correcting mistakes may lead to other anomalies. For example, when resolving the redundant anomaly, it may become the semantics loss of rules. In order to reduce the impact of errors and to resolve anomalies of administrators, this paper has designed and developed a system to assist in the decision-making of administrators by using probability together with four additional features of rules: frequency of matching between packets and rules, evidence of creating rules, expertise of rules creator and protocol priority. For each rule, the probability is calculated based on their features. If the probability of any rule is high, it indicates that the rule has a high priority. While rules in the firewall are conflicts, the rule that has a high probability value is always considered first. As a result of system testing, administrators can make more accurate decisions about conflict rules in the firewall. For the overall efficiency of the system, the time complexity of creating a system (PST) is equal to ( ), searching time over PST is ( ) and the space complexity is ( × ). However, the system still has a limitation against the establishment of the tree structure. As resolving any anomaly of rules in each period, it needs to reconstruct the whole PST tree structure. For the evaluation of confidence for resolving firewall rule anomalies, the firewall that we have proposed on the basis of probability obtains a confidence value more than the traditional firewall by 29.6%, and the reliability between Inter-raters of proposed firewall is in the moderate agreement (0.51), which increased by 13.1% from the traditional firewall.