Logic Error Detection System based on Structure Pattern and Error Degree

The importance of programming skills has increased with advances in information and communication technology (ICT). However, the di ﬃ culty of learning programming is a major problem for novices. Therefore, we propose a logic error detection algorithm based on structure patterns, which are an index of similarity based on abstract syntax trees, and error degree, which is a measure of appropriateness for feedback. We deﬁne structure patterns and error degree and present the proposed algorithm. In addition, we develop a Logic Error Detector (LED) Application Programming Interface (API) based on the proposed algorithm. An implementation of the proposed algorithm is used in experiments using actual data from an e-learning system. The results show that the proposed algorithm can accurately detect logic errors in many programs solving problems in the Introduction to Programming set.


Introduction
The Fourth Industrial Revolution has increased the demand for engineers [1] [2] [3]. Until recently, computer science was learned by university students, often in engineering fields. Because of a shortage of engineers in many countries, attempts have been undertaken to introduce computer science into secondary school education.
A study suggested that the high school ICT curriculum should focus on computer science [4]. A report by the Royal Society of the United Kingdom emphasized the importance of computer science skills in primary education [5]. The report pointed out problems in ICT education, such as those in class content and the loss of opportunities for students. In addition, it proposed to redefine ICT into three areas, namely computer science, information technology, and digital literacy, among which computer science is a particularly important area. The movement to implement programming education is accelerating globally.
For introducing programming in school education, many problems have been identified and various teaching methods have been proposed [6] [7]. A report summarizing three years of compulsory programming education in the UK published in 2014 found that it was patchy and fragile [8]. The report briefly discussed the difficulty of programming education.
A student's inherent interest in computer programming affects learning. Learners with an interest in programming can effectively learn in large groups with a single teacher, as is done in the university. Such learners can solve problems by investigating them or asking questions. However, traditional teaching methods may be unsuitable for novices and learners who are not interested in programming. One of the most effective ways for novices to learn is one-on-one learning with a teacher. This is especially true for programming because programs that accomplish a given task can be written in various ways. Teachers must thus strictly evaluate the program written by a learner.
Programming includes complex language constructs and requires logical thinking, similar to that used in mathematics. Programmers must consider algorithms, memory space, execution time, and code size. However, exhaustive and quantitative evaluation of programs by a human teacher is difficult. In addition, programs may not work as expecting due to various bugs. It is difficult even for experienced teachers to deal with every bug.
To improve this situation, it is necessary to assign a knowledgeable teacher to several novice programmers. However, this is not possible in classes that introduce programming due to a lack of human resources. To solve this problem, methods for estimating the difficulty of programming problems using cluster analysis and problem recommendation algorithms have been proposed [9] [10]. In addition, attempts have been made to identify early crisis situations experienced by novice programmers using clustering [11]. However, many studies considered only optimizing human resources and presenting a course to users. The quality of teaching still depends on the actual teacher. To solve this problem comprehensively, an intellectual tutoring system is required.
Online judge systems are environments in which programming can be learned independently [12]. An online judge is an online execution environment that executes and verifies source code submitted by users. Users can improve their programming skills by solving problems in various categories (e.g., syntax, semantics, algorithms, data structures). In addition to automatic verification, an online judge provides a quantitative performance evaluation (execution time, memory usage, etc.) of the program. Although online judges satisfy certain requirements of an autonomous learning environment, it is extremely difficult for novice programmers to continue learning using online judges because they do not provide concrete feedback. Originally, online judges were created for programming contests, in which a solution is either accepted or rejected. Thus, programmers can only recognize that their program has an error, not which type of error (e.g., logic error, compile error, run-time error). For such a program, which is "executable for any input", but "does not satisfy the specifications of the problem", the judgment does not provide useful support for the user. Therefore, novice programmers tend to give up if their logic errors cannot be debugged.
There are various approaches to programming education, including methods that use program execution traces and program dependence graphs. Here, we propose a logic error detection algorithm based on structure patterns, error degree, and abstract syntax trees. A structure pattern is a list generated from the corresponding abstract syntax tree and is an index of similarity at the structure level. An error degree is a value generated from a result of the detection algorithm and is an index of similarity at the characteristic level. The goal of the proposed algorithm is to provide optimal and personalized feedback to individuals. The proposed algorithm provides personalized feedback based on filtering using structure patterns, error degree, and the characteristics of abstract syntax trees.
In this paper, an algorithm for detecting logic errors and the corresponding API, which can be used to construct intelligent coding editors, are presented. The results of quantitative and qualitative experiments are shown. The quantitative experiments are used to verify the detection range of logic errors. The results show that the proposed algorithm can detect logic errors in many cases. The qualitative experiments are used to determine the detection accuracy and evaluate the appropriateness of feedback based on the detection results. The proposed algorithm is shown to detect logic errors and provide appropriate feedback in many cases. In some cases, generated feedback was inappropriate for learning support. This paper is an extension of the work originally presented in "2018 9th International Conference on Awareness Science and Technology (iCAST)" titled "Logic Error Detection Algorithm for Novice Programmers based on Structure Pattern and Error Degree" [13].
The remainder of this paper is organized as follows. Section 2 presents related research. Section 3 describes the logic error detection algorithm. Section 4 describes the LED API. Section 5 presents experiments that use data from an online judge system. Section 6 shows and discusses the experimental results and suggests possible improvements. Finally, Section 7 summarizes the main conclusions.

Related Research
The goal of the present study is to construct an autonomous learning environment for novice programmers. However, autonomous learning of programming has not been clearly defined.
Learner autonomy has been mainly considered in the field of foreign language education by researchers such as Holec [14] [15]. Dickinson proposed the concept of autonomous learning, where learners do not rely on teachers, but are themselves responsible for the results, and make all decisions related to learning [16]. Therefore, autonomous learning can be broadly defined as self-learning controlled by the learner.
This definition of autonomous learning assumes that the learning environment is appropriate. However, there is a shortage of teachers in programming education. In the present study, autonomous learning of programming is defined as autonomous learning in an environment without an actual teacher.
Thus far, a number of approaches to support novice programmers have been proposed. In this paper, we present methods that have different targets and structures, and require different educational resources.
Wu et al. designed practice teaching and assessment methods around an online judge system and proposed resolution methods for existing problems in hybrid learning [17]. Semi-automatic evaluation methods that focus on reducing the burden on teachers have also been proposed [18]. In this approach, students are grouped according to the structure of their source code submitted to the online judge, and guidance is provided for each group. Using a network of online judge users, a hybrid method that allows other users to be asked for guidance has been proposed [19]. This approach uses a clustering result based on the similarity of source code. In addition, a method that uses solutions and a language model based on long short-term memory (LSTM) networks for bug detection has been proposed [20].
Fault localization with program execution traces is a well-known technique for detecting logic errors. Using this concept, Su et al. constructed a Moodle-based online learning environment to provide feedback for logic errors [21]. In a study related to this technique, the accuracy of logic error detection using only 20% of the source code and the effectiveness of feedback using a lightweight spectrumbased algorithm were verified [22].
There are approaches that target specific logic errors to support novices. These approaches focus primarily on conditional branches and detect common logic errors, such as careless mistakes in conditional expressions, priority violations, and infinite loops [23] [24].
Targeted at advanced novice programmers, FrenchPress [25] is an eclipse plugin that focuses on programming style and supports programmers who have not yet assimilated the object-oriented paradigm.
In the present study, logic errors are detected by comparing a source code that may include logic errors with correct source codes stored in an online judge system. A comparison of source code is generally performed to detect code clones (i.e., similar or identical fragments of code) [26] [27] [28]. The following four types of method are commonly used to detect code clones.
Text-based methods directly compare source code as text. This type of comparison includes a lot of unique information, such as www.astesj.com 2 variable names. Token-based methods compare tokens obtained by a lexical analysis of source code. Since this type of comparison is not affected by unique information, such as variable names, it has relatively high accuracy. Tree-based methods compare code based on abstract syntax trees obtained by a syntax analyzer. Since parts unrelated to the meaning of the program are omitted from the syntax tree, the structural information of the source code is used for comparison. Semantic-based methods compare code using various information obtained from a parser. A program dependence graph is primarily used. It is possible to detect discontinuous code clones, which are difficult to find using other methods. The results obtained from text or token comparison are not necessarily consistent with the structure of the source code. Therefore, it is difficult to generate accurate feedback when using text-and token-based methods for detecting logic errors. Since the program dependency graph used in semantic-based approaches has information on data dependency and control dependency, it is possible to perform a comparison based on semantics rather than the description of source code. However, in the isomorphism judgment of graphs, although the semantics of the programs are the same, it is considered difficult to detect logic errors that differ only in internal values. A comparison using abstract syntax trees is based on the structural information of source code and thus can be performed at the syntax level, considering for example if statements or for statements. Therefore, this method is considered to be suitable for detecting logic errors by a feedback algorithm.

Logic Error Detection Algorithm
In the present study, two kinds of source code, namely "wrong code" and "correct code", are used to demonstrate the proposed algorithm. The wrong code is code submitted to an online judge and judged as "Wrong Answer". It is the target for detecting logic errors by the proposed algorithm. The correct code is code judged as "Accepted" and stored in the online judge system. It is the comparison target for detecting logic errors by the proposed algorithm.

Logic Error
A logic error is a bug in a program that triggers erroneous behavior but does not cause abnormal termination. In other words, logic errors induce behavior not intended by the programmer. Programs that contain logic errors are valid programs that can be compiled and executed. To debug a logic error, the user must determine the cause of the error by tracing variable information during execution using a number of test cases. Table 1 shows some common examples of logic error.

General Approach of Proposed Algorithm
In the present study, we propose a logic error detection algorithm based on a comparison between wrong and correct code. The proposed algorithm uses a comparison method based on an abstract syntax tree, which is a data structure in which the grammatical structure of source code is represented by finite labeled oriented trees. An abstract syntax tree is recursively defined by a minimum unit, called a syntax, generated by a parser. For example, if and for statements are recursively defined by a number of constructs. In the present study, a construct that is recursively defined by other constructs is denoted as a nonterminal construct, and a construct that is not recursively defined by other constructs is denoted as a terminal construct. Examples of nonterminal constructs are if statements and conditional expressions, and examples of terminal constructs are character strings and numeric values. Specifically, the algorithm compares abstract syntax trees of the target source code by applying the following steps recursively.
1. If the information on terminal constructs of the same kind is not different, it is not judged as a logic error.
2. If the information on terminal constructs of the same kind is different, it is judged as a logic error.
3. If the constructs are the same kind of nonterminal construct, they are compared to each other. 4. In other cases, a logic error is judged to have occurred.
The algorithm detects constructs that are logic errors. Algorithm 1 shows the pseudocode of the logic error detection algorithm. The proposed algorithm can compare compilable source code. However, because of the characteristics of abstract syntax trees, a comparison of source code with different tree shapes generates detection results that are unsuitable for detecting logic errors. In addition, even if the shapes of the trees are the same, a comparison of code with different node meanings will generate incorrect detection results. Therefore, the proposed algorithm has two phases.
In the first phase, an appropriate comparison target is selected. The proposed algorithm compares only two source codes and detects logic errors. However, the user may attempt to solve a problem using various approaches. For example, a loop can be implemented using a for statement or a while statement. If the detection algorithm is applied to the source code of programs that have quite different structures, incorrect results will be obtained. Figures 1 and 2 show examples of a comparison of source code that uses different algorithms. A suitable comparison target is required to detect logic errors. www.astesj.com

Algorithm 1 Logic Error Detection
Require: x = expression in wrong code, x n = n-th expression of x y = expression in correct code, y n = n-th expression of y Ensure: E = an array of Logic Error : Array 1: function Detection(x, y) Connect E and Detection(x i , y i ) together Step 3   13 Step 4 18: end function In the second phase, the best detection result is selected. The proposed algorithm generates feedback for the user based on a comparison of the source code. However, the algorithm that the user employs to solve a problem is independent of the source code structure. For example, how conditional branches and libraries are used depends on the user's characteristics. If the detection algorithm is applied to the source code of programs with different algorithms, we may provide feedback while ignoring the characteristics of the corresponding user. Figures 1 and 3 show examples of a comparison of source code that uses different loop constructions. Therefore, we need to give feedback based on correct codes that has the same characteristics as those of the target codes.

Structure Pattern
In the present study, structure patterns are used in the first phase. A structure pattern is a list generated from the corresponding abstract syntax tree and is an index of similarity at the structure level. Moreover, it is core data obtained by a depth-first search of the abstract syntax tree and includes a list of expressions and block statements. The generated list contains expressions such as a variable declaration and assignment, and control structures such as for and if statements. Therefore, a structure pattern can be considered to express the general form of source code. Figure 4 shows a sample source code. Figure 5 shows an example of an abstract syntax tree parsed from the sample source code. Table 2 shows an example of a structure pattern generated from the sample source code.  Programs with the same structure patterns have the same order and kind of constructs. Therefore, we can obtain appropriate comparison targets by filtering correct code using the structure pattern. For example, the structure patterns for the source code in Figures  1 and 2 are shown in Tables 3 and 4, respectively. As shown, appropriate correct codes for comparison can be obtained by filtering according to the structure pattern.

Error Degree
In the present study, the error degree is used in the second phase. An error degree is a value generated from a result of the detection algorithm and is an index of similarity at the characteristic level. Moreover, it is a total value obtained by assigning a weight according to the type of detected logic error and can be expressed as (1).
x = the number of logic errors w = weight of logic error n = the number of logic error types The error degree is zero when the wrong code is compared with itself. An increase in error degree indicates that the source code has characteristics different from those of the wrong code. Therefore, we can obtain optimal feedback by comparing the wrong code with all versions of correct code and by filtering the detection results using the error degree. Table 5 shows the weighting rules. The weight depends on the logic error type in the source code. For example, errors at the element level, such as terminal constructs www.astesj.com 5 and operators, have lower weight, whereas errors at the syntax level, such as the type of syntax and its presence in the code, have higher weight. Errors related to the number of constructs are weighted in proportion to this number. The error degree compensates for the weaknesses of filtering by structure pattern. For example, Figures 6 and 7 show the results of logic error detection for the source code of programs with the same structure patterns presented in Figure 1 and Figure 3. As shown in these figures, more accurate feedback can be obtained by calculating the error degree from the detection result.

Extended Detection Algorithm
The purpose of the proposed algorithm is to support autonomous learning by detecting logic errors in a wrong code and providing the corresponding feedback. The detection algorithm provides feedback through the following steps.
1. Analyze the judged source code and generate a structure pattern.
2. From the data stored in the online judge system, extract correct codes with the same structure pattern.
3. Compare all of the extracted correct codes with the wrong code, and calculate the error degree based on the detection results.
4. Provide information obtained based on the detection result with the lowest error degree as feedback for the user.
Algorithm 2 shows the pseudocode of the extended logic error detection algorithm.
4 Implementation of Algorithm API Section 3 described the proposed logic error detection algorithm for general online judge systems. In the present study, we also developed an LED web API for the proposed algorithm that is compatible with Aizu Online Judge (AOJ) [29].

Aizu Online Judge
AOJ is an online judge system developed and operated by the University of Aizu. An online judge system is a service that compiles and executes programs submitted by users and automatically evaluates their accuracy and performance using various test cases and verification machines. At present, AOJ contains source code from 60,000 users and 4 million judgments. AOJ contains various problems, such as those used in programming contests (e.g., ACM-ICPC) and those specific to a certain topic, including Introduction to Programming (ITP), "Algorithm and Data Structures", and related libraries. For these problems, limit values are set for CPU time and memory usage, and the user can practice by solving the problem under these limitations.
The user obtains a verdict by submitting the answer code for each question to the judge system. The judge system instantly provides feedback (accepted/rejected) based on various test cases for the problem, together with the CPU time and memory usage. The type of failure (Time Limit Exceeded, Memory Limit Exceeded, Runtime Error, etc.) is also given in the feedback. Users can debug their code based on the feedback and resubmit code as many times as desired. In the present study, the aim is to detect logic errors, so we only cover programs corresponding to "Accepted" and "Wrong Answer".
AOJ has an API for accessing a part of its database. The API can be used for developing various tools. For example, there are submitted source code, submission history including log of trial and error.

Logic Error Detector API
In the present study, we developed an LED that uses source codes stored in AOJ. Furthermore, by implementing peripheral functions, we developed as a web API for the LED.

Algorithm 2 Extended Logic Error Detection
Require: X = submitted code Y = array of accepted programs, Y n = n-th accepted program Ensure: F = Feedback 1: function Extended Detection(X) Step 1 2: R = empty array 3: p attern ⇐ structure pattern of X Step 2 4: Y ⇐ all the correct codes submitted for a given problem as X

5:
Y ⇐ Y that have the same structure pattern as p attern Step 3 6: for i = 1 to number of Y do return F ⇐ detection result with the lowest error degree in R Step 5 10: end function From the viewpoint of data availability and practicality, source codes in Java language are analyzed and utilized for the LED API. There are three main reasons for selecting Java. Firstly, it is often adopted as an introduction to object-oriented programming languages for novice programmers. Secondly, it is the second most frequently used language in AOJ. The use of stored data makes it possible to detect logic errors more accurately, so there must be a sufficient number of comparison objects. Thirdly, an open-source library called JavaParser can be used to convert source code written in Java into an abstract syntax tree.
The LED API is implemented using docker containers that serve as a web server, an application server, and a database server. Figure  8 shows an architecture diagram of the LED API.
The web container, based on Nginx, is responsible for front-end processing and acts as a reverse proxy. At the front-end, we implemented tools for research and simple editors using Vue.js and trial of API is possible. The reverse proxy controls access to the application container. Only the web container is exposed to the outside world via the API to prevent attacks on the application and database containers.
The application container is responsible for the back-end processing by the LED, which is the application that implements the proposed algorithm. Since only the host can connect to the database container, it can be used together with the reverse proxy of the web container to prevent attacks on the database container.
LED is a Java application developed using Spring Boot based on Spring Framework. MySQL and Hibernate, as an object-relational mapper, are adopted. JavaParser is used for parsing source code written in Java. Figure 9 shows a class diagram of LED. The SourceCode class is used to represent source codes submitted to and stored in AOJ. It has a many-one relation with the Problem, Status, and StructurePattern classes. It includes information on the ID of the user who submitted the code, the ID of the judge, and the submitted code. The StructurePattern class represents the structure pattern of source code for each program. It has an one-to-many relationship with the SourceCode and PatternComponent classes.

Class Composition
The PatternComponent class is used to represent the type and order of syntax in the structure pattern. It has many-to-one and composition relationships with the StructurePattern class. This is because the structure pattern includes the type and order of syntax, and there are cases where the source code of multiple programs has the same structure patterns, but the pattern component completely depends on the structure pattern.
The ErrorReport class is used to represent the comparison result www.astesj.com Figure 9: LED class diagram of the logic error detection algorithm and is also a feedback class of the LED API. It has a many-to-one relationship with the Source-Code and Problem classes. It has many-to-one and composition relationships with the ErrorComponent class. It includes information on wrong codes in which logic errors were detected and correct codes used to generate the detection result selected by filtering using the error degree. The ErrorComponent class is used to represent details of the detected logic error. It has the location, content, class of syntax, and error degree information of the logic error detected in the wrong code. Figure 10 shows the relationship between classes and objects. The flow of logic error detection using the LED API is as follows.

The user submits a source code that is judged as "Wrong
Answer".
2. The wrong code is analyzed and its structure pattern is generated.
3. The structure pattern of the wrong code is searched for in the database.
4. Correct codes with the same structure pattern as that of the wrong code are searched for in the database and used as candidate codes for detection.

5.
Using LED, all candidate codes are respectively compared to the wrong code and an error report is generated.
6. The error report with the lowest error degree is provided to the user as feedback. Figure 10: LED sequence diagram www.astesj.com 8 We conducted quantitative and qualitative experiments using the LED API and AOJ. The LED API is first used to detect logic errors in wrong code submitted by a user who has never solved the problem. Therefore, in this experiment, source codes judged to be "Wrong answer" in AOJ were used as wrong codes.

Basic Dataset from AOJ
AOJ has a problem set for novices called ITP. In this experiment, all problems in ITP were used. The dataset consists of source code that satisfies the following conditions.
• Submitted to solve the target problems.
• Written in Java.
• Consists only of the main method.
• Considered not to be created by the same user. Tables 6 and 7 show the details of the dataset. "Problem Id" is the problem id of the experimental target. "Usable Programs" is the number of available programs. "Structure Pattern" is the number of structure patterns. "Common Pattern Programs" is the number of programs with the most common structure pattern.

Experiment 1: Quantitative Method
This subsection describes experiments conducted using quantitative methods.
AOJ stores all the source codes submitted by users. Therefore, there are correct and incorrect answers to certain problems submitted by a given user. By comparing correct codes and the wrong code submitted by a given user, it is possible to obtain reliable feedback F necessary for debugging. From the inclusion relation of feedback F and feedback F' obtained using the LED API, we measure what portion of logic errors can be detected by the proposed algorithm.
In this experiment, we evaluated whether the following statements are true: • The LED API can be used to detect all logic errors.
• The LED API can be used to detect some logic errors.
For evaluating the first statement, we measured whether F' completely includes F. The code debugged based on information provided by feedback F has already been accepted by AOJ. Therefore, when feedback F' completely includes feedback F, it can be regarded that all the logic errors were detected.
For evaluating the second statement, we measured whether F' contains some of F. During debugging, a user may modify not only the algorithm, but also variable names, the order of processing, etc. Therefore, when some of feedback F are included in feedback F', it can be regarded that some logic errors were detected.
The following procedures were applied to each target problem.
1. Apply the proposed algorithm to correct code and wrong code submitted by a given user and obtain feedback F.

2.
Generate feedback F' from the wrong code using the LED API.
3. Determine the inclusion relation of F and F'.
The dataset consists of source code that satisfies the following conditions.
• Correct code and wrong code that submitted by the same user.
• These two programs have the same structure patterns.

Experiment 2: Qualitative method
This subsection describes experiments conducted using qualitative methods.
In Experiment 1, the accuracy of logic error detection was investigated using a quantitative method. However, the experiment only measured the detection range, and it is not known whether the feedback is sufficient for debugging. Therefore, we manually debugged a code using feedback from the LED API and verified whether the wrong code was acceptable by AOJ. In this experiment, we evaluated whether the following statements are true: • The LED API can be used to correctly detect logic errors.
• The provided feedback is appropriate for supporting learning.
The following procedures were applied to each target problem.
1. Randomly select a wrong code in AOJ.

Detect logic errors using the LED API.
3. Debug the wrong code manually using feedback and verify whether the debugged code is acceptable by AOJ.
www.astesj.com 9 Print "Hello World" to standard output. ITP1_1_B Calculate the cube of a given integer x ITP1_1_C Calculate the area and perimeter of a given rectangle. ITP1_1_D Read an integer S [second] and convert it to h:m:s.

ITP1_2_A
Branch on Condition Print small/large/equal relation of given two integers a and b.

ITP1_2_B
Read three integers a, b and c, and print "Yes" if a < b < c, otherwise "No". ITP1_2_C Read integers and print them in ascending order. ITP1_2_D Read a rectangle and a circle, and determine whether the circle is arranged inside the rectangle. ITP1_3_A Repetitive Processing Print "Hello World" 1000 times. ITP1_3_B Read an integer x and print it as is for multiple test cases. ITP1_3_C Read two integers x and y, and print them in ascending order for multiple test cases. ITP1_3_D Read three integers a, b, and c, and print the number of divisors of c in [a, b]. ITP1_4_A Computation Read two integers a and b, and calculate a / b in different data types. ITP1_4_B Calculate the area and circumference of a circle for given radius r.

ITP1_4_C
Read two integers, a and b, and an operator op, and then print the value of a op b.

ITP1_4_D
Read a sequence of n integers a i (i=1,2,...n), and print the minimum value, maximum value, and sum of the sequence. ITP1_5_A Structured Program I Draw a rectangle whose height is H cm and width is W cm. ITP1_5_B Draw a frame whose height is H cm and width is W cm.

ITP1_5_C
Draw a chessboard whose height is H cm and width is W cm. ITP1_5_D Structured programming without goto statement. ITP1_6_A Array Read a sequence and print it in reverse order. ITP1_6_B For given a deck of cards, find missing cards. ITP1_6_C Count the number of elements in a three-dimensional array. ITP1_6_D Read a n×m matrix A and a m×1 vector b, and print their product Ab.

ITP1_7_A
Structured Program II Read a list of student scores and evaluate the grade for each student. ITP1_7_B Find the number of combinations of integers that satisfy 1) three distinct integers from 1 to n, and 2) sum of the integers is x.

ITP1_7_C
Read the number of rows r, columns c, and a table of r×c elements, and print a new table that includes the total sum for each row and column. ITP1_7_D Read a n×m matrix A and a m×l matrix B, and print their product. ITP1_8_A Character Convert uppercase/lowercase letters to lowercase/uppercase. ITP1_8_B Read an integer and print the sum of its digits. ITP1_8_C Count the number of appearances for each letter (ignoring case). ITP1_8_D Find a pattern p in ring-shaped text s. ITP1_9_A String Read a word W and text T, and print the number of times that word W appears in text T.

ITP1_9_B
Read a deck (a string) and shuffle it, and print the final state (a string).

ITP1_9_C
Read a sequence of Taro and Hanako cards and report the final scores of the game. ITP1_9_D Perform a sequence of commands including reverse and replace operations.

ITP1_10_B
For the given two sides of a triangle a and b and the angle C between them, calculate the area, circumference, and height of the triangle. ITP1_10_C Calculate the standard deviation. ITP1_10_D Read two n-dimensional vectors x and y, and calculate Minkowski's distance, where p=1,2,3, ∞.

ITP1_11_A
Structure and Class Simulate rolling a dice. ITP1_11_B Print the integer on the right side face of the rolled die. ITP1_11_C Read two dice and determine whether they are identical. ITP1_11_D Read n dice and determine whether they are all different.

Results and Discussion
In this section, we present and discuss the results of each experiment.
6.1 Results of Experiment 1 Table 8 shows the results of Experiment 1. "Problem Id" is the problem id of the experimental target. "All Users" is the number of users who submitted source code. "Usable Users" is the number www.astesj.com of users who submitted both correct and wrong code with the same structure patterns (and thus logic errors can be detected using the LED API). "Whole Success Users" is the number of users for whom all logic errors were detected using the LED API. In other words, this is the result of the first statement. "Partial Success Users" is the number of users for whom some logic errors were detected using the LED API. In other words, this is the result of the second statement. "Whole Success Ratio" is the ratio of Whole Success Users to Usable Users. "Partial Success Ratio" is the ratio of Partial Success Users to Usable Users. "Difference Success Ratio" is the difference between Whole Success Rate and Partial Success Rate. The first item is the complete detection rate of logic errors. The results of the experiment show that all logic errors were detected for many problems. Overall, we succeeded in detecting logic errors in more than 70% of programs.
The second item is the partial detection rate of logic errors. The results of the experiments show that a higher detection ratio for partial logic errors was achieved for many problems. Overall, we succeeded in detecting logic errors in more than 80% of programs. Table 9 shows the results of Experiment 2. The first item is the detection accuracy of logic errors. Considering the features of the evaluation method, it was considered that logic errors could not be correctly detected when the source code could be modified only by factors outside the feedback range. The results of the experiment show that we successfully detected logic errors in 100% of test cases. Therefore, the proposed detection algorithm has sufficient ability to detect logic errors in wrong code.

Results of Experiment 2
The second item is related to the appropriateness of feedback. The experimental results show that we succeeded in providing appropriate feedback in 80.4% of the test cases. This evaluation item is based on the qualitative method used to verify the appropriateness for supporting learning. Therefore, we should discuss the appropriateness of the judgment.

Discussion
In this subsection, we discuss the experimental results. Some ideas for improvement based on the experimental results are also given.

Discussion for Experiment 1
The results of Experiment 1 confirm that the proposed detection algorithm can support users who attempt to solve problems. We also confirmed that the detection algorithm can detect all the logic errors in a wrong code in many cases. However, results for statement 2 indicate that feedback may contain syntax that is unrelated to the algorithm. Therefore, it is necessary to improve detection accuracy by improving the detection algorithm.

Discussion for Experiment 2
The results of Experiment 2 confirm that logic errors can be detected by the proposed detection algorithm. In addition, we confirmed that the detection algorithm can provide appropriate feedback in many cases. Therefore, it is necessary to generalize some specific cases and to improve the algorithm.

Discussion for All Experiments
The results of the two experiments confirmed that the proposed algorithm can accurately detect logic errors and provide appropriate feedback. The proposed algorithm detects logic errors using source codes stored in the online judge system. To increase accuracy, we filter the codes using structure patterns and error degree. Therefore, detection accuracy depends on the diversity of source code stored in the online judge system. In the present study, although we only considered codes written in Java, the proposed algorithm can be applied to all context-free languages. There is more source code written in languages such as C and C ++, and thus logic errors can be detected with higher accuracy.

Improvements
In this subsubsection, we suggest some improvements based on the experimental results.
Based on the results of Experiment 1, we suggest the following two improvements. One improvement is the removal of elements not related to logic errors detected by the detection algorithm. For example, elements with the same meaning in different formats, such as variable names and logical expressions, can be removed because they are not related to logic errors. Another improvement is the use of a comparison method that is independent of the structure pattern (i.e., generalization of the algorithm). In Experiment 1, a rule regarding the structure pattern as a condition of the dataset was used. This rule was used because the LED API does not have a function for comparing source codes with different structure patterns. Therefore, we suggest generalizing the algorithm by unifying the variable declaration part and determining the format of the calculation formula. This generalization would make it possible to compare source codes with different structure patterns without changing the meaning of the algorithm.
Based on the results of Experiment 2, we suggest the following three improvements. The first improvement is a comparison method of conditional branching. The conditional branching algorithm strongly depends on the logical expression that is employed by the user. In particular, filtering by structure pattern is not effective for problems that require multiple standard outputs using conditional branching. Therefore, we suggest a comparison based on a logical expression and feedback regarding the processing order. The second improvement is a comparison method of binary expressions. Because binary expressions are parsed using certain rules, a comparison between binary expressions from different structures are not effective. Therefore, we suggest making the error degree more accurate by implementing special logic to compare binary expressions. The third improvement is a review of the weighting when the type of syntax is the same but the elements are different. As an example, method call expressions can be very different, even using the same kind of syntax.
www.astesj.com However, even with such a comparison, the error degree is a very small numerical value. In particular, standard output methods and various library methods are sometimes compared, so that optimal feedback cannot be generated in some cases. Therefore, we suggest measuring to what extent the syntax has the same meaning by using the namespace and arguments. We also suggest detecting the parent syntax itself as a logic error, such as when only variable names are detected as logic errors.

Conclusion
This study proposed a logic error detection algorithm based on structure patterns and error degree to support novice programmers. The proposed algorithm uses data in an online judge system to detect logic errors in wrong code and provides feedback for supporting problem solving. In addition, we developed the LED web API based on the proposed algorithm. The LED API is compatible with the online judge system, so it can promote autonomous learning with the corresponding environment. Two experiments were conducted to verified the accuracy of the proposed algorithm. The experimental results show that the proposed algorithm can accurately detect logic errors and provide appropriate feedback. The proposed algorithm detected logic errors in programs for all the problems in the Introduction to Programming 1 set with high accuracy. In addition, the detection accuracy depends on the amount of data in the online judge system.
Although we constructed the autonomous learning environment and detection algorithm in Java, the proposed algorithm can be applied to various programming languages, such as C/C++, Python, and Ruby. Although the presented LED API is oriented to detec logic errors, the algorithm is capable of suggesting possible solutions. So, our future works include the extension of the API after conducting further experiments.