Modeling an Energy Consumption System with Partial-Value Data Associations

Many existing system modeling techniques based on statistical modeling, data mining and machine learning have a shortcoming of building variable relations for the full ranges of variable values using one model, although certain variable relations may hold for only some but not all variable values. This shortcoming is overcome by the Partial-Value Association Discovery (PVAD) algorithm that is a new multivariate analysis algorithm to learn both full-value and partial-value relations of system variables from system data. Our research used the PVAD algorithm to model variable relations of energy consumption from data by learning full-and partial-value variable relations of energy consumption. The PVAD algorithm was applied to data of energy consumption obtained from a building at Arizona State University (ASU). Full-and partial-value variable associations of building energy consumption from the PVAD algorithm are compared with variable relations from a decision tree algorithm applied to the same data to show advantages of the PVAD algorithm in modeling the energy consumption system.


Introduction
Our research is an extension of work originally presented in the 2018 IEEE ICCAR Conference [1]. Many complex systems, such as energy consumption systems and transportation systems, involve both engineered and non-engineered system factors. For example, the energy consumption system of a building involves both engineered system factors, (e.g., AC equipment, pump for water use, lighting system, computers, and network equipment) and non-engineered system factors, (e.g., social/behavioral factors, such as occupants' activities, and environmental/natural factors such as outside climate), which are intertwined to drive the energy consumption and demand of the building [2][3][4]. For another example, the transportation system involves both engineered system factors, (e.g., the transportation infrastructure including highways, streets and roads, and traffic control mechanisms, such as traffic lights) and non-engineered system factors, (e.g., social/behavioral factors such as traffic flows, drivers, pedestrians, and car accidents, as well as natural/environmental factors, such as weather conditions).
Although models of engineered systems may be available, models of mixed-factor systems are usually not available due to unknown interconnectivities and interdependencies of many engineered and non-engineered system factors. A complete, accurate system model, which clearly defines relations of system variables including interconnectivities and interdependencies of engineering and non-engineered system factors, is highly desirable for many applications. For example, variable relations of energy consumption are required to enable the accurate estimation of energy consumption/demand and the close alignment of energy production with energy demand to achieve energy production and use .
Utility/energy companies currently rely heavily on the past data of electricity loads in base, average and peak to project energy production/supply. This statistical investigative activity is done without adequate and accurate models of energy consumption systems [3]. Power plants often generate enough power to satisfy base loads and meet the difference between peak and base loads, sudden demand surge or any gap of energy supply and demand through their excess production capacities or by procuring from other energy sources [5,6]. Historical data lack critical real-time features (e.g., the lag effect of historical data, and lack of finer levels and finer divisions in time and space) for the accurate projection and estimation of energy demand and consumption. Without adequate and accurate models of energy consumption systems, it is extremely difficult to obtain an accurate projection and estimation of energy demand and consumption. As a result, energy has to be produced in excess in order to meet potential rise in demand. Energy production in excess is a significant cause of waste and inefficiency. Even with current technologies to obtain dynamic data of energy consumption systems in real time, the lack of adequate and accurate energy system models renders real-time dynamic system data useless for closely aligning energy production with energy demand to achieve energy production efficiency and energy use reduction. The ultimate energy efficiency through smart energy production and use will enable a shift from the existing code-, standard-and experience-based forecasting approach to a more dynamic, real-time and smart technology environment based on real-time data, models and analytics for the real-time, accurate estimation of energy consumption and smart technologies to align energy production with energy demand closely for energy use reduction and energy production efficiency.
Many statistical modeling, data mining and machine learning techniques for system modeling, including decision trees, regression analysis, artificial neural network, and Bayesian networks, have been used to analyze and model energy consumption and efficiency of equipment, homes and buildings [7][8][9][10][11][12][13][14][15][16]. System modeling techniques based on many existing statistical analysis, machine learning and data mining have a shortcoming of building variable relations for the full ranges of variable values using one model, although certain variable relations may hold for only some but not all variable values. This shortcoming is overcome by the PVAD algorithm that is a new multivariate analysis algorithm to learn both full-value and partialvalue relations of system variables from system data. Our research used the PVAD algorithm to model variable relations of energy consumption from data by learning full-and partial-value variable relations of energy consumption. The PVAD algorithm was applied to building energy consumption data at ASU.

Shortcomings of existing techniques of system modeling from data
Existing methods of learning system models from data include statistical analysis [17][18][19][20][21][22][23][24] and data mining techniques [23][24][25][26][27][28][29][30][31][32]. With system modeling from data, classification and prediction can be performed to explain or find relations among system variables. Depending on the nature of data, there are several methods to analyze data using statistical techniques such as parametric, nonparametric and logistic regression. For example, when modeling categorical dependent variables, logistic regression can be applied [17,21,22]. In addition to decision and regression trees [23,24], random forest and support vector machine are also considered [25-28, 29, 31]. However, the above methods assume that the role of a variable in a variable relation is known (i.e., which variable is an independent or dependent variable) and a variable plays only one role of being either an independent variable or a dependent variable in one layer of variable relations. Once a variable is considered as an independent variable, it can no longer be utilized as a dependent variable which is a main disadvantage especially when the role of a variable is not known or when multiple layers of variable relations are required where a variable can play different roles of being an independent or dependent variable in different variable relations at different layers.
Bayesian networks [23,24,[35][36][37], structural equation models [33,34] and reverse engineering methods [38][39][40][41][42][43][44][45][46][47] are examples of a few options left that can provide system modeling without prior knowledge of variables. However, those techniques discover only variable relations for full ranges of all variable values instead of relations for specific values only. This can be seen from the Fisher's Iris data set [48] in which the classification of the target variable (Plant Type) using independent variables works for only the values of Iris Versicolor and Iris Virginica) for the target variable but not for another target value of Iris Sentosa. For such data where variable relations hold for partial ranges of variable values only or different variable relations hold for different ranges of variable values, the model of the same variable relations for all variable values do not fit all data values well, that is, the model explains or represents the whole data set poorly.
The PVAD algorithm was developed as a new system modeling technique [49][50][51] to overcome the above shortcomings. Variable value associations can be used to construct associative networks as multi-layer structural system models. The application of the PVAD based system modeling technique is part and parcel of our research of energy consumption in systems.

The energy consumption data and the PVAD application
The PVAD algorithm is presented in detail in [49][50][51]. This section shows the PVAD application to data of energy consumption collected from an ASU building in January 2013 for modeling energy consumption. There was a data sample every 15 minutes. The data set has 2976 data records or instances. Each data record contains four numeric values for the consumption of electricity (E), cooling (C), heating (H), and air temperature (A), respectively, as well as TimeStamp (T). T is important because changes of T are associated with changes in presence and activities of occupants and changes of E, C and H.
To apply the PVAD algorithm, in Step 1 the numeric variables of A, H, C, and E, were transformed into categorical variables as shown in Fi To apply the PVAD algorithm, in Step 1 the numeric variables of A, H, C, and E, were transformed into categorical variables as shown in Figures (1)-(4). More details of Step 1 are in [1].    Step 2.1 generated candidate 1-to-1 associations of partial variable values, x = a  y = b, where x = a is the conditional variable value (CV) and y = b is the associative variable value (AV), and computed the co-occurrence ratio (cr) of each candidate association as follows: If cr is greater than or equal to the parameter α, we had an established association. For example, Table (1) shows 1-to-1 associations having CV: C = High together with their respective cr values and α = 0.8. In addition to parameter α, two other parameters, β and γ, are also needed. β is used to remove associations whose number of supporting instances (the instances containing variable values in the numerator of equation 1) is smaller than β. γ is used to remove an association with a common CV or AV that appears in more than γ of the data set. In this example, β is set to be 50 while α and γ are set to 0.8 and 0.95, respectively.
Step 2.2 uses two methods, YFM1 and YFM2, to examine and esyablish p-to-q associations, X = A  Y = B, where X and Y represent multiple variables. For example, using #5, 6 and 8 in Table (1), we applied YFM1 which considers all combinations of AVs covered in those associations so as to find 1-to-q associations, where q >1. To find 1-to-2 established associations, we first computed = 103 ÷ 0.8046875 = 128. Then we considered all possible combinations of two-variable AVs from the established 1-to-1 associations: 1. C=High->E=Medium, H=Low (from #5 and #6) 2. C=High->E=Medium, A=High (from #5 and #8) 3. C=High->H=Low, A=High (from #6 and #8). For each 1-to-2 candidate associations above, CommonSubset , the number of instances in the common subset of supporting instance, was computed to calculate cr for the 1-to-2 association. The results are given in Table (2). In this case, C=High->H=Low, A=High is the only established association. immediately to the next line if the AV of that association is the same as one picked in the previous step. For example, #2 has AV: T=5:45 PM to 11 PM that represents Timestamp. While the AV of the association also represents timestamp (T=12:15 PM to 5:30 PM), we skip to #3 without looking at the intersection of the instances. 2ii) Generate 2-to-1 association if n intersection ≥ 50. Table (3) lists the n intersection and the corresponding cr value. Following the same procedure, other p-to-q associations were generated by YFM1 and YFM2.
Step 3 generalized and consolidated variable associations of partial values into associations of full value ranges if there are partial-value associations covering the full value range of the same variable.

Results of the PVAD Algorithm
Tables (4)-(5) list the most specific association(s) in each group of the associations with the same AV. Table (6) lists the most generic association(s) in each group of the associations with the same AV. Variable relations for energy consumption revealed by each association in Tables (4)-(6). In Tables (4)-(6), there are groups that give similar associations. For example, the associations in Group 1 and Group 2 in Table (6) are similar. For the groups with similar associations, we marked only one group using the symbol ^ in the column of group #. Most of the associations in Tables (4)-(6) involve C=Low for cooling being low in CV or AV, because most of instances in the data set (2848 out of totally 2976 instances) contain C=Low due to the month of January when the data was collected. Since C=Low is so common in the data set, C=Low can be dropped from the associations when interpreting associations.  The associative network of the energy consumption system model shown in Figure (5) was constructed using the associations in the groups marked with ^ in Table (6). Figure (5) shows the factors associated with the high, medium and low air temperatures (from the associations with A as the AV), the factors associated with the Medium and Low heating consumption (from the associations with H as the AV), and the factors associated with the medium and low electricity consumption (from the associations with E as the AV).  Table (6) and in Figure (5) show that associations of T, E, C, H and A differ in different value ranges of these variables. This illustrates that the PVAD algorithm can discover full/partial-value variable relations that exist in many real-world systems.  Figure 5. The most generic associations in the groups marked by ^ in Table (6) represented in an associative network.

Comparison of the PVAD algorithm with some data mining techniques
We considered two of the existing data mining techniques to compare with the PVAD algorithm: association rule and decision tree.

Comparison with the association rule technique
The association rule technique first uses the Aprori algorithm to determine frequent item sets that satisfy the minimum support [23][24]. Then each frequent item set is broken up into all possible combinations of association rules which are evaluated to see if any of them satisfy the minimum support and confidence. For a large dataset, frequent item sets and candidate association rules from frequent item sets can be enormous, requiring huge amounts of computer memory space and computation time. When the association rule technique was applied to the energy consumption data, there were too many frequent item sets and consequently association rules to be listed in this paper. While the performance of the association rule technique was hindered by the data size, the search space of associations in the PVAD algorithm is narrowed down by YMF1 and YFM2, along with parameters α, β and γ.

Comparison with the decision tree technique
Decision tree is a data mining technique to learn decision rules that express relations of the dependent variable y with independent variables x in a directed and acyclic graph [23][24]. The software, Weka, was used to construct decision trees of the energy consumption system data, To construct a decision tree in Weka, there are different algorithms such as ID3 [52] and J48 [53]. The later one is an extended version of ID3 with additional features like dealing with missing values and continuous attribute value ranges. It also addresses the over-fitting problem that decision trees are prone to by pruning. The pruning process requires the computation of the expected error rate. If the error rate of a subtree is greater than that of a leaf node, a subtree is pruned and replaced by the leaf node.
In our research, ID3 was used for the comparison with the PVAD algorithm because ID3 produces comparable results with associations produced by the PVAD algorithm. Leaf nodes produced by ID3 are pure in that the class labels of instances are the same in each leaf node. The purity of leaf node corresponds to AV in associations from the PVAD algorithm having the same variable value. The PVAD algorithm produces all associations up to N-to-1 associations, where N+1 is the number of variables. In other words, the PVAD algorithm can generate the longest CVs and find the AV that they are associated with. The combination of CVs corresponds to the path from the root of a decision tree down to a leaf node.
Because the decision tree technique requires the identification of one dependent variable (the target variable) and independent variables (attribute variables) for each decision tree, five decision trees need to be constructed for each of the five variables as the dependent variable. Tables (7)-(10) list decision rules produced by one of the five ID3 trees.
Although the decision rules from the decision trees appear to have the same form as associations from the PVAD algorithm, a decision rule has a different meaning from an association from the PVAD algorithm. A decision rule derived from the root of a decision tree to a leaf node of the decision tree represents a frequent item set with instances in the leaf node having the values of the target variable and the attribute variables in the decision rule. This is why we see a path in a decision tree is also present in another tree even though different decision trees have different target variables. For example, the variable values in E=Medium, A=High, H=Medium, C=Low, T=12:15 PM to 5:30 PM, are found in all four decision trees. Note that the energy consumption data set has only five variables. Redundant paths of different decision trees can be found more often for larger data sets with more variables. This means the waste of computation time and space and the difficulty of sorting out results from a number of   (7) -(10) that are not found in associations of the PVAD algorithm because frequent item sets for those decision rules were eliminated in the process of forming associations. Hence, the PVAD algorithm has the advantage to the decision tree technique because the PVAD algorithm discovers associations rather than frequent item sets. There is another difference between the decision tree technique and the PVAD algorithm. Each step of constructing a decision tree performs the splitting of a data subset for data homogeneity based on the comparison of splits using only one variable and its values rather than combinations of multiple variables due to the large number of combinations and the enormous computation costs. Hence, the resulting decision tree contains decision rules with the consideration of only one variable at a time and may miss decision rules that can be generated if multiple variables and their values are considered and compared at a time. However, the PVAD algorithm examines one to multiple variables at a time and does not miss any associations that exist. The PVAD algorithm thus has the advantage to the decision tree technique by not missing any established associations and using YFM1 and YFM2 to cut down the computation costs.
Moreover, the decision tree algorithm requires the identification of the dependent variable (the target variable) and the independent variables (the attribute variables) although there may no priori knowledge for the identification of which variable is a dependent or independent variable. This is why five decision trees, with one decision tree taking each of the five variables as the target variable, had to be constructed for the energy consumption data. The PVAD algorithm does not require the distinction of dependent and independent variables but discovers variable value relations and the role of each variable in each variable value relation. Furthermore, the PVAD algorithm can generate p-to-q associations with q > 1 that the decision tree technique cannot generate because a decision tree is constructed for only one target variable and produces only p-to-1 decision rules. Given the differences of the PVAD algorithm and the decision tree technique, the results of the PVAD algorithm are not comparable to the results of the decision tree technique. As discussed in Section 2, the PVAD algorithm overcomes shortcomings of existing statistical analysis and data mining techniques and produce partial/full-value associations that cannot be produced from other existing techniques.

Conclusion
Our research used the PVAD algorithm to learn and build the system model of energy consumption from data, especially learn relations of variables for both full and partial value ranges. The resulting partial-value associations of variables in the energy consumption system model reveal variable relations for partial value ranges that require not one but different models of variable relations over full value ranges of the variables. This finding shows that the PVAD algorithm has the advantage and capability of discovering variable relations for building a multi-layer, structural system model. Hence, the PVAD based system modeling technique can be useful in many fields to learn system models from data. The advantages of the PVAD algorithm to existing data mining, machine learning and statistical analysis techniques were also demonstrated by comparing the PVAD algorithm and its results from the application to the energy consumption data with the association rule technique and the decision tree technique.