Distribution of Bit Patterns in Binary Sequence Generated Over Sub Extension Field

The distribution of bit patterns is an important measure to check the randomness of a sequence. The authors of this paper observed this crucial property in a binary sequence which generated by using a primitive polynomial, trace function, and Legendre symbol deﬁned over the sub extension ﬁeld. The authors create a new dimension in the sequence generation research area by considering the sub extension ﬁeld, whereas all our previous works are focused in the prime ﬁeld. In terms of distribution of bit patterns property, this research work has notable outcomes more speciﬁcally the binary sequence (deﬁned over the sub extension ﬁeld) holds much better (close to uniform) bit distribution than the previous binary sequence (deﬁned over the prime ﬁeld). Furthermore, the authors theoretically proved the distribution of bit property in this paper


Introduction
In this IoT era, we communicate with each other through the internet. Therefore, secure communication is the major matter of concern. We use symmetric cryptosystems (Advanced Encryption Standard (AES) [1]) and asymmetric cryptosystems (Rivest Shamir Adleman (RSA) [2], and Elliptic Curve Cryptography (ECC) [3]) to establish a secure communication. A pseudo-random number is one of the crucial parts of these cryptosystems. More specifically, in case of cryptography, to generate the keys (public key, private, session key, and so on) a pseudo-random number generator is used. A prominent pseudo-random number generator is essential to generate pseudo-random number having randomness property (along with other good statistical properties). Consequently, the security of these cryptosystems deliberately depends upon the randomness property regarding a sequence. Thus, it is mandatory to evaluate the randomness of a sequence before utilized them in any cryptosystems. Basically, two crucial properties namely the linear complexity [4] and the distribution of bit patterns regarding a sequence are nowadays well-known to check the randomness of a pseudo-random sequence. In this work, the authors restrict the discussion on the distribution of bit patterns property to evaluate the randomness of a sequence.
Most renowned pseudo-random number generators are the Mersenne Twister (MT) [5], Blum-Blum-Shub (BBS) [6], Legendre sequence [7,8], and M-sequence [9]. Among those the former two pseudo-random number generators (MT and BBS) are well-known considering their applications in cryptography rather than the theoretical aspect. On the other hand, the M-sequence and Legendre sequence are prominent geometric sequences regarding the theoretical aspect. As a result, the authors attracted in the pseudo-random sequence generation research area by observing the theoretical prospect on the M-sequence and Legendre sequences.
The Legendre sequence [7,8] is generated by applying the Legendre symbol over the odd characteristic field. It has a long period, high linear complexity, and the distribution of bit patterns of the Legendre se-quence is known to be close to uniform [10,11]. On the other hand, M-sequence is generated by a linear recurrence relation over the finite field. It has a maximum period but minimum linear complexity. In addition, Msequence [9] is well-known for its uniform distribution of bit patterns [12]. Our previous work on geometric sequence [13] combines the features of the Legendre sequence and M-sequence. As mentioned previously, linear complexity and distribution of bit patterns are the important measures to evaluate the randomness of a sequence. So, regarding the linear complexity, our previous sequence [13] always possess high value. Unlike the linear complexity, the distribution of bit patterns in [13] doesn't reaches up to the mark alike the Legendre sequence and M-sequence. Hence, its a scope to improve the distribution of bit patterns in our previous sequence.
The trace calculation is an important step during our sequence generation procedure. Lets focus on the important aspect regarding this calculation. In case of prime field F p , the trace function maps an element of the extension field F q M to an element of the prime field F p . Therefore, the number of possible trace outputs will be in the range of {0 ∼ p − 1}. In other words, if we calculate the trace over the prime field, then it will output p kinds of values. On the other hand, in case of the sub extension field F q , the trace function maps an element of the extension field F q M to an element of the sub extension field F q and the number of possible trace outputs will be in the range of {0 ∼ q − 1} which means the trace outputs q kinds of values. It should be noted that here M = m/m , q = p m , and m be one of the factors of m. From the theoretical perspective, more variation in the trace values contribute to the better appearance of bits (0 and 1) in a sequence. This is one of the important aspects to consider the sub extension during the sequence generation procedure to improve the distribution of bit patterns in our previous sequence [13]. After utilizing the sub extension field, the detailed improvement in distribution of bit patterns is introduced in the result and discussion section in this paper.
Recently, the authors started to consider the sub extension field during the sequence generation procedure, which is a new dimension of our research work on generation of pseudo-random sequence (whereas our previous works on binary sequence [13] and multivalue sequence [14,15] are considered in the prime field). As a result, our recent works on binary sequence [16] and multi-value sequence [17] experimentally observed the linear complexity, autocorrelation properties, respectively. As mentioned previously, the distribution of bit patterns is an important measure to evaluate the randomness of a sequence. Thus, the authors of this paper consider the distribution of bit patterns in a binary sequence which generated over the sub extension field.
The Legendre sequence and M-sequence are the base of the sequence research area. Their properties are already proven, therefore many researchers attracted by those sequences. As mentioned previously, our se-quence also generated by the idea of the Legendre and M-sequences. Consequently, the authors thought that its properties can be theoretically proven and fortunately its proven (which shown in the later section of this paper). This is one of the contributions of the authors in this paper. Moreover, they also make a comparison between the binary sequence defined over the sub extension field with their previous work on binary sequence in terms of distribution of bit patterns property. According to the comparison result, binary sequence (defined over sub extension field) holds much better (close to uniform) distribution of bit patterns than the previous binary sequence [13]. Finding this improvement by considering the sub extension field is the major contribution of this paper.
The authors of this paper observed the distribution of bit patterns in a binary sequence which generated by a primitive polynomial, trace function, and Legendre symbol over the sub extension field. In brief, the sequence generation procedure is as follows: at first, it uses a primitive polynomial over the odd characteristic field F p to generate maximum length vector sequence as elements in F q M , then the trace function maps the extension field F q M elements to the sub extension field F q elements, and finally the Legendre symbol binarizes the sub extension field F q elements to a binary sequence. The authors already observed the period, autocorrelation, cross-correlation, and linear complexity properties of the binary sequence (which generated over the sub extension field) in their previous works [16,17]. Thus, this paper focused on the distribution of bit patterns property. In brief, the authors count the number of appearances for each n-bit patterns (where 1 ≤ n ≤ (m/m )). After observing many experimental results, the authors found that the number of appearances of each bit pattern is related to the number of zeros contained in each bit pattern. Furthermore, the authors theoretically proven the distribution of bit patterns equation. Moreover, they also make a comparison with their previous work [13].
Throughout this paper, p and q denote an odd prime number and its power q = p m , respectively, where m be a positive integer which mainly denotes extension degree and m be one of the factors of m. In addition, M = m/m and F * q denotes the multiplicative group of F q , that is F * q = F q − {0}.

Preliminaries
This section briefly explains some fundamental concepts of the finite field theory such as group, field, primitive polynomial, trace function, Legendre symbol, and dual bases. Then, binary sequence is introduced along with its period and distribution of bit patterns properties.

Group
A group is a non-empty set G with a binary operation • on its elements denoted as < G, • >, which satisfies www.astesj.com the following axioms.
• Closure For ∀ a,b ∈ G, the result of a • b also exists in G and it is uniquely given.
• Associativity Elements in group G should follow associativity. i.e.
• Identity element There exists an element e ∈ G such that ∀ a ∈ G, a • e = e • a = a.
• Inverse element For ∀ a ∈ G, there exists an ele- Group generator For a given group G if there is an element g ∈ G such that for any a ∈ G there exists an unique integer i with a = g i then g will be called as a generator of G.
Order of a group The order of a group G often denoted as #G is the number of elements in the group G.
Cyclic group A group G will be cyclic if there exists at least one generator g ∈ G and it is denoted as G =< g >.
From the definition of cyclic group, it can be visualized that each element in a cyclic group can be generated with iterative operations of generator g which shown in the following Figure 1. Multiplicative group A cyclic group is called multiplicative if we tend to write its group operation in the same same way we do multiplication, that is

Field
A field < F , +, · > is a set that obeys two binary operations denoted by + and · , such that • F is a commutative group with respect to addition (+) having identity element 0.
Then F * will be called a commutative group with respect to multiplication (·), where every element should have multiplicative inverse in F * .
• For all a, b, c ∈ F the distributive law will be followed, i.e. a·(b+c) = a·b+a·c and (b+c)·a = b·a+c·a.
Sub field Let F 1 is a sub field of a field F . Then F 1 will be called a sub field if F 1 obeys the laws of field with respect to the field operation inherited from F . In addition, if F 1 F , then F 1 is a proper sub field of F . Prime field Let p be a prime. The ring of integers modulo p is a finite field of characteristic p having field order p denoted as F p is called a prime field. Extension field A subset F 0 of a field F that is itself a field under the operations of F will be called a sub field of F . In this case, F is called an extension field of F 0 . An extension field of a prime field F p can be represented as m-dimensional vector space that has m elements in F p . Let the vector space be the m-th extension field be denoted as F q M . The order of a extension field F q M is given as p m (here q = p m and M = m/m ).
In very brief, it can be said that a prime field (F p ) is a subset of sub extension field (F q ) and sub extension field F q is also a sub set of extension field F q M which shown in Figure 2.

Primitive Polynomial
Consider a polynomial f (x) of degree m over prime field F p . If it is not factorized into smaller degree polynomials over the prime field F p , it is called an irreducible polynomial. Consider the smallest number e such that x e − 1 is divisible by f (x) over F p , it is known that e becomes a factor of q M − 1. Then f (x) is especially called a primitive polynomial, when e is equal to q M − 1. Its zero ω belongs to the extension field F q M and it becomes a primitive element in F q M that generates every non-zero element in F q M as its power ω i (for i = 0, 1, 2, . . . , q M−2 ).

Trace Function
This work utilizes the trace function to map an element of the extension field X ∈ F q M to an element of the sub extension field x ∈ F q as, A crucial point, the above trace becomes an arbitrary element in F q and the trace function has a linearity property over the sub extension field F q as follows, where a, b ∈ F q and X, Y ∈ F q M .

Legendre Symbol
The Legendre symbol a / q 2 is used to check the quadratic residue for any arbitrary element a in F q . It is defined as, Here, QR and QNR stand for Quadratic Residue (QR) and Quadratic Non-Residue (QNR), respectively. Additionally, the non-zero element a is called the QR if it has a square root in F q , otherwise a is called the QNR. In this paper, the Legendre symbol is used for translating a vector sequence generated by the trace function over F q to a binary sequence. Above mentioned QR and QNR in F q holds the following important property. Non-zero elements are the roots of x q−1 − 1 in F * q over F q without any duplicates. Since it is factorized as follows: It is thus found that the number of QR's and QNR's in F * q are the same and it is given by (q − 1)/2. In addition, these numbers are important part in proving the theorem in the later section of this paper.

Dual Bases
The dual bases plays an important role in proving the theorem shown in this paper. It is defined as follows: Let F q M be a finite field and F q be a finite extension of F q M . Then the two bases A = {α 0 , α 1 , . . . , α m−1 } and B = {β 0 , β 1 , . . . , β m−1 } of F q over F q M are said to be the dual (or complementary) bases if

Binary Sequence and Its Properties
This paper introduces a binary sequence along with its period and distribution of bit patterns properties as follows.

Generation Procedure and Period
Let ω be a primitive element in the extension field F q M , where M = m/m , m be a composite number which denotes the extension degree of the primitive polynomial, and m be one of the factors of m. Then, by utilizing the trace function and Legendre symbol a binary sequence S is generated as follows: where i = (0, 1, 2, . . . , λ − 1, . . .), s i ∈ 0, 1 and f 2 (·) be a mapping function, which translates the 0, 1, and p − 1 values sequence generated by the Legendre symbol to a pseudo-random binary sequence. This mapping function is defined as follows: After observing many experimental results, the authors derive the equation for the period λ of the binary sequence as,

Distribution of Bit Patterns
From the viewpoint of security, the distribution of bit patterns is as important as the linear complexity. If a sequence holds the uniform distribution of bit patterns, then it becomes difficult to guess the next bit after observing the previous bit patterns. For example, let's assume a binary sequence having a period of 12 as S 12 = {1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0}. If we observe the 1-bit pattern in this sequence, then we can find that it has a uniform distribution of 1 and 0. In other words, 1 and 0 appear same in number. However, when we check 2-bit patterns on S 12 , we find that it only has two types of patterns (10 and 01). In this case, we can easily predict the next bit patterns after observing the previous patterns. For example, let us make a sub-sequence of S 12 as {1, 0, 1, 0, x 5 , x 6 }, we can easily guess x 5 and x 6 as x 5 = 1 and x 6 = 0. Therefore, it is also essential to evaluate the distribution of bit patterns of the sequence to confirm its randomness. In other words, the uniformity of the distribution contributes to the randomness from the viewpoint of unpredictability.

Distribution of Bit Patterns in Binary Sequence
In this section, we will introduce the bit distribution of binary sequence which generated over the sub extension field. In addition, bit distribution of Msequence and Legendre sequence is also introduced here. Throughout this section b (n) , Z(b (n) ) and D S λ (b (n) ) denotes a bit pattern of length n, number of 0's in b (n) , and number of appearance of b (n) in S λ , respectively.

Bit Distribution of M-sequence
The M-sequence [9] is generated by a linear recurrence relation over the finite field. M-sequence has a maximum period and uniform distribution of bit pattern except for the case of Z(b (n) ) = n but it has minimum linear complexity. Let, f (x) = x 4 + x + 1 be a primitive polynomial over F 2 , then using the linear recurrence relation a M-sequence of period 15 becomes as follows.
The distribution of n-bit pattern in (9) is shown in Table 1, here 1 ≤ n ≤ m. In the case of M-sequence, except the all-zero pattern, every pattern appears same in number. For example, when n = 3 all patterns appear 2 times (except 000 pattern). In other words, they are uniformly distributed. Every M-sequence has such good distribution of bit pattern feature.

Bit Distribution of LegendreSequence
Legendre sequence [7,8] is generated by applying the Legendre symbol over the odd characteristic field. Legendre sequence has a long period, high linear complexity, and the distribution of bit pattern is close to uniform. Let, p = 23, then the Legendre sequence of period 23 becomes as follows.

Bit Distribution of the Proposed Binary Sequence
Let S λ be a binary sequence of having a period of λ. Again, let b (n) , Z(b (n) ), and D S λ (b (n) ) denotes a bit pattern of length n, number of 0's in b (n) , and number of appearance of b (n) in S λ , respectively. Then, the distribution of bit patterns in the binary sequence which defined over the sub extension field can be given by the following theorem.
Let ω be a primitive element in the extension field F q M , where M = m/m , m be a composite number which denotes the extension degree of the primitive polynomial, and m be one of the factors of m. Then, utilizing the trace function and Legendre symbol one period of a binary sequence is generated as follows. (12) Here λ be the period of the sequence and it is given by the following equation as, At first, a primitive polynomial is used, then the trace value is calculated, then the Legendre symbol outputs zero, QR or QNR in F q , and finally the sequence coefficients s i is given by the mapping function f 2 (·). The authors of this paper observe the distribution of n-bit patterns in a binary sequence. It should be noted that here n satisfies 1 ≤ n ≤ (m/m ) relation. The distribution of n-bit patterns evaluated by observing the consecutive sequence coefficients (s i , s i+1 , . . . , s i+(n−1) ). Particularly, Tr ω i · ω 0 p , . By observing the above sequence coefficients, the distribution of bit patterns D S λ is determined by the following trace values.
Let A = {α 0 , α 1 , . . . , α m−1 } be a basis, ω be a primitive element and with this basis ω i is represented as, Again let B = {ω 0 , ω 1 , . . . , ω n−1 , β n , . . . , β m−1 } be a dual basis of A in F q over F q M . Then we also have Since A and B are dual bases to each other, then Tr ω i · ω t be calculated as follows.
Tr ω i · ω t = Tr Therefore, by using the dual basis, the distribution of bit patterns D S λ b (n) determined by the trace values becomes as follows.

Relation Between the Sequence Coefficients With the Trace Values and Legendre Symbol Calculation
Depending on the three different types of trace values (0, QR, and QNR), the Legendre symbol outputs three different values (0, 1, and p − 1), and finally the mapping function outputs 0 and 1 as sequence coefficients s i . This dependency between the trace and Legendre symbol is explained as follows.
According to the above table, the sequence coefficient 0 comes from the two cases: one is for the Tr (0) case and another one is for the QR in F * q case. To deal with this two cases uniquely, let us denote 0 and 0 for the first and second cases, respectively. In addition, 1 comes for QNR in F * q case. Thus the above table can be further modified as follows.
To distinguish the appearance of 0, this paper uses the notation 0, when zero comes from Tr (0) and 0 when zero comes from QR. Let the number of 0 be denoted by u and T u,n denotes the number of bit patterns including u times 0 and Z(b (n) ) − u times 0. Thus, T n can be considered as, In the following section, the distribution of bit patterns in the binary sequence defined over the sub extension field theoretically proven.

Proof of (11a)
The period of the binary sequence is given by the following equation as, After rewriting the above equation we obtain, To observe the distribution of bit patterns, the above relation becomes as follows.
Thus, we must consider two cases of the sequence length such as S q M −1 and S λ . Hence, we will observe the distribution of bit patterns in S q M −1 as D S q M −1 b (n) and S λ as D S λ b (n) .
In the previous section, we explained that n-bit patterns can be considered as b (n) = a i,0 , a i,1 , . . . , a i,n−1 . On the other hand, the remaining (m − (n · m ))-bit patterns are composed of a i,nm , a i,nm +1 , . . . , a i,m−1 coefficients of ω i , which is given by the (16). In addition, the number of combinations of a i,nm , a i,nm +1 , . . . , a i,m−1 becomes q M−nm . It should be noted that here ω i represents all of the non-zero coefficients in the extension field F q M .
As mentioned previously, when the trace value is equal to 0 or QR, then the sequence coefficients becomes 0 and 0, respectively. In addition, if the trace value is equal to QNR, then the sequence coefficients becomes 1. Additionally, u denotes the number of 0 in b (n) where 0 ≤ u ≤ Z b (n) from Tr (0), then the other 0's comes from Z b (n) − u QR's, and finally 1's comes from n − Z b (n) QNR's. Therefore, by separating 0, T u,n , and T n the combination of n-bit patterns can be given as follows.
Furthermore, T n can be derived as, According to the above equation, T n can be calculated by Z b (n) . In addition, there are n C Z(b (n) ) possible bit patterns that have the same Z b (n) . To calculate the D S q M −1 b (n) for each b (n) , T n needs to be divided by n C Z(b (n) ) .
The above equation can be further modified as follows.
Thus, (24) becomes as follows: By using the bilinear theorem, the above equation can be rewritten as, From the (21), D S λ b (n) holds the following relation as follows, www.astesj.com Therefore, using the (27), D S λ b (n) can be given by the following relation as, Thus, the first part of the (11a) is proven.

Proof of (11b)
Let us consider the case that Z b (n) = n. Therefore, the combination of n-bit patterns except the all-zero patterns is given as follows: Thus, the distribution of all-zero patterns becomes Thus, the second part of the (11b) is proven. In addition, the theorem in (11) is also proven.

Result and Discussion
This section explains the distribution of bit patterns in the binary sequence which generated over the sub extension field based on some experimental results. Then, a comparison between the binary sequence defined over the sub extension field and our previous geometric sequence [13] also introduces in terms of the distribution of bit patterns property. Here, H wt denotes the hamming weight.

Experimental Results
Let us consider the distribution of bit patterns in the binary sequence, introduced in this paper which generated over the sub extension field in the following examples.
Example 1 Let p = 5, m = 4, and m = 2, then the sequence having a period of 52 becomes as follows its distribution of n-bit patterns is shown in Table 5. 10110010111110010100001100}.
(32)  Example 2 Let p = 3, m = 6, and m = 2, then the sequence having a period of 182 becomes as follows its distribution of n-bit patterns is shown in Table 6.  Example 3 Let p = 7, m = 9, and m = 3, then the sequence having a period of 235986 becomes as follows its distribution of n-bit patterns is shown in Table 7.

Observation
It was found that the experimental results explicitly support the (11). In addition, the number of appearance of each bit pattern is related to the number of zeros contained in each bit pattern. Moreover, www.astesj.com increases in proportion to Z(b (n) ). To confirm this, let us check the Example 2 with n = 3.

Comparison With Our Previous Work
By combining the features of the M-sequence and Legendre sequence our previous work [13] proposed a geometric sequence, namely NTU (Nogami-Tada-Uehara) sequence. According to our previous research work, NTU sequence always holds long period, low correlation, high linear complexity properties which are the important considerations to use any sequence in cryptographic applications. Another crucial consideration before utilizing them in any secure applications, is to judge the randomness of a sequence. To do so, we need to evaluate the distribution of bit patterns property in a sequence. After the experimental observation, it was found that in terms of distribution of bit patterns NTU sequence is not uniformly distributed. In other words, in case of binary NTU sequence, there is much difference in appearance between the 0 and 1. To improve this drawback, instead of prime field (which used in the NTU sequence generation procedure), we focused on the sub extension field during the sequence generation procedure in this research work. As a result, after utilizing the sub extension field, the distribution of bit patterns becomes close to uniform. This comparison is shown in the following tables (Table 8 and Table 9). It should be noted that the NTU sequence is controlled by 2 parameters (p and m), on the other hand the sequence over the sub extension field is controlled by 3 parameters (p, m, and m ). Therefore, it is not possible to make the comparison between these two sequences in terms of the same length (in other words, the same period λ). The authors kept the difference as minimum as possible.
One of the most notable outcomes of this comparison result is the NTU sequence holds higher difference in terms of the appearance between the 'all zero' and 'all one' patterns. In other words, it also confirms the ununiform distribution of bit patterns. On the other hand, sequence defined over the sub extension field minimizes this difference to make it close to uniform. This comparison graphically shown in Figure 3. Table 8: Comparison in bit distribution between the sub field binary sequence and NTU sequence -I. n H wt (b (n) ) D S 182 (b (n) ) % D N T U 242 (b (n) ) %  Table 9: Comparison in bit distribution between the sub field binary sequence and NTU sequence -II.

Conclusion
In this paper, the authors observed the distribution of bit patterns in a binary sequence which defined over the sub extension. The number of appearances is related to the number of zeros contained in each bit pattern. Furthermore, the authors theoretically prove the distribution of bit patterns property. In addition, they also made a comparison between the binary sequence defined over the sub extension field and our previous work on binary sequence based on distribution of bit patterns property. According to the comparison results, the binary sequence generated over the sub extension field holds much better (close to uniform) compared to our previous binary sequence. As a future work, we would like to consider an efficient implementation to enhance the usability of our proposed sequence a Cryptographically Secure Pseudo Random Number Generator (CSPRNG).