Encoding of Terms in EMB-Based Mealy FSMs

: A method is proposed targeting implementation of FPGA-based Mealy ﬁnite state machines. The main goal of the method is a reduction for the number of look-up table (LUT) elements and their levels in FSM logic circuits. To do it, it is necessary to eliminate the direct dependence of input memory functions and FSM output functions on FSM inputs and state variables. The method is based on encoding of the terms corresponding to rows of direct structure tables. In such an approach, only terms depend on FSM inputs and state variables. Other functions depend on variables representing terms. The method belongs to the group of the methods of structural decomposition. The set of terms is divided by classes such that each class corresponds to a single-level LUT-based circuit. An embedded memory block (EMB) generates codes of both classes and terms as elements of these classes. The mutual using LUTs and EMB allows diminishing chip area occupied by FSM circuit (as compared to its LUT-based counterpart). The simple sequential algorithm is proposed for ﬁnding the partition of the set of terms by a determined number of classes. The method is based on representation of an FSM by a state transition table. However, it can be used for any known form of FSM speciﬁcation. The example of synthesis is shown. The efﬁciency of the proposed method was investigated using a library of standard benchmarks. We compared the proposed with some other known design methods. The investigations show that the proposed method gives better results than other discussed methods. It allows the obtaining of FSM circuits with three levels of logic and regular interconnections.


Introduction
The model of Mealy finite state machine (FSM) is used very often in the process of designing control units of modern digital systems [1][2][3]. There are many problems connected with optimization of characteristics of control units [4,5]. One of the most important problems is a problem of hardware reduction [6,7].
In this article, we propose a method of synthesis leading an FSM circuit to implemented as a network of EMBs and LUTs. The method is based on the structural decomposition [17] of FSM circuit.

Background of Mealy FSMs
The logic circuit of Mealy FSM is represented by the following systems of Boolean functions [1]: (2) Y = Y(T, X).
In (2) and (3), there are the following sets: Φ = {D 1 , . . . D R } is a set of input memory functions, T = {T 1 , . . . T R } is a set of state variables, X = {x 1 , . . . x L } is a set of input variables, Y = {y 1 , . . . y N } is a set of output functions. To find systems (2) and (3), it is necessary to specify a behaviour of FSM. In this article, we use a state transition table (STT) to represent a Mealy FSM. An STT contains information about the transitions between internal states a m ∈ A, where A = {a 1 , . . . a M } is a set of states [8]. There are the following columns in an STT: a m is a current state; a s is a state of transition; X h is a conjunction of input variables (or their complements) determining the transition a m , a s ; Y h is a collection of output functions (COF) generated during the transition a m , a s ; h is a number of transition (h ∈ {1, . . . , H}). For example, consider some Mealy FSM S 1 represented by STT ( Table 1).
The following sets and their parameters could be derived from Table 1: A = {a 1 , . . . , a 12 }, M = 12, X = {x 1 , . . . , x 7 }, L = 7, Y = {y 1 , . . . , y 11 }, N = 11. There are H = 20 rows in Table 1. To find the sets Φ and T, it is necessary to encode the states a m ∈ A by binary codes K(a m ) with R bits. It is a step of state assignment [8]. Let us use minimum number of state variables when there is In the discussed case, there is R = 4. It gives the sets T = {T 1 , . . . , T 4 } and Φ = {D 1 , . . . , D 4 }. As follows from the set Φ, we use D flip-flops to implement the register(RG). To get functions (2) and (3), it is necessary to turn an STT into a direct structure table (DST) [1] of Mealy FSM. To do it, we should add three columns into an STT, namely: K(a m ) is a code of current state; K(a s ) is a code of state of transition; Φ h is a collection of input memory functions equal to 1 to replace K(a m ) by K(a s ).
Each row of DST corresponds to a product term F h (h ∈ {1, . . . , H}). The term F h is the following conjunction: The first member of (5) is a conjunction A m of state variables corresponding to the code K(a m ) of the state a m ∈ A from the h-th row of DST. There are l mr ∈ {0, 1}, T 0 r =T r , T 1 r = T r (r ∈ {1, · · · , R}). The symbol l mr stands for the value of the r-th bit of K(a m ).
The functions (2) and (3) depend on terms (5). The system (2) determines a block of input memory functions (BIMF), the system (3) the block of output functions (BOF). State codes are kept into RG. It determines a Mealy FSM U 1 (Figure 1). The pulse Start loads the code K(a 1 ) of the initial state a 1 ∈ A into RG. The pulse Clock allows changing the content of RG.

Implementing Mealy FSMs with FPGAs
Each block of FSM U 1 could be implemented using either LUTs or EMBs. We name the block of LUTs as LUTer, the block of EMBs as EMBer. In the simplest case, we have a LUT-based FSM U 1 (Figure 2). Let an FSM circuit be represented by I Boolean functions. There is I = R + N in the case U 1 . Let the following condition take place: In (6), the symbol L( f i ) stands for the number of literals in a SOP of f i . In this case, there are exactly I LUTs in the circuit of U 1 . If the condition (6) is violated, then some functions should be decomposed. To do it, the different methods of functional decomposition are used [18][19][20]. It leads to multi-level circuits with complex interconnections. The multi-level circuits of LUTers consume more energy and have less performance than their single-level counterparts.
It is very important to use EMBs in FSM design. It decreases the chip area occupied by FSM circuit, as well as the number of interconnections [21][22][23]. In turn, it results in decreasing for both the power consumption and propagation time (as compared to LUT-based counterparts). Because of it, there is a lot of EMB-based methods of Mealy FSMs synthesis [10,16].
Let the following condition take place: In this case, it is enough a single EMB to implement the circuit of U 1 . It leads to FSM U 2 ( Figure 3). If condition (7) is violated, then EMBer is implemented as a network of EMBs. It has sense till the following conditions take places: If condition (8) is violated, then some methods of structural decomposition [16,17] could be used to diminish the values of L( f i ).
As a rule, the method of replacement of input variables is used [1,10]. In this case, the variables x l ∈ X are replaced by variables p g ∈ P = {p 1 , . . . , p G }. In many practical cases, there is G ≤ 3 [2]. Our analysis of standard benchmarks [16] justifies this statement. In this case, three following SBFs represent the FSM circuit: P = P(T, X); (10) As a rule, the system (10) is implemented by LUTs [10,21]. The systems (11) and (12) are implemented by EMBs. It leads to Mealy FSM U 3 ( Figure 4). To find the system (10) it is necessary: (1) to construct the set P; (2) to execute the replacement of X → P; (3) to encode the states and (4) to construct the table of LUTerP. To find the systems (11) and (12), it is necessary to transform the initial DST of U 1 . The transformation is reduced to: (1) the replacement x l ∈ X by p g ∈ P and (2) the replacement of the column X h by the column P h .
Let us use the symbol U i (S j ) to show that the model U i is used to synthesize an FSM circuit starting from the STT of FSM S j . Let us find the system (10) for FSM U 3 (S 1 ).
As follows from Table 1, there are transitions depended on a single variable x l ∈ X or two variables. Therefore, there is G = 2. It gives P = {p 1 , p 2 }. There is M = 12, R = 4. Let us encode the states of S 1 in the trivial way: K(a 1 ) = 0000, . . . , K(a 12 ) = 1011. The replacement X → P is represented by Table 2. It is constructed using the rules [1].
After minimizing, we can find the following equations: Obviously, a proper state assignment could diminish the number of arguments in functions (10). These methods are discussed in [1,10].
Let the following condition take place: In this case, it is enough a single EMB to implement the circuit of EMBer of FSM U 3 . There are other methods of structural decomposition [10]. For example, there are such methods as: (1) the encoding of collections of output functions; (2) the encoding of terms of DST; (3) the transformation of object codes. In this article, we discuss the using the encoding of terms in EMB-based Mealy FSMs. This method was used in FSMs implemented with programmable logic arrays [1]. It has never been used in FPGA-based design.
Let us explain this approach. Let us encode a term F h by a binary code K(F h ) with R H bits, where Let us use variables z r ∈ Z for the encoding, where |Z| = R H . Let us construct the following SBFs: Let the following condition take place: Let the condition (7) is violated. In this case, we propose the FSM U 4 ( Figure 5). In this FSM, the EMB implements the system (16), the LUTerPhi the system (17) and the LUTerY the system (18). Let the following condition take place: In this case, there are R + N LUTs in the FSM circuit. Both LUTers have only a single level of LUTs. However, if the condition (20) is violated, it is necessary to use the functional decomposition of functions (17) and (18). In this article, we discuss a case when the condition (20) is violated. Also, we discuss the additional condition: we could use only a single EMB. This restriction could be connected with the fact that other EMBs are taken for implementing other parts of a digital system.
As a rule, it is very important to choose the state codes leading to minimizing the values of L( f i ) [8]. There are a lot of methods of state assignment targeting FPGA-based design [17][18][19][20][21]24,25]. There is an opinion that JEDI [8] is the best of them [4]. But in the case of U 4 there is no influence of state codes on the hardware amount. Therefore, we do not analyze the state assignment methods in this article.

Main Idea of Proposed Method
Let a Mealy FSM be represented by an STT with H rows. Let us possess only a single EMB to implement the FSM circuit. Let us have FPGA chip with LUTs with S L inputs. Let the terms F h (h ∈ {1, . . . , F H }) form a set F = {F 1 , . . . , F H }. Let us use the encoding of terms F h ∈ F to reduce the number of LUTs in the FSM circuit.
Let us find the value of K for given STT and value of S L , where: Let us discuss a case, when K > 1. It means that R H > S L . Therefore, both LUTerΦ and LUTerY of U 4 are represented by multi-level circuits.
In this article, we propose a method allowing: (1) to diminish the number of LUTs in comparison with equivalent FSM U 4 and (2) to regularize the interconnections. The method is based on dividing the initial STT by K sub-tables with up to 2 S L rows. Let us illustrate this method using the STT of S 1 ( Table 1).
Let us use an EMB such that the condition (7) is violated for S 1 . Let the EMB have the configuration S A , t F such that the following conditions are true: The condition (22) shows that it is enough a single EMB to implement SBF (16). The condition (23) shows that it is not possible to implement an FSM circuit using a single EMB.
Let us find a partition Π F = {F 1 , . . . , F K } of the set F such that the following condition takes place: Let it be H k elements in the set F k . The value of R k is determined as: Each class F k ∈ Π F determines sets Y k ⊆ Y and A k ⊆ A. The set A k includes states of transition written in the rows of STT corresponding to the class F k ∈ Π F . The set Y k includes output functions written in the rows of STT corresponding to the F k ∈ Π F . Let us find such a partition Π F that In (26) and (27) , there is i = j and i, j ∈ {1, . . . K}.
Let us encode the term F h ∈ F k by a binary code C(F h ) with R k bits. Let us use variables z r ∈ Z for the encoding. These variables are the same for all classes F k ∈ Π F . To distinguish the classes, let us encode classes F k ∈ Π F by binary codes C(F k ) with R C bits: Let us use the variables v r ∈ V to encode the classes, where |V| = R c . Now, the code K(F h ) is represented as where * is a sign of concatenation. Of course there is Let the following condition take place: In this case, some functions D r ∈ Φ and y n ∈ Y could be implemented by EMB. Let they form sets Φ E and Y E , respectively. Therefore, LUTs should be used for implementing the remained functions.
Using these preliminaries, we propose the model of Mealy FSM U 5 ( Figure 6). In FSM U 5 , the EMB generates functions (16) and the following SBFs: The LUTerk (k ∈ {1, . . . , K}) generates functions: The LUTerΦY implements functions D r ∈ Φ L and y n ∈ Y L where In (36) and (37) the superscript k means that the corresponding function is generated by LUTerk. The C rk (C nk ) is a Boolean variable equal 1 if and only if D r ∈ Φ k L , y n ∈ Y k L . Also, functions D r ∈ Φ E enter LUTerΦY. Each function requires a flip-flop, so it uses a single LUT. The symbol V k stands for the conjunction corresponding to C(F k ): In (38), l kr is a value of the r-th bit of (26) and (27) take places, the number of LUTs in LUTer1-LUTerK is minimized.
Assuming that a Mealy FSM S is represented by an STT, we propose the following design method for FSM U 5 : 1. Creating the partition Π F corresponding to (26) and (27) The number of LUTs in U 5 are mostly determined by the partition Π F . Let us discuss how to find the partition Π F .

Constructing Partition of the Set of Terms
The problem is formulated as the following. It is necessary to find the partition Π F with K blocks such that relations (26) and (27) take places. The value of K is determined by (21).
In this article, we propose a simple sequential algorithm for solution of this problem. We characterize each term F h ∈ F by two sets. The set Y(F h ) ⊆ Y includes output functions written in the h-th row of STT. The set A(F h ) ⊂ A includes a state of transition a s ∈ A from the h-th row of STT. If F h ∈ F k , then y n ∈ Y k and a s ∈ A k . Of course, the set Φ k is determined by the codes K(a s ) of states a s ∈ A k .
We use two evaluations in this algorithm. The evaluation N(F h , Y k ) determines how many new output functions will be added to Y k due to including F h into F k . We determine these evaluations as the following: There are ∆ Z insignificant assignments of variables z r ∈ Z: They could be used for minimizing function (34) and (35). We propose to distribute terms evenly among K groups. It corresponds to the vector ∆ = ∆ 1 , ∆ 2 , . . . , ∆ K . Therefore, each class F k ∈ Π F includes H k elements, where: There are two stages in generating each block F k ∈ Π F . Let k − 1 blocks be constructed. At the first stage, we should choose the basic element (BE) F h ∈ F * , where there is F * = F \ {F 1 ∪, . . . , ∪F k−1 }. The term F h is a BE of F k if it satisfies to the following relation: If the condition (43) is true for terms F i and F j , the we choose the term F j where i < j.
The second stage has H k − 1 steps. At each step, we should choose the next element of F k . To do it, we use the following approach. Let us form a set P( If more than a single term satisfies to (44), then we should choose the term with the following property: If there are several terms with the property (45), we choose a term with the less value of h. Next, we should make P(F k ) = ∅ and eliminate the term F h from F * .
The constructing F k is terminated if: (1) all terms are already distributed (F * = ∅) or (2) there are H k elements in F k ∈ Π F . Let us discuss an example of creating the partition Π F for Mealy FSM S 1 . Let it be S L = 3. Using (21) gives K = 3. Using (41) gives ∆ Z = 24 − 20 = 4. Let us form the vector ∆ = 2, 1, 1 . It gives The process is shown in Table 3.
Let us explain columns of Table 3. There are terms F h in the column h. The column N(F h ) contains the numbers of output functions in terms F h . There are basic elements of F 1 and F 2 shown in columns BE1 and BE2, respectively. The symbol I stands for (39), the symbol I I for (40). The sign ⊕ means that a particular term is chosen as a basic element. The sign "-" means than F h / ∈ F * . The sign "+" means that the corresponding term is included into the class F k . There are terms F h ∈ F k in the row F k . They are shown in the order of their selection. There are output functions y n ∈ Y k in the row Y k , the states a s ∈ A k in the row A k . We determine the evaluation (40) only for terms with equal values of (39).

Example of Synthesis
In Section 5, we found the partition Π F for the discussed example. Let us use an EMB including the configuration 11, 7 . Therefore, there is S A = 11 and t F = 7. There is L + R = 11 for FSM S 1 . The condition (22) takes place. There is H = 20 and S L = 3. Using (21) gives K = 3: so, there is R C = 2 and V = {v 1 , v 2 } obviously, R 1 = R 2 = R 3 = 3. Also, there is R H = 5. Because N + R = 15, the condition (23) takes place. Therefore, it is possible to use the model U 5 for FSM S 1 . Therefore, let us design the FSM U 5 (S 1 ).
Let us execute the state assignment allowing a reduction to the numbers of elements in the sets Φ k ⊆ Φ. One of the possible solutions is shown in Figure 7.
Using Figure 7 and sets A 1 − A 3 gives the sets Φ 1 − Φ 3 . They are the following: Using Table 1 and codes form Figure 7, we can construct the direct structure table of FSM U 5 (S 1 ). It is Table 4. To construct the transformed DST, it is necessary to find codes C(F h ) and C(F k ).  Let us construct the sets Y E , φ E , Y L and Φ L . To do it, we should find the value of ∆ t . There are R H = 5 and t F = 7. Using (30) gives ∆ t = 2. We should eliminate functions D r ∈ Φ and y n ∈ Y which belong to K corresponding sets. In the discuss case, there is D 2 , Let us construct the systems of Boolean functions shoving dependence of functions D k r ∈ Φ k L and y k n ∈ Y k L on the terms F h ∈ F k (k ∈ {1, . . . , K}). To do it, we use the DST (Table 4) and classes F k ∈ Π F . We could find the following systems: Let us encode the terms F h ∈ F k in such a manner that there is minimum number of literals in systems (46) and (48). We could get codes shown in Figure 8.  Using the system (46) and Karnaugh map (Figure 8a), we could form the following system: The system (49) represents the circuit of LUTer1. It includes 4 LUTs and has 9 interconnections with the EMB.
Using the system (47) and Karnaugh map (Figure 8b), we could form the following system: The system (50) represents the circuit of LUTer2. It includes 4 LUTs and has 9 interconnections with the EMB.
Using the system (49) and Karnaugh map (Figure 8c), we could form the following system: The system (51) represents the circuit of LUTer3. It includes 5 LUTs and has 10 interconnections with the EMB.
As follows from the system (52), it is necessary to transform the equations for D 1 and D 3 . But we can escape it using the following approach. There is L(D 2 1 ) = 2. Let us multiply it by v 2 . It gives D 1 = v 2 (z 2 ∨ z 2z3 ). Now, we could represent D 1 as D 1 = D 2 1 ∨ v 1 D 3 1 with L(D 1 ) = 3. So, now it is enough a single LUT for implementing the function D 1 . The same could be done for y 1 . But it is necessary to apply the rules of functional decomposition for functions D 3 and y 9 . For example, there are two LUTs in the circuit for D 3 (Figure 9). Figure 9. Implementing the function D 3 .

The equation D 3 is represented as
The equation for y 9 will be the following: f 2 ∨ v 1 y 3 9 . Here f 2 =v 1 v 2 y 1 9 . Therefore, there are two LUTs in the circuit of y 9 . To find the systems (16) and (31), it is necessary to transform the DST of Mealy FSM U 5 . To transform the DST, it is necessary to delete the column a s , K(a s ), Y h and Φ h . They are replaced by the following columns: C(F k ), C(F h ), V h , Z h , Y Eh and Φ E h . The column V h includes the variables v r ∈ V equal to 1 in the code C(F k ) from the h-th row of transformed DST. The column Z h includes the variables z r ∈ Z equal 1 in the code C(F h ) of the term F h h ∈ {1, . . . , H}. The column Y Eh includes the functions y n ∈ Y E generated during the h-th transition of FSM. The column Φ Eh contains the variables D r ∈ Φ E equal to 1 in the h-row of initial DST. In the discussed case, there is Y E = ∅ . So, the column Y Eh is absent in the transformed table of Mealy FSM U 5 (S 1 ) (Table 5).
To implement the functions Z(T, X), V(T, X), Φ E T, X and Y E (T, X), it is necessary to construct the table of EMB. It contains the following columns: K(a m ), X, Z, V, Φ E , Y E , q. The addresses of cells are determined by concatenations of K(a m ) and X. The table includes H E rows:

Experimental Results
To investigate the efficiency of proposed method, we use standard benchmarks from the library [11]. The library includes 48 benchmarks taken from the design practice. The benchmarks are rather simple, but they are very often used by different studies to compare new and known results [26]. The benchmarks are represented in KISS2 format. The characteristics of benchmarks are shown in Table 7.
We used our CAD tool K2F [26] to translate KISS2 -based files into VHDL-based FSM models. Next, the Active-HDL environment was used to synthesize and simulate FSMs. To get FSM circuits, we used Xilinx CAD tool Vivado [27]. The FPGA chip XC7VX690TFFG1761-2 by Vertex-7 [28] was used as a target platform. The chip includes LUTs with 6 inputs and EMBs with configurations from 15, 1 till 9, 64 . We presume that only a single EMB is available to implement an FSM circuit. As follows from Table 7, the condition (7) takes place for 33 benchmark FSMs (it is around 68% from all benchmarks). Therefore, it is necessary only a single EMB to implement an FSM circuit for these benchmarks. We mark this situation by the sign "+" in the column "EMB" of Table 7. Also, we show in this column pairs S A , t F corresponding to the configuration required to implement the circuit with a single EMB. The further research was conducted for these 15 benchmarks.
Three discussed methods (U 1 , U 3 and U 4 ) were taken to compare with our approach (U 5 ). The results are shown in Table 8 (the number of LUTs in FSM circuits), Table 9 (the operating frequency) and Table 10 (the consumed energy). To design FSM U 1 , a single EMB was used to implement a part of FSM circuit. We do not know which part of a circuit was implemented as an EMB. It is up to Vivado and cannot be directly specified by a designer.
Tables 8-10 are organized in the same order. The rows are marked by the names of benchmarks, the columns by design methods. The rows "Total" include results of summation for values from corresponding columns. The summarized characteristics of U 5 -based FSMs are taken as 100%. The rows "Percentage" show the percentage of summarized characteristics respectively to U 5 -based benchmarks. To design all circuits, we use the mode AUTO of Vivado.
As follows from Table 8, the U 5 -based FSMs require fewer LUTs than their counterparts based on other FSM models. There is the following economy: (1) 23% regarding U 1 ; (2) 4% regarding U 3 ; (3) 45% regarding U 4 . Therefore, for these benchmarks the U 4 -based FSMs require the largest number of LUTs. It is connected with the fact that the condition (20) is violated for all considered U 4 -based benchmarks. It results in multi-level circuits implementing functions (17) and (18).   As follows from Table 9, the U 5 -based FSMs have the highest operating frequency as compared to other investigated FSMs. We think that this is due to the smaller number of logic levels and inter-level connections compared to other investigated FSMs. But we cannot prove this statement because Vivado does not show these details about implemented circuits. There is the following gain in operating frequency: (1) 32.6% regarding U 1 ; (2) 44.3% regarding U 3 ; (3) 27.8% regarding U 4 . The lowest frequency takes place for U 3 -based FSMs. It is connected with rather big amount of inputs. Because L + R >> SL, the circuit of LUTerP is multi-level. For discussed benchmarks, the number of logic levels in U 3 -based FSMs is higher than it is for FSMs produced by other investigated methods.
As follows from Table 10, the U 5 -based FSMs consume less energy than their counterparts based on other FSM models. There is the following economy: (1) 39.8% regarding U 1 ; (2) 51% regarding U 3 ; (3) 11.4% regarding U 4 . It is connected with the fact that U 5 -based FSM circuits have fewer LUTs and, therefore, interconnections compared to other investigated FSMs. Interconnections are known to be responsible for up to 70% of energy losses in FPGA-based circuits [26]. The results shown in Table 10 include the total power value in Watts. It should be noted that the total power consists of individual powers such as: static power, I/O, signals, LUT as Logic, F7/F8 Muxes, BUFG, registers and others. Furthermore, the frequency has a very strong impact to the power consumption. Therefore, our approach produces better results for FSMs whose circuits cannot be implemented as a single EMB. Of course, this conclusion is true only for the benchmarks [11] and the device XC7VX690TFFG1761-2. It is almost impossible to make similar conclusion for the general case.

Conclusions
Contemporary FPGA devices include a lot of look-up table elements. It allows the implementation of very complex digital system using only a single chip. But LUTs have rather small amount of inputs (S L does not exceeds 6). This value is considered to be optimal [6]. Such a limitation leads multi-level circuits representing, for example, sequential blocks of digital systems. To design multi-level circuits, the methods of functional decomposition are used. But these blocks can be synthesized using different methods of structural decomposition. As our studies [26] show, the structural decomposition can lead to FSM circuits with better characteristics than their counterparts based on functional decomposition.
The aim of this article is a presentation of a novel method of logic synthesis targeting Mealy FSMs implemented with LUTs and a configurable EMB. It is the method of structural decomposition based on encoding of product terms of Boolean functions representing FSM logic circuits. The essence of our approach is a splitting of the set of terms in a way minimizing the number of LUTs in FSM circuits. The proposed method is technology depended because it takes into account the number of inputs of LUT elements.
The experiments conducted using the Xilinx CAD tool Vivado 2019.1 clearly show that the proposed approach leads to reduction for such values as the number of LUTs, propagation time and consumed energy in comparison with FSM circuits based on known methods of terms encoding.
There are three directions in our future research. The first is connected with development design methods targeting FPGA chips of Intel (Altera). The second direction is connected with using our approach in real devices such as PDMS micro-optofluidic chip [29,30]. The last direction targets sequential blocs represented by Moore FSMs.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: