High speed wide fan-in designs using clock controlled dual keeper domino logic circuits

A. Anita Angeline | V. S. Kanchana Bhaaskaran

School of Electronics Engineering, VIT University, Chennai, India.

Correspondence
V. S. Kanchana Bhaaskaran, School of Electronics Engineering, VIT University, Chennai, India. Email: vskanchana@ieee.org

Clock Controlled Dual keeper Domino logic structures (CCDD_1 and CCDD_2) for achieving a high-speed performance with low power consumption and a good noise margin are proposed in this paper. The keeper control circuit comprises an additional PMOS keeper transistor controlled by the clock and foot node voltage. This control mechanism offers abrupt conditional control of the keeper circuit and reduces the contention current, leading to high-speed performance. The keeper transistor arrangement also reduces the loop gain associated with the feedback circuitry. Hence, the circuits offer less delay variability. The design and simulation of various wide fan-in designs using 180 nm CMOS technology validates the proposed CCDD_1 and CCDD_2 designs, offering an increased speed performance of 7.2% and 8.5%, respectively, over a conventional domino logic structure. The noise gain margin analysis proves good robustness of the CCDD structures when compared with a conventional domino logic circuit configuration. A Monte Carlo simulation for 2,000 runs under statistical process variations demonstrates that the proposed CCDD circuits offer a significantly reduced delay variability factor.

KEYWORDS
dynamic circuits, domino logic, leakage power in dynamic circuits, contention current, dynamic circuits, noise gain margin, process variability

1 | INTRODUCTION

In the deep sub-micron regime (DSM), which aims at lower power consumption and a high-speed performance, among other types of circuit designs, a domino logic circuit design has a profound impact [1]. The comparative speed characteristics of a domino logic circuit style were realized through the use of a single pre-charge p type metal oxide semiconductor (PMOS) transistor as the pull-up network (PUN), and n type metal oxide semiconductor (NMOS) evaluation transistors in the pull-down network (PDN). The keeper transistor and a static complementary metal oxide semiconductor (CMOS) inverter form the additional functional circuit components. The keeper transistor in the domino logic circuit is used to replenish the charge degradation at the dynamic node. This charge degradation is predominantly due to noise, charge sharing among the neighboring nodes, the leakage current, power, and ground noise [1,2]. The upsizing of the keeper transistor helps in retaining the charge for a longer time even in the presence of internal and external noises. However, this results in an increased contention current because the keeper transistor tries to retain its HIGH logic even as the PDN tries to discharge. This leads to degradation of the evaluation speed characteristics. The static inverter at the output node of a domino logic circuit provides a non-inverted output with increased drive strength [3,4].

With the evolution of lower technology nodes operating at reduced supply and threshold voltages, the leakage current...
components have become major impeding factors. The sub-threshold leakage current $I_{\text{sub}}$ and the gate oxide leakage current $I_{\text{G}}$ have been identified as the major components of the leakage current in domino logic styles [2,3]. However, these components make a dynamic logic circuit highly sensitive. A small input noise ultimately leads to a decreased robustness of the circuit [4]. This issue is more dominant in wide fan-in domino circuits employed in tag comparators, multiplexers, register files, SRAM pre-decoder gates, and programmable encoders, among other devices. Such systems incur an exorbitant leakage power owing to the multiple leakage paths available to the ground. It should also be noted that the states of the clock and the input combinations do have a profound impact on the leakage mechanisms of the transistors or robustness even when striving for an increase in speed [5–7]. A PDN modification helps offer a higher speed of operation. Furthermore, reducing the static switching activity at the output node owing to the pre-charge and evaluating the operations decrease the dynamic power consumption.

Studies seeking solutions to issues related to an increased leakage current, the upsizing of the keeper transistor, the consequent contention current, an increased noise margin and resultant high delay, and charge sharing are on the rise, and various counteractive measures have been adopted [1–4]. Modifications in the keeper circuitry [8–16], the pull down structure (PDN) [3], or a combination of the two [15–19], have been presented in the literature. Static switching mechanisms have also been employed in domino logic circuits to reduce the transitions at the output node. This reduces the dynamic power dissipation and hence the total power consumption [20–23]. The modification of a domino logic circuit aims at improving the robustness and speed performance of the circuit [24–29].

The proposed clock Controlled Dual Keeper Domino logic structures (CCDD_1 and CCDD_2) comprise a modified keeper circuit enabled by a delayed strobing signal from the footer transistor circuit. The footer transistor is operated after a delay is introduced by a set of inverters, during which time the footer node accumulates the dc voltage, $V_{\text{foot}}$. The keeper control circuit is operated with input from the clock and the strobing voltage $V_{\text{foot}}$ in both the proposed CCDD_1 and CCDD_2 designs, which were designed to offer abrupt control of the keeper circuit. This in effect reduces the contention between the keeper device and the PDN during the commencement of the evaluation phase. In addition, the stacked keeper transistors and delayed enabling of the footer transistor prevent any direct discharge from the dynamic node, which essentially reduces the leakage power dissipation and additionally facilitates good noise robustness. A loop gain with reduced delay variability is also realized.

The validation of the proposed circuits is carried out through simulation using wide fan-in gates such as a multiplexer and tag comparator, which remain critical modules in the data paths of processor. To identify the optimal leakage state for the proposed circuits, a leakage current analysis was conducted for different input combinations. The statistical variation of the process parameters significantly affects both the delay and the delay deviations for lower technology nodes [30]. Hence, Monte Carlo simulations were performed to evaluate the delay variability for 2,000 runs on the CCDD_1 and CCDD_2 structures.

The remainder of this paper is organized as follows. Section 2 provides a review of previous studies, with a focus on existing domino logic styles. Section 3 describes the proposed circuit architecture and design methodology. Section 4 focuses on wide fan-in design applications using the proposed logic, and elaborates on the simulation results and a comparative performance analysis of the clock controlled dual keeper domino logic circuits against existing domino logic styles. Section 5 provides some concluding remarks regarding this research.

2 CONVENTIONAL DOMINO LOGIC

The PUN of a conventional domino logic style consists of a single pre-charge transistor $M_{\text{pre}}$ controlled by the clock signal. The PDN consists of evaluation transistors. Traditionally, two variants are available (a) With the footer transistor, as shown in Figure 1A, and (b) a footer-less structure, as shown in Figure 1B. Footer-less domino logic offers a high speed, whereas a footed domino offers a reduced leakage power [1,2]. The output of the dynamic node is fed to the static inverter, which in effect yields a non-inverted output. The static CMOS inverter drives the successive stages more efficiently with good driving strength. Both domino logic structures comprise a PMOS transistor ($M_{\text{K}}$) controlled by the static inverter output, OUT.

To discuss the operation in brief, consider Figure 1A. During the pre-charge phase, when the clock is LOW, the transistor $M_{\text{pre}}$ is ON, and the dynamic node is pre-charged to the supply voltage $V_{\text{DD}}$. This causes the static CMOS inverter output, OUT, to become LOW. When the clock is HIGH, and IN is applied to the PDN, the evaluation phase commences...
with the charge on the dynamic node retained or discharged depending on the TRUE/FALSE condition of the PDN.

When the dynamic node needs to retain its HIGH logic state for a longer duration during the evaluation phase, the charge at the dynamic output node may tend to be discharged owing to various leakage current paths available at the node, as well as due to the charge sharing across nodes [1]. Here, note that the leakage current is primarily due to the sub‐threshold and gate oxide leakage of the devices. This is overcome by the PMOS keeper circuit, which counteracts the leakage current issues from the PMOS keeper transistor, which operates when the output OUT is HIGH, and retains a charge at the dynamic node. However, when the inputs are TRUE, the PDN attempts to discharge, whereas the keeper transistor tries to retain the dynamic node at HIGH (contention), which degrades the speed of the evaluation. The upsizing of the keeper transistor improves the robustness, reducing the impact of the leakage current, although at the cost of a reduced speed and increased power consumption.

The various domino logic styles found in the literature focus on one or more of the following:

1. Reduced leakage current
2. Counteracting the leakage current
3. Faster switching at the output node
4. Increased noise margin
5. Low power consumption

These attributes are achieved through the modification or reengineering of the keeper circuit or PDN. The modification of the keeper circuit is focused on improving the robustness and reducing the evaluation delay. This is achieved by enabling the keeper after a small delay during the evaluation phase or through an abrupt control of the keeper transistor, or by having varied strength during the early and later evaluation phases. By modifying the PDN, the noise robustness is increased through a reduction in the leakage current, and a high-speed evaluation is facilitated. A brief introduction of the widely discussed circuit styles, summarized in Table 1, is presented below.

### 2.1 Modification of the keeper circuit

In this type of circuit, the keeper circuit intended for retaining the charge of the dynamic node is abruptly controlled, as detailed below.

#### 2.1.1 High-speed domino logic

The high-speed domino (HSD) logic circuit [8], shown in Figure 2A, comprises a buffer derived using appropriately sized static CMOS inverters, and a PMOS device \( M_{P1} \). During the evaluation phase, the keeper transistor \( M_K \) is initially kept OFF for the duration of the delay, which is incurred by the buffer circuit (\( D_1 \) and \( D_2 \)). Thus, during the onset of the evaluation phase, owing to the delay incurred, the contention between the PDN and the keeper transistor is reduced. After the delay, the transistor \( M_{N1} \) retains the charge at dynamic node during the rest of the evaluation phase.

#### 2.1.2 Controlled strong PMOS keeper circuit

As shown in Figure 2B, the controlled strong PMOS keeper circuit logic comprises an additional control circuit to operate the keeper transistor. During the evaluation phase, when any of the inputs is HIGH, the control circuit cuts off the keeper device \( M_K \) and the dynamic node is discharged through the PDN. However, a possible charge sharing constraint with the internal nodes of the control circuit during the evaluation phase exists in this type of logic.

#### 2.1.3 Grounded PMOS keeper technique

In this logic, a conventional keeper transistor is replaced with \( M_{K1} \) and \( M_{K2} \), as shown in Figure 2C, whose sum of the lengths make up a single keeper device, resulting in a loop

### Table 1: Existing structural modifications of domino logic circuit

<table>
<thead>
<tr>
<th>Domino logic circuit types</th>
<th>Modified keeper circuit</th>
<th>Modified PDN</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-speed domino logic [8]</td>
<td>Delayed enabling of keeper in evaluation phase</td>
<td>–</td>
</tr>
<tr>
<td>Controlled strong PMOS keeper [9]</td>
<td>Additional circuit for keeper control and strong keeper device</td>
<td>–</td>
</tr>
<tr>
<td>Grounded PMOS keeper [12]</td>
<td>A second PMOS keeper with gate grounded</td>
<td>–</td>
</tr>
<tr>
<td>HSCD domino logic [3]</td>
<td>–</td>
<td>1) Delayed enabling of footer 2) Additional NMOS discharge path</td>
</tr>
<tr>
<td>Conditional evaluation domino (CEDL) [3]</td>
<td>–</td>
<td>1) Delayed enabling of footer 2) Additional discharge path</td>
</tr>
</tbody>
</table>
gain even when maintaining the same aspect ratio of the original keeper device. Hence, the delay variability is reduced.

Previous discussions [8,9,12] have indicated the impact of the keeper circuitry on reducing the contention current and creating a high-speed circuit. However, in a high-speed domino logic circuit [8], during the early evaluation phase, the dynamic node is floating, leading to exorbitant power consumption. In the controlled strong PMOS keeper circuit, the generation of a keeper control signal with an additional PDN increases the power consumption and area. It should be noted that, in these domino structures, a high-speed performance is achieved through a silicon area penalty, which may result in increased power consumption.

2.2 | Modification in the PDN

A high-speed clock delayed domino (HSCD) and conditional evaluation domino logic (CEDL) [3], shown in Figure 3A and B, have the footer transistor \( M_{N1} \) enabled after a certain delay. This results in a reduction of the leakage power. The delayed enabling of the footer transistor develops an increased footer node voltage \( V_{\text{foot}} \) at the node, \( N_f \). Furthermore, an additional discharge path through \( M_{N2} \) controlled by the footer node \( V_{\text{foot}} \) or the output of the design. However, certain size constraints need to be maintained for a correct evaluation, which creates a significant challenge when designing wide fan-in gates.

Thus, it was observed that a precise control of the keeper circuit [9,12], and the incorporation of an additional discharge path, lead to an improvement in speed [3,5]. Retaining the dynamic node at HIGH for a prolonged duration is carried out by upsizing the keeper transistor, and/or by accurately controlling the keeper transistor [9–14]. Trade-offs do exist in the designs in terms of the power consumption, speed, and robustness, which require an enhanced design. Thus, there is an explicit need for low-leakage power dissipation, high-speed, and robust domino circuits, which has led to the design of novel domino logic circuit design topologies.

3 | PROPOSED CLOCK CONTROLLED DUAL KEEPER DOMINO LOGIC CIRCUITS

Clock controlled dual keeper domino logic structures (CCDD_1 and CCDD_2) reduce the contention current, which accounts for the increased delay. The proposed CCDD domino logic circuits offer high-speed operation owing to the modifications listed in Table 2.

The inclusion of an additional keeper transistor and the abrupt conditional control of the keeper transistor facilitate an easy discharge of the dynamic node, and thus a high-speed performance is achieved. The proposed principle of abrupt conditional control of the additional keeper transistor is achieved using two circuit configurations, as shown in Figures 4 and 5. The circuit configurations are discussed in the following sub-sections.

3.1 | Clock controlled dual keeper domino with AND keeper control (CCDD_1)

The proposed CCDD_1 design using AND keeper control is shown in Figure 4. It consists of an additional PMOS keeper
transistor $M_{K2}$ along with $M_{K1}$. The transistor $M_{K2}$ is controlled using the clock and footer node voltage $V_{foot}$.

During the pre-charge phase, with the clock $CLK$ set to LOW, $M_{pre}$ is enabled, and the dynamic node charges to $V_{DD}$. During the evaluation phase, with clock $CLK$ HIGH, the PDN is set to evaluate. However, the delay $2 \times T_{P_{inv}}$ is set by the inverters $I_1$ and $I_2$, which enables the evaluation phase by operating $M_{N1}$ only after $2 \times T_{P_{inv}}$. The sizing of both the footer devices and the devices in the PDN determine $V_{foot}$, which is set to slightly above $V_{th}$ of the NMOS device.

When this foot node voltage $V_{foot}$ and clock $CLK$ are both HIGH, the output of the AND gate becomes HIGH, placing the additional keeper $M_{K2}$ in its cut-off region of operation. Hence, during the initial $2 \times T_{P_{inv}}$ time of the evaluation phase, the PDN is deactivated and the dynamic

**TABLE 2** Structural modifications in clock controlled dual keeper domino circuits

<table>
<thead>
<tr>
<th>Proposed domino styles</th>
<th>Keeper circuit modification</th>
<th>PDN modification</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clock controlled dual keeper domino with AND keeper control (CCDD_1)</td>
<td>Additional keeper control through AND gate with clock and $V_{foot}$</td>
<td>Delayed enabling of footer transistor</td>
</tr>
<tr>
<td>Clock controlled dual keeper domino logic with T-gate keeper control (CCDD_2)</td>
<td>Additional keeper control through transmission gate with clock and $V_{foot}$</td>
<td>Delayed enabling of footer transistor</td>
</tr>
</tbody>
</table>

**FIGURE 3** (A) HSCD domino logic [3] and (B) conditional evaluation domino logic (CEDL) [3]

**FIGURE 4** Clock controlled dual keeper domino logic with AND keeper control (CCDD_1) (Proposal 1)
3.3 | CCDD_1 and CCDD_2 modeling

In CCDD structures, the delayed enabling of the footer transistor and the control mechanism employing two keeper transistors \(M_{K1}\) and \(M_{K2}\) play major roles in reducing the delay and the delay variability. The delayed clocking reduces the contention between the keeper transistor and the PDN. In addition, the foot node voltage \(V_{\text{foot}}\) plays a vital role in triggering the AND/T-gate keeper control mechanisms.

However, certain constraints should be applied to ensure that the inclusion of the additional circuit does not impact the delay metrics. The enabling of the delayed footer and the delayed evaluation phase clock at the footer gate create a dc voltage \(V_{\text{foot}}\) at the drain of footer. The value of \(V_{\text{foot}}\) must be at least equal to the threshold voltage \(V_{\text{th,AND}}\) of the NMOS, as given by

\[
V_{\text{foot}} \geq V_{\text{th,AND}}. \tag{1}
\]

The delay \(T_{\text{buf}}\) incurred by the buffer circuit equals the delay incurred by the two inverters \(2 \times T_{p_{\text{inv}}}\), where \(T_{p_{\text{inv}}}\) is the delay of a single inverter. Here, \(T_{p_{\text{inv}}}\) is proportional to the \(W/L\) ratio of the devices such as \(T_{p_{\text{inv}}} = \alpha(1/(W/L))\). During the delay interval, certain conditions, namely, a PDN of TRUE, \(V_{\text{foot}}\) of HIGH, and \(CLK\) of HIGH, generate a logic of HIGH at the output of the AND gate by the keeper control circuit, and force \(M_{K2}\) to be cut off. This prevents the dynamic node from being charged to \(V_{\text{DD}}\) and is the reason for the reduced contention. After a delay equal to \(2 \times T_{p_{\text{inv}}}\), with a PDN of TRUE, the discharge from the dynamic node is facilitated owing to less contention because the keeper circuit is now disabled. The extrinsic capacitance offered by the footer device on the buffer circuit is also considered in determining the delay exerted by the buffer circuit. It should be noted that \(V_{\text{foot}}\) acts as the source biasing for the PDN transistors, which increases the threshold voltage of the PDN transistors owing to the stacking effect. Hence, the upsizing of the footer imparts \(V_{\text{foot}}\) because it is influenced by the decrease in \(V_{\text{th}}\) across the footer. Hence, the appropriate \(W/L\) ratio of the inverters in the buffer circuit and the footer device determines a delay at the beginning of the evaluation phase. It should also be noted that this in effect leads to an increase in the threshold voltage \(V_{\text{th}}\), as indicated in (3), and reduces the leakage current, as shown in (2) and (3):

\[
V_{\text{th}} = V_{\text{to}} + \gamma(\sqrt{\phi_s} + V_{sb} - \sqrt{\phi_b}) \tag{2}
\]

and

\[
I_{\text{SUBTH}} = \mu_N C_{\text{ox}} \left( \frac{W_N}{L_N} \right) V_s^2 \times \left[ e^{\frac{V_{sb} - \sqrt{\phi_b}}{\gamma V_{\text{th}}}} \right] \times \left[ 1 - e^{\frac{V_{sb} - \sqrt{\phi_b}}{\gamma V_{\text{th}}}} \right]. \tag{3}
\]

Here, \(V_{\text{th}}\) is the threshold voltage at a zero bulk bias; \(V_{sb}\) is the source to bulk voltage; \(\gamma\) is the body effect; \(\phi_s\) is the surface potential, where \(\mu_N\) is the electron carrier mobility; \(C_{\text{ox}}\) is the gate capacitance per unit area; \(W_N\) is the channel width; \(L_N\) is the

---

**FIGURE 5** Clock controlled domino logic with transmission gate keeper control (CCDD_2) (Proposal 2)

node is prevented from discharging. After \(2 \times T_{p_{\text{inv}}}\), the keeper circuitry consisting of \(M_{K1}\) and \(M_{K2}\) stops conducting. This process thus disables any contention processes from occurring.

In contrast, if the PDN is evaluated as FALSE, \(V_{\text{foot}}\) will remain LOW and connected to GND through \(M_{K1}\). To consider the state of \(V_{\text{foot}}\) during the start of the evaluation phase, note that during the preceding cycle at the time of its evaluation, irrespective of the logic states of the PDN, the voltage \(V_{\text{foot}}\) will remain at LOW. Hence, the node \(V_{\text{foot}}\) will never reach a floating state. Furthermore, in the subsequent cycle, when the PDN happens to be TRUE, the \(V_{\text{foot}}\) will change to a voltage above the threshold voltage, as discussed above. Additionally, the input signals are applied from another stage of the dynamic circuits, or in other words, in phase with the evaluation phase (\(CLK\) being HIGH). Hence, \(V_{\text{foot}}\) remains at LOW. If the PDN inputs are LOW, the \(V_{\text{foot}}\) is at LOW, causing \(M_{K2}\) to be ON, thereby retaining the charge in the dynamic node.

3.2 | Clock controlled dual keeper domino with T-gate keeper control (CCDD_2)

CCDD_2 realizes the keeper control of device \(M_{K2}\) using \(V_{\text{foot}}\) and the CLK signals applied to TG1. As can be observed, it employs a reduced number of devices, incurring a lower silicon area along with a reduced contamination delay. Furthermore, the transition delay of two input NAND gate is given as \(2R_nC\), which is approximately twice that of the delay from the transmission gate, which is \(R_nC\), where \(R_n\) is the resistance of the NMOS device and \(C\) is the capacitance from diffusion and routing. It should also be noted that the delay incurred by an NAND gate is edge dependent, whereas that of a T-gate is divided only from the parallel effective resistance \(P\) and \(N\) devices forming the T-gate.
channel length; \(V_t\) is the thermal voltage; and \(V_{th}\) is the threshold voltage. The parameter \(n\) is the sub-threshold swing coefficient of the transistor, as defined by \(n = 1 + (C_p/C_{ox})\), where \(C_{ox}\) is the depletion channel region capacitance per unit area.

In conventional domino circuits, the variation in the keeper current owing to the positive feedback loop gain (7) associated with the keeper circuit aggravates the current (delay) variability at a dynamic node [12]. However, the reduction in the current (delay) variability can be achieved by reducing the positive feedback loop gain of the keeper feedback loop. Here, the gain of the keeper feedback loop is given by (4):

\[
T = A_{inv}g_{mKeeper}Z_{dyn}
\]

where \(A\) is the inverter gain, \(g_m\) is the trans-conductance of the keeper device, and \(Z_{dyn}\) is the impedance at the dynamic node, which are realized by incorporating two keeper transistors, \(M_{K1}\) and \(M_{K2}\), in series, which in effect reduces the trans-conductance of the keeper transistor \(M_{K1}\) in the feedback loop reduced by a factor of \((1 + G_{mk1})R\). The nominal sizing of the transistors using (5) ensures the robustness of the circuit.

\[
\left(\frac{W}{L}\right)_K = \left(\frac{W}{L}\right)_{K1} + \left(\frac{W}{L}\right)_{K2}
\]

Furthermore, an additional degree of freedom that can be employed is achieved through reduction in the \(M_{K1}\) device width to modify the delay variability. The decrease in the width of the keeper device in the feedback loop also decreases the dynamic node capacitance. This enhances the speed, as indicated in (6) and (7):

\[
C_{eval} = C_{gd}^{PDN} + C_{gd}^{PDN} + C_{gd}^{inv} + C_{gd}^{k1}
\]

and

\[
T_{del} = \frac{C_{eval} \cdot V_{DD}}{2I_{Dsat}}
\]

where \(T_{del}\) is the delay; \(C_{eval}\) is the total capacitance at the output node during the evaluation, which depends on the gate drain capacitance of the pre-charge transistor \(C_{gd}^{PDN}\), PDN \(C_{gd}^{PDN}\), static inverter \(C_{gd}^{inv}\), and keeper transistor \(C_{gd}^{k1}\); and \(I_{Dsat}\) is the saturation current.

4 | VALIDATION OF CLOCK CONTROLLED DUAL KEEPER DOMINO DESIGNS

The proposed circuit styles are validated through an implementation of wide fan-in gates as follows:

1. A performance comparison of various wide fan-in gates using CCDD_1 and CCDD_2 designs against the conventional domino logic style.

2. Analysis of CCDD_1 and CCDD_2 against the widely discussed domino logic architectures found in the literature [3,5,6].

4.1 | Wide fan-in gates

The need for wide fan-in gates such as tag comparators, multiplexers, and register files are vital elements in the area of processor designs. Hence, studies on wide fan-in domino gates with lower power consumption even when operating at high speed are increasing [19]. Validation of the proposed clock controlled dual keeper domino designs is accomplished through a simulation and transient analysis of various wide fan-in circuits used in processors, such as a wide fan-in OR gate, multiplexer, and tag comparator. A comparison of the circuit parameters, including the operating speed, and an analysis of the power consumption are conducted.

4.1.1 | 128 input OR gate

Figure 6A shows a generic design of an \(N\) input OR gate (128-input OR gate) using a clock controlled dual keeper domino logic. The figure shows the AND and T-gate structures employed for the proposed CCDD_1 and CCDD_2 styles. Figure 6B and C show the transients of the input signal and the output nodes for the two styles, respectively. During the pre-charge phase, the dynamic node gets charged to \(V_{DD}\) through the pre-charge transistor \(M_{pre}\), as shown in Figure 6A. The evaluation phase starts when the clock reaches HIGH, as shown in Figure 6B. However, owing to the delay incurred by the gate signal reaching the gate of the footer device, the voltage \(V_{foot}\) initially accumulates at the footer node even when one of the inputs switches to HIGH. The \(V_{foot}\) signal, which is HIGH when applied to the AND or T-gate keeper control circuit based on the design proposal, cuts off the keeper transistor \(M_{K2}\), which in effect resists any contention current from flowing through the dual keeper arrangement. This state continues until the gate of \(M_n\) is driven by the delayed clock signal \(CLK\) evaluation. After the delay time is incurred by the two inverter stages (or the buffer), the footer transistor \(M_n\) is enabled. This in effect provides a discharge path for the dynamic node to the ground. Thus, the dynamic node output becomes LOW with reduced contention when the logic is TRUE. This LOW transition at the dynamic node is reflected as a HIGH output at the OUT node.

4.1.2 | 8 × 1 Multiplexer

Figure 7A shows an 8 × 1 multiplexer using the CCDD_1 and CCDD_2 configurations, which demonstrate eight PDN
paths equivalent to that of the number of inputs. The select signals are denoted as $S_0$, $S_1$, and $S_2$, and the eight input lines are denoted as $D_0$ through $D_7$. The PDN comprises of NMOS devices in series for the data and select signals. As discussed in Section 3, the keeper circuit comprises an additional PMOS transistor $M_{K2}$ controlled by the clock, and $V_{foot}$ using an AND/T-gate configuration.

Figure 7B and C depict a simulation waveform obtained for an $8 \times 1$ multiplexer using the CCDD_1 and CCDD_2 configurations. During the pre-charge phase, with the CLK signal at LOW, device $M_{pre}$ is operated, and the dynamic node is charged to HIGH, making the output LOW. To discuss the operation of the circuit considered, states $A$ and $B$ during the initial evaluation phase are depicted in Figure 7B. In state $A$, input $D_1 = $ LOW and the signals are selected as $S_0 S_1 S_2 = 001$, and the dynamic node is retained at HIGH, which makes the output LOW. In state $B$, $D_1 = $ HIGH, and the dynamic node tends to discharge and accumulates foot node voltage $V_{foot}$ owing to the delayed enabling of the footer transistor. With $V_{foot}$ and a CLK at HIGH, the additional keeper circuit yields a HIGH output, which places the additional keeper transistor $M_{K2}$ in a cut-off state. This reduces the contention between the PDN and the keeper transistor, and facilitates a fast discharge of the dynamic node.

4.1.3 | 40-bit tag comparator

The tag comparator employed in cache memory is another primary block of the microprocessor, which plays a major role in applications involving a faster performance [20]. Hence, to meet the imperative needs of reduced delay and the power consumption of tag comparators in deep sub-micron (DSM) technologies, as well as relentless scaling, high fan-in domino circuits can be applied. Figure 8 shows the design of an n-bit comparator using the CCDD_1 and CCDD_2 structures. When the input addresses $A[n:0]$ and $B[n:0]$ are identical, the dynamic node is retained at HIGH, and the output becomes LOW and indicates a HIT condition. However, if the addresses $A[n:0]$ and $B[n:0]$ do not match, the PDN path is enabled. During the initial evaluation phase, $V_{foot}$ accumulates and cuts off the keeper circuit. After a delay is incurred from the buffer, the footer transistor $M_n$ is enabled and a discharge occurs without any contention allowed to occur between the keeper transistor and the PDN. Hence, when a mismatch appears in the address, it makes the output HIGH, indicating a MISS condition. Hence, domino circuits can be a preferred logic design methodology owing to its high-speed operation.

4.1.4 | Simulation results and comparison

The simulations of the circuits are carried out using 180 nm technology node libraries. The size of each transistor is set to minimum, with supply voltage of $V_{DD} = 1.8$ V at $27^\circ$C, forming the simulation environment, which is used for all analyses made in this work. The keeper transistors are also set to their minimum process width, chosen to reduce the nodal capacitance values. A simulation of the above-mentioned wide
fan-in gates was conducted using a conventional domino logic style and CCDD_1 and CCDD_2 structures. The average power is calculated by applying all possible sets of input combinations.

Figure 9 shows the lower power consumption and higher speed of operation of the clock controlled dual keeper domino logic structures (CCDD_1 and CCDD_2) as compared to conventional domino logic circuits. The simulation of a 128-input OR gate using the CCDD1 and CCDD2 structures shows a reduced delay of 599.8 ps and 692.6 ps, respectively, as compared to 746 ps of the conventional domino logic circuit. The power consumptions of the CCDD_1 and CCDD_2 structures, namely, 89.9 μW and 87.2 μW, respectively, are also found to be lower than a conventional circuit.

### 4.2 | CCDD_1 and CCDD_2 vs existing domino architectures

To demonstrate and evaluate the advantages of the CCDD styles, a simulation of a 64-input OR gate was conducted, and the performance metrics were compared with the HSD, controlled strong PMOS keeper domino logic, HSCD, and CEDL domino logic styles. Furthermore, the impacts of the process variations on the power and delay metrics were also analyzed using a Monte-Carlo simulation for 2,000 runs. Table 3 shows the power consumption when considering all combinations of inputs. The delay and power delay product were also found to be less considering the existing domino logic structures. Figure 10 shows the reduced power consumption and reduced delay of the proposed styles in comparison with the existing styles.

Figure 11 shows the power consumption and delay incurred by the proposed styles under various wide fan-in conditions. It can be seen that the power and delay overhead of the proposed styles are greater for lower fan-in conditions.
owing to the additional keeper control circuitry. The faster performance of the proposed designs than that of the conventional domino logic circuit is shown in Figure 12. During the evaluation phase, with a HIGH input, the dynamic node discharges with a reduced contention owing to the presence of a keeper control circuit consisting of an AND/T‐gate configuration, and makes the output HIGH. The CCDD_1 and CCDD_2 structures offer 7% and 15% faster LOW to HIGH transitions than the conventional domino logic design, respectively.

The Unity Noise Gain (UNG) is a key parameter that defines the robustness of the circuit [2,15]. The noise gain margin is defined as the dc voltage, which when applied to all inputs, produces an output voltage with the same amplitude, as stated in (8):

\[
\text{UNG} = \{(\text{in}, V_{\text{noise}}) = \text{out}\} \tag{8}
\]

**Table 3** Power and delay of various domino logic circuits

<table>
<thead>
<tr>
<th>Domino Logic Style</th>
<th>Average Power (μW)</th>
<th>Delay (ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conventional domino [1]</td>
<td>46.4</td>
<td>412</td>
</tr>
<tr>
<td>HSD [8]</td>
<td>56.1</td>
<td>455.3</td>
</tr>
<tr>
<td>Controlled strong PMOS [9]</td>
<td>161.1</td>
<td>514</td>
</tr>
<tr>
<td>HSCD [3]</td>
<td>350.2</td>
<td>406.2</td>
</tr>
<tr>
<td>CEDL [3]</td>
<td>179.2</td>
<td>470.6</td>
</tr>
<tr>
<td>Clock controlled dual keeper domino</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Proposed 1 (CCDD_1)</td>
<td>51.2</td>
<td>362.5</td>
</tr>
<tr>
<td>Clock controlled dual keeper domino</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Proposed 2 (CCDD_2)</td>
<td>47.2</td>
<td>302.5</td>
</tr>
</tbody>
</table>

The CCDD_1 and CCDD_2 structures were observed to be more noise tolerant. The circuits were tested to be immune to a noise magnitude of 1.04 V and 1.28 V, respectively. This fact reflects the noise immunity characteristic of the proposed structures. Figure 13 shows the UNGM of the proposed styles in comparison with that of the existing styles. At lower technology nodes, the statistical parameter variations offered by the devices have a profound impact on the circuit design and operation. In a conventional domino keeper circuit, a fast NMOS–slow PMOS corner accounts for the increased leakage current, and a slow NMOS–fast PMOS corner accounts for the excessive contention current. The observation shown in Figure 14 demonstrates the lower delay variations of the clock controlled dual keeper domino logic.
structures (CCDD_1 and CCDD_2) in the slow NMOS–fast PMOS corner condition.

The variations in the process parameters such as the threshold voltage and oxide thickness lead to uncertainty and undesirable changes in the delay characteristics of the circuit. Hence, to evaluate the effects of the statistical process variation on the delay phenomena, Monte Carlo simulations of the CCDD_1 and CCDD_2 structures were conducted for 2,000 runs on a 64-input OR gate. A threshold voltage variation with a Gaussian distribution of 3σ was considered. The simulation yielded the minimum mean delay value in comparison with the conventional domino logic, as indicated in Table 4. It was also observed that CCDD_1 and CCDD_2 are superior in terms of the delay deviation. Figure 15 shows the delay distribution curve of the 64-input OR circuit for 1000 runs, which reveals the minimal delay and tightly controlled delay variations of the CCDD_1 and CCDD_2 structures over the conventional domino logic circuit. The reduced variability factor (σ/μ) of 8.4% and 9.8% for the CCDD_1 and CCDD_2 structures, respectively, illustrates the robustness of the circuit.

To summarize, the simulation results demonstrate that the abrupt control of the keeper circuit offers a reduced contention, thereby increasing the performance speed. Furthermore, the lesser delay variability proves that the proposed CCDD_1 and CCDD_2 structures are more tolerant to process variations.

### CONCLUSION

This paper presented novel clock controlled dual keeper domino logic structures (CCDD_1 and CCDD_2) with additional PMOS keeper transistor structures. The keeper circuit is controlled by a clock and foot node voltage using two different configurations, namely, AND and T-gate keeper control structures. The proposed designs offer precise control of the keeper circuitry and realize a reduced contention current, which improves the speed of the circuit. Wide fan-in modules, namely, a 128-input OR gate, a 8 × 1 multiplexer, and 40-bit tag comparator circuits were implemented using the CCDD approaches, and analyzed for an enhanced speed performance with reduced power consumption. The simulation of the clock controlled dual keeper domino logic structures (CCDD_1 and CCDD_2) with an AND and T-gate configuration for a 64-bit OR gate shows that the circuits consume 51.2 μW and 47.2 μW of power, respectively. The use of a clock controlled dual keeper domino logic structure (CCDD_2) with a T-gate used for the additional keeper control circuit demonstrates a delay value of 302.6 ps, which is much less than the AND-gate controlled clock controlled
dual keeper domino logic structure (CCDD_1). The lower delay variability of 8.4% and 9.8% for the clock controlled dual keeper domino logic structures (CCDD_1 and CCDD_2) obtained over 2,000 runs using Monte Carlo simulations validates the fact that both structures are process-variation tolerant circuits.

ORCID

A. Anita Angeline https://orcid.org/0000-0002-5603-0976
V. S. Kanchana Bhaskaran https://orcid.org/0000-0002-3819-1952

REFERENCES


5. F. Frustaci et al., High-performance noise-tolerant circuit techniques for CMOS dynamic logic, IET Circuits Devices Syst. 2 (2008), no. 6, 537–548.


27. J. Wang et al., Low power and high performance dynamic CMOS XOR/NOR gate design, Microelectron. Eng. 88 (2011), no. 8, 2781–2784.


AUTHOR BIOGRAPHIES

A. Anita Angeline received BE degree in Electronics and Instrumentation Engineering and ME degree in Applied Electronics from Karunya Institute of Technology, Coimbatore, India in 1999 and 2004, respectively. She is currently working as an Assistant Professor and is pursuing PhD at VIT, Chennai, India. Her research areas include the design of high-speed dynamic logic structures with process-variation tolerance.

V. S. Kanchana Bhaaskaran is a professor at the School of Electronics Engineering and Dean of Academics at VIT Chennai, India. She obtained an undergraduate degree in electronics and communication engineering from the Institution of Engineers (India), Calcutta, India, an MS degree in Systems and Information from Birla Institute of Technology and Sciences, Pilani, India, and a PhD from VIT Chennai. She has more than 35 years of industry, research, and teaching experience, serving with the Department of Employment and Training, the government of Tamil Nadu, IIT Madras, Salem Cooperative Sugar Mills’ Polytechnic College, SSN College of Engineering, and VIT University. Her specializations include low-power VLSI circuit designs, microprocessor architectures, and linear integrated circuits. She has published approximately 100 papers in international journals and conferences, and has three patents published. She is a reviewer for international peer-reviewed journals and conferences. She is also a Fellow at the Institution of Engineers (India), a Fellow at the Institution of Electronics and Telecommunication Engineers, a Lifetime Member of the Indian Society for Technical Education, and a Senior Member of the Institute of Electrical and Electronics Engineers, Inc., USA.