## UNIVERSITY OF WATERLOO

Faculty of Engineering

E\&CE 438:<br>Digital Integrated Circuits

# Sequential 4-bit Adder Design Report 

Prepared by:
Ian Hung (iXXXXXX), 99XXXXXX
Annette Lo (aXXXXXX), 99XXXXXX
Pamela Torres (pXXXXXX), 99XXXXXX

August 8, 2003

## Table of Contents

List of Figures ..... iii
List of Tables ..... iv
1.0 Introduction ..... 1
1.1 Project Requirements ..... 1
1.2 Functionality of Binary Adder ..... 1
2.0 Design Methodology ..... 2
2.1 Adder Architecture Selection. ..... 2
2.2 Flip-flop Architecture Selection ..... 6
3.0 Optimization Techniques ..... 6
3.1 Sizing ..... 6
4.0 Pre-layout Simulation Results ..... 8
5.0 Layout of Sequential Four-Bit Adder ..... 9
6.0 Post-layout Simulation Results ..... 10
7.0 Conclusions ..... 11
References ..... 12
Appendix ..... A-1

## List of Figures

Figure 1: Sequential 4-bit Adder Layout ..... 2
Figure 2: Complementary Static CMOS Full Adder ..... 2
Figure 3: Pre-layout Simulation Response to Signal A and B ..... 8
Figure 4: Symmetrical Implementation of 1-bit Adder ..... 9
Figure 5: Post-layout Simulation Response to Signal A and B ..... 10
Figure A-1: Characterization of Setup Time (Pre-layout) ..... A-1
Figure A-2: Characterization of Delay Time (Pre-layout) ..... A-1
Figure A-3: Characterization of Setup Time (Post-layout) ..... A-2
Figure A-4: Characterization of Delay Time (Post-layout) ..... A-2
Figure A-5: Test Bench for 4-Bit Adder ..... A-3
Figure A-6: Schematic of 4-Bit Adder ..... A-3
Figure A-7: Schematic of Transmission Gate Flip Flop ..... A-4
Figure A-8: Schematic of 1-Bit Adder ..... A-4
Figure A-9: Schematic of AND Gate ..... A-5
Figure A-10: Schematic of Buffer ..... A-5
Figure A-11: Schematic of Inverter ..... A-6
Figure A-12: Layout of 1-bit Adder ..... A-6
Figure A-13: Layout of Flip Flop ..... A-7
Figure A-14: Layout of AND Gate ..... A-7
Figure A-15: Layout of Buffer. ..... A-8
Figure A-16: Layout of Inverter ..... A-8
Figure A-17: Layout of 4-Bit Adder ..... A-9

## List of Tables

Table 1: Truth Table for Full Adder ..... 1
Table 2: Comparison of Advantages and Disadvantages of Static Adders ..... 3
Table 3: Design Criteria. ..... 4
Table 4: Comparison of Various Flip Flops ..... 5
Table 5: Final Transistor Widths Sizes ..... 7

### 1.0 Introduction

The adder is one of the most fundamental arithmetic operators used in the datapaths of microprocessors and signal processors. Since the adder is usually the speed-limiting element within a datapath, its speed and power have drastic impacts on the overall performance of a system. Thus, the main goal of an integrated circuit designer is to optimize the design of the adder. Circuit optimization includes manipulation of transistor sizes and circuit topology to maximize speed.

### 1.1 Project Requirements

The purpose of the project is to design a sequential 4-bit adder, satisfying the requirements of performing successful additions of 4-bits within a clock period of less than 1000 ps. Other constraints include the rise and fall times which are required to be equal to 100 ps and the load capacitance which is required to be 20 fF . The main objective is to maximize the figure of merit, which can be calculated using the following equation:

$$
F O M=\frac{\operatorname{Frequency}(G H z)}{\operatorname{Power}(\mu W)}
$$

### 1.2 Functionality of Binary Adder

Table 1 illustrates the basic operation of a binary adder, where A and B are the adder inputs, $\mathrm{C}_{\mathrm{i}}$ is the carry input, S is the sum output, and $\mathrm{C}_{\mathrm{o}}$ is the carry out.

Table 1: Truth Table for Full Adder [1]

| A | B | $\mathrm{C}_{\mathrm{i}}$ | S | $\mathrm{C}_{\mathrm{o}}$ |
| :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 |

The Boolean expressions for S and $\mathrm{C}_{0}$ can be expressed as

$$
\begin{aligned}
& S=A \oplus B \oplus \mathrm{C}_{\mathrm{i}}=\mathrm{AB}{ }^{\prime} \mathrm{C}_{\mathrm{i}}{ }^{\prime}+\mathrm{A}^{\prime} \mathrm{BC}_{\mathrm{i}}{ }^{\prime}+\mathrm{A}^{\prime} \mathrm{B}^{\prime} \mathrm{C}_{\mathrm{i}}+\mathrm{ABC}_{\mathrm{i}}=\mathrm{ABC}_{\mathrm{i}}+\mathrm{C}_{\mathrm{o}}{ }^{\prime}\left(\mathrm{A}+\mathrm{B}+\mathrm{C}_{\mathrm{i}}\right) \\
& \mathrm{C}_{\mathrm{o}}=\mathrm{AB}+\mathrm{BC}_{\mathrm{i}}+\mathrm{AC}_{\mathrm{i}}
\end{aligned}
$$

### 2.0 Design Methodology

The structure of the sequential 4-bit adder consists of a 1-bit full adder, 5 flip-flops, and 1 and gate. Figure 1 illustrates the general layout of the four bit adder.


Figure 1: Sequential 4-bit Adder Layout

### 2.1 Adder Architecture Selection

The three main adder designs considered include complementary static CMOS, mirror, and transmission gate based. Mainly static implementations are considered since dynamic circuits consume more power due to charging and discharging of load capacitors and the clock, and consequently result in a lower figure of merit. Moreover, since dynamic circuits are ratioless, optimization is more tedious since simple manipulation of NMOS and PMOS transistor sizes no effect.

Table 2 summarizes the advantages and disadvantages of the considered static designs. Based on the analysis above, it can be seen that the static adder circuit basically translates the Boolean equations above into complementary CMOS circuitry, and the mirror and transmission gate based adder design requires less transistors than the complementary CMOS adder. However, the mirror and transmission gate adder designs were not attempted due to its complexity, as it incorporates multiplexers and/or XORs. And the complementary static CMOS design was ultimately chosen due to its simplistic layout and its easily identifiable data transmission path. Figure 2 depicts the chosen adder design.


Figure 2: Complementary Static CMOS Full Adder

Table 2: Comparison of Advantages and Disadvantages of Static Adders [1]

| Adder Types | Advantages | Disadvantages |
| :---: | :---: | :---: |
| Complementary Static CMOS | - Logic effort reduced to 2 due to carry generation circuit design on the smaller PMOS stack <br> - NMOS and PMOS transistors connected to $\mathrm{C}_{\mathrm{I}}$ placed close to the output of the gate, causing capacitances of internal nodes in transistor chain to be pre/discharged in advance. | - Tall PMOS transistors stacks present in both S and C circuits <br> - Intrinsic load capacitance of $\mathrm{C}_{0}$ large and consists of two diffusion, six gate, and wiring capacitances <br> - Extra delay due to two inverting stages in the carrygeneration circuit <br> - Sum generation requires one extra unimportant logic stage <br> - Moderate number of transistors (28) |
| Mirror | - Few number of transistors (24) <br> - NMOS and PMOS chains completely symmetric, resulting in maximum of two series transistors in the carrygeneration circuit and logic effort of 2 at each input <br> - Transistors connected to $\mathrm{C}_{\mathrm{i}}$ placed closest to output of gate | - Boolean expression of $S$ and $\mathrm{C}_{0}$ more difficult to identify in circuitry <br> - Capacitances include two internal gate, and six gate capacitances in connecting adder, and the most critical issue is minimizing the capacitance at node $\mathrm{C}_{\mathrm{o}}{ }^{\text {, }}$ <br> - Requires an additional inverter to recover the value of $S$, increasing the number of transistors to 26 |
| Transmission Gate Based | - Few number of transistors (24) <br> - NMOS and PMOS chains completely symmetric, resulting in maximum of two series transistors in the carrygeneration circuit and logic effort of 2 at each input <br> - Transistors connected to $\mathrm{C}_{\mathrm{i}}$ placed closest to output of gate | - Boolean expression of $S$ and $\mathrm{C}_{0}$ more difficult to identify in circuitry <br> - Capacitances include two internal gate, and six gate capacitances in connecting adder, and the most critical issue is minimizing the capacitance at node $\mathrm{C}_{\mathrm{o}}{ }^{\text {, }}$ <br> - Requires an additional inverter to recover the value of $S$, increasing the number of transistors to 26 |

### 2.2 Flip-flop Architecture Selection

The flip flop is a critical component of the circuit as it forms the basis for the sequential design of the four bit adder. Since the signal exiting the full adder passes through four stages of flip flops, a poorly chosen component may greatly constrain the performance of the overall circuit. Hence, part of the design weight is to model a flip flop that consumes low power while being able to propagate the signal as quickly as possible (i.e. at least operate faster than the time required for the combinational circuit to process the inputs). Table 3 summarizes the essential characteristics of the flip flop. The setup and hold times for the data to be valid before and after the clock transition must be minimized while the propagation delay for the data to be copied to the output should also be reduced. Note that in order for a circuit to exhibit memory, a circuit must be bi-stable. Connecting two inverters back to back makes a simple bi-stable circuit. It is also crucial that the two inverters be regenerative.

Table 3: Design Criteria [2]

| Desirable Characteristics | Undesirable Characteristics |
| :---: | :---: |
| Small clock load | Positive setup time |
| Short direct path | Sensitivity to clock slope and skew |
| Reduced node swing | Dynamic (floating nodes) |
| Low-power feedback | Dynamic Master latch |
| Pulsed design |  |
| Optimization of both Master and Slave <br> latch |  |

The pros and cons of several different registers are listed in Table 4 below.
Although static logic was chosen for the implementation of the adder component, dynamic logic was chosen for the flip flop. The advantages of dynamic logic over static logic include avoiding the duplication of logic twice as in both N -tree and P -tree in standard CMOS, typically used in very high performance applications, very simple sequential memory circuits (amenable to synchronous logic), high density achievable, and in some cases, consumes less power. However, there are two drawbacks of dynamic logic and that includes problems with clock synchronization and timing, as well as design is more difficult.

For our design, the dynamic transmission-gate edge-trigger register was chosen because its performance satisfies our design criteria: it offers few desirable characteristics such as speed advantage, low static power consumption, and can be clocked at high frequencies since there is very little delay in latch elements. High density is also achieved since dynamic logic is uses fewer number of transistor and is ratioless (i.e. a fixed ratio in size between pull-up and pull-down structures is not required for proper operation), thus, less area is needed in comparison to static logic for the same performance. Furthermore, with the master-slave approach in dynamic transmission-gate design, the problem in the evaluation involving a built in "race condition" is avoided. With the numerous advantages of the transmission gate flip flop, the cost of increased design time, increased operational complexity and decreased operational margin are outweighed.

Table 4: Comparison of Various Flip Flops

| Register Types | Advantages | Disadvantages |
| :---: | :---: | :---: |
| Multiplexer based latch | - Simplicity of design | - Output is level triggered and susceptible to voltage variations |
| Master Slave edgetriggered register |  | - Large load on the clock circuitry due to large number of transmission gates |
| Master Slave edgetriggered register [simplified] | - Less load on the clock circuitry | - Design is more challenging because the source driving the input should overpower the feedback inverter. <br> - Reverse conduction path inherent with design. |
| Dynamic transmissiongate edge-trigger register | - Use fewer transistors than static circuitry <br> - Don't suffer from the static power consumption of ratioed logic <br> - Enable higher frequency performance | - Data must be refreshed, otherwise the data will be lost due to leakage. <br> - Clock overlap can cause problems in the dynamic register. <br> - Leakage current can damage signal. <br> - Dynamic power is high |
| $\mathbf{C}^{2}$ MOS Register [single edge] | - Insensitive to clock overlap if clock rise and fall time is sufficiently small | - Direct path between the input and output exist during clock transition, if rise and fall time of the clock is not short <br> - Existence of "race condition" |
| $\mathbf{C}^{2}$ MOS Register [dual edge] | - Low power feedback <br> - Locally generated second phase <br> - Data throughput is doubled | - Poor driving capability <br> - Constrains the overall clock frequency of the circuit |
| TSPC Register | - One clock is used and the problem of clock overlap and skew is eliminated. <br> - Load on the clock circuitry similar to conventional transmission gate or C2MOS register. | - Number of transistors higher than $\mathrm{C}^{2} \mathrm{MOS}$. <br> - At some times, output node is float. If the output drives a transmission gate, charge sharing can occur. |

### 3.0 Optimization Techniques

After each implementation of the adder, flip flop and 'and' gate was completed, their functionality was verified separately. When each design block was performing acceptably, they were then were cascaded to form a 4-bit adder, similar to Figure 1. Through simulation testing with $\mathrm{V}_{\mathrm{dd}}$ at 3.3 V and output capacitive loads of 20 fF , the original design was only capable of performing at a frequency of 100 MHz . With respect to the project requirement of an operating frequency of at least 1 GHz , the original design needed to be changed.

The average dynamic power dissipated by digital circuits can be expressed as $\mathrm{P}_{\text {avg }}=$ $\mathrm{C}_{\mathrm{L}} \mathrm{V}_{\mathrm{dd}}{ }^{2} \mathrm{f}$, where $\mathrm{C}_{\mathrm{L}}$ represents the total load capacitance, $\mathrm{V}_{\mathrm{DD}}$ is the power supply and f is the frequency of the signal transition. From the equation, it can be seen that dynamic power is independent of the typical device parameters. So, in order to decrease power consumption, it is mainly the frequency of switching that must be reduced, since the load capacitance and the supply voltage is kept constant in the project description (in particular, $\mathrm{V}_{\mathrm{dd}}$ is set to 3.3 V and $\mathrm{C}_{\mathrm{L}}$ is 20 fF ).

In analyzing the full adder circuit, as design blocks are cascaded together, the configuration resembles a chain of transmission gates. The time constant of a chain of $n$ transmission gates can be estimated to be $\mathrm{C}^{*} \mathrm{R}_{\mathrm{eq}} * \mathrm{n}(\mathrm{n}+1) / 2$; in other words, the delay is proportional to $\mathrm{n}^{2}$. So, to break the chain, an optimization technique is to insert buffers. A buffer is comprised to two inverters cascaded; consequently, 4 transistor are used: 2 NMOS and 2 PMOS transistors, with their gate capacitances in parallel. To reduce the delay, a buffer was added in between the carry out from the 1-bit adder and the flip flop.

A clock buffer was also incorporated in schematic simulation in order to account for the non-ideal clock inputs. From testing, it was observed that the clock buffer resolved the problem of not properly capturing an information bit when the clock transition occurred simultaneously with the transition of the input signal.

### 3.1 Sizing

Initially, the widths of the NMOS transistors were left at the default 800nm, and the widths of the PMOS were randomly sized 2-3 times larger than the NMOS depending on their location within each design block. Then, parametric analysis was used to vary the widths of the gates in each design block, and it was discovered that increasing the width sizes generally decreased the rise times at the cost of increased power consumption, hence, offsetting the benefit of reduced delay. And after a lengthy analysis process, it was realized that the sizing of the components should be kept minimal (i.e. at 800 nm or multiples of 800 nm when transistors were in series).

When several transistor devices were in series, the widths of the transistors were sized n multiples of 800 nm . This is because several devices in series each with an effective channel length $L_{\text {eff }}$ can be viewed as a single device of channel length equal to the combined channel lengths of the separate series devices [3]. For example, in the adder, a single device of channel length equal to $3 \mathrm{~L}_{\text {eff }}$ can be used to model the behaviour of three series transistors each with $\mathrm{L}_{\text {eff }}$ channel length. This is valid assuming there is no skew in the increasing gate voltage of the three N pull-down devices, and the source/drain
junctions between the three devices essentially are assumed as simple zero resistance connections.

Table 5 illustrates the final sizes of the transistors in each design block.
Table 5: Final Transistor Widths Sizes

| Design Block | NMOS Width Sizes | PMOS Width Sizes |
| :---: | :--- | :--- |
| Adder | $-800 \mathrm{~nm}, 1.6 \mu \mathrm{~m}, 2.4 \mu \mathrm{~m}$ | $-1.6 \mu \mathrm{~m}, 3.2 \mu \mathrm{~m}, 4.8 \mu \mathrm{~m}$ |
| Flip Flop | -800 nm | $-800 \mathrm{~nm}, 2.4 \mu \mathrm{~m}$ (for inverters) |
| And-gate | -800 nm | $-800 \mathrm{~nm}, 1.6 \mu \mathrm{~m}$ |
| Buffer | -720 nm | $-1.8 \mu \mathrm{~m}$ |
| Inverter | -720 nm | $-1.8 \mu \mathrm{~m}$ |

### 4.0 Pre-layout Simulation Results

Flip-Flop Setup Time: $\quad 80.0866$ ps
Flip-Flop Delay Time: $\quad 304.3$ ps
Maximum Frequency: $\quad 1.429(\mathrm{GHz})$
Worst Case Power Consumption: 2.241 (mW)
Worst Case Delay: $\quad 1.354$ (ns)
PDP (delay * power): $\quad 3.03$ (pJ)
Figure 3 illustrates the pre-layout simulation response to signal A and B , where A is a 0011 signal and $B$ is a 0101 signal.


Figure 3: Pre-layout Simulation Response to Signal A and B @ 1.429GHz

### 5.0 Layout of Sequential Four-Bit Adder

The layout stage transforms the schematic design into actual physical layout sizes and orientation of the transistor circuit. A modular approach was used for the layout of the 4bit adder. First, the most common sized NMOS and PMOS transistor was created ( $\mathrm{Wn}=$ 800 nm , and $\mathrm{Wp}=1.6 \mathrm{um})$. The created layouts were then used as templates and resized to various width lengths according to the design blocks. A common layout methodology was also adopted. In the layout, active shapes for building both NMOS and PMOS devices were placed horizontally, the polysilicon strip for gates and metal drain connections were run vertically, and power bussing was run horizontally across the top and bottom of the layout. Furthermore, different metals were used to make connections in order to allow passing metal wires (i.e. most of the pin routes were made with metal1, while $\mathrm{V}_{\mathrm{dd}}$ and $\mathrm{V}_{\mathrm{ss}}$ were routed with metal2), N and P source region extensions were diffused to $\mathrm{V}_{\mathrm{ss}}$ and $\mathrm{V}_{\mathrm{dd}}$ respectively, and the output wires were run horizontally for easy connection to neighboring circuit. The purpose of using different metals was also to isolate the noise between the source lines with the signal lines, as they are on different layers of metal.

For the 1-bit adder layout, extra care was taken to make the schematic symmetrical than the other design blocks to facilitate easier layout, as this design block itself required the most number of transistors. Figure 4 illustrates the symmetrical re-arrangement of the adder block.


Figure 4: Symmetrical Implementation of 1-bit Adder
After all the connections were made in each design block, the layout was then optimized. This was achieved by horizontally compacting the cells (i.e. decreasing the space between individual cells), then by vertical compacting. Often times, new routing schemes were discovered during the optimization process.

Once all the design blocks were laid, each individual design block was extracted and LVS was performed. After verifying that the individual extracted cells had the same terminals and netlists as the original schematics, the entire 4-bit adder was created by. When the entire layout was finished, it was extracted and another LVS was performed. The layouts of the individual design blocks as well as the entire 4-bit adder can be found in the appendix.

### 6.0 Post-layout Simulation Results

Flip-Flop Setup Time: $\quad 86.3819$ ps
Flip-Flop Delay Time: $\quad 406.709 \mathrm{ps}$
Maximum Frequency: $\quad 1.136(\mathrm{GHz})$
Worst Case Power Consumption: 2.332 (mW)
Worst Case Delay: $\quad 1.311(\mathrm{~ns})$
PDP (delay * power): $\quad 3.057$ ( pJ )
Figure 5 illustrates the post-layout simulation response to signal $A$ and $B$, where $A$ is a 0011 signal and B is a 0101 signal.


Figure 5: Post-layout Simulation Response to Signal A and B @ 1.136GHz

### 7.0 Conclusions

From the adopted design of the 4-bit adder, the pre-layout simulation yielded:
Maximum Frequency: $\quad 1.429(\mathrm{GHz})$
Worst Case Power Consumption: 2.241 (mW)
Worst Case Delay: $\quad 1.354$ (ns)
PDP (delay * power): $\quad 3.03(\mathrm{pJ})$
As FOM is calculated as:

$$
F O M=\frac{\operatorname{Frequency}(G H z)}{\operatorname{Power}(\mu W)}
$$

The Figure of Merit for the pre-layout simulation design is $1.429 / 2241=0.000638$.

The post-layout simulation yielded:
Maximum Frequency: $\quad 1.136(\mathrm{GHz})$
Worst Case Power Consumption: 2.332 (mW)
Worst Case Delay: $\quad 1.311$ (ns)
PDP (delay * power): $\quad 3.057$ ( pJ )
The Figure of Merit for the post-layout simulation design is $1.136 / 2332=0.000487$.

It can be concluded that the pre-layout and post-layout simulations yielded very similar results.

## References

[1] J. M. Rabaey, Digital Integrated Circuits: A Design Perspective, Prentice Hall, Upper Saddle River, NJ, 1996.
[2] V. Stojanovic, Latch and Flip Flop Design, http://www-classes.usc.edu/engr/ee-s/577bb/lect.10.2up.pdf
[3] Ronald W. Knepper, Dynamic Logic Circuits and Registers, http://people.bu.edu/rknepper/sc571/chapter5_ckts_C.ppt

## Appendix



Figure A-1: Characterization of Setup Time (Pre-layout)


Figure A-2: Characterization of Delay Time (Pre-layout)


Figure A-3: Characterization of Setup Time (Post-layout)


Figure A-4: Characterization of Delay Time (Post-layout)


Figure A-5: Test Bench for 4-Bit Adder


Figure A-6: Schematic of 4-Bit Adder


Figure A-7: Schematic of Transmission Gate Flip Flop


Figure A-8: Schematic of 1-Bit Adder


Figure A-9: Schematic of AND Gate


Figure A-10: Schematic of Buffer


Figure A-11: Schematic of Inverter


Figure A-12: Layout of 1-bit Adder


Figure A-13: Layout of Flip Flop


Figure A-14: Layout of AND Gate


Figure A-15: Layout of Buffer


Figure A-16: Layout of Inverter


Figure A-17: Layout of 4-Bit Adder

