Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Unit 2: Carry-Look-Ahead, Serial Adder & ALU Design

Lesson 8 of 20 in the free Digital Electronics-II notes on Siksha Sarovar, written by Rohit Jangra.

7.1 Why ripple-carry is the wrong adder for fast circuits

In an N-bit ripple-carry adder, Cout of stage i must propagate through every stage from 0 to i. So the worst-case carry path passes through 2 × N gates (two per full adder for carry generation). For N = 32 bits this is 64 gate delays — about 320 ns in old TTL. A modern CPU runs its adder in well under a nanosecond — we need a fundamentally different scheme.

7.2 The "generate" and "propagate" insight

For each bit position i, define two cheap signals:

    G_i  =  A_i AND B_i        // this stage GENERATES a carry no matter what
    P_i  =  A_i XOR B_i        // this stage PROPAGATES an incoming carry

Then the carry-out of any stage is

    C_{i+1} = G_i OR (P_i AND C_i)

The key trick: unrolling this recursion exposes every carry directly in terms of the primary inputs — no rippling!

    C_1 = G_0 + P_0·C_0
    C_2 = G_1 + P_1·G_0 + P_1·P_0·C_0
    C_3 = G_2 + P_2·G_1 + P_2·P_1·G_0 + P_2·P_1·P_0·C_0
    C_4 = G_3 + P_3·G_2 + P_3·P_2·G_1 + P_3·P_2·P_1·G_0 + P_3·P_2·P_1·P_0·C_0

Every carry now needs only 2 gate delays to settle — one for the AND tree, one for the OR — regardless of how many stages back the originating carry came from.

7.3 The 4-bit carry-look-ahead adder

A 4-bit CLA block has:

  • Four P / G generators (one per bit) — 1 gate delay.
  • A look-ahead block that computes C1, C2, C3, C4 in parallel — 2 gate delays.
  • Four final XORs that combine P_i with C_i to produce S_i — 1 gate delay.

Total worst-case delay: 4 gate delays, independent of width, for one CLA block.

7.4 Scaling beyond 4 bits — hierarchical CLA

To build a 16-bit or 64-bit adder, cascade 4-bit CLA blocks. Two levels of "block generate" / "block propagate" signals let the upper carries fly across the whole word in just a few more gate delays — that is how a 64-bit adder runs in ~6-7 gate levels.

The standard 4-bit CLA chip is the 74LS181 / 74181 ALU slice, and the look-ahead carry generator chip is the 74LS182. Cascading 74182s with 74181s gives you a fast 16-bit or 32-bit ALU out of two chips per slice.

7.5 The serial adder — fewest gates, slowest

The exact opposite of CLA: use one full adder plus a 1-bit carry flip-flop, and serially shift A and B into it bit-by-bit (LSB first):

Each clock cycle: the FA reads one bit from each operand and the stored carry, writes the Sum bit back into the shift register, and latches the new Cout into the carry flip-flop. After N clocks the sum is complete.

PropertyRipple parallel adderSerial adderCLA
GatesMany (4N)Very few (1 FA + 1 FF)Many (more than ripple)
Time2N gate delaysN clock cycles~ 4 gate delays
Use caseDefault for small NWristwatches, smart cardsHigh-speed CPUs

7.6 What an ALU actually is

An Arithmetic and Logic Unit is a combinational block that, given two operands and a "function select" code, can produce either an arithmetic result or a logical result.

The standard textbook example is the 74181 4-bit ALU, which has four function-select inputs (S3 S2 S1 S0) plus a Mode pin (M):

  • When M = 0 it is in arithmetic mode — sixteen functions like ADD, SUB, INC, DEC, NEG, …
  • When M = 1 it is in logic mode — sixteen functions like AND, OR, XOR, NAND, …

That gives 32 distinct operations from one chip, all in ~22 ns. The 74181 was the heart of every minicomputer ALU through the 1970s.

74181 partial function table (a sample of the 32)

MS3 S2 S1 S0Arithmetic (M=0)Logical (M=1)
0000ANOT A
0001A OR BNOT (A OR B)
0011minus 1logical 0
0110A minus B minus 1A XOR B
1001A plus BA XNOR B
1111Alogical 1

(There are 32 entries in total — the chip is essentially a 32-function ROM realised in TTL.)

7.7 Elementary ALU design — the one-bit slice approach

To design an N-bit ALU from scratch, build a 1-bit slice and then replicate it N times.

To go from 4 bits to 16 bits, cascade four slices; to go faster, replace ripple carry with CLA. That is conceptually the entire ALU of a simple microprocessor.

7.8 Worked example — choosing the right adder

Design a 32-bit unsigned adder that must complete in 5 ns. Which architecture?
  • Ripple carry: 32 stages × ~0.5 ns / stage ≈ 16 ns. Too slow.
  • Serial adder: 32 clock cycles. Way too slow.
  • Hierarchical CLA (4-bit blocks, 2 levels): ~ 4 × 0.5 ns = 2 ns. ✅

So the only architecture that hits the budget is a CLA. This is exactly the reasoning a CPU designer applies — and is why every modern CPU uses some form of CLA or Kogge-Stone tree adder.