GATE CSE · Computer Organization Architecture

Pipeline Processor – GATE CSE Computer Organization ArchitecturePractice Questions & PYQs

Master topic for Pipeline Processor. Includes Pipelining, Performance & Parallelism.

110 questions · 20 PYQs · 0 AI practice · GATE CSE 2027

Q41GATE 2008MCQ

Which of the following are NOT true in a pipelined processor? I. Bypassing can handle all RAW hazards II. Register renaming can eliminate all register carried WAR hazards III. Control hazard penalties can be eliminated by dynamic branch prediction

🚀 Solve in Practice Mode

Q42GATE 2008MCQ

Delayed branching can help in the handling of control hazards For all delayed conditional branch instructions, irrespective of whether the condition evaluates to true or false

🚀 Solve in Practice Mode

📖 Explanation

In pipelined architectures, delayed branching reduces control hazards by utilizing a branch delay slot. When a conditional branch is fetched, the instruction immediately following the branch in memory is fetched into the pipeline before the branch condition is resolved. Because the processor logic requires this delay slot to be filled to maintain pipeline efficiency, the instruction at $PC + \text{offset}$ (where the offset points to the next sequential instruction) is executed. This architectural design ensures that the instruction following the conditional branch is always executed, irrespective of whether the branch condition evaluates to true or false.

Q43GATE 2008MCQ

The performance of a pipelined processor suffers if:

🚀 Solve in Practice Mode

📖 Explanation

Pipeline performance relies on maintaining a constant throughput of one instruction per cycle through balanced stages and uninterrupted flow.
If pipeline stages have unequal delays $T_i$ , the clock cycle time is constrained by $T_{clk} \geq \max(T_i)$ , causing idle time in faster stages and reducing efficiency.
Data dependencies create hazards that necessitate pipeline stalls or "bubbles" to ensure correctness, which prevents continuous instruction execution.
Structural hazards occur when resource contention arises from multiple instructions sharing hardware components, forcing the processor to delay access and stall the pipeline.
Since uneven delays, data hazards, and resource contention each force the processor to stall or operate inefficiently, all these factors collectively degrade overall performance.

Q44GATE 2008MCQ

Delayed branching can help in the handling of control hazards The following code is to run on a pipelined processor with one branch delay slot: I1: ADD $\leftarrow$ R2 R7 +R8 I2 : SUB R4 $\leftarrow$ R5 - R6 I3: ADD R1 $\leftarrow$ R2 + R3 I4 : STORE Memory [R4] $\leftarrow$ R1 BRANCH to Label if R1==0 Which of the instructions I1, I2, I3 or I4 can legitimately occupy the delay slot without any other program modification?

🚀 Solve in Practice Mode

📖 Explanation

The branch condition relies on $R1$ , which is computed in $I_3$ , so $I_3$ cannot be moved to the delay slot without causing a data hazard.
Instructions $I_1$ ( $R2 \leftarrow R7 + R8$ ) and $I_2$ ( $R4 \leftarrow R5 - R6$ ) must execute before $I_3$ and $I_4$ to satisfy data dependencies.
$I_4$ ( $Memory[R4] \leftarrow R1$ ) depends on the results of $I_2$ and $I_3$ , but it does not modify $R1$ or influence the branch condition itself.
Placing $I_4$ in the delay slot ensures $I_1$ , $I_2$ , and $I_3$ complete before the branch, maintaining correct state for both the branch evaluation and the store operation.
Thus, $I_4$ is the only instruction that can safely occupy the delay slot.

Q45GATE 2008MCQ

A non pipelined single cycle processor operating at 100 MHz is converted into a synchronous pipelined processor with five stages requiring 2.5 nsec, 1.5 nsec, 2 nsec, 1.5 nsec and 2.5 nsec, respectively. The delay of the latches is 0.5 nsec. The speedup of the pipeline processor for a large number of instructions is:

🚀 Solve in Practice Mode

📖 Explanation

The non-pipelined cycle time $T_{non}$ is determined by the processor frequency:
$T_{non} = \frac{1}{100 \text{ MHz}} = 10 \text{ ns}$ .

The pipelined clock cycle time $T_{pipe}$ is determined by the maximum stage delay plus the latch delay:
$T_{pipe} = \max(2.5, 1.5, 2, 1.5, 2.5) + 0.5 = 2.5 + 0.5 = 3 \text{ ns}$ .

For a large number of instructions, the speedup $S$ is the ratio of the cycle times:
$S = \frac{T_{non}}{T_{pipe}} = \frac{10 \text{ ns}}{3 \text{ ns}} \approx 3.33$ .

Q46GATE 2007MCQ

The floating point unit of a processor using a design D takes 2t cycles compared to t cycles taken by the fixed point unit. There are two more design suggestions $D_1$ and $D_2$ . $D_1$ uses 30% more cycles for fixed point unit but 30% less cycles for floating point unit as compared to design D. $D_2$ uses 40% less cycles for fixed point unit but 10% more cycles for floating point unit as compared to design D. For a given program which has 80% fixed point operations and 20% floating point operations, which of the following ordering reflects the relative performances of three designs? ( $D_i$ > $D_j$ denotes that $D_i$ is faster than $D_j$ )

🚀 Solve in Practice Mode

📖 Explanation

For a program with $80\%$ fixed-point and $20\%$ floating-point operations, the average cycle time is calculated as follows:

Design $D$ : $T_D = 0.8(t) + 0.2(2t) = 1.2t$
Design $D_1$ : $T_{D_1} = 0.8(1.3t) + 0.2(1.4t) = 1.04t + 0.28t = 1.32t$
Design $D_2$ : $T_{D_2} = 0.8(0.6t) + 0.2(2.2t) = 0.48t + 0.44t = 0.92t$
Since performance is inversely proportional to cycle count, we compare the results: $0.92t < 1.2t < 1.32t$ .
This indicates that $T_{D_2} < T_D < T_{D_1}$ , which corresponds to the performance ordering $D_2 > D > D_1$ .

Q47GATE 2007MCQ

A processor takes 12 cycles to complete an instruction $I$ . The corresponding pipelined processor uses 6 stages with the execution times of 3, 2, 5, 4, 6 and 2 cycles respectively. What is the asymptotic speedup assuming that a very large number of instructions are to be executed?

🚀 Solve in Practice Mode

📖 Explanation

The non-pipelined execution time is $T_{non} = 12$ cycles. In a pipelined processor, the clock cycle time ( $T_{clk}$ ) is determined by the stage with the maximum execution time, meaning $T_{clk} = \max(3, 2, 5, 4, 6, 2) = 6$ cycles. For a very large number of instructions, the asymptotic speedup ( $S_{\infty }$ ) is calculated as the ratio of the non-pipelined instruction execution time to the pipeline clock cycle time. Applying this, $S_{\infty } = \frac{T_{non}}{T_{clk}} = \frac{12}{6} = 2$ .

Q48GATE 2007MCQ

Consider a pipelined processor with the following four stages: IF: Instruction Fetch ID: Instruction Decode and Operand Fetch EX: Execute WB: Write Back The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions? ADD R2, R1, R0 (R2 $\leftarrow$ R1 + R0) MUL R4, R3, R2 (R4 $\leftarrow$ R3 * R2) SUB R6, R5, R4 (R6 $\leftarrow$ R5 - R4)

🚀 Solve in Practice Mode

📖 Explanation

Let's trace the execution of the instructions cycle by cycle, considering operand forwarding and variable EX stage latencies:

I1: ADD R2, R1, R0 (EX stage is 1 cycle)
- CC1: I1.IF
- CC2: I1.ID
- CC3: I1.EX (R2 is available for forwarding at the end of CC3)
- CC4: I1.WB
I2: MUL R4, R3, R2 (EX stage is 3 cycles)
- CC2: I2.IF (starts after I1.IF)
- CC3: I2.ID (needs R2. R2 is forwarded from I1.EX in the same cycle. No stall.)
- CC4: I2.EX (1/3)
- CC5: I2.EX (2/3)
- CC6: I2.EX (3/3) (R4 is available for forwarding at the end of CC6)
- CC7: I2.WB
I3: SUB R6, R5, R4 (EX stage is 1 cycle)
- CC3: I3.IF (starts after I2.IF)
- CC4: I3.ID (needs R4. R4 is being computed by I2.EX and will be ready at the end of CC6.)
- Since R4 is not available from I2 until the end of CC6, I3's EX stage must stall.
- CC5: I3.STALL (waiting for R4)
- CC6: I3.STALL (waiting for R4)
- CC7: I3.EX (R4 is forwarded from I2's EX stage that completed at the end of CC6)
- CC8: I3.WB

The last instruction (I3) completes its Write Back stage at clock cycle 8.

The final answer is $\boxed{8}$

Q49GATE 2007MCQ

Data forwarding techniques can be used to speed up the operation in presence of data dependencies. Consider the following replacements of LHS with RHS. i. $R1\rightarrow Loc, Loc\rightarrow R2 \; \equiv \; R1\rightarrow R2, R1 \rightarrow Loc$ ii. $R1\rightarrow Loc, Loc\rightarrow R2 \; \equiv \; R1\rightarrow R2$ iii. $R1\rightarrow Loc, R2 \rightarrow Loc \; \equiv \; R1\rightarrow Loc$ iv. $R1\rightarrow Loc, R2 \rightarrow Loc \; \equiv \; R2\rightarrow Loc$ In which of the following options, will the result of executing the RHS be the same as executing the LHS irrespective of the instructions that follow ?

🚀 Solve in Practice Mode

📖 Explanation

For equivalence i: The LHS performs $R1 \to Loc$ followed by $Loc \to R2$ , resulting in $Loc = R1_{old}$ and $R2 = R1_{old}$ . The RHS performs $R1 \to R2$ followed by $R1 \to Loc$ , resulting in $R2 = R1_{old}$ and $Loc = R1_{old}$ . Both match.
For equivalence ii: The LHS sets $Loc = R1_{old}$ and $R2 = R1_{old}$ , but the RHS only sets $R2 = R1_{old}$ . $Loc$ is not updated in the RHS, so they differ if $Loc$ is read later.
For equivalence iii: The LHS updates $Loc$ to $R2_{old}$ , whereas the RHS leaves $Loc = R1_{old}$ . They are not equivalent.
For equivalence iv: The LHS performs $R1 \to Loc$ then $R2 \to Loc$ , leaving $Loc = R2_{old}$ . The RHS performs $R2 \to Loc$ , also leaving $Loc = R2_{old}$ . Both match.
Therefore, i and iv are the correct equivalent pairs.

Q50GATE 2006MCQ

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement $X = (S - R * (P + Q))/T$ is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

\begin{array}{lll} \text{ADD} & \text{$R5,R0,R1$} & \text{$;R5$} \leftarrow \text{R0 + R1} \\ \text{MUL}& \text{$R6,R2,R5$} & \text{$;R6$} \leftarrow \text{R2 * R5} \\ \text{SUB} & \text{$R5,R3,R6$} & \text{$;R5$} \leftarrow \text{R3 -R6} \\ \text{DIV} &\text{$R6,R5,R4$} & \text{$;R6$} \leftarrow \text{R5/R4} \\ \text{STORE} &\text{$R6,X$}& \text{$;X$} \leftarrow \text{R6} \\ \end{array}

The IF, ID and WB stages take 1 clock cycle each. The EX stage takes 1 clock cycle each for the ADD, SUB and STORE operations, and 3 clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

🚀 Solve in Practice Mode

📖 Explanation

The pipeline dependencies and timing, given EX-to-ID forwarding, are:

$I_1$ (ADD) completes EX at $T_3$ .
$I_2$ (MUL) depends on $I_1$ , starts EX at $T_4$ , and finishes EX at $T_6$ (3 cycles).
$I_3$ (SUB) depends on $I_2$ , starts EX at $T_7$ , and finishes EX at $T_7$ (1 cycle).
$I_4$ (DIV) depends on $I_3$ , starts EX at $T_8$ , and finishes EX at $T_{10}$ (3 cycles).
$I_5$ (STORE) depends on $I_4$ , starts EX at $T_{11}$ , and completes WB at $T_{12}$ (1 cycle).

The sequence completes in 12 clock cycles.

Q51GATE 2006MCQ

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement $X = (S - R * (P + Q))/T$ is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

\begin{array}{lll} \text{ADD} & \text{$R5,R0,R1$} & \text{$;R5$} \leftarrow \text{R0 + R1} \\ \text{MUL}& \text{$R6,R2,R5$} & \text{$;R6$} \leftarrow \text{R2 * R5} \\ \text{SUB} & \text{$R5,R3,R6$} & \text{$;R5$} \leftarrow \text{R3 -R6} \\ \text{DIV} &\text{$R6,R5,R4$} & \text{$;R6$} \leftarrow \text{R5/R4} \\ \text{STORE} &\text{$R6,X$}& \text{$;X$} \leftarrow \text{R6} \\ \end{array}

The number of Read-After-Write (RAW) dependencies, Write-After-Read( WAR) dependencies, and Write-After-Write (WAW) dependencies in the sequence of instructions are, respectively,

🚀 Solve in Practice Mode

📖 Explanation

Let the instructions be $I_1: ADD \ R5, R0, R1$ , $I_2: MUL \ R6, R2, R5$ , $I_3: SUB \ R5, R3, R6$ , $I_4: DIV \ R6, R5, R4$ , and $I_5: STORE \ R6, X$ .
RAW dependencies occur when an instruction reads a register written by a previous instruction: $I_1 \to I_2$ ( $R5$ ), $I_2 \to I_3$ ( $R6$ ), $I_3 \to I_4$ ( $R5$ ), and $I_4 \to I_5$ ( $R6$ ), totaling 4.
WAR dependencies occur when an instruction writes to a register that was read by a previous instruction: $I_2 \to I_3$ ( $R5$ read by $I_2$ , written by $I_3$ ) and $I_3 \to I_4$ ( $R6$ read by $I_3$ , written by $I_4$ ), totaling 2.
WAW dependencies occur when two instructions write to the same register: $I_1 \to I_3$ ( $R5$ ) and $I_2 \to I_4$ ( $R6$ ), totaling 2.
The number of RAW, WAR, and WAW dependencies are 4, 2, and 2, respectively.

Q52GATE 2006MCQ

A CPU has five-stages pipeline and runs at 1GHz frequency. Instruction fetch happens in the first stage of the pipeline. A conditional branch instruction computes the target address and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instructions following a conditional branch until the branch outcome is known. A program executes $10^{9}$ instructions out of which 20% are conditional branches. If each instruction takes one cycle to complete on average, then total execution time of the program is

🚀 Solve in Practice Mode

📖 Explanation

The processor runs at a frequency of $1 \text{ GHz} = 10^9 \text{ cycles/second}$ .
The program executes $N = 10^9$ instructions. The base execution time, assuming one cycle per instruction, is $1 \text{ second}$ .
A conditional branch is fetched in stage 1 and resolved in stage 3, causing the pipeline to stall for $2$ cycles ( $S2$ and $S3$ ) for each branch.
The number of conditional branches is $0.20 \times 10^9 = 0.2 \times 10^9$ .
Total stall cycles = $(\text{Branches}) \times (\text{Stall cycles per branch}) = (0.2 \times 10^9) \times 2 = 0.4 \times 10^9 \text{ cycles}$ .
Total cycles = $\text{Base cycles} + \text{Stall cycles} = 10^9 + 0.4 \times 10^9 = 1.4 \times 10^9 \text{ cycles}$ .
Execution time = $\frac{\text{Total cycles}}{\text{Frequency}} = \frac{1.4 \times 10^9 \text{ cycles}}{10^9 \text{ cycles/second}} = 1.4 \text{ seconds}$ .

Q53GATE 2005MCQ

We have two designs D1 and D2 for a synchronous pipeline processor. D1 has 5 pipeline stages with execution times of 3 nsec, 2 nsec, 4 nsec, 2 nsec and 3 nsec while the design D2 has 8 pipeline stages each with 2 nsec execution time How much time can be saved using design D2 over design D1 for executing 100 instructions?

🚀 Solve in Practice Mode

📖 Explanation

The execution time for $N$ instructions in a pipeline with $k$ stages is given by the formula $T = (k + N - 1) \times T_{clk}$ .
For design $D1$ , $k_1 = 5$ and the clock cycle $T_{clk1} = \max(3, 2, 4, 2, 3) = 4$ nsec, resulting in $T_1 = (5 + 100 - 1) \times 4 = 416$ nsec.
For design $D2$ , $k_2 = 8$ and the clock cycle $T_{clk2} = 2$ nsec, resulting in $T_2 = (8 + 100 - 1) \times 2 = 214$ nsec.
The total time saved by $D2$ over $D1$ is calculated as $T_1 - T_2 = 416 - 214 = 202$ nsec.

Q54GATE 2005MCQ

A 5 stage pipelined CPU has the following sequence of stages: IF - Instruction fetch from instruction memory, RD - Instruction decode and register read, EX - Execute: ALU operation for data and address computation, MA - Data memory access - for write access, the register read at RD stage is used, WB - Register write back. Consider the following sequence of instructions: I1 : L R0, 1oc1; R0 <= M[1oc1] I2 : A R0, R0; R0 <= R0 + R0 I3 : S R2, R0; R2 <= R2 - R0 Let each stage take one clock cycle. What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of I1 ?

🚀 Solve in Practice Mode

📖 Explanation

To complete the instruction sequence, the pipeline must handle the load-use data dependency between $I_1$ and $I_2$ , which forces a stall.

C1-C3: $I_1$ proceeds through IF, RD, EX. $I_2$ performs IF and RD. $I_3$ performs IF.
C4: $I_1$ performs MA. $I_2$ stalls due to the data dependency on $I_1$ . $I_3$ advances to RD.
C5: $I_1$ completes WB. $I_2$ proceeds to EX. $I_3$ stays in RD.
C6: $I_2$ proceeds to MA, and $I_3$ proceeds to EX.
C7: $I_2$ completes WB, and $I_3$ proceeds to MA.
C8: $I_3$ completes WB.

The sequence of instructions completes at the end of clock cycle 8.

Q55GATE 2004MCQ

In an enhancement of a design of a CPU, the speed of a floating point unit has been increased by 20% and the speed of a fixed point unit has been increased by 10%. What is the overall speedup achieved if the ratio of the number of floating point operations to the number of fixed point operations is 2:3 and the floating point operation used to take twice the time taken by the fixed point operation in the original design?

🚀 Solve in Practice Mode

📖 Explanation

Let $t_{FIX, old} = \tau$ , so the original time for floating point operations is $t_{FP, old} = 2\tau$ . With operation counts $N_{FP} = 2k$ and $N_{FIX} = 3k$ , the original total time is $T_{old} = (2k)(2\tau) + (3k)(\tau) = 7k\tau$ .
The new operation times are $t_{FP, new} = \frac{2\tau}{1.2} = \frac{5\tau}{3}$ and $t_{FIX, new} = \frac{\tau}{1.1} = \frac{10\tau}{11}$ .
The new total time is $T_{new} = (2k)\left(\frac{5\tau}{3}\right) + (3k)\left(\frac{10\tau}{11}\right) = k\tau \left(\frac{10}{3} + \frac{30}{11}\right) = k\tau \left(\frac{110 + 90}{33}\right) = \frac{200k\tau}{33}$ .
The overall speedup is $\text{Speedup} = \frac{T_{old}}{T_{new}} = \frac{7k\tau}{(200/33)k\tau} = \frac{7 \times 33}{200} = \frac{231}{200} = 1.155$ .

Q56GATE 2004MCQ

A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds respectively. Registers that are used between the stages have a delay of 5 nanoseconds each. Assuming constant clocking rate, the total time taken to process 1000 data items on this pipeline will be

🚀 Solve in Practice Mode

📖 Explanation

The clock cycle time ( $T_c$ ) is determined by the slowest stage delay plus the register delay:
$T_c = \max(150, 120, 160, 140) + 5 = 160 + 5 = 165 \text{ ns}$
The total time to process $n=1000$ instructions in a $k=4$ stage pipeline is given by:
$T_{total} = (k + n - 1) \times T_c$
Substituting the values:
$T_{total} = (4 + 1000 - 1) \times 165 = 1003 \times 165 = 165495 \text{ ns}$
Converting to microseconds:
$165495 \text{ ns} = 165.495 \mu s \approx 165.5 \mu s$

Q57GATE 2004MCQ

Consider a pipeline processor with 4 stages S1 to S4. We want to execute the following loop: for (i = 1; i < = 1000; i++) {I1, I2, I3, I4} where the time taken (in ns) by instructions I1 to I4 for stages S1 to S4 are given below:

\begin{array}{|c|c|c|c|c|} \hline & \textbf {$S _1$} &\textbf {$S _2$} & \textbf {$S _3$} & \textbf{$S _4$ } \\ \hline \textbf{I1}& \text{$1$} & \text{$2$} & \text{$1$} & \text{$2$} \\ \hline \textbf{I2} & \text{$2$} & \text{$1$} & \text{$2$} & \text{$1$}\\ \hline \textbf{I3}& \text{$1$} & \text{$1$} & \text{$2$} & \text{$1$} \\ \hline \textbf{I4} & \text{$2$} & \text{$1$} & \text{$2$} & \text{$1$} \\ \hline \end{array}

The output of I1 for i = 2 will be available after

🚀 Solve in Practice Mode

📖 Explanation

The pipeline execution time for instruction $I$ in stage $S$ is calculated as $End(I, S) = \max(End(I, S_{prev}), End(I_{prev}, S)) + T(I, S)$ . For the first iteration ( $i=1$ ), the completion times $End(I_n, S_m)$ are:

$I_1: S_1=1, S_2=3, S_3=4, S_4=6$
$I_2: S_1=3, S_2=4, S_3=6, S_4=7$
$I_3: S_1=4, S_2=5, S_3=8, S_4=9$
$I_4: S_1=6, S_2=7, S_3=10, S_4=11$

For $i=2$ , $I_1$ follows $I_4$ from the previous iteration:

$End(I_1, S_1) = \max(0, End(I_4, S_1)) + 1 = 6 + 1 = 7$
$End(I_1, S_2) = \max(7, End(I_4, S_2)) + 2 = \max(7, 7) + 2 = 9$
$End(I_1, S_3) = \max(9, End(I_4, S_3)) + 1 = \max(9, 10) + 1 = 11$
$End(I_1, S_4) = \max(11, End(I_4, S_4)) + 2 = \max(11, 11) + 2 = 13$

The output of $I_1$ for $i=2$ is available at $13 \text{ ns}$ .

Q58GATE 2003MCQ

For a pipelined CPU with a single ALU, consider the following situations 1. The (j + 1)-th instruction uses the result of j-th instruction as an operand 2. The execution of a conditional jump instruction 3. The j - th and j + 1 - st instructions require the ALU at the same time Which of the above can cause a hazard?

🚀 Solve in Practice Mode

📖 Explanation

Situation 1 creates a data hazard (RAW dependency), as the $(j+1)$ -th instruction requires the output of the $j$ -th instruction before it has been written back to the register file.
Situation 2 creates a control hazard, as the pipeline cannot determine the next instruction address to fetch until the branch condition is evaluated.
Situation 3 creates a structural hazard, because the single ALU cannot process requests from two different instructions simultaneously within the same clock cycle.
Since these scenarios represent the three fundamental causes of pipeline hazards-data, control, and structural-all three situations interfere with the continuous flow of instructions.

Q59GATE 2002MCQ

The performance of a pipelined processor suffers if

🚀 Solve in Practice Mode

📖 Explanation

In an ideal pipeline, stage delays are balanced and instructions are independent. However, unequal stage delays (Option A) force the clock period to be determined by the slowest stage, wasting time in faster stages. Dependent instructions (Option B) introduce data or control hazards, forcing the pipeline to stall or flush, which increases the average cycles per instruction ( $CPI$ ). Sharing hardware resources (Option C) creates structural hazards, forcing stalls when multiple instructions compete for the same unit. Since all these factors inevitably degrade throughput and efficiency, the performance suffers.

Q60MCQ

Comparing the time T1 taken for a single instruction on a pipelined CPU with time T2 taken on a non-pipelined but identical CPU, we can say that

🚀 Solve in Practice Mode

📖 Explanation

For a non-pipelined CPU, the time taken for one instruction $T_2$ is the total propagation delay of the combinational logic required for the instruction. In a pipelined CPU, the instruction is divided into $N$ stages, and the time for one instruction $T_1$ is $N \times T_{clock}$ , where $T_{clock}$ is the cycle time. The cycle time is defined as $T_{clock} = T_{max\_stage} + T_{overhead}$ , where $T_{overhead}$ accounts for the latching/register delays between stages. Because $T_{overhead} > 0$ and the non-pipelined design lacks these additional latching delays, the latency of a single instruction in a pipeline is typically greater than or equal to that of a non-pipelined instruction. Consequently, $T_1 \geq T_2$ .

Want unlimited AI-generated Pipeline Processor questions?

Sign up free and practice with adaptive difficulty — Easy, Medium, Hard. New questions every session.

Start practising for free →