# Questions & Answers of Instruction pipelining

Question No. 50

Instruction execution in a processor is divided into 5 stage, Instruction Fetch (IF), Instruction decode (ID), Operand Featch (OF), Execute (EX), and Write Back (WB). These stages take 5, 4, 20, 10,and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementations of the processor are contemplated;

(i) a naive pipeline implementation (NP) with 5 stages and
(ii) an efficiant pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns respectively.

The speedup (correct to two decimal places) achived by EP over NP in executing 20 independent instructionswith no hazards is __________.

Question No. 143

Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies $\tau_1$, $\tau_2$, and $\tau_3$ such that $\tau_1$ = $\frac{3\tau_2}4$ = 2$\tau_3$. If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is _________ GHz, ignoring delays in the pipeline registers.

Question No. 55

Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipelined delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is _____.

Question No. 53

Consider a 6-stage instruction pipeline, where all stages are perfectly balanced.Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is ______________________.

Question No. 219

Consider the following processors (ns stands for nanoseconds). Assume that the pipeline registers have zero latency.

P1: Four-stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns.
P2: Four-stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns.
P3: Five-stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns.
P4: Five-stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns.

Which processor has the highest peak clock frequency?

Question No. 253

An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF), instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency 1 ns. The new design has a total of eight pipeline stages. A program has 20% branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are P and Q nanoseconds, respectively. The value of P/Q is __________.

Question No. 20

Register renaming is done in pipelined processors

Question No. 41

Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage.Delays for the stages and for the pipeline registers are as given in the figure.

What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation'?

Question No. 33

A 5-stage pipelined processor has Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Write Operand (WO) stages. The IF, ID, OF and WO stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions, 3 clock cycles for MUL instruction, and 6 clock cycles for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions?

 Instruction Meaning of instruction I0 :MUL R2 ,R0 ,R1 R2 ← R0 *R1 I1 :DIV R5 ,R3 ,R4 R5 ← R3 /R4 I2 : ADD R2 ,R5 ,R2 R2 ← R5 + R2 I3 :SUB R5 ,R2 ,R6 R5 ← R2 - R6

Question No. 37

The program below uses six temporary variables a, b, c, d, e, f.
a = 1
b = 10
c = 20
d = a + b
e = c + d
f = c + e
b = c + e
e = b + f
d = 5 + e
return d + f

Assuming that all operations take their operands from registers, what is the minimum number of registers needed to execute this program without spilling?

Question No. 28

Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:

 S1 S2 S3 S4 I1 2 1 1 1 I2 1 3 2 2 I3 2 1 1 3 I4 1 2 2 2

What is the number of cycles needed to execute the following loop?

for (i=1 to 2) {I1; I2; I3; I4;}

Question No. 36

Which of the following are NOT true in a pipelined processor?
I. Bypassing can handle all RAW hazards
II. Register renaming can eliminate all register carried WAR hazards
III. Control hazard penalties can be eliminated by dynamic branch prediction

Question No. 37

Consider a pipelined processor with the following four stages:

IF: Instruction Fetch
ID: Instruction Decode and Operand Fetch
EX: Execute
WB: Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

 ADD  R2, R1,  R0 R2 $←$ R1 + R0 MUL  R4, R3, R2 R4 $←$ R3 * R2 SUB  R6, R5, R4 R6 $←$ R5 - R4