GATE Questions & Answers of Computer Organization and Architecture Computer Science and Information Technology

The size of the physical address space of a processor is 2P bytes. The word length is 2W bytes. The capacity of cache memory is 2N bytes. The size of each cache block is 2M words. For a K-way set-associative cache memory, the length (in number of bits) of the tag field is

The instruction pipeline of a RISC processor has the following stages: Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Writeback (WB). The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards.
The number of clock cycles required for completion of execution of the sequence of instructions is ______.

 

A processor has 16 integer registers (R0, R1, .. , R15) and 64 floating point registers (F0, F1,… , F63). It uses a 2-byte instruction format. There are four categories of instructions:Type-1, Type-2, Type-3, and Type-4. Type-1 category consists of four instructions, each with 3 integer register operands (3Rs). Type-2 category consists of eight instructions, each with 2 floating point register operands (2Fs). Type-3 category consists of fourteen instructions, each with one integer register operand and one floating point register operand (1R+1F). Type-4 category consists of N instructions, each with a floating point register operand (1F).
 
The maximum value of N is __________.

When two 8-bit  number A7....A0  and  B7 ..... B0  in 2's  complement representation (with A0 and B0 as the least significant bits) are added using ripple-carry adder, the sum bits obtained are S7.....S0 and the  carry bits are C7....C0 . An overflow is said to have occured if

Consider a two-level cache hierarchy with L1 and L2 caches. An application incurs 1.4 memory accesses per instruction on average. For this application, the miss rate of L1 cache is 0.1; the L2 cache experiences, on average, 7 misses per 1000 instructions. The miss rate L2 expressed correct to two decimal places is _________.

Consider a RISC machine where each instruction is exactly 4 bytes long. Conditional and unconditional branch instruction use PC-relative addressing mode with Offset specified in bytes to the target location of the branch instruction. Further the Offset is always with respect to the address of the next instruction in the program sequence. Consider the following instruction sequence.

Instr. No. Instruction
i   :     add  R2,  R3,  R4
i+1 :     sub  R5,  R6,  R7
i+2 :     cmp  R1,  R9,  R10
i+3 :     beq  R1,  offset
If the target of the branch instruction is i, then the decimal value of the offset is ______

Instruction execution in a processor is divided into 5 stage, Instruction Fetch (IF), Instruction decode (ID), Operand Featch (OF), Execute (EX), and Write Back (WB). These stages take 5, 4, 20, 10,and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementations of the processor are contemplated;
 
(i) a naive pipeline implementation (NP) with 5 stages and
(ii) an efficiant pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns respectively.
 
The speedup (correct to two decimal places) achived by EP over NP in executing 20 independent instructionswith no hazards is __________.

Consider a 2-way set associative cache with 256 blocks and uses LRU replacement, Initially the cache is empty. Conflict misses are those misses which occur due to contention of multiple blocks for the same cache set. Compulsory misses occur due to first time access to the block. The following sequence of accesse to memory blocks

(0, 128, 256, 128, 0, 128, 256, 128, 1, 129, 257, 129, 1, 129, 257, 129)

is repeated 10 times. The number of conflict misses experinced by the cache is ____________ . 

A cache memory unit with capacity of N words and block size of B words is to be designed. If it is designed as a direct mapped cache, the length of the TAG field is 10 bits. If the cache unit is now designed as a 16-way set-associative cache, the length of the TAG field is ______ bits.

In a two-level cache system, the access times of L1 and L2 caches are 1 and 8 clock cycles, respectively. The miss penalty from the L2 cache to main memory is 18 clock cycles. The miss rate of L1 cache is twise that of L2. The average memory access time (AMAT) of this cache system is 2 cycles. The miss rate of L1 and L2 respectively are:

The read access times and the hit ratio for different caches in a memory hierachy are as given below.

Cache

Read access time

(in nanosecond)

Hit ratio
I-cache 2 0.8
D-cache 2 0.9
L2-cache 8 0.9

The read access time of main memory is 90 nanoseconds. Assume that the caches use the referred-word-first read policy and the write back policy. Assume that all the cache are direct mapped caches. Assume that the dirty bit is always 0 for all the blocks in the caches. In execution of a program, 60% of memory read are for instruction fetch and 40% are for memory operand fetch. The average read access time in nanoseconds (up to 2 decimal place) is ____________ .  

Consider a machine with a byte addreassable main memory of $ 2^{32} $ bytes divided into blocks of size 32 bytes. Assume that a direct mapped cache having 512 cache line is used with this machine. The size of the tag filed in bites is __________.

A processor can support a maximum memory of 4GB, where the memory is word-addressable (a word consists of two bytes). The size of the address bus of the processor is at least bits.

The size of the data count register of a DMA controller is 16 bits. The processor needs to transfer a file of 29, 154 kilobytes from disk to main memory. The memory is byte addressable. The minimum number of times the DMA controller needs to get the control of the system bus from the processor to transfer the file from the disk to main memory is ____________.

The stage delays in a 4-stage pipeline are 800, 500, 400 and 300 picoseconds. The first tage(with delay 800 picoseconds) is replaced with a functionally equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is___________ percent.

Consider a carry lookahead adder for adding two n-bit integers, built using gates of fan-in at most two. The time to perform addition using this adder is

A processor has 40 distinct instructions and 24 general purpose registers. A 32-bit instruction word has an opcode, two register operands and an immediate operand. The number of bits available for the immediate operand field is __________.

Consider a processor with 64 registers and an instruction set of size twelve. Each instruction has five distinct fields, namely, opcode, two source register identifiers, one destination register identifier, and a twelve-bit immediate value. Each instruction must be stored in memory in a byte-aligned fashion. If a program has 100 instructions, the amount of memory (in bytes) consumed by the program text is __________ .

The width of the physical address on a machine is 40 bits. The width of the tag field in a 512 KB 8-way set associative cache is _______ bits.

Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies $\style{font-family:'Times New Roman'}{\tau_1\;,\;\tau_2}$, and $\style{font-family:'Times New Roman'}{\tau_3\;}$ such that $\style{font-family:'Times New Roman'}{\tau_1=3\tau\;=\;3\tau_2/4=2\tau_3}$. If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is _________ GHz, ignoring delays in the pipeline registers.

For computer based on three-address instruction formats, each address feild can be used to specify which of the following:
(S1) A memory operand
(S2) A processor register
(S3) An implied accumulator register

Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipelined delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is _____.

Consider the sequence of machine instructions given below.

                                MUL R5, R0, R1
                                DIV R6, R2, R3
                                ADD R7, R5, R6
                                SUB R8, R7, R4

In the above sequence, R0 to R8 are general purpose registers. In the instruction shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode(IF), (2) Operand Fetch (OF), (3) Perform Operation(PO) and (4) Write back the result (WB). The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD or SUB instuction, 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is _________.

Consider a processor with byte-addressable memory. Assume that all registers, including Program Counter (PC) and Program Status Word (PSW), are of size 2 bytes. A stack in the main memory is implemented from memory location (0100)16 and it grows upward. The stack pointer(SP) points to the top element of the stack. The current value of SP is (016E)16. The CALL instruction is of two words, the first is the op-code and the second word is the starting address of the subroutine(one word = 2 bytes). The CALL instruction is implemented as follows:

  • Store the current value of PC in the stack
  • Store the value of PSW register in the stack
  • Load the starting address of the subroutine in PC


The content of PC just before the fetch of CALL instruction is (5FA0)16. After execution of the CALL instruction, the value of the stack pointer is

Consider a machine with a byte addressable main memory of 220 bytes, block size of 16 bytes and a direct mapped cache having 212 cache lines. Let the addresses of two consecutive bytes in main memory be (E201F)16 and (E2020)16. What are the tag and cache line address (in hex) for main memory address (E201F)16?

Consider the following reservation table for a pipeline having three stages S1, S2, and S3.

  Time →
 
1 2 3 4 5
S1
S2
S3
X       X
  X   X  
    X    

 

The minimum average latency (MAL) is _____.

 

Consider the following code sequence having five instructions I1 to I5. Each of these instructions has the following format.
OP Ri, Rj, Rk
Where operation Op is performed on contents of registers Rj and Rk and the result is stored in register Ri.
I1: ADD R1, R2, R3
I2: MUL R7., R1, R3
I3: SUB R4, R1, R5
I4: ADD R3, R2, R4
I5: MUL R7, R8, R9

Consider the following three statements.

S1:There is an anti-dependence between instruction I2 and I5
S2:There is an anti-dependence between instructions I2 and I4
S3:Within an instruction pipeline an anti-dependence always creates one or more stalls

Which one of above statements is/are correct?

A machine has a 32-bit architecture, with 1-word long instructions. It has 64 registers, each of which is 32 bits long. It needs to support 45 instructions, which have an immediate operand in addition to two register operands. Assuming that the immediate operand is an unsigned integer, the maximum value of the immediate operand is ____________.

Consider a 6-stage instruction pipeline, where all stages are perfectly balanced.Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is ______________________.

An access sequence of cache block addresses is of length N and contains n unique block addresses.The number of unique block addresses between two consecutive acceses to the same block address is bounded above by k. What is the miss ratio if the access sequence is passed through a cache of  associativity A ≥ k exercising least-recently-used replacement policy?

Consider two processors p1 and p2 executing the same instruction set. Assume that under identicalv conditions, for the same input, a program running on p2 takes 25% less time but incurs 20% more CPI (clock cycles per instruction) as compared to the program running on p2. If the clock frequency of p1 is 1GHz, then the clock frequency of p2 (in GHz) is _________.

A 4-way set-associative cache memory unit with a capacity of 16 KB is built using a block size of 8 words. The word length is 32 bits. The size of the physical address space is 4 GB. The number of bits for the TAG field is _____

In designing a computer’s cache system, the cache block (or cache line) size is an important parameter. Which one of the following statements is correct in this context?

If the associativity of a processor cache is doubled while keeping the capacity and block size unchanged, which one of the following is guaranteed to be NOT affected?

The value of a float type variable is represented using the single-precision 32-bit floating point format of IEEE-754 standard that uses 1 bit for sign, 8 bits for biased exponent and 23 bits for mantissa. A float type variable X is assigned the decimal value of -14.25. The representation of X in hexadecimal notation is

Consider a main memory system that consists of 8 memory modules attached to the system bus, which is one word wide. When a write request is made, the bus is occupied for 100 nanoseconds (ns) by the data, address, and control signals. During the same 100 ns, and for 500 ns thereafter, the addressed memory module executes one cycle accepting and storing the data. The (internal) operation of different memory modules may overlap in time, but only one request can be on the bus at any time. The maximum number of stores (of one word each) that can be initiated in 1 millisecond is ____________

Consider the following processors (ns stands for nanoseconds). Assume that the pipeline registers have zero latency.

P1: Four-stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns.
P2: Four-stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns.
P3: Five-stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns.
P4: Five-stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns.

Which processor has the highest peak clock frequency?

An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF), instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency 1 ns. The new design has a total of eight pipeline stages. A program has 20% branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are P and Q nanoseconds, respectively. The value of P/Q is __________.

The memory access time is 1 nanosecond for a read operation with a hit in cache, 5 nanoseconds for a read operation with a miss in cache, 2 nanoseconds for a write operation with a hit in cache and 10 nanoseconds for a write operation with a miss in cache. Execution of a sequence of instructions involves 100 instruction fetch operations, 60 memory operand read operations and 40 memory operand write operations. The cache hit-ratio is 0.9. The average memory access time (in nanoseconds) in executing the sequence of instructions is __________.

In a k-way set associative cache, the cache is divided into v sets, each of which consists of k lines. The lines of a set are placed in sequence one after another. The lines in set s are sequenced before the lines in set (s+1). The main memory blocks are numbered 0 onwards. The main memory block numbered j must be mapped to any one of the cache lines from

Consider the following sequence of micro-operations.
MBR ← PC
MAR ← X
PC ← Y
Memory ← MBR
Which one of the following is a possible operation performed by this sequence?

Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I1, I2, I3, …, I12 is executed in this pipelined processor. Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

A RAM chip has a capacity of 1024 words of 8 bits each (1K × 8). The number of 2 × 4 decoders with enable line needed to construct a 16K × 16 RAM from 1K × 8 RAM is

The following code segment is executed on a processor which allows only register operands in its instructions. Each instruction can have atmost two source operands and one destination operand. Assume that all variables are dead after this code segment.
c = a + b;
d = c * a;
e = c + a;
x = c * c;
if (x > a) {
   y = a * a;
}
else {
    d = d * d;
    e = e * e;
}

Suppose the instruction set architecture of the processor has only two registers. The only allowed compiler optimization is code motion, which moves statements from one place to another while preserving correctness. What is the minimum number of spills to memory in the compiled code?

The following code segment is executed on a processor which allows only register operands in its instructions. Each instruction can have atmost two source operands and one destination operand. Assume that all variables are dead after this code segment.
c = a + b;
d = c * a;
e = c + a;
x = c * c;
if (x > a) {
   y = a * a;
}
else {
    d = d * d;
    e = e * e;
}

What is the minimum number of registers needed in the instruction set architecture of the processor to compile this code segment without any spill to memory? Do not apply any optimization other than optimizing register allocation.

The amount of ROM needed to implement a 4 bit multiplier is

Register renaming is done in pipelined processors

A computer has a 256 KByte, 4-way set associative, write back data cache with block size of 32 Bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag directory entry contains, in addition to address tag, 2 valid bits, 1 modified bit and 1 replacement bit.

The number of bits in the tag field of an address is

A computer has a 256 KByte, 4-way set associative, write back data cache with block size of 32 Bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag directory entry contains, in addition to address tag, 2 valid bits, 1 modified bit and 1 replacement bit.

The size of the cache tag directory is

The minimum number of D flip—flops needed to design a mod-258 counter is

Consider a hypothetical processor with an instruction of type LW R1 , 20 (R2), which during execution reads a 32-bit word from memory and stores it in a 32-bit register R1. The effective address of the memory location is obtained by the addition of a constant 20 and the contents of register R2. Which of the following best reflects the addressing mode implemented by this instruction for the operand in memory?

On a non-pipelined sequential processor, a program segment, which is a part of the interrupt service routine, is given to transfer 500 bytes from an I/O device to memory.

     Initialize the address register
     Initialize the count to 500
LOOP:Load a byte from device
     Store in memory at address given by address register
     Increment the address register
     Decrement the count
     If count != 0 go to LOOP

Assume that each statement in this program is equivalent to a machine instruction which takes one clock cycle to execute if it is a non-load/store instruction. The load-store instructions take two clock cycles to execute. The designer of the system also has an alternate approach of using the DMA controller to implement the same transfer. The DMA controller requires 20 clock cycles for initialization and other overheads. Each DMA transfer cycle takes two clock cycles to transfer one byte of data from the device to the memory. What is the approximate speedup when the DMA controller based design is used in place of the interrupt driven program based input-output?

Consider evaluating the following expression tree on a machine with load-store architecture in which memory can be accessed only through load and store instructions. The variables a, b, c, d and e are initially stored in memory. The binary operators used in this expression tree can be evaluated by the machine only when the operands are in registers. The instructions produce result only in a register. If no intermediate results can be stored in memory, what is the minimum number of registers needed to evaluate this expression?

Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage.Delays for the stages and for the pipeline registers are as given in the figure.

What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation'?

An 8KB direct-mapped write-back cache is organized as multiple blocks, each of size 32-bytes. The processor generates 32-bit addresses. The cache controller maintains the tag information for each cache block comprising of the following.
1 Valid bit
1 Modified bit
As many bits as the minimum needed to identify the memory block mapped in the cache.
What is the total size of memory needed at the cache controller to store meta-data (tags) for the cache?

A main memory unit with a capacity of 4 megabytes is built using 1M×1-bit DRAM chips.Each DRAM chip has 1K rows of cells with 1K cells in each row. The time taken for a single refresh operation is 100 nanoseconds. The time required to perform one refresh operation on all the cells in the memory unit is

A 5-stage pipelined processor has Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Write Operand (WO) stages. The IF, ID, OF and WO stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions, 3 clock cycles for MUL instruction, and 6 clock cycles for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions?

Instruction Meaning of instruction
I0 :MUL R2 ,R0 ,R1 R2 ← R0 *R1
I1 :DIV R5 ,R3 ,R4 R5 ← R3 /R4
I2 : ADD R2 ,R5 ,R2 R2 ← R5 + R2
I3 :SUB R5 ,R2 ,R6 R5 ← R2 - R6

The program below uses six temporary variables a, b, c, d, e, f.
a = 1
b = 10
c = 20
d = a + b
e = c + d
f = c + e
b = c + e
e = b + f
d = 5 + e
return d + f

Assuming that all operations take their operands from registers, what is the minimum number of registers needed to execute this program without spilling?

A computer system has an L1 cache, an L2 cache, and a main memory unit connected as shown below. The block size in L1 cache is 4 words. The block size in L2 cache is 16 words. The memory access times are 2 nanoseconds. 20 nanoseconds and 200 nanoseconds for L1 cache, L2 cache and main memory unit respectively.

When there is a miss in L1 cache and a hit in L2 cache, a block is transferred from L2 cache to L1 cache. What is the time taken for this transfer?

A computer system has an L1 cache, an L2 cache, and a main memory unit connected as shown below. The block size in L1 cache is 4 words. The block size in L2 cache is 16 words. The memory access times are 2 nanoseconds. 20 nanoseconds and 200 nanoseconds for L1 cache, L2 cache and main memory unit respectively.

When there is a miss in both L1 cache and L2 cache, first a block is transferred from main memory to L2 cache, and then a block is transferred from L2 cache to L1 cache. What is the total time taken for these transfers?

How many 32K × 1 RAM chips are needed to provide a memory capacity of 256 K-bytes?

A CPU generally handles an interrupt by executing an interrupt service routine

Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:

  S1 S2 S3 S4
I1 2 1 1 1
I2 1 3 2 2
I3 2 1 1 3
I4 1 2 2 2

What is the number of cycles needed to execute the following loop?

for (i=1 to 2) {I1; I2; I3; I4;}

Consider a 4-way set associative cache (initially empty) with total 16 cache blocks. The main memory consists of 256 blocks and the request for memory blocks is in the following order:

0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155.

Which one of the following memory block will NOT be in cache if LRU replacement policy is used?

A hard disk has 63 sectors per track, 10 platters each with 2 recording surfaces and 1000 cylinders. The address of a sector is given as a triple c,h,s, where c is the cylinder number, h is the surface number and s is the sector number. Thus, the 0th sector is addressed as 0,0,0, the 1st sector as 0,0,1, and so on.

The address 400,16,29 corresponds to sector number:

A hard disk has 63 sectors per track, 10 platters each with 2 recording surfaces and 1000 cylinders. The address of a sector is given as a triple c,h,s, where c is the cylinder number, h is the surface number and s is the sector number. Thus, the 0th sector is addressed as 0,0,0, the 1st sector as 0,0,1, and so on.

The address of the 1039th sector is

For a magnetic disk with concentric circular tracks, the seek latency is not linearly proportional to the seek distance due to

Which of the following is/are true of the auto-increment addressing mode?

I. It is useful in creating self-relocating code
II. If it is included in an Instruction Set Architecture, then an additional ALU is required for effective address calculation
III. The amount of increment depends on the size of the data item accessed

Which of the following must be true for the RFE (Return From Exception) instruction on a general purpose processor?
I. It must be a trap instruction
II. It must be a privileged instruction
III. An exception cannot be allowed to occur during execution of an RFE instruction

For inclusion to hold between two cache levels L1 and L2 in a multi-level cache hierarchy, which of the following are necessary?
I. L1 must be a write-through cache
II. L2 must be a write-through cache
III. The associativity of L2 must be greater than that of L1
IV. The L2 cache must be at least as large as the L1 cache

Which of the following are NOT true in a pipelined processor?
I. Bypassing can handle all RAW hazards
II. Register renaming can eliminate all register carried WAR hazards
III. Control hazard penalties can be eliminated by dynamic branch prediction

The use of multiple register windows with overlap causes a reduction in the number of memory accesses for

I. Function locals and parameters
II. Register saves and restores
III. Instruction fetches

In an instruction execution pipeline, the earliest that the data TLB (Translation Lookaside Buffer) can be accessed is

Consider a machine with a 2-way set associative data cache of size 64Kbytes and block size 16bytes. The cache is managed using 32 bit virtual addresses and the page size is 4Kbyts. A program to be run on this machine begins as follows:
double ARR [1024] [1024];
int i, j ;
/* Initialize array ARR to 0.0 * /
for (i = 0; i < 1024; i ++)
for (j = 0; j < 1024; j ++)
ARR [i] [j] = 0.0;

The size of double is 8Bytes. Array ARR is located in memory starting at the beginning of virtual page 0xFF000 and stored in row major order. The cache is initially empty and no pre-fetching is done. The only data memory references made by the program are those to array ARR

The total size of the tags in the cache directory is

Consider a machine with a 2-way set associative data cache of size 64Kbytes and block size 16bytes. The cache is managed using 32 bit virtual addresses and the page size is 4Kbyts. A program to be run on this machine begins as follows:
double ARR [1024] [1024];
int i, j ;
/* Initialize array ARR to 0.0 * /
for (i = 0; i < 1024; i ++)
for (j = 0; j < 1024; j ++)
ARR [i] [j] = 0.0;

The size of double is 8Bytes. Array ARR is located in memory starting at the beginning of virtual page 0xFF000 and stored in row major order. The cache is initially empty and no pre-fetching is done. The only data memory references made by the program are those to array ARR

Which of the following array elements has the same cache index as ARR [0] [0]?

Consider a machine with a 2-way set associative data cache of size 64Kbytes and block size 16bytes. The cache is managed using 32 bit virtual addresses and the page size is 4Kbyts. A program to be run on this machine begins as follows:
double ARR [1024] [1024];
int i, j ;
/* Initialize array ARR to 0.0 * /
for (i = 0; i < 1024; i ++)
for (j = 0; j < 1024; j ++)
ARR [i] [j] = 0.0;

The size of double is 8Bytes. Array ARR is located in memory starting at the beginning of virtual page 0xFF000 and stored in row major order. The cache is initially empty and no pre-fetching is done. The only data memory references made by the program are those to array ARR

The cache hit ratio for this initialization loop is

Delayed branching can help in the handling of control hazards

For all delayed conditional branch instructions, irrespective of whether the condition evaluates to true or false

Delayed branching can help in the handling of control hazards

The following code is to run on a pipelined processor with one branch delay slot:
I1 : ADD R2 R7 + R8
I2 : SUB R4 R5 - R6
I3 : ADD R1 R2 + R3
I4 : STORE Memory [R4] R1
BRANCH to Label if R1 == 0
Which of the instructions I1,  I2, I3 or I4 can legitimately occupy the delay slot without any other program modification?

Consider a 4-way set associative cache consisting of 128 lines with a line size of 64 words. The CPU generates a 20-bit address of a word in main memory. The number of bits in the TAG, LINE and WORD fields are respectively:

Consider a disk pack with 16 surfaces, 128 tracks per surface and 256 sectors per track. 512 bytes of data are stored in a bit serial manner in a sector. The capacity of the disk pack and the number of bits required to specify a particular sector in the disk are respectively:

Consider a pipelined processor with the following four stages:

IF: Instruction Fetch
ID: Instruction Decode and Operand Fetch
EX: Execute
WB: Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

ADD  R2, R1,  R0 R2 R1 + R0
MUL  R4, R3, R2 R4 R3 * R2
SUB  R6, R5, R4 R6 R5 - R4

Consider the following program segment. Here R1, R2 and R3 are the general purpose registers.

  Instruction Operation Instruction size (no. of words)
  MOV R1, (3000) R1M[3000] 2
LOOP: MOV R2, (R3) R2M[R3] 1
  ADD R2, R1 R2R1 + R2 1
  MOV (R3), R2 M [R3]R2 1
  INC R3 R3R3 + 1 1
  DEC R1 R1R1 – 1 1
  BNZ LOOP Branch on not zero 2
  HALT Stop 1

Assume that the content of memory location 3000 is 10 and the content of the register R3 is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from the memory location 1000. All the numbers are in decimal.

Assume that the memory is word addressable. The number of memory references for accessing the data in executing the program completely is:

Consider the following program segment. Here R1, R2 and R3 are the general purpose registers.

  Instruction Operation Instruction size (no. of words)
  MOV R1, (3000) R1M[3000] 2
LOOP: MOV R2, (R3) R2M[R3] 1
  ADD R2, R1 R2R1 + R2 1
  MOV (R3), R2 M [R3]R2 1
  INC R3 R3R3 + 1 1
  DEC R1 R1R1 – 1 1
  BNZ LOOP Branch on not zero 2
  HALT Stop 1

Assume that the content of memory location 3000 is 10 and the content of the register R3 is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from the memory location 1000. All the numbers are in decimal.

Assume that the memory is word addressable. After the execution of this program, the content of memory location 2010 is:

Consider the following program segment. Here R1, R2 and R3 are the general purpose registers.

  Instruction Operation Instruction size (no. of words)
  MOV R1, (3000) R1M[3000] 2
LOOP: MOV R2, (R3) R2M[R3] 1
  ADD R2, R1 R2R1 + R2 1
  MOV (R3), R2 M [R3]R2 1
  INC R3 R3R3 + 1 1
  DEC R1 R1R1 – 1 1
  BNZ LOOP Branch on not zero 2
  HALT Stop 1

Assume that the content of memory location 3000 is 10 and the content of the register R3 is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from the memory location 1000. All the numbers are in decimal.

Assume that the memory is byte addressable and the word size is 32 bits. If an interrupt occurs during the execution of the instruction “INC R3”, what return address will be pushed on to the stack?

Consider a machine with a byte addressable main memory of 216 bytes. Assume that a direct mapped data cache consisting of 32 lines of 64 bytes each is used in the system. A 50 × 50 two-dimensional array of bytes is stored in the main memory starting from memory location 1100H. Assume that the data cache is initially empty. The complete array is accessed twice. Assume that the contents of the data cache do not change in between the two accesses.

How many data cache misses will occur in total?