### SET 2

Question 1

**A machine has a 32-bit architecture, with 1-word long instructions. It has 64 registers, each of which is 32 bits long. It needs to support 45 instructions, which have an immediate operand in addition to two register operands. Assuming that the immediate operand is an unsigned integer, the maximum value of the immediate operand is ____________.**

Answer Discuss it!

.

Correct answer is :16383

Solution :

1 Word = 32 bits

Each instruction has 32 bits

To support 45 instructions, opcode must contain 6-bits

Register operand1 requires 6 bits, since the total registers are 64.

Register operand 2 also requires 6 bits

Opcode(6) Reg opd 1(6) Reg opd 2(6) immediate opnd(14)

14-bits are left over for immediate Operand Using 14-bits, we can give maximum 16383, Since 2^{14} = 16384

Question 2

**Consider a 6-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6- stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is ______________________.**

Answer Discuss it!

.

Correct answer is :4

Solution :

For 6 stages, non- pipelining takes 6 cycles.

There were 2 stall cycles for pipelining for 25% of the instructions

So pipeline time = (1+25/100*2) = 3/2 =1.5

Speed up = Non - pipeline time/pipeline time = 6/1.5 =4

Question 3

**An access sequence of cache block addresses is of length N and contains n unique block addresses. The number of unique block addresses between two consecutive accesses to the same block address is bounded above by k. What is the miss ratio if the access sequence is passed through a cache of associativity A >= k exercising least-recently-used replacement policy?**

A : n/N

B : 1//N

C : 1/A

D : k/n

Answer Discuss it!

.

Correct answer is :A

Question 4

**Consider two processors P1 and P2 executing the same instruction set. Assume that under identical conditions, for the same input, a program running on P2 takes 25% less time but incurs 20% more CPI (clock cycles per instruction) as compared to the program running on P1. If the clock frequency of P1 is 1GHz, then the clock frequency of P2 (in GHz) is _________.**

Answer Discuss it!

.

Correct answer is :1.6

Solution :

1 cycle time for p1= 10^{9} /1GH = 1n.s

Assume 1 p takes 5 cycles for a program then p2 takes 20% more, means, 6 cycles. p2 Takes 25% less time, means, if p1 takes 5 n.s, then p2 takes 3.75 n.s. Assume p2 clock frequency is x GHz.

p2 Taken 6 cycles , so 6 * 10^{9} / x GH = 3.75 => x=1.6

Question 5

**In designing a computer’s cache system, the cache block (or cache line) size is an important Parameter. Which one of the following statements is correct in this context?**

A : A smaller block size implies better spatial locality

B : A smaller block size implies a smaller cache tag and hence lower cache tag overhead

C : A smaller block size implies a larger cache tag and hence lower cache hit time

D : A smaller block size incurs a lower cache miss penalty

Answer Discuss it!

.

Correct answer is :D

Solution :

When a cache block size is smaller, it could accommodate more number of blocks, it improves the hit ratio for cache, so the miss penalty for cache will be lowered.

Question 6

**If the associativity of a processor cache is doubled while keeping the capacity and block size unchanged, which one of the following is guaranteed to be NOT affected? **

A : Width of tag comparator

B : Width of set index decoder

C : Width of way selection multiplexor

D : Width of processor to main memory data bus

Answer Discuss it!

.

Correct answer is :D

Solution :

When associativity is doubled, then the set offset will be effected, accordingly, the number of bits used for TAG comparator be effected.

Width of set index decoder also will be effected when set offset is changed.

Width of wag selection multiplexer wil be effected when the block offset is changed.

With of processor to main memory data bus is guaranteed to be NOT effected.

Question 7

**Consider a main memory system that consists of 8 memory modules attached to the system bus, which is one word wide. When a write request is made, the bus is occupied for 100 nanoseconds (ns) by the data, address, and control signals. During the same 100 ns, and for 500 ns thereafter, the addressed memory module executes one cycle accepting and storing the data. The (internal) operation of different memory modules may overlap in time, but only one request can be on the bus at any time. The maximum number of stores (of one word each) that can be initiated in 1 millisecond is ____________**

Answer Discuss it!

.

Correct answer is :10000

Solution :

Each write request, the bus is occupied for 100 n.s

Storing of data requires 100 n.s.

In 100 n.s - 1 store

100/10^{6} n.s. = 1 store

1m.s. = 10^{6} / 100 stores = 10000 stores

Question 8

**The minimum number of arithmetic operations required to evaluate the polynomial P (X) = X**^{5} + 4X^{3} + 6x + 5 for a given value of X, using only one temporary variable is _____.

Answer Discuss it!

.

Correct answer is :7

Solution :

Take x common repeatedly and form an equation and count the steps.

Question 9

**An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF), instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency 1 ns. The new design has a total of eight pipeline stages. A program has 20% branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are P and Q nanoseconds, respectively. The value of P/Q is __________.**

Answer Discuss it!

.

Correct answer is :1.54

Question 10

**The memory access time is 1 nanosecond for a read operation with a hit in cache, 5 nanoseconds for a read operation with a miss in cache, 2 nanoseconds for a write operation with a hit in cache and 10 nanoseconds for a write operation with a miss in cache. Execution of a sequence of instructions involves 100 instruction fetch operations, 60 memory operand read operations and 40 memory operand write operations. The cache hit-ratio is 0.9. The average memory access time (in nanoseconds) in executing the sequence of instructions is __________.**

Answer Discuss it!

.

Correct answer is :1.68

Solution :

Total instruction= 100 instruction fetch operation + 60 memory operand read operation + 40 memory operand write op;

= 200 instructions (operations)

Time taken for fetching 100 instructions (equivalent to read) = 90*1ns +10*5ns =140ns

Memory operand Read operations = 90%(60)*1ns +10%(60)×5ns = 54ns + 30ns = 84ms

Memory operands write operation time = 90%(40)*2ns +10%(40)*10ns = 72ns + 40ns =112ns

Total time taken for executing 200 instructions =140 + 84 +112 = 336ns

Average memo

Question 1

.

Correct answer is :16383

Solution :

1 Word = 32 bits

Each instruction has 32 bits

To support 45 instructions, opcode must contain 6-bits

Register operand1 requires 6 bits, since the total registers are 64.

Register operand 2 also requires 6 bits

Opcode(6) | Reg opd 1(6) | Reg opd 2(6) | immediate opnd(14) |

14-bits are left over for immediate Operand Using 14-bits, we can give maximum 16383, Since 2

^{14}= 16384

Question 2

.

Correct answer is :4

Solution :

For 6 stages, non- pipelining takes 6 cycles.

There were 2 stall cycles for pipelining for 25% of the instructions

So pipeline time = (1+25/100*2) = 3/2 =1.5

Speed up = Non - pipeline time/pipeline time = 6/1.5 =4

Question 3

.

Correct answer is :A

Question 4

.

Correct answer is :1.6

Solution :

1 cycle time for p1= 10

^{9}/1GH = 1n.s

Assume 1 p takes 5 cycles for a program then p2 takes 20% more, means, 6 cycles. p2 Takes 25% less time, means, if p1 takes 5 n.s, then p2 takes 3.75 n.s. Assume p2 clock frequency is x GHz.

p2 Taken 6 cycles , so 6 * 10

^{9}/ x GH = 3.75 => x=1.6

Question 5

.

Correct answer is :D

Solution :

When a cache block size is smaller, it could accommodate more number of blocks, it improves the hit ratio for cache, so the miss penalty for cache will be lowered.

Question 6

.

Correct answer is :D

Solution :

When associativity is doubled, then the set offset will be effected, accordingly, the number of bits used for TAG comparator be effected.

Width of set index decoder also will be effected when set offset is changed.

Width of wag selection multiplexer wil be effected when the block offset is changed.

With of processor to main memory data bus is guaranteed to be NOT effected.

Question 7

.

Correct answer is :10000

Solution :

Each write request, the bus is occupied for 100 n.s

Storing of data requires 100 n.s.

In 100 n.s - 1 store

100/10

^{6}n.s. = 1 store

1m.s. = 10

^{6}/ 100 stores = 10000 stores

Question 8

^{5}+ 4X

^{3}+ 6x + 5 for a given value of X, using only one temporary variable is _____.

.

Correct answer is :7

Solution :

Take x common repeatedly and form an equation and count the steps.

Question 9

.

Correct answer is :1.54

Question 10

.

Correct answer is :1.68

Solution :

Total instruction= 100 instruction fetch operation + 60 memory operand read operation + 40 memory operand write op;

= 200 instructions (operations)

Time taken for fetching 100 instructions (equivalent to read) = 90*1ns +10*5ns =140ns

Memory operand Read operations = 90%(60)*1ns +10%(60)×5ns = 54ns + 30ns = 84ms

Memory operands write operation time = 90%(40)*2ns +10%(40)*10ns = 72ns + 40ns =112ns

Total time taken for executing 200 instructions =140 + 84 +112 = 336ns

Average memo