Description

Consider program P, which runs on a 4 GHz machine M in 500 seconds. Consider an optimization to P that replaces all instances of multiplying a value by 4 (mult X,X,4) with two instructions that set x to x+x twice (add X,X; add X,X). Assume that every multiply instruction takes 4 cycles to execute, and every add instruction takes just 1 cycle. After recompiling, the program now runs in 400 seconds on machine M. Determine how many multiplies were replaced by this optimization. In your answer, show all of your calculations and analysis. [20%]

Your company could speed up a Java program on their new computer by adding hardware support for garbage collection. Assume that garbage collection currently comprises 50% of the cycles of the program. You have two possible changes to consider. The first one would be to automatically handle garbage collection in hardware, causing the increase in cycle time by a factor of 1.2. The second option would be to add the new instructions to the ISA that could be used during garbage collection. This would halve the number of cycles needed for garbage collections, but would increase the cycle time by a factor of 1.1. Which of these two options, if any, should you choose? [20%]

Consider two possible improvements that can be used to enhance a machine. You can either make multiply instructions run four times faster than before, or make memory access instructions run two times faster than before. You repeatedly run a program that takes 100 seconds to execute. Of this time, 25% is used for multiplication, 40% for memory access instructions, and 35% for other tasks.


What will the speedup be if you improve only multiplication? [5%]

What will the speedup be if you improve only memory access? [5%]

What will the speedup be if both improvements are made? [10%]


Assume that multiply instructions take 4 cycles to execute and account for 20% of the instructions in a typical program and that the other 80% of the instructions require an average of 2 cycles for each instruction.


What is the percentage of time that the CPU spends doing multiplication? [5%]

Estimate the CPI value for a typical program executing on this machine [5%].

Assume that it is possible to reduce the number of cycles required for multiplication from 4 to 2, but this will require a 20% increase in the cycle time. Nothing else will be affected. Should we proceed with this modification? Explain why or why not [10%].

 Suppose that we designed a new floating point unit that accelerates the floating point instructions by a factor of 5. We are now looking for a benchmark to show off the new floatingpoint unit and we want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floatingpoint hardware. How much of the execution time would floatingpoint instructions have to account for in this program in order to yield our desired speedup on this benchmark? [20%]