# LAB 12 QUESTIONS SOLUTION

\$35.00 \$30.80

Category:

## Description

– Name: Tanner Hobbs

– NetID: Hobbs131

Answer the questions below according to the lab specification. Write

your answers directly in this text file and submit it to complete the

lab.

PROBLEM 1: Timing Benchmarks

============================

The code in `reversal.c’ implements two functions which reverse the

elements of an array. They take markedly different approaches.

,—-

| void reverse_arr1(int *arr, long size){

|   int *tmp = malloc(sizeof(int)*size);

|   for(long i=0; i<size; i++){

|     tmp[i] = arr[size-i-1];

|   }

|   for(long i=0; i<size; i++){

|     arr[i] = tmp[i];

|   }

|   free(tmp);

| }

|

| void reverse_arr2(int *arr, long size){

|   for(long i=0; i<size/2; i++){

|     int tmp = arr[i];

|     arr[i] = arr[size-i-1];

|     arr[size-i-1] = tmp;

|   }

| }

`—-

(A) Predictions

~~~~~~~~~~~~~~~

Predict which of these two functions will run faster based on their

code characteristics.

Function reverse_arr2 will run faster because less instructions are required. Two for loops are ran through at n iterations in the first method whereas in the second only one for loop is ran for n/2 iterations

(B) Timing

~~~~~~~~~~

Modify the provided `reversal.c’ file to perform timing calculations

on the two functions as they are called on different sizes of arrays.

Print runtimes in a tabular format.

Paste the C code you wrote below to show you how you timed the

function runs.

clock_t begin, end;

double cpu_time1;

double cpu_time2;

begin = clock();

reverse_arr1(arr1,size);

end = clock();

cpu_time1 = ((double) (end – begin)) / CLOCKS_PER_SEC;

printf(“cpu time reverse_arr1: %.4e   “, cpu_time1);

begin = clock();

reverse_arr2(arr2,size);

end = clock();

cpu_time2 = ((double) (end – begin)) / CLOCKS_PER_SEC;

printf(“cpu time reverse_arr2: %.4e   \n”, cpu_time2);

(C) Analysis

~~~~~~~~~~~~

Paste the table of numbers your code produced for timing the two

reversal functions. Describe if the one you predicted would be faster

actually was measured to be faster.

cpu time reverse_arr1: 2.0000e-06   cpu time reverse_arr2: 1.0000e-06

cpu time reverse_arr1: 1.0000e-06   cpu time reverse_arr2: 1.0000e-06

cpu time reverse_arr1: 1.0000e-06   cpu time reverse_arr2: 1.0000e-06

cpu time reverse_arr1: 2.0000e-06   cpu time reverse_arr2: 1.0000e-06

cpu time reverse_arr1: 1.0000e-06   cpu time reverse_arr2: 1.0000e-06

cpu time reverse_arr1: 1.0000e-06   cpu time reverse_arr2: 1.0000e-06

cpu time reverse_arr1: 2.0000e-06   cpu time reverse_arr2: 1.0000e-06

cpu time reverse_arr1: 1.0000e-05   cpu time reverse_arr2: 2.0000e-06

cpu time reverse_arr1: 7.0000e-06   cpu time reverse_arr2: 2.0000e-06

cpu time reverse_arr1: 9.0000e-06   cpu time reverse_arr2: 4.0000e-06

The one I predicted to be faster seems to be faster

PROBLEM 2: ax then xpy VS axpy

==============================

Examine the timing code in `axpy.c’. Consider the 3 functions `ax()’,

`xpy()’, and `axpy()’.  Note that these are timed in `main()’ as

follows.

,—-

| // start timing 1

|     ax(a,x1,size);

|     xpy(x1,y,size);

| // stop timing

|

| // start timing 2

|     axpy(a,x2,y,size);

| // stop timing

`—-

This is because the first two functions together do what `axpy()’

does.

Note also that `axpy()’ has two “fairness” loops which do affect the

results but compensate for the fact that the previous two functions

combined have two loops.

Time these functions using the provided code. Explain the results you

observe and if one appears to be faster than the other, describe what

features of the processor and memory architecture might be affecting

the timings.

axpy seems to be slightly faster than ax and xpy together. This may be because the scaling of the array and the addition of arrays operations are done simultaneously in the for loops thus reducing the amount of instructions handled.

PROBLEM 3 (OPTIONAL): Profiler

==============================

This problem demonstrates the utility of a performance *profiler* to

help locate “hot spots” in code which take most of the run time

associated with it.

Change into the directory `warsim’ that is part of the lab code

distribution.  The code contained within it implements a simple

simulation of the card game War.  The details of the game are not

important except that the program simulates playing of the game then

reports statistics on it.

First compile the program by using the provided Makefile.

,—-

| > make

| gcc -Og -g -pg -no-pie -c libwar.c

| gcc -Og -g -pg -no-pie -c libcardlist.c

| gcc -Og -g -pg -no-pie -o warsim warsim.c libwar.o libcardlist.o

`—-

Notice that the option `-pg’ is passed in which will enable

/profiling/ of the code. The `-no-pie’ option is present in case a

buggy version of `gcc’ is present and has no significant effect.

Run the resulting `warsim’ program as follows.

,—-

| # show usage

| > ./warsim

| usage: ./warsim cardfile ngames [logfile]

|

| # simulate 10 games

| > ./warsim full.deck 10

| ———————-

| Final stats

|   0.60 P1 Win Ratio

| 408.50 Avg battles per game

|  30.30 Avg wars per game

|

| # simulate 100 games

| > ./warsim full.deck 100

| ———————-

| Final stats

|   0.54 P1 Win Ratio

| 301.56 Avg battles per game

|  22.94 Avg wars per game

|

| # simulate 2000 games

| > ./warsim full.deck 2000

| ———————-

| Final stats

|   0.54 P1 Win Ratio

| 298.48 Avg battles per game

|  22.79 Avg wars per game

`—-

This last run might take a few seconds as 2000 games are simulated.

After the runs are finished, the file `gmon.out’ appears. This is an

output file that is generated on running programs with profiling

enabled.

,—-

| > ls gmon.out

| gmon.out

| > file gmon.out

| gmon.out: GNU prof performance data – version 1

`—-

The contents of the file are binary and must be interpreted by the

program `gprof’.  Use the `-b’ option to show “brief” output and pass

in the name of the program that was run.

,—-

| > gprof -b warsim

| Flat profile:

|

| Each sample counts as 0.01 seconds.

|   %   cumulative   self              self     total

|  time   seconds   seconds    calls  ms/call  ms/call  name

|  50.11      0.06     0.06  2387860     0.00     0.00  print_list

|  25.06      0.09     0.03 32507650     0.00     0.00  card2str

|   8.35      0.10     0.01   596965     0.00     0.00  end_battle

|   8.35      0.11     0.01     4000     0.00     0.00  new_stack

|   8.35      0.12     0.01       52     0.19     0.19  str2card

|   0.00      0.12     0.00  8073378     0.00     0.00  stack_empty

|   0.00      0.12     0.00  6656812     0.00     0.00  queue_empty

|   0.00      0.12     0.00  3944508     0.00     0.00  stack_top

|   0.00      0.12     0.00  1675522     0.00     0.00  queue_add

|   0.00      0.12     0.00  1675522     0.00     0.00  queue_remove

|   0.00      0.12     0.00  1571470     0.00     0.00  queue_front

|   0.00      0.12     0.00  1465470     0.00     0.00  stack_pop

|   0.00      0.12     0.00  1465470     0.00     0.00  stack_push

|   0.00      0.12     0.00   596965     0.00     0.00  start_battle

|   0.00      0.12     0.00     6001     0.00     0.00  new_queue

|   0.00      0.12     0.00     6001     0.00     0.00  queue_free

|   0.00      0.12     0.00     4000     0.00     0.00  stack_free

|   0.00      0.12     0.00     2000     0.00     0.00  deal2

|   0.00      0.12     0.00     2000     0.00     0.06  playwar

|   0.00      0.12     0.00     2000     0.00     0.00  queue_copy

|   0.00      0.12     0.00     2000     0.00     0.00  queue_rotate

|   0.00      0.12     0.00        1     0.00    10.02  read_deck

| …

`—-

The first part of the output shows a breakdown of how much time was

spent in each function associated the most recent run of the program.

Analyze this breakdown and make suggestions as to where optimization

effort might best be spent to increase the speed of warsim.