Description
Please upload 1) a PDF with solutions of Problems 1, 2 and 3; 2) a PDF of your report for Problem 4; 3) a single zip le of your code, README, results for Problem 4.

Simple Complexity

For each pair of functions f and g, write whether f is in O(g), (g), or (g).
(a) f = (n + 1000)^{4}, g = n^{4} 3n^{3}

f = log_{1000} n, g = log_{2} n

f = n^{1000}, g = n^{2}

f = 2^{n}, g = n!

f = n^{n}, g = n!
(f) f = log n!, g = n log n (hint: if uncertain, Stirling’s a certain)

Determine the BigO time complexity for the algorithm below (show your work). Also, very brie y explain (in one or two sentences) what the algorithm outputs (note: the % symbol is the modulo operator):
Data: n
1 i = 1;
2 while i n do
3j = 0;
4k = i;
5while k % 3 == 0 do

k = k/3 ;

j++ ;

end

print i,j;
10i++;
11 end
1

Greedy – Why is the pool always so busy anyway?
Georgia Tech is trying to raise money for the Technology Square Research Building and has decided to host the \Tech Swim Run Bike” (TSRB) Triathalon to start o the fundraising campaign!
Usually in a triathalon all athletes will perform the three events in order (swimming, running, then biking) asynchronously, but unfortunately due to some last minute planning, the race committee was only able to secure the use of one lane of the olympic pool in the campus recreation center. This means that there will be a bottleneck during the rst portion (the swimming leg) of the race where only one person can swim at a time. The race committee needs to decide on an ordering of athletes where the rst athlete in the order will swim rst, then as soon as the rst athlete completes the swimming portion the next athlete will start swimming, etc. The race committee, not wanting to wait around for an extremely long time for everyone to nish, wants to come up with a schedule that will minimize the time it takes for everyone to nish the race. Luckily, they have prior knowledge about how fast the athletes will complete each portion of the race, and they have you, an algorithmista, to help!
Speci cally, for each athlete, i, they have an estimate of how long the athlete will take to complete the swimming portion, s_{i}, running portion, r_{i} and biking portion, b_{i}. A schedule of athletes can be represented as a list of athletes (e.g. [athlete_{7}; athlete_{4}; :::]) that indicate in which order the athletes will start the race. Using this information, they want you to nd an ordering for the athletes to start the race that will minimize the time taken for everyone to nish the race, assuming all athletes perform at their estimated time. More precisely, give an e cient algorithm that produces a schedule of athletes whose completion time is as small as possible and prove that it gives the optimal solution using an exchange argument.
Keep in mind that once an athlete nishes swimming, they can proceed with the running and biking portions of the race, even if other athletes are already running and biking. Also note that all athletes have to swim rst (i.e. some athletes won’t start o running or biking). For example: if we have a race with 3 athletes scheduled to go in the order [2; 1; 3], and the nishing time for athlete i is represented as a_{i}, then the nishing times would be as follows: (with a total nishing time of max(a_{1}; a_{2}; a_{3}))
a_{2} = (s_{2}) + r_{2} + b_{2}
a_{1} = (s_{2} + s_{1}) + r_{1} + b_{1}
a_{3} = (s_{2} + s_{1} + s_{3}) + r_{3} + b_{3}

Divide and Conquer
You’re given a set of n hightech tags. All tags look exactly the same, but embedded in each is a chip with an ID t_{i}. If two tags i and j have the same ID t_{i} = t_{j} , then they light up when tapped against each other (and turn back o
2
once they’re separated).
Your task is simple: Given n of these tags, is there a collection of more than n=2 of them with the same ID? Note: there’s no way of guring out the tag’s actual ID. Your only option is to tap two tags against each other and see whether they light up. Each tap costs one \operation”, and you can only compare two tags at a time.
Divise a Divide and Conquer algorithm that requires operations on the order of O(n log n) to execute. In particular, write down the pseudocode, brie y explain why it works, and formally show that it’s complexity is O(n log n).

Programming Assignment
You are to implement either Prim’s or Kruskal’s algorithm for nding a Minimum Spanning Tree (MST) of an undirected graph, and evaluate its running time performance on a set of graph instances. The 12 input graphs are RMAT graphs [1], which are synthetic graphs with powerlaw degree distributions and smallworld characteristics. These graphs might have multiple arcs between the same vertices, with each of them having an associated weight (i.e. the graphs are multigraphs), your code should take this into account.
4.1 Static Computation
The rst part of the assignment entails coding either Prim’s or Kruskal’s algorithm to nd the cost of an MST given a graph le. You may use the programming language of your choice (either C++, Java, or Python). We provide a wrapper function in all three languages to help you get started with the assignment. You may call your own functions inside the wrapper. We also have implemented a timer in the wrapper that records the running time of your algorithms. To implement these algorithms, you may make use of data structure implementations in the programming language of your choice; e.g. in python, the heapq library may be used for implementing priority queues and set operations may be used for implementing the union nd data structure. In Java, java.util.PriorityQueue may be used for implementing priority queues and java.util.Set may be used for set operations.
The ‘graph le’ format is as follows:
Line 1: N E (N = number of vertices, E = number of edges)
Every subsequent line contains three integers: u v weight (u = source of edge, v = destination of edge, weight = weight of edge between u and v)
The MST calculation is to be implemented in the computeMST function, as indicated in the wrapper code.
4.2 Dynamic Recomputation
The next part of the assignment requires you to update the cost of the MST given new edges to be added to the graph. You are provided with a ‘changes le’ and the format is as follows:
3
Line 1: N (N = number of changes/edges to be added)
Every subsequent line contains three integers: u v weight (u = source of new edge to be added, v = destination of edge, weight = weight of edge between u and v)
You are to implement the function recomputeMST as indicated in the wrapper code that computes the new MST given the new edge to be added into the graph. You are responsible for maintaining the old MST before recomputing.
Note: it is very easy to complete this part of the assignment by simply adding the new edge and calling your old computeMST function from part 1 (Subsection 4.1). The objective is to minimize computation and e ciently recompute the cost of the MST.
4.3 Execution
The wrapper code is set up to require three command line arguments: <executable> <graph le.gr> <change le.extra> <output le>. The <graph le> is the one described in Subsection 4.1 and the <change le> is the one described in Subsection 4.2.
Note: You do not have to use the wrapper code provided; however, if you write your own code from scratch, please indicate so in your README (how to compile and run your code)
4.4 Experiments
You are required to run your code for all 12 input graphs provided. The wrapper functions we provide in C++, Java, and Python describe the following procedure. For each graph:
parse the edges (parseEdges): to be implemented
compute the MST using either Prim’s or Kruskal’s algorithm (computeMST): to be implemented
write the cost of the initial MST and time taken to compute it to the output le: provided in wrapper code
For each line in the <change le>,
{ Parse the new edge to be added: provided in wrapper code
{ Call the function recomputeMST: to be implemented
{ Write to the output le the cost of the new MST and the time taken to compute it: provided in wrapper code
Name the output les as such: <graph le> output.txt and place them in the folder results. You should have one output le for each of the 12 input graphs provided.
4
4.5 Report
Write a brief report wherein you:

List which data structures you have used for your choice of algorithm (Prim’s/Kruskal’s). Explain the reasoning behind your choice and how that has in uenced the running time of your algorithms, and the theoretical complexity (i.e., specify the bigOh for your implementation of each of the two algorithms – computeMST and recomputeMST).

Plots: We require you to make two plots. Plot the running time as the number of edges in the graph increases (across the 12 graphs you were given) for both the static and dynamic calculations, i.e. one plot showing on the xaxis the number of edges the graph has and the yaxis the time for the static MST calculation, and another plot where the yxis the time needed to insert 100 edges (as given the changes les) using your recomputeMST code. Discuss the results you observe. How does the empirical scaling you observe match your bigOh analysis? How does the behavior vary with the dynamic recomputation?
4.6 Deliverables
code for initial, static MST implementation (this is called computeMST in the provided wrapper code)
code for dynamic MST recomputation (this is called recomputeMST in the provided wrapper code)
README, explaining how to run your code Report (Subsection 4.5)
output les (12) within the results folder – one for each graph
References

D. Chakrabarti, Y. Zhan, and C. Faloutsos. RMAT: A Recursive Model for Graph Mining. In Proc. 4th SIAM Intl. Conf. on Data Mining, Florida, USA, April 2004.
5