A5 – Paths Solution




Compute missed connections for all two-hop paths.

Fine Print: (0) Group assignment, two students. (1) A connection is any pair of flight F and G of the same carrier such as F.Destination = G.Origin and the scheduled departure of G is <= 6 hours and >= 30 minutes after the scheduled arrival of F. (2) A connection is missed when the actual arrival of F < 30 minutes before the actual departure of G. (3) Optimize your code for performance. (4) Output the number of connections and missed connections per airline, per year. (5) The reference solution outputs “UA, 2013, 2826, 129” when processing the following two files. (6) The reference solution is 239 LOC Java and 10 LOC of R. The reference solutions takes 5 minutes to run on the full data set (337 files) with a cluster of 10 large machines. (7) Your submission should include code and Makefiles. No data should be included. A report should describe your implementation and detail the timing of your results for at least the two year data set, and, if you dare, the full data set. Include results file for any runs on AWS. The report should be in PDF.

error: Content is protected !!