Project 03 Solution

$30.00 $24.90

Description

INSTRUCTIONS

All homeworks will have many problems, both theoretical and practical. Programming exercises need to be submitted via SMARTSITE using the assign-

ment boxes NOT DROPBOX.

Other methods of submission without prior approval will receive zero points. Write legibly preferably using word processing if your hand-writing is unclear. Be

organized and use the notation appropriately. Show your work on every problem.

Correct answers with no support work will not receive full credit.

  1. CHALLENGE: PROBLEMS ON GRAPHS & NETWORKS: We saw how the networks can be used to plan for building projects or to schedule the parts of a project. Here are some some more challenges:

Let G be a directed graph with distances or costs on its arcs and two special vertices s; t. We are interested on the \longest” paths using two possible de nitions

{ If we call the length of a path the sum of the lengths on the path. Write a model to compute the longest path on a network.

{ If we instead call the length of a path the largest length among arcs the path, write another model to compute the longest path on a network under this de nition.

Consider again, the TSP for n cities and cost of travel cij. Write a new model for the TSP, di erent from the one we saw in class. This time us the binary variables xi;j;k, which are 1if the k-th leg of the trip, the salesman goes from city i to city j. Your formulation must have a cubic number of constraints.

  1. CHALLENGE: FINDING A GOOD MATCH: Matching problems appear everywhere in applica-tions. E.g., how do students get matched to medical schools? How do workers get assigned to jobs at work? How do organ donors get match to patients in need?

We can model matchings in graphs. A set M of the edges E is called a matching of G if and only if:

8u 2 V; jfe 2 M j u 2 egj 1

In other words, a vertex is incident to at most one edge of M. A matching M is said to be maximal if there is no matching M0 with M M0 . A maximal matching of the largest cardinality is a maximum matching. The size of a maximum matching is denoted by OPTG.

In the following exercises we will explore several approaches to nd a maximum matching in a graph:

  1. Let M be a matching and let us denote by V (M) the set of endpoints of edges in M: V (M) := fu j 9e 2 M; u 2 eg

Show that when M is a maximal matching of G, V (M) is a vertex cover of G.

b. Let M be a maximal matching of G, show that jV (M)j 2OPTG.

    1. Now consider the following greedy algorithm for nding a maximum matching: Start with an arbitrary edge as the very rst matching. Find another edge that does not have a vertex in common with the current matching. If one exists add it to the current matching. Repeat until no more edges can be added.

    1. What is the running time of this algorithm on a graph with n nodes and m edges?

    1. Give an example where the algorithm fails to nd a maximum matching. But show that the greedy method always yields a solution that has at least half as many edges as a true maximum matching.

    1. Give now an integer-optimization model (i.e., now based on linear equations, inequalities) to construct a maximum matching of any graph. G.

  1. Let G be a graph with two distinguished vertices s; t. An even st-path is a path from s to t with an even number of edges. Show that we can formulate the problem of nding an even st-path with as

few edges as possible as a minimum cost perfect matching problem. HINT: Construct an auxiliary graph H from G, make a copy of the graph G, and remove vertices s; t. Call that new graph G0 . Construct H starting with the union of G; G0 and joining every vertex v 2 G di erent from s; t with its copy in G0 . Use H.

  1. THEORY PROBLEMS: GEOMETRY OF INTEGER SOLUTIONS:

MEDITATE ABOUT THIS: In the previous problems you dealt with several types of combi-natorial optimization problems. Which ones have polynomial time solution? Which ones are NP-hard? Discuss whatever information you nd about their complexity.

Suppose S = f(y1; y2) 2 Z2 : y1 y2 2; 3y1 + y2 21; y1 + 5y2 34g. Find (a) An inequality description of convex hull of S. (b) Find the extreme points of conv(S).

Use the branch-and-bound method to solve the following optimization problem. Show the solution graphically:

min y1 + 3y2

subject to:y1 + 5y2 12;

y1 + 2y2 8;

y1; y2 0 integer

.

Generate a valid inequality using Chvatal-Gomory cut procedure for the following problem:

min y1 + y2 + y3

subject to:3y1 + 5y2 y3 12;

y1 + y3 7;

y1 y2 + 2y3 9;

y1; y2; y3 0 integer

.

  1. CHALLENGE: Applications to genetics, DNA SEQUENCE ALIGNMENT:

In this problem we study the problem of DNA sequence alignment. The input is a pair of DNA sequences (similar versions exist when trying to align multiple DNA sequences but we omit this here):

    1. TTGATCAATGG

    1. ATCATACAAGGA

and the goal is to understand whether or not they have the same biological function. For this we nd the best way to align the sequences to each other. If after alignment, the sequences look \similar enough”, we will conclude that they have the same biological function.

More precisely, the sequences are aligned by introducing gaps. For the above two sequences, a possible way to introduce gaps is as follows (gaps are represented by hyphens):

  1. TTGAT-CAATGG-

  1. ATCATACAA-GGA

After introducing the gaps, the sequences are compared by counting the number of mismatches, i.e., the number of locations where the nucleotides di er. For the above example, there are two mismatches: one at location 0, and one at location 2 (using the usual array indexing convention in programming!).

Let g denote the number of gaps and m denote the number of mismatches in a given alignment, the overall cost c of the alignment is then de ned by:

c := 2 g + m

  1. TTGAT-CAATGG

  1. ATCATACAAGGA

which has cost 2 1 + 4 = 6, which is minimal among all possible alignments of these two sequences.

Let us denote by s (resp. t) the rst (resp. second) sequence. We de ne c(s; t) to be the cost of the alignment of minimal cost among all possible alignments. Finally, we denote by n (resp. m) the length of s (resp. t).

a. For a sequence s, s[i : j] will denote the substring of s ranging from index i inclusive to j

exclusive. Give a formula to compute c(s[0 : i]; t[0 : j]) as a function of c(s[0 : i]; t[0 : j 1]),

c(s[0 : i 1]; t[0 : j]) and c(s[0 : i 1]; t[0 : j 1]). Think recursively.

  1. Using part a. design and describe an algorithm to compute c(s; t) [hint: think dynamically! It is very useful to think of this problem as a type of matching problem].

  1. Implement the algorithm you designed above. A test dataset is available at https://www. math.ucdavis.edu/~deloera/TEACHING/MATH160/dna.txt. Each line of the data le is the list of the rst 10,000 base pairs of the genome of the well-known Escherichia coli bacteria (why is it famous? Do you know?). There are only two lines corresponding to two di erent species of this bacteria.

  1. Write code (using MATLAB and/or SCIP) which reads the data le and outputs the cost of the optimal alignment of the two DNA sequences. Do not forget to Submit the code you wrote. Using C or Python to help yourself is allowed.

  1. Two DNA sequences are considered to have the same biological function if the cost of the optimal alignment, divided by the length of the sequence is smaller than 5%. Do the two DNA sequences in the data le have the same biological function?