DNA Sequencing Solution

$30.00 $24.90

Description

In order to complete the Human Genome Project, geneticists were required to sequence DNA. This process determines the correct order of DNA strands, which are made from four bases: adenine (A), guanine (G), cytosine ©, and thymine (T). To do this, the ends of two incomplete DNA strands are compared to see if they match when overlapped. If so, it is possible that these DNA strands are two pieces of a larger whole. Essentially, the goal of DNA sequencing is to piece together tiny fragments of the much larger original, called a genome.

Here is an example of two DNA strands that match when overlapped:

Strand 1: ACGGACATAGTCATT

Strand 2: CATAGTCATTTCATG

Combined: ACGGACATAGTCATTTCATG

Assignment

Your program will attempt to find the best match between a target DNA strand and several candidate DNA strands. Your program will read the target DNA strand from one file (containing just one line) and the candidate DNA strands from a second file (containing one line for each of the candidate DNA strands). See the examples below.

Your program should start by asking the user for the filename of each of these two files. After determining which of the candidate DNA strands is the best match, it should combine the target strand with the matching strand (as shown in the example above) and print the combined strand.

In order to determine which candidate strand is the best match, you should compare the target strand to each of the candidate strands. The best match is the candidate strand with the largest number of overlapping bases (the number of contiguous bases that are in common). For example, the two strands shown above have 10 overlapping bases. To make your job easier, simply compare the end of the target strand to the beginning of each candidate strand (you do not need to compare the beginning of the target strand to the ends of the candidate strands).

If two candidate strands tie as the best match, use the candidate strand which appears first in the file. You may assume that all of the candidate strands have the same length as the target strand. You may also assume that at least one of the candidate strands will match the target strand.

Hints

Consider using string slicing as you compare the strands.

Python provides startswith and endswith functions to assist in comparing strings.

Potential Program Elements

Sample

target_strand.txt:

TTATAGTGATATACGTGCTTAGGTAGTGCAGAGACACAACTTATAGAGTGAGGCCAGCTCACGAGCTCTAGAAGCCCAAA

candidate_strands.txt:

TATTGTGCTCTATAGCTCCAGGCACATCCCTTGACGGATTGGGGACTGTCTTGACGAAAGTTCGGAGGTAGAAAAGTCCA

GACGACACCCTGGCAAAGGTCACGTCATGGGTGGAGTACTTATACCGGCAGCAGAGCGATCTGCTACCTATCTTCATGAT

CACGAGCTCTAGAAGCCCAAACTGTGACGCAATTGCCGGGCTAAAACTATGCTAAGAAATCCCCATTACCAGAGTCTTAG

TGAGCCGTTGGGCAGTTAACGGATTTTACTCGTCGCTGCCTGAAGTGCCAAATTTACCAAAAACCGGATAACTTCATGCA

CTTATAGAGTGAGGCCAGCTCACGAGCTCTAGAAGCCCAAATTGCTACTGTGCCGCTGCGCACCGCATGATCGCAGTCAG

TTAGAGGAATTGGACGGCACTCGGACACAAGCTCACGCCCCATACTTTAGCACCGAATATCGACTAAGCATAGTTGATCT

AGCAAGAGTTGGTATCTCTAGGGGCTTCTCCGGACGCAACGACGCGTCTGACAGTTCAGGTTGTTATGACCCGGGTGTGA

CTATGGTTAGGCAACTTCCACGCTATCCCTCGACCACGGCTCGTGGAGCCGTACCGGTGTATTTTGTTGCTGCTAATATT

GTAGCACGCAGTTCGAGTCACCCGGAAGCAGCGAAACGTTCGGCAACTACAAACTCCAATCTTGTATTCGGGTGCCTTTT

CGATTGTCTGTGTTCTGCATGAGCACAATAAGTACAAGTCGAACTGGTATTTACTAAAGTCCGCATATTGTACGGTACGT

Program execution:

Target strand filename: target_strand.txt

Candidate strands filename: candidate_strands.txt

TTATAGTGATATACGTGCTTAGGTAGTGCAGAGACACAACTTATAGAGTGAGGCCAGCTCACGAGCTCTAGAAGCCCAAATTGCTACTGTGCCGCTGCGCACCGCATGATCGCAGTCAG