Upon completion of this assignment, you need to be able to:
Iterate through an array from its beginning index to the end.
Determine cumulative information as you progress through an array. Determine equality of string objects.
Access the individual characters of a string object in Java. Continue building good programming skills.
An important task in bioinformatics is the identification of DNA and RNA sequences. In this assignment, we will be looking at nucleic acid sequences. These sequences contain up to four different bases denoted by letters: A for adenine, C for cytosine, G for guanine, and T for thymine. Sequence strings are compared in order to determine whether nucleic acid sequences match each other, or are related through mutations. Real sequence data, as used by biochemists and in bioinformatics research, consists of very long strings of bases. Determining relatedness can require the use of very complex algorithms, beyond the scope of this assignment.
The sequences in this assignment will all contain between two and four of the possible bases in fA; C; G; T g. Your task to to search through a collection of sequence data and count how many times a specific sequence occurs. For example, if the collection contains the following sequences: fACT G; GAT C; ACT; GT C; AC; GAT C; GAg and we search for the specific sequence GAT C, we would report that it was found 2 times.
One of the challenges in this assignment will be dealing with mutated sequences. A mu-tation can occur due to insertions of additional bases within a sequence. For the purpose
of this assignment, a mutated sequence contains at least two of the same bases occur-ring in a row; for example, in the sequence GAAAT C, the A has mutated, and in the sequence CCGGAT , both the C and G have mutated. Another task in this assignment is to detect how many of the sequences in the collection are mutated. The final task will be to search through the collection of sequence data for a specific sequence, but you must treat original and mutated sequences the same. For example, if the collection contains fT GC; AC; T T GC; T ACG; T GGCC; AGT Cg and we search for the specific sequence T GC, we would report that it was found 3 times, because T T GC and T GGCC are mu-tated forms of T GC..
- Download this pdf file and store it on your computer.
- Create a directory / folder on your computer specific to this assignment. for example, CSC110/assn5 is a good name.
- Save a file called SearchDNA.java in your directory. Fill the source code with the class name and the methods outlined in the following specification document.
- Start with the easiest methods first. See the following detailed instructions below for some tips.
- Complete each method and test it thoroughly before moving onto the next one. Com-pile and run the program frequently; it is much easier to debug an error shortly after a successful and error-free run.
Start with the printArray method. Focus on passing in an array of Strings as a pa-rameter and using a loop to visit each element in the array. Remember that array indices start at 0, as do the indices of each character in a String.
After completing and testing printArray, work on the findLongest or the findFrequency methods, as they are quite similar. Within both methods, use a loop to visit each element
in the array. In the findLongest method, you must keep track of which String object in the array contains the most characters, whereas in the findFrequency method, you must keep track of how many times a specific String object is found in the array. Make sure you finish and test both of these before moving to the next step.
The methods involving mutations are a little more difficult. In this assignment, a muta-tion occurs when two or more characters in a string are repeated in a row. Think about how you might be able to detect a mutation in a String object. Once you come up with a strategy, test it with a number of Strings to see if it works!
As you work through a solution, we strongly recommend that you save, compile and test the code after every line or two. This can be something as easy as printing out the value of a variable, or calling a method to print out the value returned. It is important to do this to confirm a component of the code works correctly, so you can be confident using that component throughout the code in later steps.
For each of the methods listed in the specifications, you must provide an internal test call from the main method If the method does not behave as expected, then debug and adjust the method. To receive full marks for testing, each method must be tested, even if it does not work.
We provide a couple of internal test cases you can use to test the correctness of the meth-ods. The following shows the method call to printArray from inside the main method the output that is expected from a working printArray method.
Submit the following completed file to the Assignment folder on conneX.
Please make sure you have submitted the required file(s) and conneX has sent you a confir-mation email. Do not send [.class] (the byte code) files. Also, make sure you submit your assignment, not just save a draft. Draft copies are not made available to the instructors, so they are not collected with the other submissions. We can find your draft submission, but only if we know that it’s there.
A note about academic integrity
It is OK to talk about your assignment with your classmates, and you are encouraged to design solutions together, but each student must implement their own solution.
Marks are allocated for : : :
No errors during compilation of the source code.
The method headers must be exactly as specified and the methods must perform as specified. Be sure to read the Method Details and not just the summaries.
Each method must have a test call inside the main method.
Style of the source code meets the requirements outlined in the Style Guidelines docu-ment available in the Resources folder of conneX.