Let's not worry that DNA is transcribed into RNA and other mechanisms before it can
be used to synthesize proteins. Instead let's assume that DNA is used directly to
specify proteins.
It goes as follows: proteins are sequences of amino acids.
In our discussion we assume a total of
20 possible amino acids, and that each amino acid is identified by a letter. Thus
a protein can be seen as a string on these 20 letters.
A sequence of 3 consecutive nucleotides is called a codon.
And codons map into amino acids as indicated in the attached
table.
[A possible use of the information in that table is the following
Java variable:
private static final String[][] CODON_AMINO = { {"att", "i"}, {"atc", "i"}, {"ata", "i"}, {"ctt", "l"}, {"ctc", "l"}, {"cta", "l"}, {"ctg", "l"}, {"tta", "l"}, {"ttg", "l"}, {"gtt", "v"}, {"gtc", "v"}, {"gta", "v"}, {"gtg", "v"}, {"ttt", "f"}, {"ttc", "f"}, {"atg", "m"}, {"tgt", "c"}, {"tgc", "c"}, {"gct", "a"}, {"gcc", "a"}, {"gca", "a"}, {"gcg", "a"}, {"ggt", "g"}, {"ggc", "g"}, {"gga", "g"}, {"ggg", "g"}, {"cct", "p"}, {"ccc", "p"}, {"cca", "p"}, {"ccg", "p"}, {"act", "t"}, {"acc", "t"}, {"aca", "t"}, {"acg", "t"}, {"tct", "s"}, {"tcc", "s"}, {"tca", "s"}, {"tcg", "s"}, {"agt", "s"}, {"agc", "s"}, {"tat", "y"}, {"tac", "y"}, {"tgg", "w"}, {"caa", "q"}, {"cag", "q"}, {"aat", "n"}, {"aac", "n"}, {"cat", "h"}, {"cac", "h"}, {"gaa", "e"}, {"gag", "e"}, {"gat", "d"}, {"gac", "d"}, {"aaa", "k"}, {"aag", "k"}, {"cgt", "r"}, {"cgc", "r"}, {"cga", "r"}, {"cgg", "r"}, {"aga", "r"}, {"agg", "r"} };] A specific codon, ATG, is called the start codon, i.e. the translation from DNA to protein starts at such a codon. Three codons, TAA, TAG, TGA, are called stop codons. The definition of a protein (a gene) starts at a start codon (excluded) and ends at the first stop codon following it (excluded) that includes a multiple of 3 nucleotides.
You are to write a program that is given as command line parameter the name of a file containing DNA information as a string (here is an example of such a file). The string may be broken into multiple lines and contain spaces. You should pay no attention to such line breaks and spaces. You should:
Send to the TA a case analysis for this problem: problem statement, analysis, design, implementation, and testing.