New Tool to Identify DNA Patterns

David Cox, a Ph.D. student in computer science, devised the "symbolic scatter plot" tool to provide a visual representation of a DNA sequence. "Improved identification of relevant DNA sequences will hopefully expedite the development of successful treatment for a range of diseases," Cox says, "by allowing researchers to focus on the components of DNA that are related to the disease and improving our understanding of the genetic mechanisms of these diseases."

How does the symbolic scatter plot create a visual representation of DNA? DNA is composed of a series of nucleotides. There are only four types of nucleotides, represented by the letters A, T, G and C. Each three-letter string of these nucleotides, such as AAA or ATG, is called a 3-mer. Cox explains, "There are only 64 possible 3-mers, thus each 3-mer maps to a number from zero to 63. The symbolic scatter plots take a very long string of letters representing a DNA sequence and split it into a bunch of 3-mers. It then plots a point for each 3-mer, zero through 63, with that number serving as the y-coordinate." The x-axis is the order that the 3-mer appears in the genetic sequence.

Cox says: “The resulting scatter plots reveal interesting patterns in the original DNA. I can also string these scatter plots together to produce animations for the purpose of comparing DNA sequences."

Cox chose to focus on 3-mers because they correlate to codons, which are the genetic codes the body uses to specify the insertion of a specific amino acid during the creation of proteins. In other words, they oversee the creation of proteins – which are themselves the basic building blocks of the human body. "There are 64 3-mers, but only 20 amino acids," Cox says, "so each amino acid corresponds to multiple 3-mers." Cox designed the symbolic scatter plot so that those 3-mers that correspond to the same amino acid are adjacent to one another.

"This way," Cox says, "it is easier to determine when a difference in 3-mers is significant – from one amino acid to another – rather than a difference in 3-mers that still results in the production of the same amino acid. A change in a single amino acid can be the difference between a relatively harmless disease and a fatal one," Cox says.; Source: North Carolina State University