tRNA Alignment

Using bioinformatics to tell the story of tRNA evolution

from 4 billion years ago to the present day

Multiple sequence alignments (MSAs) are an important tool in bioinformatic analysis as they are a means of identifying homologous regions of a set of sequences. Additionally, MSAs can provide insight into potential evolutionary pathways that could result in observed genetic diversity. There are many algorithms for computing MSAs of sequences. However, algorithms which perform well on proteins and DNA tend to perform poorly on tRNAs because of the challenges posed by the secondary structure of tRNA. TRNAs feature internal folding between distant bases, whereas the structure of proteins and DNA is predominantly influenced by local interactions. For this reason, we observe that most algorithms have difficulty aligning tRNAs perfectly, and the performance is particularly poor in the V loop region. This can be seen visually above, and can also be numerically verified (see Quality Reports).

This research was completed as part of a project course at Northeastern University's Roux Institute under the supervision of Dr. Philip Bogden, who also assisted with data visualization. Dr. Jennifer Spillane was the primary point of contact for the domain science and provided necessary insight into the bioinformatic algorithms and techniques used in this work. Dr. Zachary Burton and Dr. Lei Lei were the external stakeholders and guided the direction of the research to answer the most pressing questions about tRNA evolution.

The code and documentation for this project can be found on GitHub.