Project overview
The DNA sequence in every cell of the body, termed the genome, stores the instructions to make all the proteins that are the essential building blocks needed for living. The code for each protein is stored in shorter stretches of DNA called genes. Converting the DNA sequence of a gene (which has only four letters) to that of its corresponding protein (made up of twenty different kinds of amino acids) requires two major processes: transcription and translation. Transcription is the copying of the sequence information in the DNA into a similar molecule called messenger RNA (mRNA). The mRNA messages are then decoded into the amino acid sequence using the process of translation (so called because it goes from the language of DNA/RNA, to the language of amino acids). The Genetic Code, which is used to translate nucleotides (RNA) into amino acids (protein), is well established and it is easy to predict what amino acids are encoded by a given stretch of DNA. The process is complicated, however, by the presence of untranslated regions at the ends of the mRNA, which do not encode any protein sequence. As a consequence, in order to correctly translate a protein, it is important to know where translation begins. The complex molecular machine that translates the mRNA sequence is called the ribosome, which starts making a protein when it finds a particular sequence in the mRNA called a translation initiation codon, which usually has the sequence AUG. This project will advance our knowledge regarding the nature of these translation initiation codons. Many of the rules of translation initiation remain unclear therefore the more we know, the more we can understand from existing data. This is particularly true as improvements in sequencing technology means that an ever-increasing proportion of our knowledge about the protein universe is derived purely from applying these rules computationally to DNA sequence data. It is well established that multiple proteins can be produced from a single gene by generating different mRNA sequences during transcription. A much more recent finding is that translation can similarly produce different proteins from the same mRNA by the ribosome beginning to translate the protein at different positions, making longer or shorter versions of the protein. This project is concerned with how and why different translation initiation codons are used and how widespread this phenomenon is. So far, there are only a few examples where this has been discovered but those that are known are very important. In fact alternative initiation codons can be used to make new forms of proteins which have completely different functions or go to different places within the cell. Furthermore, it is now becoming clear that the initiation codon itself does not have to be the AUG triplet and the use of what we describe as non-canonical initiation codons is the focus of our proposed work. We have successfully identified translation initiation from non-AUG codons, and in this project we will particularly focus on genes that make proteins with roles in the batteries of the cell, the mitochondria. We believe that important signals within the proteins which help target them to this part of the cell have been ignored because they are made by starting from non-AUG codons. This means that computational methods using the wrong rules will have missed them. We have already proven this phenomenon in one gene, and once we have successfully identified novel initiation codons in further candidates, we will then examine what the consequences are for the proteins that are produced. We will certainly identify the signals that are involved in moving proteins to the mitochondria, but may also find new roles for the newly identified protein sequence.
Research outputs
Connor Maltby, James Schofield, Steven Houghton, Ita O'Kelly, Mariana Vargas-Caballero, Katrin Deinhardt & Mark Coldwell,
2020, Nucleic Acids Research, 48(17), 9822-9839
DOI: 10.1093/nar/gkaa699
Type: article