Dr Richard Edwards BBSRC QSLiMFinder motif prediction

Project overview

This project aims to identify sites on proteins that are critical for their interactions with other proteins. Many protein-protein interactions are mediated by Short Linear Motifs (SLiMs): short stretches of proteins (5-15 amino acids long), of which only a few positions are critical to function. These motifs are vital for biological processes of fundamental importance, such as signalling pathways and targeting proteins to the correct part of a cell. The primary objective of this project is to integrate a number of leading computational techniques to predict novel SLiMs and, in so doing, add crucial detail to protein-protein interaction networks. This will generate a valuable resource of potential SLiMs, including defined occurrences and interactions. Given the great promise SLiM-mediated interactions hold as future therapeutic targets, this resource could be a potential gold-mine for the pharma industry and future drug design. A crucial part of any predictive bioinformatics analysis is the rediscovery of previously known results. The Eukaryotic Linear Motif (ELM) database provides a rich resource of known eukaryotic motifs and these will form the basis of annotating known SLiMs that are (re)discovered during the course of the investigation. Data generated during this project will, in turn, be fed back into the ELM resource to improve existing ELM annotation and, potentially, provide new annotated occurrences of known motifs. As well as being useful for further investigation of specific protein-protein interactions, these predictions/annotations will be of interest to the wider scientific community who are interested in understanding the fundamental principles of how proteins interact with each other. To generate the SLiM predictions, this project will put together cutting-edge tools for two distinct but related activities: (1) predicting regions of proteins involved in SLiM-mediated interactions from 3D structures of proteins; (2) identifying over-represented recurring sequence patterns from proteins that all share a common interaction partner. First, structural features will be used to identify candidate regions in specific proteins. SLiMs typically interact with larger, globular domains in their partner protein, and structural signatures of such 'domain-motif interactions' can be used to highlight possible motif regions. These regions will then be compared to other proteins known to interact with the same protein as the candidate. Currently, the most successful approaches for this explicitly use a model of convergent evolution for detection. Under this model, the common motif identified must be shared by sequences that have no other detectable sequence similarity. Previously, we developed the most successful of these tools on benchmarking data, SLiMFinder, which accounts for both the evolutionary relationships found between input proteins and the total motif space being searched to estimate the statistical significance of over-represented motifs. For this project, an extension of SLiMFinder will be used that takes advantage of the fact that the 3D methods will have identified a specific short region on one of the proteins. This extra information makes the method much more sensitive. These results will be of great interest to anyone trying to understand the molecular basis of protein-protein interactions and signalling pathways. During the course of the project, the methods used will be further developed and validated, providing useful tools for future investigations. Open source software and webserver implementations will be made available to facilitate further application. This will make these methods available to bench scientists studying specific proteins and interactions.