The interactome aims to study the main aspects of proteins interacting in a living system. The interactome is rather dynamic as the interactions and ultimately functions are manifested in a temporal and spatial manner. To understand the complex cellular mechanisms involved in a biological system, it is necessary to study the nature and specificity of these interactions and the dynamics involved in it at the molecular level, for which prediction of protein-protein interactions (PPIs) has played a significant role. My main research in interactomics focuses on machine learning algorithms for prediction and analysis of PPIs using low-througput (mostly structural) and high-throughput data.

  

The Role of SLiMs in Prediction of PPI

SLiMonInterfaceMotifs are patterns widespread over a group of proteins that are likely to be related by function or may have other biological features in common. Usually, each motif contains a sequence pattern of 3-20 amino acids. Motifs of 3-10 amino acids are considered as short, linear motifs (SLiMs) or minimotifs [10]. SLiMs can have the capacity of encoding a functional interaction in a short sequence, and enrichment in intrinsically disordered regions of proteins. SLiMs should also be able to function independently of their tertiary structure context and their tendency to evolve convergently.

We have studied the role of SLiMs in protein interactions, especially those contained in the interface of protein complexes (see example in the figure). We have proposed a model that uses SLiMs as properties to predict obligate and non-obligate protien interaction types. Using various classifiers such as k-NN, LDR and SVM on two well-known datasets, namely the Zhu et al. and Mintseris et al., we have achieved an impressive prediciton accuracy of more than 99%, which implies an increase of at least 7% from previous approaches, even better than the structure-based methods, while using only sequence information.

 Relevant publications: 

  • M. Pandit, L. Rueda (2013). Prediction of Biological Protein-protein Interaction Types Using Short, Linear Motifs. ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACMBCB 2013), Washington, DC, USA, pp. 699-700.
  • M. Pandit, L. Rueda (2013). Prediction of Obligate Protein-protein Interactions Using Short, Linear Motifs. 17th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2013), Beijing, China. Poster presentation.

 

Electrostatic and Desolvation Energies in Prediction of PPI

ElectrostaticsObligateComplexDifferent approaches for prediciton of PPI have used a wide variety of physicochemical properties. Of these, we have found that binding free energies are relevant for prediction of obligate complex types. Knowledge-based contact potential that accounts for hydrophobic interactions, self-energy change upon desolvation of charged and polar atom groups, and side-chain entropy loss is defined as desolvation energy. In our earlier work, we have used desolvation energies to predict types of complexes, achieving very good prediction accuracy. We have also used desolvation energies to predict crystal packing and biological complexes.

Electrostatic interactions are one of three types of non-covalent interactions, which occur between electrically charged atoms having both positive and negative interactions. Non-covalent interactions are very common between macromolecules such as proteins. Electrostatic energy involves a long-range interaction and can occur between charged atoms of two interacting proteins or two different molecules. Moreover, these interactions can occur between charged atoms on the protein surface and charges in the environment. For example, in an obligate protein complex (see figure) positively charged atoms (red highlighted area in (b)) and negatively charged atoms (blue highlighted area in (a)) show high affinity and hence a stonger interaction. We have proposed a model that uses electrostatic energies of pairs of atom types and amino acids to predict protein complext types. We have computed electorstaic energy values via PDB2PQR and APBS, and used them to predict obligate and non-obligate protein complexes.

Our results on two well-known datasets of obligate and non-obligate complexes confirm that electrostatic energy is an important property for prediction on the basis of all the experimental results, achieving accuracies of over 95%. Furthermore, a comparison performed by changing the distance cutoff demonstrates that the best values for prediction of PPI types using electrostatic energy range from 9Å to 12Å, which show that electrostatic interactions are long-range and cover a broader area in the interface. We have also applied feature selection mechanisms to show that (a) a few pairs of atoms and amino acids are appropriate for prediction, and (b) prediction performance can be improved by eliminating irrelevant and noisy features and selecting the most discriminative ones.

Relevant publications: 

  

Sequence and Structural Domains in PPI

CATHDomainHierarchyDomains can be considered to be the minimal and fundamental units of proteins. Whether domains are sequence or structural, in most of the cases, they are associated with a specific biological role and act as basic functional units within cells. Previous studies have focused on employing domain knowledge to predict PPI. In our earlier work we proposed a prediction model that uses Pfam domains to predict obligate and non-obligate PPIs. The results demonstrated that desolvation energies are more efficient and powerful than interface area and composition properties for prediction and that homo domain interactions are associated with obligate complexes.

In a recent work we have proposed a model to predict obligate and non-obligate protein interaction types using desolvation energies of structural domains that are present in the interfaces of protein complexes, which are extracted from the CATH database. The prediction is performed using several classifiers on two well-known datasets, which demonstrate that domain-based features of higher levels of CATH, especially level 2, are more powerful and discriminative than features of other levels. The study also concluded dat properties taken from different levels of the CATH hierarchy yield higher accuracies than properties taken from each level of the hierarchy separately. Furthermore, analysis of structural properties suggests that domain–domain interactions that have at least a mainly-beta secondarystructure in one sub-unit are more informative for predicting obligate and non-obligate PPIs.

Relevant publications: 

 

Dynamics and Proteotranscriptomics

We are currently investigating the following emerging topics associated with the dynamics in the interactome and its relationships with transcriptomics. More details about our current research will be posted soon.

  • Dynamics of protein interactions by means of studying temporal associations of proteins into transient/permanent or obligate/non-obligate complexes.
  • Data integration of interactomics with transcriptomics by finding relationships between gene expression and alternative splicing and protein-protein interactions. Applications to breast and prostate cancer are being pursued. In one of the projects (see also Transcriptomics) we are currently investigating the protein variants or isoforms found in RNA-seq data and which are associated with alternative splicing in prostate cancer.
  • Interactomics in oral fluids, in particular, the main aspects of salivary protein-protein interactions.