Drug-target affinity prediction using a graph-based approach enriched with molecule words

Loading...
Thumbnail Image

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023.

Abstract

Wet-lab experiments to predict the affinity of drugs for their targets are costly and time consuming. Computational methods can provide an alternative to early stage experiments and guide the research process. Recently, the use of natural language processing techniques to represent molecules has become popular and has led to successful results. In our work, we assume that proteins and ligands, like human languages, have their own languages and that these languages consist of meaningful smaller parts that we call words. We identify protein and ligand words based on their 1D sequences using a subword tokenization method and represent protein-ligand interactions with a heterogeneous graph consisting of four different node types corresponding to proteins, ligands, protein words, and ligand words. A graph-based approach is used to learn embeddings for the nodes in the graph. These embeddings are fed into a deep learning model for predicting protein-ligand binding affinity. We show that using their word embeddings to represent novel proteins and/or ligands not present in the training set improves the results compared to the case where no words are used. Using pre-trained word embeddings for previously unknown molecules is also efficient in terms of complexity, as we do not need to re-train the input graph to learn the embeddings for these new molecules.

Description

Keywords

Citation

Collections