Title:An Overview of Protein Function Prediction Methods: A Deep Learning
Perspective
Volume: 18
Issue: 8
Author(s): Emilio Ispano, Federico Bianca, Enrico Lavezzo and Stefano Toppo*
Affiliation:
- Department of Molecular Medicine, Computational Medicine Group, University of Padova, Padova (PD), 35131, Italy
Keywords:
Protein function prediction, AFP, GO, machine learning, deep learning, feature representation methods, measurements, classifiers, web servers.
Abstract:
Predicting the function of proteins is a major challenge in the scientific community, particularly
in the post-genomic era. Traditional methods of determining protein functions, such as experiments,
are accurate but can be resource-intensive and time-consuming. The development of Next Generation
Sequencing (NGS) techniques has led to the production of a large number of new protein sequences,
which has increased the gap between available raw sequences and verified annotated sequences.
To address this gap, automated protein function prediction (AFP) techniques have been developed as
a faster and more cost-effective alternative, aiming to maintain the same accuracy level.
Several automatic computational methods for protein function prediction have recently been developed
and proposed. This paper reviews the best-performing AFP methods presented in the last decade and
analyzes their improvements over time to identify the most promising strategies for future methods.
Identifying the most effective method for predicting protein function is still a challenge. The Critical
Assessment of Functional Annotation (CAFA) has established an international standard for evaluating
and comparing the performance of various protein function prediction methods. In this study, we analyze
the best-performing methods identified in recent editions of CAFA. These methods are divided into
five categories based on their principles of operation: sequence-based, structure-based, combined-based,
ML-based and embeddings-based.
After conducting a comprehensive analysis of the various protein function prediction methods, we observe
that there has been a steady improvement in the accuracy of predictions over time, mainly due to
the implementation of machine learning techniques. The present trend suggests that all the bestperforming
methods will use machine learning to improve their accuracy in the future.
We highlight the positive impact that the use of machine learning (ML) has had on protein function prediction.
Most recent methods developed in this area use ML, demonstrating its importance in analyzing
biological information and making predictions. Despite these improvements in accuracy, there is still a
significant gap compared with experimental evidence. The use of new approaches based on Deep
Learning (DL) techniques will probably be necessary to close this gap, and while significant progress
has been made in this area, there is still more work to be done to fully realize the potential of DL.