ESPEYB21 15. Editors’ Choice Artificial Intelligence (4 abstracts)
Science 381(6664): eadg7492 (2023). PubMed: 37733863
In Brief: The authors describe AlphaMissense, a machine-learning tool that predicts the pathogenicity of 71 million human coding variants. 22.8 million variants (32%) are classified as likely pathogenic and 40.9 million (57%) as likely benign. They provide these databases freely as resources to the international research community.
These authors from Google DeepMind previously developed and released AlphaFold, a revolutionary machine-learning approach to predict 3-dimensional protein structures from 1-dimensional amino acid or gene sequence information that was highlighted in an earlier Yearbook chapter (https://www.espeyearbook.org/ey/0019/ey0019.15-15). Here, they build on AlphaFold by incorporating evolutionary constraint information to predict the pathogenicity of all 216 million possible single amino acid changes in the 19 233 canonical human proteins. They evaluated the performance of their predictions against diverse clinical benchmarks, including against 18 924 annotated missense variants in ClinVar, and achieved an impressively high overall performance, with an receiver operator curve (auROC) of 0.940.
When we perform genetic sequencing in our patients, some will receive a confident clinical diagnosis due to having a known pathogenic mutation that is listed in an agreed international database such as ClinVar. However, and more typically, some will have variants of uncertain significance. Indeed, the vast majority of missense variants detected on sequencing are of unknown clinical significance. In research studies, this presents an enormous burden for experimental functional assays to help inform which mutations are relevant. Even if such assays are performed, the cell-based functional readout may not correspond directly with the variants clinical impact.
AlphaMissense will greatly accelerate this process. Although it cant (yet) be used directly in the context of clinical diagnosis, it will help in filtering which mutations to consider for functional validation and also for confirmatory analyses in large population-based biobanks.