A new AI model is opening the black box of the leading artificial intelligence tool for predicting how proteins will interact with small molecules, such as drugs.
The model, OpenFold3, which launched October 28, is a reconstruction of Google DeepMind’s AlphaFold3. A large consortium of researchers led by Mohammed AlQuraishi at Columbia University painstakingly dissected AlphaFold3’s code and created a facsimile of the AI platform, which predicts the structure of proteins paired with other molecules, including nucleic acids and chemicals in drugs. AlphaFold3 can only be used by individuals, non-commercial organizations or journalists. But companies — and anyone else — can use the open-source OpenFold3 model for commercial purposes, including drug development.
Predicting protein-molecule pairings is important in designing drugs “because this is how biology works. Biology is not proteins in isolation. It’s biomolecules interacting with each other,” says Woody Sherman, founder and chief innovation officer at Boston-based Psivant Therapeutics. Sherman also chairs the OpenFold executive committee.
Proteins are some of the hardest working molecules in the body. How these workhorses perform depends largely on their shape. AlphaFold2 cracked the problem of predicting what shapes proteins will adopt. The team behind the AI model shared in the 2024 Nobel Prize in chemistry for the achievement. AlphaFold3 introduced interactions with other proteins and molecules to the mix.
But unlike AlphaFold2, DeepMind didn’t initially open the AlphaFold3 code for other researchers to explore, at least not until hundreds of scientists signed a petition calling for transparency. “It’s hard to evaluate a computational product without seeing the raw information,” says Stephanie Wankowicz, a computational structural biologist at Vanderbilt University who coauthored the petition. It’s necessary for other researchers to have the code to test accuracy and reliability of the predictions and to learn what other data are necessary to make the model better, Wankowicz says.
Re-creating AlphaFold2 gave OpenFold creators insight into how the AI works, she says. AlphaFold2 was billed as an AI model that learns how proteins fold based on their amino acid building blocks, but it actually memorizes protein structures it has seen before and uses those memories to predict how similar proteins may appear, Wankowicz says. Looking under AlphaFold3’s hood may yield similar insights into protein-drug pairs.
Other teams have tried to reproduce AlphaFold3 and “have gotten close, but not super precise,” Wankowicz says.
That’s because it is difficult to reproduce subtle tricks and tweaks that are in the AlphaFold3 creators’ heads but don’t appear in the code or supplemental information, Sherman says. Some are technical settings used for certain parts of the calculation. “Nobody’s specifying that,” he says. “But details matter, especially when you’re dealing with the large models and with lots of data.” The OpenFold3 team did its best to replicate AlphaFold3, he says, but some differences remain.
Biology also matters, Sherman says. In cells, proteins are bathed in water and ions. They vibrate and move. None of that is captured in the static images created by AI models or by lab-made snapshots of crystalized proteins. The OpenFold3 team hopes to add water and dynamic movement into its model to better reflect how proteins exist in nature, Sherman says.
Even before its official release OpenFold3 was embraced by pharmaceutical companies. Five companies banded together in the Federated OpenFold3 Initiative to train the AI model on proprietary data and build a more powerful prediction tool while still keeping company secrets. That partnership was announced October 1 by Apheris, a Berlin-based company that runs the group platform.
Only about 2 percent of the protein structures in publicly available databases on which AlphaFold3 and OpenFold3 were trained are paired with molecules that have druglike properties, says Robin Röhm, cofounder and chief executive of Apheris. Drug companies have thousands of such structures in their databases.
Each company in the federation will train a version of OpenFold3 on about 4,000 to 8,000 protein-drug pairs in its own library, Röhm says. Apheris aggregates those locally trained AIs into a centralized version that has the knowledge about how about 20,000 proteins and drugs interact but doesn’t contain the proprietary data. The global version goes back to each company for another round of training and so on.
Despite the expanded datasets, don’t expect dramatic changes yet in drug discovery, Sherman says. OpenFold3 “is a starting point,” he says. “It’s going to be the next stage, and the next stage and the next stage that are where we’re really going to start seeing that meaningful impact on drug discovery.”



