This is how the protein At1g58602 could be folded. (Screenshot: AlphaFold)
The structure of proteins, the elementary building blocks of every living being, has so far been a mystery for us humans. When cells make proteins, they string countless amino acids into a long chain that eventually folds into one another. This folding is different for each protein and it determines how the proteins work in a cell, whether they are good-natured or harmful, whether the cell is working properly or not.
This decoding of protein folding has long been considered one of the greatest challenges in biology. So far it has only been possible to model and track the folding of individual proteins with extremely laborious detailed work. But last year, of all things, the AI startup DeepMind, which, like Google, belongs to the Alphabet Group, celebrated a breakthrough – and now wants to share it with the world.
More on the subject
- Protein folding: Deepmind AI solves decades-old science problem
- Corona: These artificial intelligences are looking for Covid-19 drugs
Under the AlphaFold project, DeepMind has used artificial intelligence based on machine learning in recent years to predict the protein structure based on the individual amino acid sequence. Last November, the startup presented its collected results for the first time, which exceeded even the wildest expectations: In a joint experiment, AlphaFold was able to predict the structure of the 100 protein sequences presented in 70 cases as precisely as was previously only possible through experimental trials . When it came to the difference between calculated and actual structure, DeepMind came up with an average value of over 90 – 100 is the maximum. In other words: The AI was damn close to the actual folding.
After the genome comes the proteome
In the meantime, those responsible assume that protein folding is about 98.5 percent of the human proteome, i.e. the entirety of all proteins in the organism, to have predicted so precisely that the results have a concrete medical and scientific benefit. Researchers do not have to start from scratch for every protein, but can use the predicted folding as a basis.
The entire protein database, which DeepMind has created in the meantime, the startup is now making available to the world as open source. This includes almost all of the 20,000 proteins expressed by humans, as well as the structures of around 20 other organisms such as yeast bacteria and E.coli. In total, the database contains around 350,000 proteins, which makes the database one of the largest and probably the best that has ever existed. There is also a paper detailing the underlying AI process.
DeepMind, which is carried out with the European Molecular Biology Laboratory hopes the database will help researchers analyze how life works at the atomic level. They should find out how diseases are triggered, how drugs can be personalized and how “green enzymes” might one day break down plastic.
“Structural biologists are not used to being able to look up everything in seconds, as it has taken them years to determine these things experimentally,” says DeepMind CEO Demis Hassabis in an interview with Techrunch. “And I think that should lead to completely new approaches to solving open questions and to new experiments.”

