• Medicine
  • Spirituality
  • Astronomy
  • Sociology
  • Technology
  • History
  • Open Access
  • News

Scientists Proposed OntoProtein For Protein Models

Self-supervised protein language models are successful in learning protein models. With increased computing capacity, existing protein language models pre-trained with millions of sequences may extend the parameter scale from the million-level to the billion-level and produce spectacular improvements. However, those prevalent techniques seldom explore adding knowledge graphs, which may give rich, structured information facts for improved protein representations. The knowledge graphs that include useful biology information may help show proteins better when combined with data from other sources.

OntoProtein was suggested by researchers from Zhejiang University in China, headed by Ningyu Zhang and Zhen Bi. In protein pre-training models, this first generic framework employs structure from Gene Ontology. They created a new large-scale knowledge network based on Gene Ontology and linked proteins, and gene annotation texts or protein sequences characterize all nodes in the graph. They suggested a novel contrastive learning method that simultaneously uses knowledge-aware negative sampling to maximize the knowledge graph and protein embedding during pre-training. OntoProtein can do better than current methods that use pre-trained protein language models. It can predict how proteins interact with each other and how they work.

OntoProtein

OntoProtein is the first broad framework to incorporate external knowledge graphs into protein pre-training. It is a protein pre-training model with gene ontology embedding. The researchers proposed a hybrid encoder to represent English text and protein sequences and contrastive learning using knowledge-aware negative sampling to maximize the knowledge graph and protein sequence embedding during pre-training. They encoded the node descriptions (go annotations) as the relevant entity embeddings for knowledge embedding. They extend their use of gene ontology to molecular function, cellular components, and biological processes, and they develop a knowledge-aware negative sampling strategy for the knowledge embedding aim. With the mask language models, OntoProtein inherits the high capacity of protein comprehension from protein language models. OntoProtein can also integrate biology knowledge into its representation of proteins with supervision from knowledge graphs by the knowledge embedding object. This object doesn't care what kind of protein task you're trying to do. You can change the structure of the model and add new training goals to make it work for different types of functions.

Illustration of protein molecule
Illustration of protein molecule

Mask Protein Modeling And Knowledge Embedding

To build the OntoProtein, the researchers used the mask protein modeling object and the knowledge embedding aim. A novel knowledge graph dataset was generated by combining Gene Ontology and publicly annotated proteins. This dataset was used to train the model, then tested in many downstream tasks. The Tasks Assessing Protein Embeddings (TAPE) benchmark was employed to assess protein representation learning. In TAPE, there are three sorts of functions: structural, evolutionary, and protein engineering. To analyze OntoProtein, they chose six sample datasets, including secondary structure (SS) and contact prediction. Protein-protein interactions (PPI) are high-specificity physical contacts formed between two or more protein molecules. They are seen as a sequence classification challenge and are judged on three different datasets of different sizes.

OntoProtein outperforms all other proteins in all tests. OntoProtein surpasses TAPE Transformer and ProtBert in structure and contact prediction, demonstrating that it may benefit from useful biological knowledge graphs in pre-training. OntoProtein exhibited its ability to predict fluorescence. OntoProtein, on the other hand, does not do well in protein engineering, homology, and stability prediction, all of which are regression tasks. This is most likely owing to the pre-training object's absence of sequence-level goals. The suggested method may be considered pre-training for human language and protein (the language of life). This research aims to determine how to read the language of life's code by making proteins with information about genes.

Conclusion

The researchers first incorporated external factual information from gene ontology into protein models. OntoProtein (protein pretraining with gene ontology embedding) is the first broad framework to integrate external knowledge graphs into protein pre-training. Experiment findings on everyday protein tasks show that effective information injection aids in understanding and uncovering the language of life. Furthermore, OntoProtein is compatible with the model parameters of many pre-trained protein language models, implying that users may utilize the existing pre-trained parameters on OntoProtein without changing the architecture. The promising findings indicate future efforts to improve OntoProtein by infusing more helpful information with gene ontology selection and expanding this technique to other sequence generation challenges for protein design.

Recent Articles

  • Stone Age Humans Conducted Surgical Amputation, A New Study Finds Evidence

    Stone Age Humans Conducted Surgical Amputation, A New Study Finds Evidence

    Scientific investigation in Borneo has unearthed the world's first documented instance that stone age humans conducted surgical amputation. This discovery represents a significant achievement in the annals of human prehistory.

  • 180 Tips - Best Website For Football Prediction And Tips

    180 Tips - Best Website For Football Prediction And Tips

    How can you beat the best odds at the most well-known online bookmakers when betting on soccer matches? Today's 180 Tips forecasts from Betwinner360 include a record of winning methods. 180 tips provides in-depth research of more than 50 different football leagues, betting tactics organized by market, and expert predictions for each match.

  • Virgo And Sagittarius Compatibility - Great Conversations

    Virgo And Sagittarius Compatibility - Great Conversations

    Virgo and Sagittarius's compatibility is such that they are attracted to one another right away. They base their first interactions on how well they get along in conversation, which comes naturally to them both. These changeable signs will enjoy every minute of talking to each other to death! They engage in frenzied arguments, lengthy philosophical discussions, and fast banter.

  • Pisces And Aquarius Compatibility - Creative Brilliance

    Pisces And Aquarius Compatibility - Creative Brilliance

    Pisces and Aquarius compatibility is such that these two are masters of manifestation while working together. When Aquarians let go of their rigid views, they become natural alchemists who are capable of creating with their creative brilliance. Because of their more flexible and understanding nature, Pisces may assist them in becoming less stuck and releasing their boundless creative potential.

  • Pisces And Scorpio Compatibility - Best Match For Love

    Pisces And Scorpio Compatibility - Best Match For Love

    Pisces and Scorpio compatibility is that they have an innate knowledge of one another, even though they are frequently mysterious to others. These two are irresistibly attracted to each other because they feel like they understand each other but don't say it out loud. They seem to have a psychic connection because they can easily tune into one another's brains.

  • A Massive 7.6 Earthquake Rocks Papua New Guinea

    A Massive 7.6 Earthquake Rocks Papua New Guinea

    On Sunday, a massive 7.6 earthquake rocks Papua New Guinea, causing landslides, cracking roads, and damaging houses. It is currently unknown how extensive the damage is; however, there is debris scattered over the highways and fissures popping up in the roads. According to the United States Geological Survey, the epicenter of the earthquake was located at a depth of 90 kilometers, or around 56 miles, and was close to the town of Kainantu, which has a population of approximately 8,500 people.

  • Most Frequent Angel Numbers You Keep Seeing Is Luck Or Unlucky

    Most Frequent Angel Numbers You Keep Seeing Is Luck Or Unlucky

    The most frequent angel numbers you keep seeing are distinct from other forms of esoterica in that they have no connection to the date that you were born. In numerology, every number has a meaning. Nevertheless, certain number sequences are more prevalent than others. Because all of us have come across them, three-digit angel numbers are widespread.

  • Pisces And Leo Compatibility Is Quite Strong

    Pisces And Leo Compatibility Is Quite Strong

    Leo will inevitably rule Pisces in some aspect of life, Pisces and Leo compatibility whether it is in terms of work, friendship, or love. However, as Leos are kind and loving rulers who have no malice in their hearts, and since the fish don't mind being ruled if it's done with warmth and love, this isn't always a terrible thing.

  • Angel Number 6666 Meaning - A Sign That Your Heart Is Full Of Love

    Angel Number 6666 Meaning - A Sign That Your Heart Is Full Of Love

    If you keep seeing the same number in your life, it's not a coincidence. The angels use Angel Number 6666 meaning to send you an important message. If you can clearly interpret the message of angel number 6666, it will serve you well. This number appears in your life directly from the divine realm.