News

Viruses are doing mysterious things everywhere – AI could help researchers understand what they’re doing in the oceans and in your gut

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


Viruses are a mysterious and little-understood force in microbial ecosystems. Researchers know they can infect, kill, and manipulate human and bacterial cells in almost all environments, from the oceans to your gut. But scientists still don’t have a complete picture of how viruses affect their surrounding environments, in large part because of their extraordinary diversity and ability to evolve quickly.

Communities of microbes are difficult to study in the laboratory. Many microbes are difficult to cultivate and their natural environment has many more features influencing its success or failure than scientists can replicate in a laboratory.

Then systems biologists like me often sequence all the DNA present in a sample – for example, a fecal sample from a patient – separate the viral DNA sequencesthen write down the sections of the viral genome that encodes proteins. These notes about the location, structure, and other characteristics of genes help researchers understand the roles viruses can play in the environment and identify different types of viruses. Researchers annotate viruses by matching viral sequences in a sample with previously annotated sequences available in public databases of viral genetic sequences.

However, scientists are identifying viral sequences in DNA collected from the environment at a rapid rate. rate that far exceeds our ability to annotate these genes. This means that researchers are publishing discoveries about viruses in microbial ecosystems using unacceptably small fractions of available data.

To improve researchers’ ability to study viruses around the world, my team and I developed a new approach to annotate viral sequences using artificial intelligence. Using protein language models similar to large language models like ChatGPT, but specific to proteins, we were able to classify novel viral sequences. This opens the door for researchers to not only learn more about viruses, but also address biological questions that are difficult to answer with current techniques.

Annotating viruses with AI

Great language models using relationships between words in large sets of text data to provide potential answers to questions for which they have not been explicitly “taught” the answer. When you ask a chatbot “What is the capital of France?” for example, the model does not look for the answer in a capital table. Instead, he is using his training on huge datasets of documents and information to infer the answer: “The capital of France is Paris.”

Similarly, protein language models are AI algorithms trained to recognize relationships between billions of protein sequences from environments around the world. Through this training, they will be able to infer something about the essence of viral proteins and their functions.

We wondered whether protein language models could answer this question: “Given all the annotated viral genetic sequences, what is the function of this new sequence?”

In our concept proof, we train neural networks on previously annotated viral protein sequences in pre-trained protein language models and then use them to predict the annotation of new viral protein sequences. Our approach allows us to investigate what the model is “seeing” in a specific viral sequence that leads to a specific annotation. This helps identify candidate proteins of interest based on their specific functions or the way their genome is organized, sifting through the search space of vast datasets.

Microscopic image of spherical bacteria colored in bright greenMicroscopic image of spherical bacteria colored in bright green

By identifying more distant viral gene functions, protein language models can complement current methods to provide new insights into microbiology. For example, my team and I were able to use our model to discover a previously unrecognized integrase – a type of protein that can transport genetic information in and out of cells – in globally abundant marine picocyanobacteria Prochlorococcus It is Synecococcus. Remarkably, this integrase may be able to move genes in and out of these bacterial populations in the oceans and allow these microbes to better adapt to changing environments.

Our language model also identified a new viral capsid protein which is widespread in the global oceans. We have produced the first image of how its genes are organized, showing that it may contain different sets of genes that we believe indicate that this virus performs different functions in its environment.

These preliminary findings represent just two of the thousands of annotations our approach provided.

Analyzing the unknown

The majority of hundreds of thousands of newly discovered the viruses remain unclassified. Many viral genetic sequences correspond to families of proteins with no known function or never seen before. Our work shows that similar protein language models could help study the threat and promise of our planet’s many uncharacterized viruses.

Although our study focused on viruses in the global oceans, improved annotation of viral proteins is critical to better understanding the role viruses play in health and disease in the human body. We and other researchers have hypothesized that viral activity in the human gut microbiome can be changed when you are sick. This means that viruses can help identify stress in microbial communities.

However, our approach is also limited because it requires high-quality annotations. Researchers are developing new protein language models that incorporate other “tasks” as part of their training, particularly predicting protein structures to detect similar proteins, to make them more powerful.

Make all AI tools available via FAIR data principles – data that is findable, accessible, interoperable and reusable – can help researchers at large realize the potential of these new ways of annotating protein sequences, leading to discoveries that benefit human health.

This article was republished from The conversation, an independent, nonprofit news organization that brings you trusted facts and analysis to help you understand our complex world. It was written by: Libusha Kelly, Albert Einstein Faculty of Medicine

See more information:

Libusha Kelly receives funding from the National Institutes of Health.



Source link

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

1 2 3 6,112

Don't Miss

Hailey Bieber shares new photos of growing baby bump at Rhode HQ as fans rave about update

HAILEY Bieber left fans thrilled with her growing baby bump

Apple will provide a minimum of five years of iPhone security updates

Apple has committed to providing at least five years of