EA - Detecting Genetically Engineered Viruses With Metagenomic Sequencing by Jeff Kaufman Podcast Por  arte de portada

EA - Detecting Genetically Engineered Viruses With Metagenomic Sequencing by Jeff Kaufman

EA - Detecting Genetically Engineered Viruses With Metagenomic Sequencing by Jeff Kaufman

Escúchala gratis

Ver detalles del espectáculo

Acerca de esta escucha

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Detecting Genetically Engineered Viruses With Metagenomic Sequencing, published by Jeff Kaufman on June 27, 2024 on The Effective Altruism Forum.
This represents work from several people at the NAO. Thanks especially to Dan Rice for implementing the duplicate junction detection, and to @Will Bradshaw and @mike_mclaren for editorial feedback.
Summary
If someone were to intentionally cause a
stealth pandemic today, one of the ways they might do it is by modifying an existing virus. Over the past few months we've been working on building a computational pipeline that could flag evidence of this kind of genetic engineering, and we now have an initial pipeline working end to end.
When given 35B read pairs of wastewater sequencing data it raises 14 alerts for manual review, 13 of which are quickly dismissible false positives and one is a known genetically engineered sequence derived from HIV. While it's hard to get a good estimate before actually going and doing it, our best guess is that if this system were deployed at the scale of approximately $1.5M/y it could detect something genetically engineered that shed like SARS-CoV-2 before 0.2% of people had been infected.
System Design
The core of the system is based on two observations:
If someone has made substantial modifications to an existing virus then somewhere in the engineered genome there will be a series of bases that are a good match for the original genome followed by a series of bases that are a poor match for the original genome. We can look for sequencing reads that have this property and raise them for human review.
Chimeric reads can occur as an artifact of sequencing, which can lead to false positives. The chance that you would see multiple chimeras involving exactly the same junction by chance, however, is relatively low. By requiring 2x coverage of the junction we can remove almost all false positives, at the cost of requiring approximately twice as much sequencing.
Translating these observations into sufficiently performant code that does not trigger alerts on common sequencing artifacts has taken some work, but we now have this running.
While it would be valuable to release our detector so that others can evaluate it or apply it to their own sequencing reads, knowing the details of how we have applied this algorithm could allow someone to engineer sequences that it would not be able to detect. While we would like to build a detection system that can't be more readily bypassed once you know how it works, we're unfortunately not there yet.
Evaluation
We have evaluated the system in two ways: by measuring its performance on simulated genetic engineered genomes and by applying it to a real-world dataset collected by a partner lab.
Simulation
We chose a selection of 35 viruses that
Virus Host DB categorizes as human-infecting viruses, with special attention to respiratory viruses:
Disease
Virus
Genome Length
AIDS
HIV
9,000
Chickenpox and Shingles
Human alphaherpesvirus 3
100,000
Chikungunya
Chikungunya virus
10,000
Common cold
Human coronavirus 229E
30,000
Common cold
Human coronavirus NL63
30,000
Common cold
Human coronavirus OC43
30,000
Common cold
Human rhinovirus NAT001
7,000
Common cold
Rhinovirus A1
7,000
Common cold
Rhinovirus B3
7,000
Conjunctivitis
Human adenovirus 54
30,000
COVID-19
SARS-CoV-2
30,000
Ebola
Ebola
20,000
Gastroenteritis
Astrovirus MLB1
6,000
Influenza
Influenza A Virus, H1N1
10,000
Influenza
Influenza A Virus, H2N2
10,000
Influenza
Influenza A Virus, H3N2
10,000
Influenza
Influenza A Virus, H7N9
10,000
Influenza
Influenza A Virus, H9N2
10,000
Influenza
Influenza C Virus
10,000
Measles
Measles morbillivirus
20,000
MERS
MERS Virus
30,000
Metapneumovirus infection
Human metapneumovirus
10,000
Mononucleosis
Human herpesvirus 4 type 2
200,000
MPox
Monkeypox virus
200,000
Mumps
Mumps orthor...
Todavía no hay opiniones