• EA - Detecting Genetically Engineered Viruses With Metagenomic Sequencing by Jeff Kaufman

  • Jun 27 2024
  • Length: 15 mins
  • Podcast

EA - Detecting Genetically Engineered Viruses With Metagenomic Sequencing by Jeff Kaufman  By  cover art

EA - Detecting Genetically Engineered Viruses With Metagenomic Sequencing by Jeff Kaufman

  • Summary

  • Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Detecting Genetically Engineered Viruses With Metagenomic Sequencing, published by Jeff Kaufman on June 27, 2024 on The Effective Altruism Forum. This represents work from several people at the NAO. Thanks especially to Dan Rice for implementing the duplicate junction detection, and to @Will Bradshaw and @mike_mclaren for editorial feedback. Summary If someone were to intentionally cause a stealth pandemic today, one of the ways they might do it is by modifying an existing virus. Over the past few months we've been working on building a computational pipeline that could flag evidence of this kind of genetic engineering, and we now have an initial pipeline working end to end. When given 35B read pairs of wastewater sequencing data it raises 14 alerts for manual review, 13 of which are quickly dismissible false positives and one is a known genetically engineered sequence derived from HIV. While it's hard to get a good estimate before actually going and doing it, our best guess is that if this system were deployed at the scale of approximately $1.5M/y it could detect something genetically engineered that shed like SARS-CoV-2 before 0.2% of people had been infected. System Design The core of the system is based on two observations: If someone has made substantial modifications to an existing virus then somewhere in the engineered genome there will be a series of bases that are a good match for the original genome followed by a series of bases that are a poor match for the original genome. We can look for sequencing reads that have this property and raise them for human review. Chimeric reads can occur as an artifact of sequencing, which can lead to false positives. The chance that you would see multiple chimeras involving exactly the same junction by chance, however, is relatively low. By requiring 2x coverage of the junction we can remove almost all false positives, at the cost of requiring approximately twice as much sequencing. Translating these observations into sufficiently performant code that does not trigger alerts on common sequencing artifacts has taken some work, but we now have this running. While it would be valuable to release our detector so that others can evaluate it or apply it to their own sequencing reads, knowing the details of how we have applied this algorithm could allow someone to engineer sequences that it would not be able to detect. While we would like to build a detection system that can't be more readily bypassed once you know how it works, we're unfortunately not there yet. Evaluation We have evaluated the system in two ways: by measuring its performance on simulated genetic engineered genomes and by applying it to a real-world dataset collected by a partner lab. Simulation We chose a selection of 35 viruses that Virus Host DB categorizes as human-infecting viruses, with special attention to respiratory viruses: Disease Virus Genome Length AIDS HIV 9,000 Chickenpox and Shingles Human alphaherpesvirus 3 100,000 Chikungunya Chikungunya virus 10,000 Common cold Human coronavirus 229E 30,000 Common cold Human coronavirus NL63 30,000 Common cold Human coronavirus OC43 30,000 Common cold Human rhinovirus NAT001 7,000 Common cold Rhinovirus A1 7,000 Common cold Rhinovirus B3 7,000 Conjunctivitis Human adenovirus 54 30,000 COVID-19 SARS-CoV-2 30,000 Ebola Ebola 20,000 Gastroenteritis Astrovirus MLB1 6,000 Influenza Influenza A Virus, H1N1 10,000 Influenza Influenza A Virus, H2N2 10,000 Influenza Influenza A Virus, H3N2 10,000 Influenza Influenza A Virus, H7N9 10,000 Influenza Influenza A Virus, H9N2 10,000 Influenza Influenza C Virus 10,000 Measles Measles morbillivirus 20,000 MERS MERS Virus 30,000 Metapneumovirus infection Human metapneumovirus 10,000 Mononucleosis Human herpesvirus 4 type 2 200,000 MPox Monkeypox virus 200,000 Mumps Mumps orthor...
    Show more Show less
activate_primeday_promo_in_buybox_DT

What listeners say about EA - Detecting Genetically Engineered Viruses With Metagenomic Sequencing by Jeff Kaufman

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.