activate_primeday_promo_in_buybox_DT
Episodios
  • AF - Simplifying Corrigibility - Subagent Corrigibility Is Not Anti-Natural by Rubi Hudson
    Jul 16 2024
    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Simplifying Corrigibility - Subagent Corrigibility Is Not Anti-Natural, published by Rubi Hudson on July 16, 2024 on The AI Alignment Forum. Max Harms recently published an interesting series of posts on corrigibility, which argue that corrigibility should be the sole objective we try to give to a potentially superintelligent AI. A large installment in the series is dedicated to cataloging the properties that make up such a goal, with open questions including whether the list is exhaustive and how to trade off between the items that make it up. I take the opposite approach to thinking about corrigibility. Rather than trying to build up a concept of corrigibility that comprehensively solves the alignment problem, I believe it is more useful to cut the concept down to a bare minimum. Make corrigibility the simplest problem it can be, and try to solve that. In a recent blog post comparing corrigibility to deceptive alignment, I treated corrigibility simply as a lack of resistance to having goals modified, and I find it valuable to stay within that scope. Importantly, that is the aspect of corrigibility that is anti-natural, meaning that it can't be straightforwardly captured in a ranking of end states. Why does this definition of corrigibility matter? It's because properties that are not anti-natural can be explicitly included in the desired utility function. Following that note, this post is not intended as a response to Max's work, but rather to MIRI and their 2015 paper Corrigibility. Where Max thinks the approach introduced by that paper is too narrow, I don't find it narrow enough. In particular, I make the case that corrigibility does not require ensuring subagents and successors are corrigible, as that can better be achieved by directly modifying a model's end goals. Corrigiblity (2015) The Corrigibility paper lists five desiderata as proposed minimum viable requirements for a solution to corrigibility. The focus is on shut down, but I also think of it as including goal modification, as that is equivalent to being shut down and replaced with another AI. 1. The agent shuts down when properly requested 2. The agent does not try to prevent itself from being shut down 3. The agent does not try to cause itself to be shut down 4. The agent does not create new incorrigible agents 5. Subject to the above constraints, the agent optimizes for some goal MIRI does not present these desiderata as a definition for corrigibility, but rather as a way to ensure corrigibility while still retaining usefulness. An AI that never takes actions may be corrigible, but such a solution is no help to anyone. However, taking that bigger picture view can obscure which of those aspects define corrigibility itself, and therefore which parts of the problem are anti-natural to solve. My argument is that the second criterion alone provides the most useful definition of corrigibility. It represents the only part of corrigibility that is anti-natural. While the other properties are largely desirable for powerful AI systems, they're distinct attributes and can be addressed separately. To start the pare down of criteria, the fifth just states that some goal exists to be made corrigible, rather than being corrigibility itself. The first criterion is implied by the second after channels for shut down have been set up. Property three aims at making corrigible agents useful, rather than being inherent to corrigibility. It preempts a naive strategy that incentivizes shut down by simply giving the agent high utility for doing so. However, beyond not being part of corrigibility, it also goes too far for optimal usefulness - in certain situations we would like agents to have us to shut them off or modify them (some even consider this to be part of corrigibility). Weakening this desideratum to avoid incentivi...
    Más Menos
    8 m
  • LW - Multiplex Gene Editing: Where Are We Now? by sarahconstantin
    Jul 16 2024
    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Multiplex Gene Editing: Where Are We Now?, published by sarahconstantin on July 16, 2024 on LessWrong. We're starting to get working gene therapies for single-mutation genetic disorders, and genetically modified cell therapies for attacking cancer. Some of them use CRISPR-based gene editing, a new technology (that earned Jennifer Doudna and Emmanuelle Charpentier the 2020 Nobel Prize) to "cut" and "paste" a cell's DNA. But so far, the FDA-approved therapies can only edit one gene at a time. What if we want to edit more genes? Why is that hard, and how close are we to getting there? How CRISPR Works CRISPR is based on a DNA-cutting enzyme (the Cas9 nuclease), a synthetic guide RNA (gRNA), and another bit of RNA (tracrRNA) that's complementary to the gRNA. Researchers can design whatever guide RNA sequence they want; the gRNA will stick to the complementary part of the target DNA, the tracrRNA will complex with it, and the nuclease will make a cut there. So, that's the "cut" part - the "paste" comes from a template DNA sequence, again of the researchers' choice, which is included along with the CRISPR components. Usually all these sequences of nucleic acids are packaged in a circular plasmid, which is transfected into cells with nanoparticles or (non-disease-causing) viruses. So, why can't you make a CRISPR plasmid with arbitrary many genes to edit? There are a couple reasons: 1. Plasmids can't be too big or they won't fit inside the virus or the lipid nanoparticle. Lipid nanoparticles have about a 20,000 base-pair limit; adeno-associated viruses (AAV), the most common type of virus used in gene delivery, has a smaller payload, more like 4700 base pairs. 1. This places a very strict restriction on how many complete gene sequences that can be inserted - some genes are millions of base pairs long, and the average gene is thousands! 2. but if you're just making a very short edit to each gene, like a point mutation, or if you're deleting or inactivating the gene, payload limits aren't much of a factor. 2. DNA damage is bad for cells in high doses, particularly when it involves double-strand breaks. This also places limits on how many simultaneous edits you can do. 3. A guide RNA won't necessarily only bind to a single desired spot on the whole genome; it can also bind elsewhere, producing so-called "off-target" edits. If each guide RNA produces x off-target edits, then naively you'd expect 10 guide RNAs to produce 10x off-target edits…and at some point that'll reach an unacceptable risk of side effects from randomly screwing up the genome. 4. An edit won't necessarily work every time, on every strand of DNA in every cell. (The rate of successful edits is known as the efficiency). The more edits you try to make, the lower the efficiency will be for getting all edits simultaneously; if each edit is 50% efficient, then two edits will be 25% efficient or (more likely) even less. None of these issues make it fundamentally impossible to edit multiple genes with CRISPR and associated methods, but they do mean that the more (and bigger) edits you try to make, the greater the chance of failure or unacceptable side effects. How Base and Prime Editors Work Base editors are an alternative to CRISPR that don't involve any DNA cutting; instead, they use a CRISPR-style guide RNA to bind to a target sequence, and then convert a single base pair chemically - they turn a C/G base pair to an A/T, or vice versa. Without any double-strand breaks, base editors are less toxic to cells and less prone to off-target effects. The downside is that you can only use base editors to make single-point mutations; they're no good for large insertions or deletions. Prime editors, similarly, don't introduce double-strand breaks; instead, they include an enzyme ("nickase") that produces a single-strand "nick"...
    Más Menos
    14 m
  • EA - Apply now: Get "unstuck" with the New IFS Self-Care Fellowship Program by Inga
    Jul 16 2024
    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Apply now: Get "unstuck" with the New IFS Self-Care Fellowship Program, published by Inga on July 16, 2024 on The Effective Altruism Forum. You finally want to resolve deeper-seated inner conflicts, and remove inner blocks in the way of becoming a more fulfilled, resilient, and well-performing version of yourself? This post allows you to learn how IFS as a coaching or therapy approach can help with mental wellbeing, if it might be the right approach for you, if so to get excited about it and inform you about the opportunity to take part in Rethink Wellbeing's online IFS group course starting this August. Executive Summary Rethink Wellbeing's (RW) launches a brand new online IFS course for ambitious altruists. Learn powerful, and practical tools to uncover the dynamics of your inner conflicts, become a more whole and resilient self, and transform your mental wellbeing and performance. You will meet with a peer group of 5-7 like-minded ambitious altruists led by a trained peer facilitator, for 6 weeks and 3 follow-ups. The course empowers you to learn IFS skills and apply those to your life until they become habitual. This includes 9 group sessions, home practice based on an IFS "playbook", individual progress tracking, and support from the Rethink Wellbeing Online Community. Participation takes ~5 hours per week for 6 weeks, and 2-3 hours the 8 weeks after. You can apply via the form now in less than 15 minutes. Due date: until 20th July 2024. All groups start in August 2024. We accept suitable participants until all spaces are full. The earlier you apply, the higher the chances to secure your spot. No or low costs - two options and all in between: No costs and a motivational deposit of $200 (less in LMICs) that you get back upon successful participation, or $550 to cover the costs of your attendance. Internal Family Systems (IFS): When talking about themselves, many people naturally use expressions like "a part of me." For example, someone who was considering a job offer might say, "one part of me is excited about this opportunity, but another part of me is afraid of the responsibility." Internal Family Systems (IFS) is a form of psychotherapy that takes this kind of language literally and assumes that people's minds are divided into parts with sometimes conflicting beliefs and goals. IFS aims to reconcile conflicts between those parts and get them to cooperate rather than fight each other, so that they can become a more healed and whole self. The goal is to improve self-leadership, ground, and grow yourself, your new self, in the 8 C's of IFS: curiosity, compassion, calmness, clarity, confidence, creativity, courage, and connectedness. How IFS works Do you know what would be beneficial for you to do, but just can't make the change? Do you keep coming up against the same challenging or unresolvable inner blocks? Do you recognize these behaviors in yourself: Avoiding, putting off, and neglecting things: Are you procrastinating important tasks and goals, finding yourself endlessly planning but never executing? Do you find yourself turning to distractions or comfort activities when faced with stress? Feeling guilty for not doing what you planned to do? Or not good enough for not having done enough or good enough? Judging yourself, and high expectations: Are you constantly doubting your abilities despite your achievements? Feeling like an imposter in your field? Do you set excessively high standards for yourself that are almost impossible to meet? Or do you believe you need to keep doing more to be good enough? Monitoring yourself when with others: Do you try to make sure others think well of you by controlling what you do or don't say? Do you keep trying to please others or take care of them so that they like you more or do what you want? Do you neglect your own needs a...
    Más Menos
    14 m

Lo que los oyentes dicen sobre The Nonlinear Library

Calificaciones medias de los clientes

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.