• LW - Towards more cooperative AI safety strategies by Richard Ngo

  • Jul 16 2024
  • Duración: 6 m
  • Podcast

LW - Towards more cooperative AI safety strategies by Richard Ngo  Por  arte de portada

LW - Towards more cooperative AI safety strategies by Richard Ngo

  • Resumen

  • Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards more cooperative AI safety strategies, published by Richard Ngo on July 16, 2024 on LessWrong. This post is written in a spirit of constructive criticism. It's phrased fairly abstractly, in part because it's a sensitive topic, but I welcome critiques and comments below. The post is structured in terms of three claims about the strategic dynamics of AI safety efforts; my main intention is to raise awareness of these dynamics, rather than advocate for any particular response to them. Claim 1: The AI safety community is structurally power-seeking. By "structurally power-seeking" I mean: tends to take actions which significantly increase its power. This does not imply that people in the AI safety community are selfish or power-hungry; or even that these strategies are misguided. Taking the right actions for the right reasons often involves accumulating some amount of power. However, from the perspective of an external observer, it's difficult to know how much to trust stated motivations, especially when they often lead to the same outcomes as self-interested power-seeking. Some prominent examples of structural power-seeking include: Trying to raise a lot of money. Trying to gain influence within governments, corporations, etc. Trying to control the ways in which AI values are shaped. Favoring people who are concerned about AI risk for jobs and grants. Trying to ensure non-release of information (e.g. research, model weights, etc). Trying to recruit (high school and college) students. To be clear, you can't get anything done without being structurally power-seeking to some extent. However, I do think that the AI safety community is more structurally power-seeking than other analogous communities (such as most other advocacy groups). Some reasons for this disparity include: 1. The AI safety community is more consequentialist and more focused on effectiveness than most other communities. When reasoning on a top-down basis, seeking power is an obvious strategy for achieving one's desired consequences (but can be aversive to deontologists or virtue ethicists). 2. The AI safety community feels a stronger sense of urgency and responsibility than most other communities. Many in the community believe that the rest of the world won't take action until it's too late; and that it's necessary to have a centralized plan. 3. The AI safety community is more focused on elites with homogeneous motivations than most other communities. In part this is because it's newer than (e.g.) the environmentalist movement; in part it's because the risks involved are more abstract; in part it's a founder effect. Again, these are intended as descriptions rather than judgments. Traits like urgency, consequentialism, etc, are often appropriate. But the fact that the AI safety community is structurally power-seeking to an unusual degree makes it important to grapple with another point: Claim 2: The world has strong defense mechanisms against (structural) power-seeking. In general, we should think of the wider world as being very cautious about perceived attempts to gain power; and we should expect that such attempts will often encounter backlash. In the context of AI safety, some types of backlash have included: 1. Strong public criticism of not releasing models publicly. 2. Strong public criticism of centralized funding (e.g. billionaire philanthropy). 3. Various journalism campaigns taking a "conspiratorial" angle on AI safety. 4. Strong criticism from the FATE community about "whose values" AIs will be aligned to. 5. The development of an accelerationist movement focused on open-source AI. These defense mechanisms often apply regardless of stated motivations. That is, even if there are good arguments for a particular policy, people will often look at the net effect on overall power balance when ...
    Más Menos
activate_primeday_promo_in_buybox_DT

Lo que los oyentes dicen sobre LW - Towards more cooperative AI safety strategies by Richard Ngo

Calificaciones medias de los clientes

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.