Ex Anter

Ex ante thoughts from the midst of history.

About

Archive

Things

Rogue AGI Embodies Valuable Intellectual Property
04 Jun 2021 | 1082 words
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This post was written by Mark Xu based on interviews with Carl Shulman. It was paid for by Open Philanthropy but is not representative of their views. Summary: Rogue AGI has access to its embodied IP. This...
An Intuitive Guide to Garrabrant Induction
03 Jun 2021 | 7381 words
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This post is a high-level summary of the core insights and arguments in Logical Induction, a MIRI paper from 2016. It’s intended for people without much mathematical training. Numbers in [brackets] indicate the section of the paper...
Intermittent Distillations #3
15 May 2021 | 3406 words
Mundane solutions to exotic problems (Paul Christiano) Mundane solutions to exotic problems Summary Thinking about AI safety often leads to considering exotic problems: models purposefully altering their gradients, agents hiding their capabilities to defect when an opportunity arises, or humans being vulnerable to side-channel attacks. These exotic problems might seem...
Lumenator Recipe
12 May 2021 | 109 words
Let there be light. My current lumenator consists of three copies of the following: 2x LED Pure White CRI 95 Corn Bulb 1x LED Warm White CRI 95 Corn Bulb 1x Adesso Floor Lamp My previous lumenator consisted of: 3x 16 bulb light string 32x 5000k CRI 90 LED 16x...
Pre-Training + Fine-Tuning Favors Deception
08 May 2021 | 962 words
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Thanks to Evan Hubinger for helpful comments and discussion. Currently, to obtain models useful for some task X, models are pre-trained on some task Y, then fine-tuned on task X. For example, to obtain a model that...
Less Realistic Tales of Doom
06 May 2021 | 1263 words
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Realistic tales of doom must weave together many political, technical, and economic considerations into a single story. Such tales provide concrete projections but omit discussion of less probable paths to doom. To rectify this, here are some...
Making Markets Over Beliefs
30 Apr 2021 | 949 words
Betting is an excellent way to improve the accuracy of your beliefs. For example: Me: “I think Alice is 5’11.” Friend: “I’ll bet you $5 1:1 that it’s higher.” Translated, my friend means they’re willing to agree to a deal where I pay them $5 if Alice is taller than...
Agents Over Cartesian World Models
26 Apr 2021 | 8610 words
Coauthored with Evan Hubinger Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Thanks to Adam Shimi, Alex Turner, Noa Nabeshima, Neel Nanda, Sydney Von Arx, Jack Ryan, and Sidney Hough for helpful discussion and comments. Abstract We analyze agents by supposing a Cartesian boundary between...
Intermittent Distillations #2
13 Apr 2021 | 2862 words
Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making (Andrew Critch and Stuart Russell) Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making Summary A policy (over some partially observable Markov decision process (POMDP)) is Pareto optimal with respect to two agents with different utility functions if it...
Meta-EA Needs Models
05 Apr 2021 | 1251 words
Epistemic status: a bunch of stuff Sometimes I have conversations with people that go like this: Me: Feels like all the top people in EA would have gotten into EA anyway? I see a bunch of people talking about how sometimes people get into EA and they agree with the...

« 1 2 3 4 5 6 7 8 9 10 »