Ex Anter

RSS Feed
29 Mar 2021 | 16 words
Multiple people have requested an RSS feed for my blog. I have created one here.
Transparency Trichotomy
28 Mar 2021 | 2062 words
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Introduction In Relaxed Adversarial Training for Inner Alignment, Evan presents a trichotomy of ways to understand a model M: Transparency via inspection: use transparency tools to understand M via inspecting the trained model. Transparency via training: incentivize...
Intermittent Distillations #1
16 Mar 2021 | 3166 words
This is my low-budget version of Rohin’s Alignment Newsletter. A critique of pure learning and what artificial neural networks can learn from animal brains (Anthony M. Zador) A critique of pure learning and what artificial neural networks can learn from animal brains Summary This paper points out that human learning...
Strong Evidence is Common
13 Mar 2021 | 289 words
Portions of this are taken directly from Three Things I’ve Learned About Bayes’ Rule. One time, someone asked me what my name was. I said, “Mark Xu.” Afterward, they probably believed my name was “Mark Xu.” I’m guessing they would have happily accepted a bet at 20:1 odds that my...
Open Problems in Myopia
10 Mar 2021 | 2405 words
Coauthored with Evan Hubinger. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Thanks to Noa Nabeshima for helpful discussion and comments. Introduction Certain types of myopic agents represent a possible way to construct safe AGI. We call agents with a time discount rate of zero...
Towards a Mechanistic Understanding of Goal-Directedness
09 Mar 2021 | 1408 words
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This post is part of the research I have done at MIRI with mentorship and guidance from Evan Hubinger. Introduction Most discussion about goal-directed behavior has focused on a behavioral understanding, which can roughly be described as...
Revenge of the Prediction Market
05 Mar 2021 | 642 words
Recommended reading: Prediction Markets: Tales from the Election Suppose I wanted to know the probability of some future event. How might I do this? One way would be to pay forecasters from the Good Judgment Project to forecast the event. These forecasters are generally pretty good at what they do,...
Maslow First and the World Second
04 Mar 2021 | 652 words
Saul McLeod: Maslow’s hierarchy of needs is a motivational theory in psychology comprising a five-tier model of human needs, often depicted as hierarchical levels within a pyramid. From the bottom of the hierarchy upwards, the needs are: physiological (food and clothing), safety (job security), love and belonging needs (friendship), esteem,...
How Simulacra Levels Increase
03 Mar 2021 | 653 words
Simulacra levels are an important and confusing concept. The concept itself is described reasonably well by the posts here. I’ve given my list of examples here. However, none of the descriptions I’ve read give a good explanation of why simulacra levels tend to rise. I now understand this process better...
Seriously, the Map is Not the Territory
02 Mar 2021 | 243 words
The quotation is not the referent. “Snow is white “ is true if and only if snow is white. A model of reality is not reality. A prediction about what is going to happen is different from what actually happens. What you expect about reality is not reality. Your feelings...