Artificially Intelligent

Any mimicry distinguishable from the original is insufficiently advanced.

Meta-EA Needs Models

| 1274 words

Thanks to Kuhan Jeyapragasan, Michael Byun, Sydney Von Arx, Thomas Kwa, Jack Ryan, Adam Křivka, and Buck Shlegeris for helpful comments and discussion.

Epistemic status: a bunch of stuff

Sometimes I have conversations with people that go like this:

Me: Feels like all the top people in EA would have gotten into EA anyway? I see a bunch of people talking about how sometimes people get into EA and they agree with the philosophy, but don’t know what to do and slowly drift away from the movement. But also there’s this other thing they could have done which is “figure out what to do and then do that.” Why didn’t they do that instead? More concretely, sometimes I talk to software engineers that want to get into AI Safety. Nate Soares was once a software engineer and he studied a bunch of math and now runs MIRI. This feels like a gap that can’t really be solved by creating concrete next steps.

Them: A few things:

  1. Anecdotally (and also maybe some survey data), there are people that you would consider “top EAs” where it feels like they could have not gotten into EA if things were different, e.g. they were introduced by a friend they respected less or they read the wrong introduction. It seems still quite possible that we aren’t catching all the “top people.”

  2. Even if we can’t get people to counterfactually become EAs, we can still make their careers faster. It’s much easier to convince people to change careers in undergrad, when they haven’t spent a bunch of effort. For example, if you want EAs to get PhDs, then the path becomes much murkier after undergrad.

  3. There are people with other skills that the EA movement needs. Becoming a personal assistant, for example. Or other things like journalism and anthropology that the EA movement needs but might not get by default because such people aren’t attracted to “EA-type-thinking” by default.

  4. Even if we don’t convert people to EA, we can spread EA ideas to people who might make important decisions in the future, like potential employees of large companies, politicians, lawyers, etc. For example, the world would probably be better if all ML engineers were mildly concerned with risks posed by AI. Additionally, people often donate money to charity, and if we can plant the seed of EA in such people, they might differentially donate more money to effective charities.

(This is the part where I just say the things I wanted to say from the beginning but didn’t know how to lead into.)

Me: These all seem like good reasons why meta-EA is still valuable, but the main objection I have to them is that they all suggest that meta-EA is operating different orders of magnitude. For example, if the goal of meta-EA is counterfactual speed versus counterfactual careers, that’s like a 10-50x difference in the “number of EA years” you’re getting out of an individual.

More broadly, it feels like meta-EA has a high-level goal, which is “make the world better, whatever that means”, but has a very murky picture about how this resolves into instrumental subgoals. There are certain cruxes like “how hard is it to get a counterfactual top EA” that potentially has a 10x influence on how impactful meta-EA is that we have very little traction on.

More concretely, I imagine taking my best guess at the “current plan of meta-EA” and giving it to Paul Graham and him not funding my startup because the plan isn’t specific/concrete enough to even check if it’s good and this vagueness is a sign that the key assumptions that need to be true for the plan to even work haven’t been identified.

GiveWell tries to do a very hard thing and evaluate charities against each other. The way they do this involves a pretty complicated model and a bunch of estimations but you can look at the model and look at the estimates of the parameters and say “the model seems reasonable and the estimates seem reasonable, so the output must be reasonable.” However, the current state of meta-EA is that I don’t know how to answer the question of whether action A is better than action B and we’re basically just looking at charities and estimating “12 goodness” and “10 goodness” directly, which is not how you’re supposed to estimate things.

Sometimes, questions are too difficult to answer directly. However, if you’re unable to answer a question, then a sign that you’ve understood the question is your ability to break it down into concrete subquestions that can be answered, each of which is easier to answer than the original top-level question. If you can’t do this, then you’re just thinking in circles.

AI safety isn’t much better from a “has concrete answers to questions” perspective, but in AI safety, I can give you a set of subquestions that would help answer the top-level questions, which I feel like I can’t do for meta-EA. Like the questions would be something like “what is meta-EA even trying to do? What are some plausible goals? Which one are we aiming for?” and then I would proceed from there.

To me, this is the core problem of meta-EA: figuring out what the goal is. Who are we targeting? What’s the end-game? Why do we think movement building is valuable? How does it ground out in actions that concretely make the world better?

Right now it feels like a lot of meta-EA is just taking actions without having models of how those actions lead to increased value in the world. And there are people working on collecting data and figuring out answers to some of these questions, which is great, but it still feels like a lot of people in meta-EA don’t have this sense that they’re trying to do a very hard thing and should be doing it in a principled way.

(This is the part where I don’t quite know how my interlocutor would respond.)

Them: Yeah I agree that meta-EA lacks explicit models, but that’s just because it’s trying to solve a much harder problem than GiveWell/AI Safety precisely because there are so many possible goals. In practice, meta-EA doesn’t look like “pick the best goal, then try and pursue it” it looks more like “observe the world, determine which of the many plausible goals of meta-EA are most tractable given what you’ve observed, then steer towards that.” There isn’t a strong “meta-EA agenda” because such an agenda would be shot to pieces within the week. In general, I’m worried about top-down style explicit reasoning overriding the intuitions of people on the ground by biasing towards calculable/measurable things.

Sure “plans are useless, but planning is indispensable”, but sometimes, if the world is changing quickly and you are aiming for five different things in a hundred different ways, planning is also not that useful. In practice, it’s just better to do things that are locally valuable instead of trying to back-chain from some victory condition that you’re not even sure is possible. In the AI safety case, this might look like work on neural network interpretability, which is robustly useful in a broad class of scenarios (see An Analytic Perspective on AI Alignment for more discussion).

Overall, I agree that meta-EA could benefit from more explicit models, but it’s important to note that “model-free” work can still be robustly useful.

(This is the part where I conclude by saying these issues are very tricky and confusing and I would be excited about people thinking about them and gathering empirical data.)