Advice I Commonly Give People New To Alignment

07 Jan 2022 | 506 words

Epistemic status: I am very junior and have very little idea of what I’m talking about, which still might be enough to make this worth reading.

I often find myself talking to people who want to get more surface area on what it’s like to be an alignment researcher, how to identify other people who might be good alignment researchers, how to tell if they themselves are good alignment researchers, etc. Here are things that I commonly say to such people:

The alignment problem is a pretty normal research problem, similar to problems from fields like theoretical computer science. It’s hard to determine if someone is a good alignment researcher in basically the same ways it’s hard to figure out if someone will win a Nobel prize, a Turing award, or a Fields medal (or just be a good researcher/professor in other academic fields). Most people are bad at alignment not because alignment is a especially tricky (maybe it’s slightly more tricky than other problems), but just because most people suck at research. It’s hard to determine if you would be a good alignment researcher because it’s hard to determine if you would be a good physicist.
- Corollary: Generic research experience is likely to be helpful for thinking about alignment. Fields that seem similar to alignment to me (without really knowing that much) are theoretical computer science, cryptography, and some parts of math.
When you’re thinking about alignment yourself, be very curious why your ideas are bad. I take as granted that at the beginning you’re only going to have bad ideas. A mistake I made when I was younger was to have ideas, then immediately discard them as “bad” without asking why they would be bad. After thinking for a while, I would be left with ideas that I had uniformly discarded as “bad,” with no sense of whether or not progress was made. If instead I had been more curious about specifically why my ideas were bad, I could have generated ideas that were bad in different ways from each other, which would have given a tangible sense of progress and given me more traction on the problem. This point might be important enough to be it’s own post at some point.
Try your hand at the ELK contest. It’s probably the best opportunity in a while to take a stab at a relevant research problem that has a single document describing the relevant contrast. Holden gives a more thorough argument here. Once someone had basically understood the problem (as described in this document), I would be excited about their potential to be a good (alignment) researcher if they were able to come up with a winning proposal in ~10 hours. I expect it would take around ~20 hours for someone to familiarize themselves with the document. (Note that this is positive selection, not negative selection. If someone fails to come up with a good proposal, that doesn’t mean they still can’t be a good researcher.)

Artificially Intelligent

About

Archive

Donations

Things

Advice I Commonly Give People New To Alignment