What part of the alignment problem does this plan aim to solve? This plan aims to solve the shutdown problem. Why has that part of the alignment problem been chosen? Because the shutdown problem is tractable, and because a solution (if widely implemented) would push the risk of AI takeover down to approximately zero. How does this plan aim to solve the problem? By training agents to lack a preference between every pair of different-length trajectories, and thereby ensuring that these agents aren't willing to pay costs to prevent or cause shutdown. What evidence is there that the methods will work? See the proposal. There are arguments that agents trained in line with the IPP won't pay costs to prevent or cause shutdown (in section 11). There are also arguments that the IPP largely circumvents the problems of reward misspecification, goal misgeneralization, and deceptive alignment (in section 19). We're also working on an experiment to test the proposed reward function in some gridworld environments. What are the most likely causes of this not working? See the proposal. Section 21 lists some issues still to address, including multi-agent dynamics, maintaining the shutdown button, creating corrigible subagents, and 'managing the news'.