What part of the alignment problem does this plan aim to solve? Prevent misalignment at training time Why has that part of the alignment problem been chosen? Its very underlooked How does this plan aim to solve the problem? Create methods that make it very hard to train models for harmful purposes What evidence is there that the methods will work? https://arxiv.org/abs/2405.14577 provides both empirical and theoretical evidence its possible. What are the most likely causes of this not working? If organizations train models from scratch than this is not a viable safety approach.