What part of the alignment problem does this plan aim to solve? Defining hard constraints on AI behaviour. Why has that part of the alignment problem been chosen? That's where planners are applicable. How does this plan aim to solve the problem? A planner would be either hard-constrained with formal language definition of not harming humans, or would incorporate sentiment analysis in each step of planning. What evidence is there that the methods will work? See the attached paper. What are the most likely causes of this not working? STRIPS is a very primitive planning language, and for the more complex ones planners need to be developed almost from scratch. This approach might once again teach us the bitter lesson, as completely algorithmic planner becomes unfeasible to implement. Naive implementation will also suffer from combinatorial explosion almost immediately. However, this may all be mitigated with further integration of LLM functionality into planning process. And, even though this brings back the problem of blackboxes, the architecture will still enable some degree of compartmentalisation, which will mean that a smaller, more easily interpretable models shall have to be contended with.