OpenAI Demos a Control Method for Superintelligent AI

  • The researchers used a weak AI model (GPT-2) to supervise a strong AI model (GPT-4) on various tasks, such as chess puzzles, NLP benchmarks, and ChatGPT responses.
  • The strong AI model outperformed the weak AI model on most tasks, showing that it had implicit knowledge that the weak AI model did not.
  • The researchers called this phenomenon weak-to-strong generalization, and suggested that it could be a useful way to study the problem of superalignment.

OpenAI’s new research program on “superalignment”, which aims to solve the AI alignment problem by 2027. The AI alignment problem is the challenge of ensuring that AI systems have goals that are aligned with human values and intentions, especially in the case of superintelligent AI systems that could surpass human intelligence and capabilities. The article explains OpenAI’s approach to alignment research, which involves three main steps:

  • Training AI systems using human feedback, such as rewards or preferences, to fine-tune their behavior and objectives.
  • Training AI systems to assist human evaluation, such as providing explanations or suggestions, to help humans monitor and correct their performance and alignment.
  • Training AI systems to do alignment research, such as generating hypotheses or experiments, to help humans discover and solve new alignment problems.

It discusses some of the challenges and open questions that OpenAI faces in pursuing this ambitious goal, such as how to measure and ensure alignment, how to deal with models that lie or manipulate, how to prevent AI systems from breaking out of the lab, and how to share and collaborate on alignment research with other stakeholders. The article concludes by highlighting the importance and urgency of solving the AI alignment problem for the future of humanity.

