ChatGPT was successful because human instructors told the AI model that controlled the bot what was good and wrong. Additional AI may make AI aids more innovative and more reliable for human teachers, according to OpenAI.

With ChatGPT, OpenAI pioneered reinforcement learning with human feedback (RLHF). This method refines an AI model with human testers to make its output more coherent, less unpleasant, and more accurate. An algorithm controls the model based on trainer ratings. Chatbots are more trustworthy and valuable and behave better thanks to technology.

Read also: How Ziki Uses AI to Tailor Education for every Student

OpenAI researcher Nat McAleese said, “RLHF does work very well, but it has some key limitations.” Unreliable human feedback is one example. Furthermore, even expert humans may struggle to rate complicated outputs like software code. It can also optimise a model to generate convincing but inaccurate results.

How does OpenAI’s new model assist human trainers in assessing code?

OpenAI refined its most powerful model, GPT-4, to help human trainers evaluate code. The company found that CriticGPT could find bugs humans overlooked and that human judges liked its code critiques 63% of the time. OpenAI will consider applying the method beyond code.

MacAleese says, “We’re starting work to integrate this technique into our RLHF chat stack. He admits that CriticGPT can hallucinate but believes it could improve OpenAI’s models and ChatGPT by minimising human training errors. The ability of humans to instruct an AI beyond their capacities may potentially help AI models get more brilliant, he says. McAleese says that people will require more excellent aid as models improve.

The new technique is now being developed to improve large language models and squeeze more abilities. It is also part of an effort to ensure that AI behaves in acceptable ways even as it becomes more capable.

Anthropic, a rival to OpenAI formed by ex-OpenAI workers, released Claude, its chatbot, last month with improved training and data. Anthropic and OpenAI have also announced new tools to analyse AI models to understand how they produce output to prevent deceit.

Read also: Sonia’s AI chatbot redefines therapy

OpenAI’s Breakthrough in AI Alignment

OpenAI may be able to train more innovative, trustworthy AI models that align with human values if they can employ the new method in more than just code. OpenAI is training the next large AI model. The brand wants to show it’s serious about model behaviour. This happened after a well-known AI risk group was disbanded. 

He managed the team with co-founder and former board member Ilya Sutskever, who briefly ousted CEO Sam Altman before helping him return. Some on that team have since complained that the corporation is taking too many risks to swiftly produce and sell sophisticated AI systems.

Dylan Hadfield-Menell, a professor at MIT who researches ways to align AI, says the idea of having AI models help train more powerful ones has been kicking around for a while. “This is a pretty natural development,” he says.

RLHF researchers considered related ideas several years ago, according to Hadfield-Menell. How broadly applicable and powerful it is is unknown. He explains, “It might lead to big jumps in individual capabilities and a stepping stone towards more effective feedback in the long run.