You can now fine-tune your enterprise’s own version of OpenAI’s o4-mini reasoning model with reinforcement learning

OpenAI has introduced Reinforcement Fine-Tuning (RFT), enabling third-party developers to customize their language models like o4-mini for specific tasks such as tax, healthcare, or legal applications. Unlike supervised fine-tuning, RFT uses graders—custom systems or external models—to score and adjust responses, allowing for nuanced adaptations based on company policies or compliance needs. This method is cost-effective, billing only for active training time, making it efficient for developers to implement while offering significant improvements for businesses requiring tailored AI solutions.

Summary