Key Points:
- OpenAI’s latest O1 model excels in solving advanced math, coding, and science challenges.
- The O1 model is available today through ChatGPT and OpenAI’s API as a preview, with future updates expected.
- While it’s still a preview, it shows significant improvements in math and problem-solving capabilities.
OpenAI has launched its new O1 AI model, designed to handle complex math problems and think through difficult tasks in areas such as coding and science. The O1 model, which is available starting today, aims to deliver deeper problem-solving abilities by testing various strategies and recognizing its own errors during the process.
According to OpenAI, the model’s performance is comparable to PhD-level students in fields like physics, chemistry, and biology. “In tests, the new model performs similarly to PhD students across physics, chemistry, and biology tasks,” the company stated. It also boasts impressive coding and mathematical abilities, outperforming previous versions by a wide margin.
For example, during a qualifying exam for the International Mathematics Olympiad, the O1 model correctly solved 83% of the problems, a significant leap from GPT-4o’s 13% success rate.
Despite these advancements, the O1 model is still in its preview phase, meaning it lacks some of the key features available in GPT-4o, such as uploading files, browsing the web, or handling various multitasking scenarios. As OpenAI points out, “GPT-4o remains more versatile for a broader range of tasks.”
However, for those working on intricate math, coding, or scientific projects, OpenAI describes O1 as a “significant advancement.” The company chose the name OpenAI O1 to highlight this new level of AI sophistication and potential.
OpenAI’s blog post identifies the target audience as professionals who regularly encounter complex problems in science, coding, and mathematics. Developers looking to execute workflows and physicists generating detailed formulas are examples of the people who might benefit from this technology.
Safety remains a priority with the new O1 model. OpenAI revealed that O1 achieved a score of 84 on one of the company’s toughest jailbreaking tests, compared to GPT-4o’s lower score of 22, demonstrating improvements in security and robustness.
OpenAI has also initiated agreements with the US and UK AI Safety Institutes, providing them with early access to a research version of O1 for evaluation and testing. These institutes will play a key role in assessing the model’s performance and safety both before and after its public release.