OpenAI’s latest AI models have a new safeguard to prevent biorisks

Operai says he deployed a new system to monitor his latest models of AI, O3 and O4-mini reasoning, for indications related to biological and chemical threats. The system aims to prevent models from offering advice that could instruct someone about carrying out potentially harmful attacks, According to the OpenAI security report.

O3 and O4-MINI represent a significant increase in capacity over the previous models of OpenAi, says the company, and therefore raises new risks in the hands of bad actors. According to Openai’s internal reference points, the O3 is more skilled to answer questions about the creation of certain types of biological threats in particular. For this reason, and to mitigate other risks, Openai created the new monitoring system, which the company describes as a «security monitor centered in security.»

The monitor, tailored to reason about OpenAi content policies, runs on O3 and O4-mini. It is designed to identify indications related to biological and chemical risk and instruct the models to refuse to offer advice on these issues.

To establish a baseline, Operai caused the red teams to spend about 1,000 hours marking «insec» conversations related to the O3 and O4-mini Biorrisco. During a test in which Openai simulated the «blocking logic» of its security monitor, the models refused to respond to the risks of 98.7% of the time, according to OpenAI.

Openai acknowledges that his test did not account for people who could attempt new instructions after being blocked by the monitor, so the company says it will continue depending in part of the human monitoring.

According to the company, O3 and O4-mini do not cross the «high-risk» threshold of OpenAI for biorriscos. However, compared to O1 and GPT-4, Operai says that the early versions of O3 and O4-mini proved to be more useful to answer questions about the development of biological weapons.

System and O4-MINI system card graph (screen capture: OpenAI)

The company is actively tracking how its models could facilitate malicious users to develop chemical and biological threats, according to the recently updated update of OpenII. Preparation frame.

Operai depends more and more on automated systems to mitigate the risks of their models. For example, to prevent The GPT-4O native image generator when creating child sexual abuse (CSAM) materialOpenai says he uses in a reasoning monitor similar to that the company implemented for O3 and O4-mini.

However, several researchers have expressed OpenAI concerns do not prioritize security as much as it should. One of the company’s red team partners, Metr, said he had relatively little time to test O3 at a reference point for deceptive behavior. Meanwhile, Openai decided not to publish a security report for its GPT-4.1 model, which was launched earlier this week.

Kypplo
Logo
Registrar una cuenta nueva
Comparar artículos
  • Total (0)
Comparar
0
Shopping cart