How MIT Is Teaching AI to Avoid Toxic Mistakes

7 months ago
10

MIT’s novel machine learning method for AI safety testing utilizes curiosity to trigger broader and more effective toxic responses from chatbots, surpassing previous red-teaming efforts.

A user could ask ChatGPT to write a computer program or summarize an article, and the AI chatbot would likely be able to generate useful code or write a cogent synopsis. However, someone could also ask for instructions to build a bomb, and the chatbot might be able to provide those, too.

To prevent this and other safety issues, companies that build large language models typically safeguard them using a process called red-teaming. Teams of human testers write prompts aimed at triggering unsafe or toxic text from the model being tested. These prompts are used to teach the chatbot to avoid such responses.

Loading comments...