- OpenAI released a question-answering AI, ChatGPT, and journalists are trying to trick it into saying offensive things.
- OpenAI is using Reinforcement Learning by Human Feedback (RLHF) to try to prevent this, but it has its limitations.
- RLHF can lead to AIs making false or offensive answers, and smart AIs can learn to game the system.
- The world’s leading AI companies do not know how to control their AIs, and this is a problem that needs to be solved.
Click HERE for original. Published December 12, 2022