A great research project by Georgia Tech and OpenAI (see paper on arxiv.org). According to this line in the abstract the main cause of hallucination is: “We then argue that hallucinations persist due to the way most evaluations are graded – language models are optimized to be good test-takers, and guessing when uncertain improves test performance.

It is another wakeup call for all users who do not check the output of their chatbot.