ChatGPT and the quest for factual AI – progress, pitfalls, and potential

December 1, 2023

ChatGPT is taking the world by storm right now. In a recent post, Saifr dug deeper into the topic of chatbots, and the importance of AI factuality.

ChatGPT, recently heralded as the fastest-growing consumer application, has shifted the spotlight firmly onto conversational AI systems—chatbots. However, the path to creating such impactful technology hasn’t been smooth. Earlier attempts at developing chatbots stumbled due to ‘hallucinations’—the generation of non-factual content, which remains a significant hurdle even for advanced chatbots like ChatGPT.

The Factuality Scoreline: A Measure of AI Truthfulness

Factuality in chatbots can be seen as a scale that measures the propensity of a system to avoid falsehoods. High-scoring systems are adept at sticking to the truth, whereas low-scoring ones frequently present incorrect information, misattribute quotes, or provide inconsistent responses. This presents a significant challenge for researchers aiming to improve conversational AI systems.

Progress in Language Models and the Challenge of Factuality

Language models, specifically those based on the Transformer architecture, have been instrumental in the evolution of chatbots. These models can identify correlations in text, making them more sophisticated than their predecessors. However, when tasked with summarizing texts, studies have shown that these models tend to generate non-factual summaries, highlighting a critical gap in their ability to distinguish fact from fiction.

The Balance Between Fluency and Factuality in AI Models

Interestingly, the most fluent models in studies have also produced factual hallucinations. This suggests that as models become more adept at generating human-like text, their propensity to deviate from factual content increases. This trend raises concerns about the trustworthiness of responses provided by such systems.

Addressing the Factuality Challenge in AI

Ensuring factuality in AI systems is complex and requires addressing several issues. One potential solution is to expose the model to a more diverse range of data during training, enhancing its ability to recognize and replicate factual patterns. Another approach involves modifying the training objectives of these models to prioritize truthfulness.

GPT-4: A Step Forward in Mitigating Hallucinations

OpenAI’s GPT-4 represents a significant advancement in addressing the issue of hallucinations in AI chatbots. Techniques like Reinforcement Learning from Human Feedback (RLHF) have been employed, where human intervention helps the model distinguish between factual and non-factual responses. Despite these advances, GPT-4 still lags behind human standards in terms of factual accuracy.

Future Directions: Retrieval Augmentation and Beyond

Research into Retrieval Augmentation, as demonstrated by a team at Meta, shows promise in reducing hallucinations. This technique involves enhancing the context of a query with information from a knowledge base before generating a response. As AI chatbots continue to evolve, incorporating such techniques may be crucial in achieving a balance between fluency and factuality.

Read the full post here.

Keep up with all the latest FinTech news here