The Future of Conversational AI: ChatGPT's Competing Innovations
Written on
ChatGPT, developed by OpenAI and released last fall, has gained significant attention online, arguably more than any other machine learning model outside of the AI field. It offers a nearly human-like interaction and aids users in various tasks, from optimizing SEO to enhancing programming code. While it has its shortcomings, particularly in logical reasoning, it remains an impressive tool. However, I believe that by the end of 2023, it may fade from memory as more advanced conversational AI tools emerge.
The Origins of Chatbots
To appreciate the advancements in conversational AI, it’s essential to understand how chatbots are created. Modern chatbots fundamentally function as auto-completion systems designed to mimic human conversation. They rely on large language models (LLMs), which are primarily transformer-based neural networks trained to predict text continuation based on initial prompts.
Once an LLM is established, converting it into a chatbot can be straightforward. A predetermined prompt defines the conversation's direction, such as instructing the AI to act as a helpful assistant. When a user engages, the model generates responses by predicting likely word sequences based on the provided prompt and user input.
This approach may seem simple, but what made ChatGPT revolutionary is its unique training methodology.
Understanding ChatGPT
The specifics of ChatGPT's training remain unclear, as no detailed research paper accompanied its launch, and the source code has not been released. Information is primarily derived from OpenAI's blog, which indicates that its training process is akin to that of InstructGPT, another OpenAI model designed to follow prompts. This training employs a method called Reinforcement Learning from Human Feedback (RLHF).
Reinforcement Learning from Human Feedback
The RLHF process for ChatGPT involves three key stages:
- The initial LLM is fine-tuned under supervision, using data created by human labelers who outline desired responses.
- For various prompts, the model generates alternative responses, which human evaluators rank from best to worst. This comparative data informs a reward model predicting preferred outputs.
- Finally, the reinforcement learning stage optimizes a policy based on the reward model, guiding the AI to generate responses that receive the highest ratings.
The result is a model that aligns its outputs with human expectations.
Reducing Bias and Misinformation
One significant benefit of the RLHF process is a decrease in biases within the model. Previous GPT-3 models displayed biases related to race, gender, and religion, reflecting the diverse data sourced from the internet. The dual injection of human feedback throughout the training process helps ensure that RLHF-trained models produce less biased and more truthful information.
Applications of ChatGPT
Immediately upon its release, ChatGPT found utility in numerous applications ranging from SEO optimization to debugging. A notable development came from Microsoft, which announced plans to integrate a GPT-based model into its products after a partnership with OpenAI. This advancement allows users to interact with the Bing search engine conversationally, offering responses in a more natural format than traditional search results. This prompted a swift reaction from Google, which introduced its own chatbot, Bard.
Limitations of ChatGPT
Despite its capabilities, ChatGPT is not without flaws. Many users have encountered its inaccuracies or inappropriate responses, and its knowledge is reportedly capped at 2021, leading to confusion about more recent events, despite its awareness of Elon Musk's Twitter acquisition in 2022.
New competitors are emerging, with Google and DeepMind at the forefront.
Introducing Sparrow
One of ChatGPT's primary rivals is DeepMind's Sparrow, which was unveiled in September 2022. Unlike OpenAI's offering, Sparrow hasn't gained widespread attention due to the absence of a public API. However, a research paper details its capabilities, generating high expectations for its future release.
Sparrow’s training method mirrors that of ChatGPT, employing the RLHF protocol for response generation. It is built on a general-purpose LLM named Chinchilla, which has been fine-tuned using human-annotated data to function as an intelligent assistant.
What sets Sparrow apart is its adherence to guidelines and its ability to substantiate its claims.
Evidence-Based Responses
To enhance the reliability of its assertions, Sparrow can execute Google searches to support its answers. This feature incorporates two additional personas into its interactions: SearchQuery and SearchResults. When Sparrow determines that it requires additional information to respond effectively, it formulates a relevant search query, processes the results, and generates a response informed by both the conversation's context and the retrieved information.
Adhering to Guidelines
DeepMind's researchers established a set of 23 rules for Sparrow to follow, such as refraining from making harmful statements or offering financial advice. During its feedback learning process, Sparrow receives evaluations based on:
- Any rule violations in its outputs,
- Whether it should have conducted a Google search for evidence,
- The relevance of the evidence if a search was performed.
Subsequently, classifiers are trained to replicate this feedback, allowing Sparrow to refine its responses according to the feedback received.
Anticipating Sparrow's Impact
While Sparrow's innovations hold promise, its capabilities are inherently limited by the effectiveness of its human-defined rules and the quality of available web content. Once publicly accessible, users may quickly find ways to exploit loopholes in its guidelines, but Sparrow is poised to become a more reliable tool than ChatGPT.
The Emergence of Bard
Despite an initial setback upon its announcement, Google’s Bard has the potential to outperform ChatGPT in terms of response quality and coherence. This advantage stems from its underlying LLM (LaMDA) and the implementation of chain-of-thought prompting.
The LaMDA Advantage
Unlike ChatGPT and Sparrow, which utilize generalist LLMs, Bard is built on LaMDA, which has been specifically trained on dialogue. This specialization enables it to generate more relevant responses and engage in extensive conversations across diverse topics.
LaMDA was introduced in May 2021 but gained attention a year later when a Google engineer claimed it exhibited sentience—an assertion that, while controversial, highlights its conversational prowess.
Chain-of-Thought Reasoning
Although the intricacies of Bard’s training remain undisclosed, it likely utilizes chain-of-thought prompting, a method that helps improve reasoning by teaching the model to break down complex issues into manageable steps.
Research indicates that training LaMDA with chain-of-thought prompting enhances its performance across various reasoning tasks. Compared to ChatGPT and earlier iterations, Bard appears capable of offering superior answers to intricate inquiries, positioning it for widespread popularity upon release. Currently, Bard is being trialed by a select group of Google developers.
The Future of Chatbots
Despite their extensive training datasets, LLMs only encompass a small portion of human knowledge. Many experiences can't be easily articulated in language—consider the beauty of the northern lights, which is hard to appreciate without having witnessed them firsthand.
Consequently, future chatbots may evolve into multimodal models. Research is already underway in this area, as illustrated by a recent paper proposing a new approach called Multimodal Chain-of-Thought Reasoning in Language Models.
This method divides the answering process into two phases: generating rationale and making a final decision, utilizing both visual and textual information throughout. This approach has outperformed GPT-3.5 on benchmark datasets and even surpassed human-level performance.
Key Takeaways
The cutting-edge chatbots of today are likely to become outdated more quickly than anticipated. Their successors will have the capability to access the internet during interactions to validate their claims and provide current information. They will also adhere to safety guidelines that mitigate the chances of biased and inaccurate outputs. Furthermore, with the use of chain-of-thought prompting, these models will improve their ability to tackle complex questions. A transition towards multimodal models that integrate various forms of information seems inevitable. Given the rapid advancements in conversational AI, it is entirely plausible that this article could become outdated in mere months.
Thank you for reading!
If you found this article interesting, consider subscribing for email updates on new posts. By becoming a Medium member, you can support my writing and gain unlimited access to a wealth of articles by various authors, including myself.
Interested in staying updated on the fast-evolving field of machine learning and AI? Check out my new newsletter, AI Pulse. For consulting inquiries, feel free to reach out or book a one-on-one session here.
You might also enjoy my other articles. Here are a few recommendations:
- Self-Supervised Learning in Computer Vision
- How to train models with only a few labeled examples
- Monte Carlo Dropout
- Enhance your neural network for free with one small trick, gaining model uncertainty estimates as a bonus.
- The Importance of Bayesian Thinking in Everyday Life
- A simple mindset shift to help you better navigate the uncertain world around you.