Understanding Natural Language Processing: Key Breakthroughs Explored
Written on
Natural Language Processing (NLP) has gained considerable attention recently, especially with AI systems being described as sentient and capable of humor. This article provides an overview of significant milestones in the field of NLP and their broader implications.
Grammar and Machines
In 1957, linguist Noam Chomsky published a groundbreaking work titled Syntactic Structures, positing that understanding grammatical rules could enable the prediction of all grammatically correct sentences in any language. This idea laid the groundwork for the belief that machines could learn languages, emphasizing their capacity to adhere to complex rules.
Chomsky proposed a grammatical hierarchy where sentences could be visualized as tree structures. For instance, the sentence "John hit the ball" can be broken down into a noun phrase ("the ball") and a verb phrase ("hit the ball").
As the relationship between language and machines shifted towards computer science, a pivotal development occurred in 1966. Dr. Joseph Weizenbaum at MIT created a computer program capable of functioning as a psychotherapist, engaging users in conversations like the one shown below.
The responses generated by ELIZA often mirrored the user's statements, a result of the specific rules it followed. This implementation illustrated the application of grammatical rules to conversational exchanges. For instance, if a user mentioned something akin to "like," ELIZA would respond with a prompt like "In what way?"
Neural Networks
Despite many believing ELIZA demonstrated intelligence, it operated on a basic rule-based system making straightforward decisions based on input. The early 2010s marked a turning point with significant breakthroughs in neural networks and enhanced computational power, allowing machines to learn languages through extensive text data training.
But what exactly are neural networks? While summarizing this complex concept concisely is challenging, the core idea revolves around high school algebra. Given a set of inputs (x1, x2, x3,…), a series of matrix multiplications can transform these into intermediate values (a1, a2, a3, a4,…). Following a non-linear transformation, this can ultimately produce a single output (y), which can be either 0 or 1.
This is crucial as the binary output can be linked to various valuable metrics, such as sentiment analysis, where 0 might indicate negative sentiment (e.g., a critical review), while 1 could indicate positive sentiment. This concept can extend to multiple outputs representing a range of sentiments or other classifications.
You may wonder what the x's represent. Each x can correspond to a unique word, allowing for the mapping of a sentence of, say, ten words to a vector of x of length 10, with each x assigned a specific number. However, two challenges arise:
- Distance is significant for vectors. If you assign numeric values to words based on their dictionary order, such as giving "Aardvark" the value of 1, it creates confusion for neural networks. This method could lead to biases, incorrectly associating reviews with more words starting with 'A' as favorable.
- High computational cost. Assigning a dimensionality equivalent to the entire English vocabulary, with its hundreds of thousands of words, is computationally prohibitive.
A notable advancement in neural networks came in 2013 with Word2vec, which addressed the above issues by embedding words into a reduced dimensionality of 100–1000. This development made neural networks much more efficient for NLP applications.
Here are some key tasks in Natural Language Processing:
- Sentiment analysis
- Text generation
- Question answering
- Entity extraction
- Language translation
These represent just a few tasks, and as NLP technology evolves, the list of applications continues to grow.
For tasks like language translation, traditional neural networks can fall short. Researchers developed recurrent neural networks (RNNs) to account for the context provided by preceding words. Each "cell" in an RNN not only processes the current word but also considers the output from the previous words, providing context that aids in translation.
Andrej Karpathy, the former AI director at Tesla, has implemented a simple RNN in Python using basic mathematics in around 100 lines of code!
The Arrival of Transformers
In 2017, a significant advancement in NLP emerged with the paper "Attention is All You Need" by Vaswani et al. This research demonstrated that training a model to focus on specific parts of a sentence yielded superior results compared to traditional recurrence methods. The rationale is clear: comprehending an entire sentence while translating is more effective than translating word by word, which could lead to forgetting previous sections of the text.
Transformer architectures have since transformed the NLP landscape. Generally, larger models trained on extensive datasets yield better performance.
While I won't delve into the intricacies of transformers in this article, it's essential to note that these models, often referred to as Large Language Models, are trained on vast corpuses, including Wikipedia articles and books. They are termed language models because they focus on understanding languages rather than executing specific NLP tasks from the outset. For instance, the GPT-2 model was designed to predict the subsequent word given all previous words in a text, and interestingly, such general language models often outperform previous state-of-the-art models on specific tasks.
Is a Conversational AI Sentient?
Before addressing the recent debate regarding the sentience of a Google AI chatbot, let’s review some notable results from transformer models. The GPT-2 model by OpenAI, designed to generate coherent passages, does so by accurately predicting the next item in a sequence. Below is an example illustrating its capability through a fictional narrative featuring Edward Snowden as president in 2020. It’s striking how convincingly the highlighted text was generated by an AI.
In June 2022, a Google engineer claimed that the LaMDA chatbot was sentient, based on his dialogues with it. The interactions were notably convincing.
Though much debate surrounds LaMDA's sentience, there is a consensus that the Turing test—evaluating intelligence based on a machine's indistinguishability from a human—may not suffice to assess intelligence concerning consciousness.
Final Thoughts
This overview has provided a glimpse into the advancements in NLP over the years. I didn’t even touch upon the latest transformer models that can explain humor (a topic perhaps better left unexplored). Alongside traditional NLP tasks, recent achievements have reshaped our understanding of AI and NLP.
Beyond the somewhat abstract discussions on consciousness, the practical applications of NLP are booming. We encounter its impact daily when searching for information on Google, where the summaries we receive are outcomes of transformer models. Companies like Hugging Face are committed to making cutting-edge transformer NLP models publicly available, and many organizations are utilizing Hugging Face's implementations to enhance their operations.
Regarding consciousness and intelligence, my experience as a parent of three-year-old twins has highlighted the vast differences between humans and machines. My children grapple with basic counting and grammatical structures but excel in conversation and observation. In contrast, machines are adept at following rules, excelling in counting, grammar, and vocabulary but often struggle with meaningful dialogue (though this is evolving). I'm curious to know how LaMDA would describe its day. Does it possess an awareness of its training history? Understanding the distinctions between humans and machines is crucial; we wouldn’t want a rule-following prodigy devoid of social skills, nor would we desire a sociable machine.
Hopefully, we won’t need to concern ourselves with such machines. Unless, of course, someone demonstrates a generalizable AI capable of performing various language, vision, mechanical, and sensory tasks without external power, roaming freely among us, and outperforming the current leading AI technologies.
If you found this article informative, please share it on social media or with someone who might appreciate insights into the connections between technology and modern society. Your comments are welcome in the discussions on the cyber-physical substack page. This is a small but growing effort, and I hope to share my journey in building resilient, data-driven societies.
Originally published at https://skandavivek.substack.com on October 25, 2022.