Natural language understanding tough for neural networks

All the sessions from Transform 2021 are available on-demand now. Watch now.

One of the dominant trends of artificial intelligence in the past decade has been to solve problems by creating ever-larger deep learning models. And nowhere is this trend more evident than in natural language processing, one of the most challenging areas of AI.

In recent years, researchers have shown that adding parameters to neural networks improves their performance on language tasks. However, the fundamental problem of understanding language—the iceberg lying under words and sentences—remains unsolved.

Linguistics for the Age of AI, a book by two scientists at Rensselaer Polytechnic Institute, discusses the shortcomings of current approaches to natural language understanding (NLU) and explores future pathways for developing intelligent agents that can interact with humans without causing frustration or making dumb mistakes.

Marjorie McShane and Sergei Nirenburg, the authors of Linguistics for the Age of AI, argue that AI systems must go beyond manipulating words. In their book, they make the case for NLU systems can understand the world, explain their knowledge to humans, and learn as they explore the world.

Table of Contents

Knowledge-based vs. knowledge-lean systems

Consider the sentence, “I made her duck.” Did the subject of the sentence throw a rock and cause the other person to bend down, or did he cook duck meat for her?

Now consider this one: “Elaine poked the kid with the stick.” Did Elaine use a stick to poke the kid, or did she use her finger to poke the kid, who happened to be holding a stick?

Language is filled with ambiguities. We humans resolve these ambiguities using the context of language. We establish context using cues from the tone of the speaker, previous words and sentences, the general setting of the conversation, and basic knowledge about the world. When our intuitions and knowledge fail, we ask questions. For us, the process of determining context comes easily. But defining the same process in a computable way is easier said than done.

There are generally two ways to address this problem.

In the earlier decades of AI, scientists used knowledge-based systems to define the role of each word in a sentence and to extract context and meaning. Knowledge-based systems rely on a large number of features about language, the situation, and the world. This information can come from different sources and must be computed in different ways.

Knowledge-based systems provide reliable and explainable analysis of language. But they fell from grace because they required too much human effort to engineer features, create lexical structures and ontologies, and develop the software systems that brought all these pieces together. Researchers perceived the manual effort of knowledge engineering as a bottleneck and sought other ways to deal with language processing.

“The public perception of the futility of any attempt to overcome this so-called knowledge bottleneck profoundly affected the path of development of AI in general and NLP [natural language processing] in particular, moving the field away from rationalist, knowledge-based approaches and contributing to the emergence of the empiricist, knowledge-lean, paradigm of research and development in NLP,” McShane and Nirenburg write in Linguistics for the Age of AI.

In recent decades, machine learning algorithms have been at the center of NLP and NLU. Machine learning models are knowledge-lean systems that try to deal with the context problem through statistical relations. During training, machine learning models process large corpora of text and tune their parameters based on how words appear next to each other. In these models, context is determined by the statistical relations between word sequences, not the meaning behind the words. Naturally, the larger the dataset and more diverse the examples, the better those numerical parameters will be able to capture the variety of ways words can appear next to each other.

Knowledge-lean systems have gained popularity mainly because of vast compute resources and large datasets being available to train machine learning systems. With public databases such as Wikipedia, scientists have been able to gather huge datasets and train their machine learning models for various tasks such as translation, text generation, and question answering.

Machine learning does not compute meaning

Today, we have deep learning models that can generate article-length sequences of text, answer science exam questions, write software source code, and answer basic customer service queries. Most of these fields have seen progress thanks to improved deep learning architectures (LSTMs, transformers) and, more importantly, because of neural networks that are growing larger every year.

But while larger deep neural networks can provide incremental improvements on specific tasks, they do not address the broader problem of general natural language understanding. This is why various experiments have shown that even the most sophisticated language models fail to address simple questions about how the world works.

In their book, McShane and Nirenburg describe the problems that current AI systems solve as “low-hanging fruit” tasks. Some scientists believe that continuing down the path of scaling neural networks will eventually solve the problems machine learning faces. But McShane and Nirenburg believe more fundamental problems need to be solved.

“Such systems are not humanlike: they do not know what they are doing and why, their approach to problem solving does not resemble a person’s, and they do not rely on models of the world, language, or agency,” they write. “Instead, they largely rely on applying generic machine learning algorithms to ever larger datasets, supported by the spectacular speed and storage capacity of modern computers.”

Getting closer to meaning

In comments to TechTalks, McShane, a cognitive scientist and computational linguist, said that machine learning must overcome several barriers, first among them being the absence of meaning.

“The statistical/machine learning (S-ML) approach does not attempt to compute meaning,” McShane said. “Instead, practitioners proceed as if words were a sufficient proxy for their meanings, which they are not. In fact, the words of a sentence are only the tip of the iceberg when it comes to the full, contextual meaning of sentences. Confusing words for meanings is as fraught an approach to AI as is sailing a ship toward an iceberg.”

For the most part, machine learning systems sidestep the problem of dealing with the meaning of words by narrowing down the task or enlarging the training dataset. But even if a large neural network manages to maintain coherence in a fairly long stretch of text, under the hood, it still doesn’t understand the meaning of the words it produces.

“Of course, people can build systems that look like they are behaving intelligently when they really have no idea what’s going on (e.g., GPT-3),” McShane said.

All deep learning–based language models start to break as soon as you ask them a sequence of trivial but related questions because their parameters can’t capture the unbounded complexity of everyday life. And throwing more data at the problem is not a workaround for explicit integration of knowledge in language models.

Language endowed intelligent agents (LEIA)

In their book, McShane and Nirenburg present an approach that addresses the “knowledge bottleneck” of natural language understanding without the need to resort to pure machine learning–based methods that require huge amounts of data.

At the heart of Linguistics for the Age of AI is the concept of call language-endowed intelligent agents (LEIA) marked by three key characteristics:

LEIAs can understand the context-sensitive meaning of language and navigate their way through the ambiguities of words and sentences.
LEIAs can explain their thoughts, actions, and decisions to their human collaborators.
Like humans, LEIAs can engage in lifelong learning as they interact with humans, other agents, and the world. Lifelong learning reduces the need for continued human effort to expand the knowledge base of intelligent agents.

LEIAs process natural language through six stages, going from determining the role of words in sentences to semantic analysis and finally situational reasoning. These stages make it possible for the LEIA to resolve conflicts between different meanings of words and phrases and to integrate the sentence into the broader context of the environment the agent is working in.

LEIAs assign confidence levels to their interpretations of language utterances and know where their skills and knowledge meet their limits. In such cases, they interact with their human counterparts (or intelligent agents in their environment and other available resources) to resolve ambiguities. These interactions in turn enable them to learn new things and expand their knowledge.

LEIAs convert sentences into text-meaning representations (TMR), an interpretable and actionable definition of each word in a sentence. Based on their context and goals, LEIAs determine which language inputs need to be followed up. For example, if a repair robot shares a machine repair workshop floor with several human technicians and the humans engage in a discussion about the results of yesterday’s sports matches, the AI should be able to tell the difference between sentences that are relevant to its job (machine repair) and those it can ignore (sports).

LEIAs lean toward knowledge-based systems, but they also integrate machine learning models in the process, especially in the initial sentence-parsing phases of language processing.

“We would be happy to integrate more S-ML engines if they can offer high-quality heuristic evidence of various kinds (however, the agent’s confidence estimates and explainability are both affected when we incorporate black-box S-ML results),” McShane said. “We also look forward to incorporating S-ML methods to carry out some big-data-oriented tasks, such as selecting examples to seed learning by reading.”

Does natural language understanding need a human brain replica?

One of the key features of LEIA is the integration of knowledge bases, reasoning modules, and sensory input. Currently there is very little overlap between fields such as computer vision and natural language processing.

As McShane and Nirenburg note in their book, “Language understanding cannot be separated from overall agent cognition since heuristics that support language understanding draw from (among other things) the results of processing other modes of perception (such as vision), reasoning about the speaker’s plans and goals, and reasoning about how much effort to expend on understanding difficult inputs.”

In the real world, humans tap into their rich sensory experience to fill the gaps in language utterances (for example, when someone tells you, “Look over there?” they assume that you can see where their finger is pointing). Humans further develop models of each other’s thinking and use those models to make assumptions and omit details in language. We expect any intelligent agent that interacts with us in our own language to have similar capabilities.

“We fully understand why silo approaches are the norm these days: each of the interpretation problems is difficult in itself, and substantial aspects of each problem need to be worked on separately,” McShane said. “However, substantial aspects of each problem cannot be solved without integration, so it’s important to resist (a) assuming that modularization necessarily leads to simplification, and (b) putting off integration indefinitely.”

Meanwhile, achieving human-like behavior does not require LEIAs to become a replication of the human brain. “We agree with Raymond Tallis (and others) that what he calls neuromania – the desire to explain what the brain, as a biological entity, can tell us about cognition and consciousness – has led to dubious claims and explanations that do not really explain,” McShane said. “At least at this stage of its development, neuroscience cannot provide any contentful (syntactic or structural) support for cognitive modeling of the type, and with the goals, that we undertake.”

In Linguistics for the Age of AI, McShane and Nirenburg argue that replicating the brain would not serve the explainability goal of AI. “[Agents] operating in human-agent teams need to understand inputs to the degree required to determine which goals, plans, and actions they should pursue as a result of NLU,” they write.

A long-term goal

Many of the topics discussed in Linguistics for the Age of AI are still at a conceptual level and haven’t been implemented yet. The authors provide blueprints for how each of the stages of NLU should work, though the working systems do not exist yet.

But McShane is optimistic about making progress toward the development of LEIA. “Conceptually and methodologically, the program of work is well advanced. The main barrier is the lack of resources being allotted to knowledge-based work in the current climate,” she said.

McShane believes that the knowledge bottleneck that has become the focal point of criticism against knowledge-based systems is misguided in several ways:

(1) There actually is no bottleneck, there is simply work that needs to be done.
(2) The work can be carried out largely automatically, by having the agent learn about both language and the world through its own operation, bootstrapped by a high-quality core lexicon and ontology that is acquired by people.
(3) Although McShane and Nirenburg believe that many kinds of knowledge can be learned automatically—particularly as the knowledge bases that foster bootstrapping grow larger—the most effective knowledge acquisition workflow will include humans in the loop, both for quality control and to handle difficult cases.

“We are poised to undertake a large-scale program of work in general and application-oriented acquisition that would make a variety of applications involving language communication much more human-like,” she said.

In their work, McShane and Nirenburg also acknowledge that a lot of work needs to be done, and developing LEIAs is an “ongoing, long-term, broad-scope program of work.”

“The depth and breadth of work to be done is commensurate with the loftiness of the goal: enabling machines to use language with humanlike proficiency,” they write in Linguistics for the Age of AI.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
networking features, and more

Become a member