This essay on the creation of meaning in natural language was written for the course ‘Language and Text’ of Media Technology (Leiden).
One of the most everyday yet awe-inspiring skills of the human brain is its capacity for creation. Through a continuous cycle of destruction and organization, chopped up bits of sensory impressions and distributed memories are pulled together to form a picture of the world, ourselves, the other… Divided into workable chunks by our sense of meaning and boundaries, of what belongs to what. The
view from my window is a merciless assault on my visual cortex; billions of different light-waves, reflecting, interfering, traveling through air, glass and eye-goo, bumping into millions of rods and cones, transmitted by voltage gradients to the back of the head, where my brain constructs the world from it. Colors and shapes are identified, categories defined, memories bubble up to provide some clarification in the mess. Building up a meaningful picture means dividing again – that bit of brown is a house, that other bit belongs to a bird. Order is constructed in the chaos: a whole made from a mess of
parts, cut up again in pieces that make sense to me, pieces I can work with.
Truth and meaning are relative concepts here. A friendly body of water is a looming mass of potential danger to my 3-year old nephew . I see a street that’s difficult to ride my bike on, but a roadworker possibly sees gray granite cobble-stones, type G625, with a badly-laid border cobble and old grouting. We swim through a mess of impressions and write our own story from them. Misfires are incorporated into the final picture – blind-spots and saccadic eye movements are no match for our creator-brain. We find meaning in clouds, whispers in rustling branches, the future in the guts of a sheep, a monster in a heap of clothes on a chair. We’re not good at making sense of the world, we’re good at forcing the world to make sense for us.
The same thing happens with language. Whether reading or listening, we take in words,
sentences, context. Your fingers tell you how many pages are left in the book, your experience with prior books tells you what to expect at this point (is it time for the ‘happily ever after’ sequence yet?). The whole realm of subtext in face-to-face communication is extensively covered elsewhere. On the web, you look at the formatting, the design of the page, the writing style – is this serious or sarcastic? Can I trust this source or not? Here also, meaning is personal – your memories, cultural background,
conventions you are used to, all these things determine what a sentence means to you. Writing on the web brings these differences to light like little else did before, because of its potential to be read by many distinct people.
So how does a computer analyze meaning on the web? I would argue that an ideal natural
language processor would not simply ‘understand’ meaning – it would create it. As an example, lets look at Noam Chomsky’s famous unlikely sentence: “Colorless green ideas sleep furiously”. The ideal natural language processor does not thoughtlessly mark this down as grammatically correct – but neither does it discard it as nonsensical gibberish. Instead, it might be inspired. It sees these ideas, colorless and green, tangled in their sheets because of a vicious dream. Or it remembers the different possible meanings of ‘colorless’ (boring, nondescript) or ‘green’ (young, inexperienced), solving apparent inconsistencies. Maybe it even writes a short story, creating context that provides meaning to the sentence. The ideal natural language processor understands poetic license.
Abstract, unlikely sentences do not mean nothing. On the contrary, they can mean whatever you want them to mean – the vaguer, the better. Not in spite of but because they are so detached from reality. However, even a sentence like “The man walks through the park”, innocent and straightforward as it seems, can be loaded with possible meaning. What kind of man, what kind of park? Are we talking modern-day New York or 14th century Versailles? Did the man just leave a murder scene, or a love nest? Is it strange that he walks there, or does he do that every day?
There are two impossible tasks when it comes to language and our capacity for imagination: to create a grammatically correct sentence that is so abstract and unconnected that it is totally devoid of meaning – and to create a sentence that is so straightforward and obvious that it can contain only one. As soon as you penetrate the surface of this well of meaning, you fall in completely. This is precisely the reason that the semantic web is not enough to truly grasp natural language – apart from other constraints, it also restricts meaning instead of creating it. The semantic web demands that writers limit the meaningfulness of their own writing by filing it neatly away in categories. Thus, when mentioning a name, you have to specify whether you refer to a person, a dog, or something else. You do not have the option of playing with uncertainty. Indeed, a common critique of the semantic web is that it necessitates agreement on labeling conventions in entire fields of knowledge. This is difficult to achieve, of course, but I would like to add that it also potentially cheapens the interactions in that field – limiting the creativity that derives from doubt.
So an ideal natural language processor is imaginative and comes up with as many meanings as possible for a single word or sentence. But the next hurdle is looming at the horizon – having established that each grammatically correct sentence has a multitude of meanings, our poor natural language processor now has to decide which ones – as truth is likewise seldom singular – are the most likely ones, the ones the writer or speaker meant to express. Luckily, humans are also not perfect at this task – especially when communicating over borders of culture and custom different interpretations may hinder our mutual understanding. But even between close friends or family, personality and
experiences can color a story or sentence.
Does this make it easier for or natural language processor, or harder? I would say harder, as apart from considering what the writer wanted to say, it has to take into account what the potential reader is searching for. So the ideal processor knows not only what a writer meant with a text or what their text means to you – it can also tell you what it likely means to a 10-year old in Japan, to an Islamic grandmother living in New York City, to a middle-aged cook in Santiago de Cuba. A processor does not only understand the rules, but also their implications. It not only knows that the word “bridge” is feminine in Spanish but masculine in French, it also understands that this means that the Spanish are more likely to describe actual bridges as slender and elegant, whereas native French speakers may call
them strong and sturdy.
That is quite a lot to ask. But when you are talking about language – a living, evolving creature – to ask the most might just be good enough. Imperfect solutions for searching and ordering information on the web can of course be helpful already. However, the complexity of language, the way it is entwined with so many basic human functions, makes it a harder nut to crack than at first may seem. Humans are social animals, and communication is layered so deep in our cortex that to touch on language is to touch on almost everything – memory, cognition, emotion, perception… Nothing that not integrates as many of these functions as possible is going to process natural language on an acceptable level to its human users. It is no accident that the most respected literary works are full of layers and multiple interpretations, playing around with the numerous meanings they create in the readers mind. Our ideal natural language processor does not need the brain of a reader – it needs the brain of a writer.