Why is language still the toughest problem to crack, in AI?
Do we determine what our words mean, or do words determine what we mean?
Though this rather childish-looking question may seem to be something which only characters as truculent as Tweedledum and Tweedledee would care to dispute ad infinitum, upon a little further analysis, it can really throw us into a confused dilemma.
Noam Chomsky was one of the first linguists to suggest that the proficient usage of any language requires a certain instinct, and even, perhaps a certain genetic code. Yet, several languages are unique in their own ways; German and Russian, for example, are largely devoid of a distinct continuous tense; French allows the usage of double negatives to mean negatives!.
As further defined by Robert Sapolsky in his lecture on ‘Language’ at Stanford, human language cannot be considered 'language', without seven key features that distinguish it from the communication between other creatures. These include:
Each of these facets of human language are exclusive to our species, and one or the other is typically the reason that various apes who were taught ASL (American Sign Language) over the years, have been unable to recreate language that is truly ‘human’.
For communication to be complete, practical and real in the human world, each of these features is essential. In the same way, AI with Natural Language Processing/ Comprehension & Understanding/ Generation of contextually & semantically sensible dialogs, could only become humane if these features were embedded into the famed Siris and Alexas. The difficulty lies in creating these features without compromising datasets or precision, thereby reaching the holy grail of Artificial General Intelligence (AGI).
1) Semanticity- the ability to generate and convey meaning, by ‘bucketing’ sounds to create words, is the most fundamental feature of any and every human language. The instinctiveness of meaning makes it all the more difficult to completely transfer meaning into a system, and clearly makes it incredibly challenging to generate meaning in novel words and communicate successfully. The awareness of semanticity in itself, the definition of semanticity that thereby gives the word meaning, is an almost inexplicable, and simultaneously intuitive, concept.
2) Embedded clauses are arguably the easiest to represent through programming languages as well as human languages, using simple to complex logic constructs- from Aristotle's term logic to propositional and predicate logic that form the foundations of AI.
3) Recursion or generativity is an incredibly interesting property of language: a finite number of words can produce an infinite number of sentences, and a sentence can have infinite length, bounded only by practical time constraints.
4) Displacement is the one feature that most famous chimpanzees and gorillas learning ASL could not achieve. Displacement involves the ability to talk about different time periods, different people, regardless of present circumstances, and without only conveying current emotion. Being able to talk about things emotionally distant from us, and dissociating our communication from our situation, is a unique capability that we seem to convey easily. Giving each other information in ‘facts’ not directly pertaining to ourselves, or even getting the answer from a chatbot when you ask for 23rd January’s weather, are examples of displacement ingrained in human language.
5) Arbitrariness refers to the lack of connection between the meaning of words and their shapes or sounds. Adjectives such as ‘heavy’ or ‘sad’ are not created with relation to the shapes of the letters in them, or resembling the sounds they make. The arbitrariness of human language makes it difficult to compare languages by sound or script, since neither of the two convey meaning. The randomness of meaning is, therefore, immune to guesswork or brute force.
6) Meta communication, the ability to communicate about communication and discuss language at all, is another feature of human language, forming the basis for natural language processing and generation. Meta communication also refers to secondary communication that can change or add to the meaning of a sentence or communication.
7) Prosody, which forms a derivative part of meta communication, involves things such as intonation, stress, rhythm and other parts of body language that accompanies a unit of communication. The accompanying body language and intonation can convey different meanings to even the same sentence, which can be expressed through meta communication, such as sarcasm created using varied tones of voice.
Motherese, or baby talk, consists of the difference in intonation, including stress on vowels, that parents use when communicating with their infants, with the intention of teaching them how to speak. Specific to humans, it is an important part of child language acquisition, a field growing in importance to machine learning through natural language acquisition.
Each of the 7 features that distinguish human language from others seems instinctive, and unnoticeable in daily conversation. To train a language model on these basic but generic features of human languages, is a different ballgame altogether.
To quote Franz Kafka, who, interestingly enough, was a brilliant renegade of a writer- ‘All language is but a poor translation’.
No wonder then, that the best of Google Translator APIs and NLP/ language translator modules from Microsoft Azure or AWS, often generate translated text-strings that read more like AI-generated lame jokes. One good service they all provide to human intelligence though- they augment our sense of humor- "Maria...makes me... laugh"!- so goes the famed song from The Sound of Music!