Language AI: Even GPT-n isn't close to AGI, it's still Narrow AI

Research contributor and author: Aparajita Bandopadhyay

================================================

Why is language still the toughest problem to crack, in AI?

The core paper on GLUE - the NLP/NLU benchmarking system for General Language Understanding Evaluation (GLUE)- mentions that human ability to understand language is general, flexible and robust. Well, several NLP/ NLU algorithmic techniques and systems developed thus far, have been neither of these.

Even after GPT3 having 175 billion parameters, the articles generated by it still requires relevance and context testing of the output, and consequently a huge amount of editing. Claiming GPT-3 to be the Nirvana of NLG and the ultimate step towards AGI in the field of language is arguable. The recent claims by a well-known media company about using GPT3 to generate a perfectly credible article for their newspaper, triggered a big debate amongst the AI technologists, practitioners, and marketing folks.

Do we determine what our words mean, or do words determine what we mean?
Though this rather childish-looking question may seem to be something which only characters as truculent as Tweedledum and Tweedledee would care to dispute ad infinitum, upon a little further analysis, it can really throw us into a confused dilemma.

What is language? The Oxford dictionary defines it as 'the method of human communication, either spoken or written, consisting of the use of words in a structured and conventional way’. Many writers have considered it to be far greater than the aforementioned definition, and some consider it to be an omnipotent and utterly ingenious device which can alter the course of our lives.

Languages have not only been shaped by us, but have done the converse, too; words have cheer-led, misled and led us to many a strange time and place in the course of our past. Though it is not evident to most at a first glance, languages can serve as a radically deterministic power on the way we think. Most foreign learners, upon being subjected to the learning of a foreign tongue, express their intense dislike for dense and complex grammatical modules, as well as intangible scripts; we are often led to wonder what strange concatenation of events must have led to the development of such a volatile and complex thing as language, which was originally designed for the ease of communication.

Almost all languages share the common essentials- they have distinct categories for the grouping of verbs, nouns and adjectives, as well as tense. The sheer fact that so many languages have so much in common shows us the astounding uniformity underlying human thought; only when contrary examples are cited do we recognise this; consider, for example, a certain language called Hopi, which bears absolutely no references to time! Several languages, including Indonesian and Mandarin lack discrete verb tenses; but to think of living without the concept of Time, in all its abstraction and paradoxical regularity, seems to be impossible. So many aspects of time-its linearity, its irreversibility, its ability to serve as a dimension, of sorts- have been embedded in us from a young age, via our languages. If time travel were possible, we would have very strange tenses indeed!

Noam Chomsky was one of the first linguists to suggest that the proficient usage of any language requires a certain instinct, and even, perhaps a certain genetic code. Yet, several languages are unique in their own ways; German and Russian, for example, are largely devoid of a distinct continuous tense; French allows the usage of double negatives to mean negatives!.

As further defined by Robert Sapolsky in his lecture on ‘Language’ at Stanford, human language cannot be considered 'language', without seven key features that distinguish it from the communication between other creatures. These include:

semanticity,
embedded clauses,
recursion,
displacement,
arbitrariness, "Motherese",
meta communication, and
prosody.

Each of these facets of human language are exclusive to our species, and one or the other is typically the reason that various apes who were taught ASL (American Sign Language) over the years, have been unable to recreate language that is truly ‘human’.

For communication to be complete, practical and real in the human world, each of these features is essential. In the same way, AI with Natural Language Processing/ Comprehension & Understanding/ Generation of contextually & semantically sensible dialogs, could only become humane if these features were embedded into the famed Siris and Alexas. The difficulty lies in creating these features without compromising datasets or precision, thereby reaching the holy grail of Artificial General Intelligence (AGI).

1) Semanticity- the ability to generate and convey meaning, by ‘bucketing’ sounds to create words, is the most fundamental feature of any and every human language. The instinctiveness of meaning makes it all the more difficult to completely transfer meaning into a system, and clearly makes it incredibly challenging to generate meaning in novel words and communicate successfully. The awareness of semanticity in itself, the definition of semanticity that thereby gives the word meaning, is an almost inexplicable, and simultaneously intuitive, concept.

2) Embedded clauses are arguably the easiest to represent through programming languages as well as human languages, using simple to complex logic constructs- from Aristotle's term logic to propositional and predicate logic that form the foundations of AI.

Adding conditions or details to a pre-existing clause, that answer questions such as where, when, how or why, in relation to the clause, creates language rich in information.
Specifications through embedded clauses result in more informative sentences, and thus more meaningful communication.
If further detailing and longer sentences do provide more information in a sophisticated algorithm, this facet can be conquered.

3) Recursion or generativity is an incredibly interesting property of language: a finite number of words can produce an infinite number of sentences, and a sentence can have infinite length, bounded only by practical time constraints.

By embedding a sentence in another sentence with perhaps an identical format, you can generate a longer, different sentence. ‘They don’t know that we know they know we know’, is a perfect example that could technically last forever continued in the same exact pattern, without losing meaning.
Recursion in programming usually refers to self-referencing, such as recursive functions that call themselves within the function definition.
Generativity is relatively easy for people to manufacture, by adding clauses, and may be easy to manufacture randomly as well, with strict boundaries of meaningfulness.

4) Displacement is the one feature that most famous chimpanzees and gorillas learning ASL could not achieve. Displacement involves the ability to talk about different time periods, different people, regardless of present circumstances, and without only conveying current emotion. Being able to talk about things emotionally distant from us, and dissociating our communication from our situation, is a unique capability that we seem to convey easily. Giving each other information in ‘facts’ not directly pertaining to ourselves, or even getting the answer from a chatbot when you ask for 23rd January’s weather, are examples of displacement ingrained in human language.

5) Arbitrariness refers to the lack of connection between the meaning of words and their shapes or sounds. Adjectives such as ‘heavy’ or ‘sad’ are not created with relation to the shapes of the letters in them, or resembling the sounds they make. The arbitrariness of human language makes it difficult to compare languages by sound or script, since neither of the two convey meaning. The randomness of meaning is, therefore, immune to guesswork or brute force.

6) Meta communication, the ability to communicate about communication and discuss language at all, is another feature of human language, forming the basis for natural language processing and generation. Meta communication also refers to secondary communication that can change or add to the meaning of a sentence or communication.

7) Prosody, which forms a derivative part of meta communication, involves things such as intonation, stress, rhythm and other parts of body language that accompanies a unit of communication. The accompanying body language and intonation can convey different meanings to even the same sentence, which can be expressed through meta communication, such as sarcasm created using varied tones of voice.

Motherese, or baby talk, consists of the difference in intonation, including stress on vowels, that parents use when communicating with their infants, with the intention of teaching them how to speak. Specific to humans, it is an important part of child language acquisition, a field growing in importance to machine learning through natural language acquisition.

Each of the 7 features that distinguish human language from others seems instinctive, and unnoticeable in daily conversation. To train a language model on these basic but generic features of human languages, is a different ballgame altogether.

Language mobility

Language mobility??

Languages in Translation: Wasn't one language enough?

Research contributor and author: Aparajita Bandopadhyay

================================================

To quote Franz Kafka, who, interestingly enough, was a brilliant renegade of a writer- ‘All language is but a poor translation’.

Many foreign words, which have resisted easy translation, have simply been incorporated without any change into English- e.g., schadenfreude and weltschmerz- and some linguists, such as Bickerton, have developed a theory which roughly states that a people which does not experience certain events,or know of certain objects, will not create words corresponding to them. This is an intuitively obvious idea; but, quoting Matt Ridley, it would be absurd to argue that only Germans can understand the concept of schadenfreude, and the rest of us find the concept of taking pleasure from the sorrows of our neighbours foreign.

Redundancies in words, too, inevitably appear in the development of languages. We must also call into consideration the fact that omnipresent and abundant things cannot, logically, have words corresponding to themselves; why would we want a word for something that is taken for granted? After all, the things we know the best are the things we don’t know that we know. Even this may serve as a hindrance to our complete comprehension of another language. It is also a highly engaging activity for our brains to learn and communicate with languages. Our brains use our Broca’s areas, Wernicke’s areas, prefrontal cortices, and several other centres for constructing simple sentences.

The fact that languages have influenced us greatly is proved by most of our inabilities to instantly grasp a new language- yet it is ourselves who have influenced languages in such diverse ways that different ones are hard to grasp. Languages are yet another proof of the often laughable circularity underlying human thought (every definition is ultimately circular-for example- the definition of a ’thing’ can be an ‘object’, and that of an ‘object’ may be a ’thing’-upon being asked what both of them mean, we can smugly reply-’the same thing as each other!’).Our inevitable ineptitude for the expression of our most elementary axioms reveals not only the visceral and nebulous nature of our concepts, but also the idea of a strange, uniform camaraderie between us, as a race.

Languages delude us about ease of communication very often; yet, we often laugh at our own complexity in thought when trying to straighten out paradoxes of clever phrasing, and oxymorons (how many of us have really heard silence?).

No wonder then, that the best of Google Translator APIs and NLP/ language translator modules from Microsoft Azure or AWS, often generate translated text-strings that read more like AI-generated lame jokes. One good service they all provide to human intelligence though- they augment our sense of humor- "Maria...makes me... laugh"!- so goes the famed song from The Sound of Music!