“This book is about language and its evolution, yet it also is an adventure in learning about the human brain and how it works. The mystery of how speech organs make sounds is exposed, as is the incredible cognitive processing that is needed to produce or decode language as we know it. . . . A fascinating discourse.” Chris Boehm, University of Southern California
Eve Spoke presents a compelling case for the pivotal role that speech has played in human language and human evolution. Wrestling with the age-old question of why such a large gulf exists between humans and other animals, Philip Lieberman mines both the fossil record and modern neuro-scientific techniques to chart the development of the anatomy and brain mechanisms necessary for human language as we know it. Eschewing any notion of a language gene or instinct, he pursues instead an evolutionary path in which environment acts on a biological capacity to reveal the interconnectedness of systems that make us most human: precise motor skills, speech, language, and complex thought. Lieberman interweaves decades of research in anthropology, neuroscience, psychology and linguistics into his exposition on the evolution of human speech.
|Publisher:||Norton, W. W. & Company, Inc.|
|Product dimensions:||1.00(w) x 1.00(h) x (d)|
About the Author
Philip Lieberman is University Professor of Cognitive and Linguistic Sciences at Brown University. He has written many books on the brain, evolution, and speech, and his photos are exhibited around the world.
Read an Excerpt
The Mice Talked at Night
The sky was nearer black than blue. At 24,000 feet on Mount Everest, Dr. Mike keyed his radio and began to speak. Far below at the Khumbu Glacier Base Camp, I punched the record button of the digital tape recorder connected to my Motorola Maxtrax two-way radio. We were testing my evolutionary theory that the brain mechanisms that control our tongues, larynx, and lips when we talk are the evolutionary bases for complex human thought. The experiment was successful. As we reported in Nature, the international journal of science, the climbing teams' speech motor control and their ability to comprehend simple sentences had both deteriorated. By the time they had reached 24,000 feet, they needed 50 percent more time to understand sentences that six-year-old children readily comprehend. The lack of oxygen at extreme altitudes affected brain mechanisms that regulate both speech motor control and syntax. The climbers' decision-making abilities deteriorated as well. Putting these effects together with the results of many other independent experimental studies, we were able to show that the parts of the human brain that control speech also play a part in thinking.
Over the past thirty years my colleagues and I have studied monkeys, chimpanzees, infants, children, normal adults, dyslexic adults, elderly people, and patients suffering from Parkinson's disease and other types of brain damage. We have also examined the skulls of our fossil ancestors, comparing them with those of newborn infants and apes. The focus of these studies has been the puzzle surrounding human evolution. Why are we so different from other animals, although we are at the same time so similar?
Anatomists have for centuries known that we share many features with our nearest animal "cousin"--the chimpanzee. Modern biology has only deepened the mystery. In terms of the genetic information encoded in the DNA sequences that determine the structure of their bodies, human beings and chimpanzees are more similar to each other than rabbits and hares are to each other. Every new fossil discovery confirms the fact that our not-so-distant ancestors, the various australopithecine species, resembled chimpanzees four or five million years ago, an instant in the flow of time. Why, then, are we humans so different from all other living creatures, and how could this have come to pass? Part of the answer seems to be that we are able to think because we can talk. Brain structures originally designed to control our tongues and lips, as well as our hands, may have become modified and elaborated for language and thinking.
In some deep, unconscious way we "know" that dogs, cats, chimpanzees, and other intelligent animals would be human if they could only talk. Intuitively we know that talking = thinking = being human. The studies discussed below show that this intuition is correct. We know, so far as science "knows" anything, that speech is a central aspect of human language. Speech consists of more than a set of arbitrary sounds that people can use to communicate. The particular acoustic properties of human speech allow us to transmit information rapidly to each other. The complex ballet constantly performed by the muscles of our speech anatomy--our lips, tongue, vocal cords, and so on--is choreographed by specialized brain mechanisms that also appear to make complex human thought possible. The fossil record of human evolution and genetic evidence show that these brain mechanisms and anatomy reached their present state fairly recently. We, Homo sapiens loquax, evolved in the last 150,000 years or so, most likely in Africa, from which we spread out and populated the world, displacing earlier humanlike animals.
Indeed, these prehistoric events may be dimly reflected in the mythology that forms our human heritage. The Popol Vuh, the Mayan story of creation, for example, links being able to talk with being human:
The Gospel according to John is even more direct:
And as Beatrix Potter knew, in the small hours of the night the mice met to talk to each other. Speech is so essential to our concept of intelligence that its possession is virtually equated with being human. Animals who talk are human, because what sets us apart from other animals is the "gift" of speech.
Eve and the Neanderthals
Although the speech ability of the Neanderthals who lived in Europe and Asia forty thousand years ago and the brain mechanisms controlling speech and thinking seem to be unlikely topics for acrimonious dispute, barrels of printer's ink have been spilled in this controversy and countless harsh words uttered. The evolutionary and biological natures of human speech and language directly impinge on two issues. One contentious issue is how human beings evolved. The speech deficiencies of the "classic" European Neanderthals are consistent with the Eve hypothesis--that modern human beings evolved in Africa some 150,000 years ago and then migrated to Europe, Asia, and Australia, displacing the archaic humanlike hominids who had reached these areas in an earlier wave of hominid expansion. All contemporary human beings, therefore, have common African ancestors, according to the Eve theory, and no present human population is directly related to the Neanderthals. The opposing "multiregional" theory of human evolution holds that modern human beings evolved locally in different places and times. According to the multiregional theory, native Asians, Africans, Australians, and Europeans independently evolved in these locales from resident Homo erectus populations that had emigrated there from Africa about one million years ago. Milford Wolpoff of the University of Michigan is perhaps its foremost proponent.
The multiregional theory is beset by many problems. It is based on the premise that the small differences in skull shape and other bones that differentiate the teeth and skulls of contemporary Asians supposedly were similar to those of Homo erectus who lived in Asia because Asian Homo sapiens supposedly evolved from these extinct hominids. Contemporary Europeans supposedly evolved from European Neanderthals, who, in turn, differed from Asian Homo erectus, who, in turn, differed from a hypothetical Australian Homo erectus population. But if that were the case, we would have to account for the fact that all living human beings are remarkably similar. If we independently evolved, why are central human characteristics--our brains, anatomy, and physiology--so similar throughout the world? Any normal human child can effortlessly acquire any human language before the age of three years. Antibiotics work, subject to similar individual variations, in similar fashion in all parts of the world. An unlikely explanation has been offered by multiregionalist theorists who propose that "gene flow" occurred after the independent evolution of different human groups. However, if the extensive mating of these populations necessary to yield similar basic human attributes happened after the hypothetical independent evolution from Homo erectus, why would small regional distinctions survive unless they were really adaptive? In this case, the Eve hypothesis also can account for adaptive regional differences.
Moreover, the skeletal comparisons cited by Milford Wolpoff and his colleagues are suspect. Neanderthals, for example, are very different from any modern humans; they are extinct. William W. Howells, working at the Peabody Museum of Anthropology and Ethnology of Harvard University, has spent decades studying the skulls of human populations throughout the world. He has demonstrated that Neanderthal skulls have characteristics that never occur in modern human beings; modern human beings, conversely, have features that never occur in Neanderthals. In fact, the specific skeletal "evidence" cited by multiregionalists to support their theory appears to be irrelevant. A collaborative study (Frayer et al.) that was supposed to clinch the case for the multiregional argument, presented in 1993 the list of skeletal features supposedly linking modern Asians, Europeans, and Australians to the Homo erectus and Neanderthal fossils found in Asia, Europe, and Australia. However, Daniel Lieberman, who is now at Rutgers. University, two years later showed that most of these supposedly diagnostic features do not bear on the debate. Some of them, such as the shape of the incisor teeth, were found to a greater or lesser degree in all archaic fossil hominids and modern humans throughout the world. These features, therefore, cannot be used to link contemporary human beings who lived in a particular part of the world to Homo erectus populations that previously lived there. Other features, such as larger jawbones, were affected by environmental factors. A larger jawbone is not entirely specified by 2 person's genes; it can result anywhere when a person chews harder and more often. Daniel Lieberman, who is also "my son the anthropologist," demonstrated that the remaining skeletal features cited by Frayer and his colleagues supported the Eve hypothesis.
The reason that Neanderthal speech is a central issue in the debate concerning the Eve hypothesis is that people tend to have children with mates who speak the same language or dialect. Recent studies show beyond any reasonable doubt that speech serves as a genetic isolating mechanism in modern human beings. Therefore, the speech differences that my colleagues and I believe existed between Neanderthals and early modern human beings could have kept them apart for many generations. The Neanderthal-human speech distinction takes on significance since the demographic model developed by Ezra Zubrow of the State University of New York at Buffalo shows that Neanderthals would gradually have become extinct over many generations if their early modern human competitors had possessed only slight advantages. Zubrow's model is consistent with the archaeological record (discussed in Chapter 4), which shows that most of the stone tools of early modern human beings were similar to those of contemporary Neanderthals for tens of thousands of years. The gradual-extinction model is also consistent with the reappraisal of Neanderthal speech capabilities that will be presented here; Neanderthals clearly possessed language and speech, but their speech capabilities were intermediate between those of still earlier hominids and those of modern humans. Neanderthal speech would immediately have been perceived as being different from that of our ancestors. We therefore do not have to script a blitzkrieg in which modern human beings, our ancestors, overwhelmed the Neanderthals because they were ten times smarter or could talk ten times faster.
In short, we can account for the extinction of the Neanderthals if modern humans possessed only slight advantages, providing that the populations were genetically isolated. If Neanderthal and early human groups acted as we do, and kept apart because of speech differences, then small cognitive and linguistic advantages acting over many generations would have resulted in human beings gradually replacing Neanderthals. Given the role of speech as an isolating mechanism, exponents of the multiregional theory who claim that Neanderthals and modern humans mated must also claim that Neanderthal speech capabilities did not differ from those of modern human beings.
Noam Chomsky's Linguistic Theories
The other scientific debate that we'll consider involves Noam Chomsky's linguistic theories. In Eleanor Lattimore's account of the wild "honeymoon" trip that she and her husband, Owen, made in the 1930s across Central Asia from Soviet Siberia through wild bandit-infested regions of China, across the Himalayan mountain passes to Ladakh, she tells how local tribesmen regarded Owen as a sage. In one remote hamlet,
All human languages use words and some form of syntax to convey distinctions of meaning. The "rules" of syntax in English, for example, convey the fact that Mary is the person being kissed in the sentence "Bill kissed Mary" and that she isn't the person being kissed in the sentence "Bill kissed Jane while Mary watched them." In both cases Bill is the person who carries out the action. Mary and Jane are respectively the recipients of the action. It's clear that different human languages use various methods to indicate these relationships. The syntax of languages that diverged from a common ancestral language in the last few thousand years, such as English and Latin, differ dramatically. English, for instance, conveys different meanings to a limited degree using "morphemes" added to the end of a word that indicate whether a noun is plural rather than singular or whether a verb is in the past tense or not (books versus book, walked versus walk). But Latin makes greater use of morphemes to convey distinctions in meaning; the subject and object relationships conveyed by word order in English are, for example, conveyed by morphemes added to each word in Latin. A Latin-speaking child had to acquire a strategy different from that of an English-speaking child to understand the meaning of a sentence. The question arises, Why are human languages so different and how do children learn to speak and understand any language?
An obvious answer is that children learn the words and rules of syntax of their native language in much the same way that they learn everything else, by means of general cognitive processes. Although affinities can be seen in the words of many distantly related languages, such as English and Hindi, which both derive from a common ancestral language, Indo-European, their syntactic rules differ profoundly. Many aspects of the cultures of India, England, and the United States also differ, but it is evident that the specific habits and attitudes of these different cultures are learned by children as they grow up in a particular locale. Similar distinctions hold for very closely related languages and cultures, such as German and English. However, Noam Chomsky, arguably the most influential living linguist, has turned the linguistic world upside down. Chomsky claims that human beings do not really learn the rules of syntax. He instead proposes that we come equipped at birth with a "language organ" that specifies all the rules of syntax of all human languages. Chomsky's disciples (some of his leading advocates often refer to the theory as a worldwide religion) believe that a "universal grammar" is genetically coded into every human brain. The principles of the universal grammar are designed to guide every child to the "correct" set of syntax rules of any language that the child happens to hear. The hypothetical genetically coded universal grammar is identical for all human beings. These premises would be amazing if they were true.
Chomsky once categorically stated that human language couldn't have evolved by means of the processes that Charles Darwin proposed in his modestly entitled book On the Origin of Species. Chomsky has recently retreated from that stance, but we will see that his version of the biology and evolution of human linguistic ability is not consistent with the general principles of evolutionary biology and the studies of the brain bases of language and speech that we'll discuss.
What Makes Speech Useful?
Curiously, the property that makes human speech an essential component of language and thinking was discovered by chance. In theory, relationship between science and engineering is that scientists discover "laws" of nature, which engineers later apply to solve practical problems. That's often the case. Albert Einstein's discovery that E - [mc.sup.2] preceded the nuclear age by almost forty years. However, research directed toward an engineering project showed linguists that human speech communication is a complex process by which we can effortlessly transmit information at least five times faster than by any other sounds. The following experiment that you can perform without any equipment other than a pencil, or any object that you care to tap on a table, will show you what the development group at Haskins Laboratories, which was then located in New York City, discovered in the late 1950s.
What you'll discover is that it's almost impossible even to differentiate and count more than seven or so taps per second. In fact, when sounds are presented at a rate that exceeds fifteen per second, they merge into a continuous buzz. However, you can easily differentiate and identify more than ten speech sounds per second when you talk slowly. If you talk rapidly, the maximum rate at which speech sounds can be produced and comprehended is about twenty-five to thirty sounds per second. The Haskins Laboratories team, directed by the psychologist Alvin Liberman, the physicist Franklin S. Cooper, and the linguist Pierre DeLattre, was trying to build a machine that would "read" books aloud to blind people. Computer systems that would identify printed characters were available, but artificial speech-producing systems were in the infant stage of development and generated incomprehensible noises. The solution seemed to be to use a system of non-speech sound codes for the letters that the print scanner identified. The letter a could, for example, be signaled by a low-pitched tone, e by a high-pitched tone, and so on. Traditional Morse code could also be used. However, it soon became apparent that the maximum rate at which the text could be transmitted was so slow that "readers" usually forgot the beginning of a sentence before coming to its end. Moreover, people listening to the reading machine had to concentrate intently on identifying the sounds themselves, further reducing comprehension. The system had the same limitations as traditional Morse code.
The Haskins research team soon realized that the limitations of its reading machine derived from a fundamental property of human speech. Linguists had thought that the sounds of speech were similar to "beads on a string." Each sound supposedly was independent of its neighbors. Like the movable type used to print books, each speech sound was thought to be an independent entity that could be combined with any other letter sound, subject to some restrictions (some sound patterns could not occur in particular languages, for instance, ng at the start of a word in English). However, the fundamental premise was that people listened to each sound in sequence, identifying it from the acoustic "cues" in the segment of time that corresponded to the individual sound and then went on to identifying the next sound, stringing the identified segments into syllables and words. W. Freeman Twaddell, a distinguished linguist at Brown University in the 1930s and 1940s, had provided an explicit model of how people were supposed to identify the individual "phonemes," the meaningful speech sounds, and put them together into words, phrases, and sentences. One simple example will illustrate this hypothetical process. Three phonemes [c] [a] [t] make up the word cat (a phonemic transcription would use the symbols [c] [ae] and [t], but the alphabetic symbols will suffice). Each phoneme is hypothetically identified in sequence from acoustic "cues" that signal each sound. The key claim is that the acoustic cues for each sound are confined to a particular segment of time. Each phoneme is therefore independent of its neighbors. Three independent phonemes [s] [a] [p] hypothetically make up the word sap. If we isolate the phonemes that constitute cat and sap, we should be able to recombine them to produce the words sat, cap, pat. However, though elegant and simple, this model of speech production and perception is wrong.
The first hint that this model was wrong came from a project at Columbia University that used the then new technology of tape recording to build up a library of individual phonemes that could be pieced together to produce speech. The would-be inventors reasoned that it should be possible to have trained announcers carefully read a list of words and then cut out the segments of recording tape for each phoneme. Since phonemes were supposed to be the sound equivalents of movable type, it should have been possible to recombine them by means of a mechanical device that could rapidly play the stored tape segments in specified sequences. Using this system, one should have been able to form the word tack by rearranging the three phonemes that formed the word cat. However, when the system was constructed with the state-of-the-art technology of the 1950s, the resulting sounds were incomprehensible. One person who worked on that project described the resulting signal as the "speech of a drunken cockroach." Attempts to improve the system focused on the mechanics of the tape playback system, but the sound never improved. The inherent problem was that when a person said the word cat, the acoustic cues for each sound were, in fact, distributed across the entire word.
The diagram on the next page adapted from one of the Haskins Laboratories papers illustrates this time smearing. The acoustic cues are melded together. The term used to describe this process is encoding. The primary acoustic cue that lets you identify the first consonant of bag, [b ae g] in phonetic transcription, as a [b] rather than a [d] or [g] is impressed on the segment of time that also lets you know that you're hearing the vowel [ae]. (The brackets before and after a letter indicate that it refers to a sound of the International Phonetic Alphabet, a set of symbols used to transcribe the sounds of speech.) The cues for the final [t] sound are likewise distributed throughout the vowel. The vowel's acoustic cues are distributed throughout the entire monosyllabic word. Furthermore, it is inherently impossible to cut out a "pure" [b], because the acoustic cues for the vowel [ae] were impressed on the initial [b] when it was spoken. The encoding process chunks the speech signal into syllable-size segments as you talk. The human speech perception system operates in terms of these longer syllable-size chunks of speech, which obviously occur at a lower rate than the individual phonemes. The human speech perception system is complex; it is not the sort of system that an engineering group would have designed. In principle, it would have been simpler if each sound were independent.
Speech Is a Five-Ring Circus
The gymnastics that humans use to produce speech are likewise complex. As we talk, we must continually plan ahead, modifying the immediate movements of our speech-producing organs--our lips, tongue, larynx, lungs, and velum (a structure that can seal the nose from the mouth)--to take account of what we're going to say Another simple experiment will reveal this process.
The exact timing between lips and the movement of the tongue tip that's necessary to produce a [t] seems to vary for different languages and dialects. Two American speech scientists, James Lubker and Tom Gay, who were working at the Royal Institute of Technology's Speech Laboratory in Stockholm, showed that native speakers of Swedish, for example, seem to anticipate rounded vowels to a greater degree than native speakers of English. The distinction probably forms part of what we normally think of as a "Swedish accent."This Swedish versus English distinction obviously isn't part of the genetic endowment of the population of Sweden. Research on the acquisition of speech by children that Joan Sereno, who is now at Cornell University, and I conducted showed that English-speaking children learn to perform these articulatory gymnastics between the ages of three and five years. They appear to be unconsciously paying attention to these subtle distinctions, which they gradually learn to mimic by a process of trial and error. The articulatory maneuvers that people use to produce speech are arguably the most complex that ordinary people attain during their lifetime. Research on the acquisition of speech by normal children shows that they don't really attain adult levels of proficiency until about the age of ten years.
Why Is the Speech Perception-Production Process So Complex?
No other living species has the anatomy and the brain mechanisms that humans use to produce speech. Although chimpanzees are the closest living relatives of modern human beings, they cannot produce even simple words. Comparative analyses of human and chimpanzee DNA and the fossil evidence indicate that humans and chimpanzees had a common ancestor a mere five million years ago. Some fundamental differences exist between the sound-producing anatomy of chimpanzees and that of humans. However, despite these anatomical differences, discussed in the chapters that follow, chimpanzees would be able to produce a muffled approximation to human speech if their brains were capable of planning and executing the necessary complex articulatory maneuvers. But even though experimenters and animal trainers have assiduously attempted to teach chimpanzees to talk since the seventeenth century, no chimpanzee has ever been able to speak. It is becoming evident that human speech ability depends on two factors--specialized anatomy and a special-purpose neural "functional language system" that regulates speech production, speech perception, and syntax in the human brain. Both anatomy and brain had to evolve from the primate base of the human-ape common ancestor to make human speech, language, thought, and culture possible. And both appear to have reached the human condition in Eve. But why is the system so complex?
The answer to this question derives from what Ernst Mayr, one of the great minds of twentieth-century evolutionary biology, terms the "proximate logic" of evolution. In simple terms, evolution is miserly and opportunistic. The goal is to achieve a result by spending as little as possible and making do with what you already have. The time resolution of the "standard" mammalian auditory system seems to be about the same for both primitive and evolved mammalian species. Patricia Kuhl, a psychologist at the University of Washington who studies both children and animals, for example, found that chinchillas and humans used the same temporal criteria to tell whether a sound was a [d] or a [t]. The distinction here rests on the timing between the moment the vocal cords of the larynx begin to move open and shut in a regular manner, producing "phonation," and the moment your lips open, producing a "puff" or "burst" of air noise. If the noise burst and phonation occur within 20 msec (a msec is 1/1000 of a second), both the chinchilla and the human will "hear" the sound [b]. A longer time delay yields a [p]. The decision-making criterion seems to be the length of time that must intervene between two different sounds in order for the hearer reliably to know which occurred first. Human beings and chinchillas use the same auditory criterion to categorize these sounds because humans retain the basic "primitive" mammalian auditory system found in chinchillas.
Evolutionary biologists find it essential to distinguish between "primitive" and "derived" features when they chart the family trees of various species. A primitive feature is one that characterizes an ensemble of species that are the ancestors of a particular species. A derived feature is one that differentiates a species and its close relatives from other, less related species. For example, frogs, chickens, monkeys, and humans normally have five digits, a primitive feature shared by most terrestrial animals. Single hooves are a derived feature differentiating horses and closely related species from other mammals. But we couldn't conclude that human beings are more closely related to frogs than horses because we and frogs have five digits. Human beings simply retain the primitive five-finger, five-toe configuration of terrestrial animals.
Table of Contents
|CHAPTER 1 The Mice Talked at Night||3|
|CHAPTER 2 Chimpanzees and Time Machines||21|
|CHAPTER 3 He's a Big Baby||49|
|CHAPTER 4 Dead Men and Women Talk Again||68|
|CHAPTER 5 Talking and Thinking Brains||98|
|CHAPTER 6 What, When, and Where Did Eve Speak to Adam, and|
|He to Her?||133|