BRUNO RICHARD
HAUPTMANN WAS EXECUTED in Trenton, N.J., in April 1936,
for kidnapping and murdering the young son of the famous aviator
Charles Lindbergh. The most dramatic moment in Hauptmann’s closely
watched trial came when Lindbergh identified Hauptmann’s voice as that
of his son’s kidnapper. “The minute Lindbergh pointed his finger at
Hauptmann, the trial was over,” said Hauptmann’s lawyer after the
conviction. “Jesus Christ himself said he was convinced this was the
man who killed his son. Who was anybody to doubt him or deny him
justice?”
Lindbergh
had heard the voice of his son’s kidnapper three years
earlier. Still hoping to get the child back alive, Lindbergh had
accompanied Dr. John Condon to St. Raymond’s Cemetery in the Bronx to
deliver ransom money. Condon handed off $50,000 in marked gold
certificates, while Lindbergh waited nearly 100 yards away in a car.
Out of the darkness came the words, “Hey, doctor! Over here, over
here.”
Twenty-nine
months after the encounter in the cemetery, in
September 1934, Lindbergh told a Bronx grand jury that “it would be
very difficult to sit here and say that I could pick a man by that
voice.” Undeterred, the district attorney asked Lindbergh later that
day: “Would you like to see the man who kidnapped your son?” The next
morning, while Lindbergh sat in the back of the D.A.’s office among a
group of detectives, Hauptmann was brought in and asked to repeat the
words, “Hey, doctor. Here, doctor, over here.” Lindbergh told the
prosecutor that he recognized the voice as that of the kidnapper, and
he testified under oath at the trial that Hauptmann was the man he had
heard in the cemetery.
The
question of how well lay witnesses like Lindbergh can
recognize voices arises regularly in legal cases. When a woman is
sexually assaulted by a man wearing a ski mask, or when a government
official receives a bomb threat, the case may hinge on how well the
victim can identify the perpetrator’s voice. But at the time of
Hauptmann’s trial, no experts were available to assess the accuracy of
Lindbergh’s account. In fact, social science research on how well
people can identify speakers by their voices was actually initiated by
Hauptmann’s trial, though too late to help him. In the intervening
decades, researchers have made big leaps in understanding the human
capacity to identify voices, but the legal system has yet to take the
research into account. As a result, intuition based on personal
experience rather than science tends to govern the admissibility and
perceived reliability of voice identifications at trial. And scientists
have shown that our intuitions are often wrong.
ONE YEAR
AFTER HAUPTMANN’S EXECUTION, Frances McGehee, a psychology
professor at the University of Illinois, had students listen to a
person read a 56-word passage from behind a screen. The students were
then tested at various times to see whether they could pick the reader
out from a group of five voices. They did so with 83 percent accuracy
the next day. Three weeks later, however, their success rate had
declined dramatically to 51 percent. Five months later they were down
to a dismal 13 percent accuracy rate—well below chance.
More
recent work confirms McGehee’s findings that accurate
identification persists for a period of time, and then deteriorates
sharply, far more so than most people would expect. Not surprisingly,
the amount of time we are exposed to a voice matters, but the number of
times a listener is exposed to a voice may be more important than the
length of the exposure. Hearing a voice once for 60 seconds is not
nearly as helpful as hearing it three times for 20 seconds each time.
We all
know that less familiar voices are harder to recognize. But
the degree of familiarity matters more than we might assume. The
Canadian psychologist Daniel Yarmey and his colleagues have found that
people can identify the voices of those close to them, such as family
members, with 89 percent accuracy in a voice lineup, but that accuracy
drops to 66 percent when the voice is that of an acquaintance, such as
a neighbor or coworker with whom the subject has had only occasional
contact.
Disguise
is also a problem. A simple and effective form of
disguising a voice is whispering. Distorting the voice is another
method. A device as low-tech as a pencil can be quite effective in
masking a voice. Brazilian kidnappers have been reported to place a
pencil between their teeth when making ransom demands. This trick
creates complex acoustic changes by affecting the movement of the
speaker’s tongue and jaw, making the voice that much more difficult to
identify.
Some
voices, especially those of family members, may be very
similar to each other and easy to confuse. We have all had the
experience of calling someone on the phone and misidentifying the
person who answers, even when we know well both the person to whom we
are speaking and the one to whom we think we are speaking. We identify
preadolescent boys as their mothers and confuse the voices of brothers
with each other. It should not be surprising that skilled imitators can
intentionally cause confusion. In a study conducted in Sweden, people
were asked whether they heard the voice of Carl Bildt, the former
Swedish Prime Minister, among a group of voices played for them. The
actual voice on the tape was that of a good political impersonator.
People just about always got it wrong, unless they also heard Bildt’s
actual voice as one of the alternatives in the lineup. The possibility
of misidentification in a court setting is clear.
On the
whole, research has shown that we are not as good at voice
identification as we think we are, but scientists have also discovered
that all listeners are not created equal. People differ dramatically in
their ability to identify voices. Some people are great at it, and
others awful. Not much is known, however, about why some are better at
voice recognition than others. The skill appears to correlate to some
extent with musical ability and perhaps certain aspects of memory, but
no surefire way to predict this ability has yet been developed. Were
reliable aptitude tests available, proving that an iden tifying witness
had limited ability to recognize voices might be equivalent to showing
that an eyewitness had uncorrected bad vision. For now, judges seem to
presume that everyone is relatively good at voice recognition, better,
in fact, than the research suggests is possible.
THE LEGAL
SYSTEM DOES NOT RECOGNIZE MOST OF THE FINDINGS—especially
the counterintuitive ones—that scientists have demonstrated since the
Hauptmann trial. An earwitness who testified, “Of course I recognize
[the defendant’s] voice—I’ve lived next door to him for five years”
would likely devastate the defendant’s case. In fact, there is a strong
probability that the witness—despite her familiarity with the
voice—would get it wrong.
The
matter is more than academic. In 1992, Guy Paul Morin was
convicted of raping and murdering a young girl from his neighborhood in
Ontario. The conviction was based in part on an erroneous
identification of Morin’s voice by the child’s mother. On the night of
the crime, a number of people had heard a man’s voice cry out from
outside the victim’s home, “Help me, help me. Oh God, help me,” as if
the perpetrator had done something terrible and was consumed with
remorse.
No one
identified the voice to the police at that time according
to police records. But after Morin was arrested, the victim’s mother
identified his voice. She said that she knew it was his because she had
spoken with him a few times over the backyard fence. The court allowed
the identification because the witness was familiar with the voice in
advance of the case. Morin served 18 months of a 25-year sentence
before he was exonerated by DNA evidence in 1995.
In the
United States, the legal system has taken some steps to
reduce the likelihood of false identifications by earwitnesses. Under
the Fifth Amendment’s due process clause, which deals with criminal
matters, the Supreme Court has established procedures that courts must
follow before admitting identification evidence. In fact, the same 1972
case that set ground rules for eyewitness identification also involved
an earwitness identification. According to these rules, it is no longer
permissible for police officers to invite the victim of a crime to the
police station, bring her within earshot of a defendant who is asked to
say a few words, and then ask the victim if the defendant was the
perpetrator of the crime, as happened in the Lindbergh case.
Such
suggestive procedures are considered too likely to result in
false identifications. But even in the case of a suggestive
identification, the Supreme Court’s analysis allows courts to admit the
identification, as long as it is determined to be reliable. Reliability
is an empirical question, but one that the courts have continually
shown no ability to answer accurately. Though experiments have shown no
correlation between a witness’s confidence in what he heard and the
accuracy of his identification, courts generally hold that a witness’s
confidence in the identification is a good indicator of its
reliability.
If a judge
decides that an identification is reliable enough to be
admitted into evidence, it’s up to a jury to decide whether the
identification is correct. Jurors, unfortunately, tend to have the same
misconceptions about voice identification that judges have, and the
typical jury instructions on how to evaluate the evidence are unlikely
to enlighten them. While a sharp defense lawyer might be able to call
an expert who could educate the jury, most lawyers do not have the
money to do so or do not know that such experts exist.
THE LEGAL
SYSTEM’S RELUCTANCE to look seriously at questions of
speaker identification stems partly from the recognition of
“voiceprint” experts in some courts as expert witnesses in the 1960s
and ’70s. Voiceprints, known as sound spectrograms in scientific
circles, are graphic representations of the amplitude and frequency of
sound. The technology was developed in the 1940s to create “visible
speech” that deaf people might be taught to decipher. It was used by
the military during World War II to try to identify speakers of
intercepted radio messages. Neither effort was particularly successful.
In 1962,
however, one of its developers, Lawrence Kersta, published an article
in Nature that
claimed that people’s voices, like their fingerprints, are unique and
can be identified through visual inspection of their voiceprints.
During the ’70s, published studies touted voiceprints as a highly
reliable means of identifying voices, and many law enforcement agencies
welcomed the technology. A typical case might involve a telephoned bomb
threat that was recorded on audiotape. After the police arrested a
suspect, they would have him recite the same words into a tape
recorder. The examiner would then run both tapes through spectrographic
analysis, compare the voiceprints, and reach a conclusion.
Prominent
experts in phonetics had their doubts about the
reliability of this methodology. Some courts began permitting
voiceprint experts to testify; others rejected this “expertise.” In
1979, an influential report from the National Research Council slowed
the acceptance of voiceprint specialists as experts. The report
determined that voiceprint analysis, while accurate under ideal
laboratory conditions, was not reliable enough for courts to depend on
the technology when a recording was made under “real-world” conditions,
where voice signals are degraded by problems like poor recording
quality, background noise, and telephone transmission.
Occasional
battles over voiceprints have continued to surface
during the past 20 years, but most law enforcement agencies have
stopped trying to get them into court. In the 1990s, the Supreme Court
tightened the standards for admitting scientific evidence in federal
court, further reducing the motivation to use the technology. The
voiceprint’s demise as a valuable forensic tool has resulted in a
broader decline in the interest in voice identification techniques
generally. To many judges and lawyers involved in the criminal justice
system, including leading experts on scientific evidence, voice
identification has been equated with voiceprints and voiceprints are
too unreliable.
Other
so-called forensic identification sciences, including
microscopic hair analysis, handwriting identification, bite-mark
analysis, ballistics, and even fingerprints have also been under attack
in recent years. Once the Supreme Court, in its 1993 Daubert decision,
established the “known rate of error” as one of the indicia of
scientific reliability, it did not take long for lawyers and legal
scholars to notice that many of these identification “sciences” have
been used in courts for years with little proof about their error rate.
Some, like hair analysis, have become notorious for contributing to
convictions that were later overturned by DNA evidence, the Central
Park Jogger case being perhaps the most notorious example.
Unlike
most of the forensic identification sciences, which have
been defending their turf against these challenges, experts in
phonetics have been at the forefront of those questioning the
reliability of traditional voiceprint analysis. But perhaps because the
field of phonetics has applications outside the courtroom—voice
recognition software is used by word processors and corporate security
agencies alike—they have not given up the pursuit of accurate voice
recognition. Rather, phonetics experts are working to develop more
reliable methodologies for identifying voices. Steady advances in
computer technology may have a great impact on forensic voice
identification in the future. Huge databases of voices, sophisticated
mathematical modeling techniques, and the ability of acoustic engineers
to decompose the human voice into a host of different components have
led to enormous improvement in voice recognition technology.
Still,
even with all of these improvements, machines that can
identify a voice with the reliability of DNA or fingerprints are still
in the future. The most advanced technology is not yet able to deal
well with disguise, and performance dwindles when the voice recordings
are of poor quality. But the steady improvements in the field suggest
that the technologies may become accurate enough to be relied upon. We
hope that the history of voiceprint analysis will not preclude courts
from taking better voice recognition technologies seriously as they
become available for courtroom use.
Of
course, no technology, however perfect, will always be able to
compensate for the weaknesses of human memory. Though voice recognition
software may one day be able to determine whether the voice on a tape
matches that of a suspect—or whether the voice on a broadcast is really
Saddam Hussein’s—it will never make the testimony of the woman who
heard her daughter’s attacker and thought she recognized her neighbor
more reliable. The courts have been more inclined to trust people than
machines, but we may soon reach a time when the reverse should be true.
|