Look who’s talking

Speaker recognition for manhunt

Assassins, kidnappers, extortionists – they like to keep in touch by phone. The hunt for criminals by listening device is quite simple in the minds of hollywood writers: "a few more seconds, we’ll have him", says the computer technician. Data records rush past the monitor – then a flash of light. "That’s him, his voice profile is a match." but investigators can actually match tapped phone conversations with a voice profile database? And does it work automatically?? German phonetics experts are extremely skeptical: "the digital voiceprint is a phantom", said angelika braun, professor of phonetics at the university of marburg, at last week’s scientific press conference in bonn.

Angelika braun, who headed the speaker recognition department at the federal criminal police office (bka) as recently as 2000, says that the probability of identifying a perpetrator by voice profile is as high as that of handwriting analysis: "at least two to three percent are wrong." although the human voice is individually distinctive, it is also bound to anatomical and physiological conditions. It may wear out or change in the event of illness.

Jens peter koster, phonetics professor at the university of trier, also considers voiceprints, which are visualized speech sound analyses still used mainly in the usa, to be "scientifically unsound". Angelika braun doubts its effectiveness: "in fact, the assessment of voice identity is merely shifted from the auditory to the visual level. This objectively leads to a worsening of the decision basis, since the resolution of the sonagrams is substantially below the differentiation ability of the human ear." this is probably why the fbi does not go to court with the voiceprints. But allegedly they are used in the echelon system to identify speakers. And criminal procedure law in the u.S. Still allows for their recognition by a judge.

Feature-oriented phonetic analysis

In germany, the voice is used as a clue, especially in murder, kidnapping, and assassination cases. In the 1970s, the federal criminal police office last relied on voice profile analysis, then the investigators turned to the more sophisticated feature-based phonetic analysis. This is a combination procedure that supplements classic auditory methods with computer-assisted instrumental phonetic methods. The first successes were achieved at the end of the 1970s in a kidnapping and a murder case. In the case of the kidnapping of the then employer president hanns martin schleyer, wiretap protocols were also analyzed and used for the terrorist trials.

The feature-based phonetic analysis is based on the one hand on the computer-assisted analysis of the sound signal, and on the other hand on the evaluation by an expert. First, the quality of the sound carrier is improved in order to achieve an ideal signal. While in the 70’s the spectral range was investigated, today it is the cepstral range. The spectrum itself has a periodic structure. The frequencies of the oscillations in the spectrum form the cepstrum of the signal. The experts use complex logarithms and fourier transforms to calculate the cepstrum.

The expert then evaluates speech characteristics such as breathing or vocalization. Whether a speaker speaks monotonously or animatedly also plays a role in the evaluation, as do microfluctuations of the voice or the energy distribution in the voice spectrum. In a police murder in holzminden, for example, the perpetrator was identified by his very high-pitched voice, among other things. As koster pointed out, particularly cutting and whistling sounds can also be typical of a speaker – as in the case of the raf terrorist peter-jurgen boock.

On average, an analysis takes 20 to 30 hours to produce a report that can be used in court. This speaker profile is then matched with comparison samples of suspects. "The data is destroyed again after use" emphasizes angelika braun. Therefore, the federal criminal police office does not have a database with voice profiles.

The method is also suitable for identifying foreign-language offenders. However, the evaluators cannot themselves determine typical dialectal colorations in other languages. For the german language, the bka has already developed a database of regional colloquial languages. For the analysis of foreign languages, on the other hand, freelance interpreters are often used. But gerd osten, first chief inspector at the federal border guard, warns against too high expectations: "the higher the requirements for language expertise, the lower the probability that the interpreter himself is neutral." after all, linguistic ties among small language groups often exist because of a very limited regional origin.

Automatic speaker verification

In germany there are few experts in speech and language analysis. Four phoneticians and three linguists work at the federal criminal police office in the field of speaker recognition. The state criminal investigation departments in north rhine-westphalia, bavaria, brandenburg and berlin also employ only a few experts. Only the universities of marburg and trier deal with forensic phonetics. In europe, intensive research is currently being carried out on automated speaker recognition.

Last week, a bka symposium was held on the topic of "automatic speaker verification" in eltville, where the latest research results were presented to the public. With optimal voice transmission, 20 seconds of audio tape is currently sufficient for an analysis. But increasingly, nonverbal techniques such as text messages and e-mails are being used, says gerd osten. Speaker recognition fails here, but the digital traces can be identified more quickly.