The Voice Recognition Tech Centre knows all about establishing benchmarks. The first insurance policy on the market to be concluded by voice was developed in the ERGO workshop. And the latest generation of phonebots offer fluid, case-closing dialogues with dynamically generated speech. What’s next? In this article, ERGO CDO Mark Klein takes a look beyond the industry’s perimeter. In digital medicine, voice recognition could revolutionise diagnostics. What is already possible, and what are the implications for insurers?
“You don’t sound very well.” We’ve all said this to family members or friends who are under the weather – or heard it about ourselves. Even on the phone it is noticeable: the voice may seem congested, weak or lacking energy. When you have a cold, it sometimes sounds like you have a clothes peg on your nose.
Without a doubt, the voice can serve as an indicator of illness. Scientific jargon refers to biomarkers, i.e. indicators that something is wrong. Consequently, doctors and experts in voice recognition have long occupied themselves with the question of whether more can be gleaned from the pitch of one’s voice than just a cold.
The smartphone app developed by start-up Vocalis can detect chronic obstructive pulmonary disease (COPD) in its early stages. The app can detect the first signs of breathlessness when speaking. The speech analytics company, which has offices in Israel and the United States, chose to widen its diagnostic function to include COVID-19 in the spring of 2020.
Its AI compared the voices of people infected with COVID-19 with those of people who had tested negative for the virus. The software thus “learned” to narrow down a voiceprint characteristic of COVID-19. According to Tal Wenderow, President and CEO of Vocalis, the app was able to help clinics pre-screen potential cases.
Above all, however, the company wanted to find out what the AI algorithm could tell from a voice profile. Vocalis is one of many start-ups working in the relatively new field of voice diagnostics. Many other companies are working on COVID-19 voice signatures. Some are developing algorithms, for instance, to detect whether someone is wearing a face mask.
AI algorithms in the field of voice research can do even more. Researcher Björn Schuller is one of the world’s leading experts in the field of voice analysis. The electrical engineer earned his doctorate in 2005 with a thesis on identifying emotions on the basis of voice.
Today, the professor of Embedded Intelligence for Health Care and Wellbeing focuses his research on the fundamental question: how can artificial intelligence succeed in evaluating the human voice to the extent that it can detect certain diseases at an early stage?
Our voice reveals more than we think. ADHD patients, for example, tend to speak rigidly, while depressed patients sound monotonous, and the voices of those even in the early stages of Parkinson’s disease already contain a barely perceptible tremor. Artificial intelligence is able to identify such subtle acoustic differences: in the case of Parkinson’s, it is successful in more than 90 percent of cases, and in the case of ADHD and mental illnesses, the success rate is more than 80 percent.
Schuller, who teaches at the University of Augsburg and at Imperial College London, has developed a diagnostic AI that works on the basis of artificial neural networks. The neural networks created by Schuller teach themselves what to look for in speech signals. The result is that the medical AI can diagnose almost as well as a human being.
Through voice analysis alone, the supercomputer then spews out the patients' heart rate and cortisol level, their age – and even their height to within five to seven centimetres. The neural circuits of the artificial neural networks may well constitute nothing more than pure mathematics, but they are designed to learn to make decisions that approximate those of a human being as closely as possible.
The average success rate of Schuller’s AI is astonishing. In up to 85 percent of cases, the voice analysis is consistent with the actual diagnosis received by the patient. Machine learning applications can now identify voice biomarkers for a variety of diseases – including dementia, depression, autism spectrum disorders, and even heart disease.
Artificial voice analysis might not simply replace or supplement human diagnosis – it might even outshine it.
In principle, of course, any doctor can roughly assess the voice of the person in front of them – but they can also miss many subtleties. In addition, some patients are good at concealing their mood. Most people do not notice when someone who is depressed raises their voice at times to feign to their doctor that they are feeling better.
Computers can detect such differences. For the future, one could imagine smart, voice-based medical technology in the form of a wristband which, based on the wearer’s voice, is able to detect numerous illnesses at an early stage. And, just like a fitness tracker, it could motivate people to pay attention to their own health.
Naturally, the big tech companies don’t want to miss out on the action. Amazon made a first marketable attempt with its “Halo” device (in the meantime, its successor “Halo View” has been launched). It is a wearable that uses a microphone to analyse its user’s voice.
The wristband could distinguish between formal business negotiations and a heated family conflict, for example – and make recommendations to lower one’s voice or speak in a calmer tone.
The music streaming service Spotify was also granted a patent in 2021 and intends to use voice analysis to suggest songs based on “emotional state, gender, age or accent”.
While there is some euphoria about the accuracy of speech-based voice analysis, a certain degree of caution is warranted. There are a number of potential pitfalls, ranging from misdiagnosis to the invasion of personal and medical privacy.
So far, potential biomarkers have only been identified in research studies involving a limited number of patient groups.
Are the “hits” a typical consequence of the disease identified? Or are these biomarkers rather related to the differences between test groups, such as surroundings, level of education, other medical conditions, mood or simply fatigue? All of these factors are variables that can interfere with the evaluation.
Regardless of how well they work, speech analytics systems that provide personal health information are a delicate matter. They belong exclusively in the hands of medical professionals for the purpose of healing or preventing people from getting sick. The technology must be impervious to criminal activity. Voices can already be synthesised so well that they can hardly be distinguished from the original human voice. Deepfakes – in the negative sense – pose an enormous security risk.
By combining safety and simplicity, however, voice detectors may one day also offer added value as assistance systems – for example a voice analytics system installed in a car that could detect – and report – a pending shock or heart attack before it happened.
We at ERGO and in the voice team will continue to engage with the technologies – and eagerly await further developments in medical voice AI.
How biomarkers are revolutionising disease research
Test without a test? Artificial intelligence identifies COVID-19 based on cough