AI & Robotics

How LLMs can revolutionise voice assistants

By the end of 2023, as much as 70% of customer interactions is expected to be handled by Large Language Model-based chatbots. As the AI-driven hype gains momentum, it raises a poignant question: is the era of voice assistants like Siri or Alexa becoming a thing of the past, left behind in the race for AI innovation? It may just be exactly opposite – turns out, LLMs can help drive unprecedented development in popular smart assistants, taking their technology to the next level.

It’s been 12 years since Apple announced Siri – the voice-enabled “personal assistant” which was integrated into iPhone 4S – and 8 years since Amazon invited Echoes and Alexa into our homes. Today voice assistants are no longer a source of excitement – they exist in our phones or homes, ready to play our favourite song, turn on the lights, tell us what time it is, or what the weather will be like tomorrow afternoon. As technology has evolved, this once-revolutionary tech has begun to show its age, especially compared to the highly developed Large Language Models such as Chat GPT-4, BARD, or LlaMA.

Voice assistants now seem limited and fallible, requiring us to provide them with only simple commands that they can understand. While still very functional they often struggle with maintaining a coherent dialogue, something more advanced AI chatbots have already gotten us used to.

LLM to the rescue

The integration of LLMs into voice assistants may completely change the way we use them. The limiting command-and-control systems that Siri or Alexa are will soon give way to more advanced systems that understand our language better together with our emotions or context. The merging of chatbot and voice assistant technologies is already happening.

Recently, during the 10th edition of Connect, Meta announced the launch of new smart glasses created in cooperation with EssilorLuxottica (Ray Ban). Apart from features such as live recording and the integration with Meta apps, glasses have five microphones allowing for access to the newest Meta AI chatbot via voice control.

Amazon is also investing in new developments regarding their voice assistant technologies. Alexa now has its own LLM that is expected to revolutionise the product as we know it. The latest version of Amazon's voice assistant can understand conversational phrases and interpret context, making it capable of multiple requests from a single command. 

It's all about improvements

Lange Language Models will enhance voice assistants’ conversational abilities. This means you will no longer need to tell them exactly what to do and repeat your commands, each time adjusting it so that the assistant is able to understand it accurately. You will be able to say, for example, "I'm cold," and the assistant will change the temperature in your AC. It will be finally able to grasp the nuance of a conversation and respond in a more human-like manner.

We all appreciate chatbots such as Chat GPT for the fact that they remember the information previously given to it, so we do not have to start the conversation from scratch each time. With Large Language Models at the helm, voice assistants will become increasingly adept at personalizing interactions by learning from previous conversations. For example, a device powered by an LLM will be able to remember specific music preferences, suggest restaurants based on past choices, and even make tailored recommendations for movies or books.

The global nature of today's society demands voice assistants that can seamlessly switch between languages. Large Language Models have made remarkable progress in this area, enabling voice assistants to understand and respond in multiple languages with a level of fluency that was previously unachievable. This opens new avenues for cross-cultural communication and commerce, in addition to making voice assistants more accessible to a diverse user base.

“Voice assistants have come a long way in recent years, thanks to advancements in speech synthesis, speech recognition, natural language understanding, and recognition technologies," remarks Paweł Bulowski, Advanced Analytics Program Manager, AI & Data Division in ET&SM: "The use of LLMs in such solutions offers immense possibilities. Providers like Poly AI have already done it and use such products for their customers. At the same time, it's important to note that LLMs are only as good as the data they're trained on. For example, the first ChatGPT had issues because it was trained on outdated datasets. I think everyone remembers how frustrating this was."

Our ERGO colleague adds: "Solutions like RAG and Agents have been developed to address this issue. RAG allows LLMs to access a knowledge base, which enriches their responses with external data. Conversely, agents allow LLMs to interact with other applications, such as Python, for specific arithmetic calculations. It's like giving superpowers to your LLM solutions. Not only can they use databases, but they can also book a flight for you or make a payment. While these solutions can enhance the capabilities of LLM-based voice assistants, it's essential to be aware of the risks associated with LLMs. The OWASP 10 for LLMs framework is a valuable resource for understanding these risks and how to mitigate them. With the right approach, LLM-based voice assistants can offer incredible value and convenience to users but it's important to keep risk and security at the top of the list when designing such solutions.”

With the integration of Large Language Models, voice assistants are poised for a transformative evolution. Not only do LLMs promise to improve conversational capabilities, they also offer a dramatic reduction in customer service costs while ensuring the "human touch" experience that customers desire, Pawel adds: "This fusion of advanced AI chatbots and voice assistant technologies will redefine our interactions with these devices, making them more intuitive, responsive, and personalized. Instead of fading into the past, the voice assistant era seems to be coming back with a vengeance."

Passend dazu