Essential for a good customer contact history is to store a summary of the conversation in the CRM system: this will contribute to a good follow-up. What was the contact reason, what was the essence of the conversation and which follow-up appointments were made? Spoken Summary will save a lot of valuable time if the advisor does not have to type a summary because it will be automatically written and stored in a CRM system with the use of speech technology and artificial intelligence.
At Telecats we use speech recognition to convert a spoken summary from the advisor into text. This results in an easily readable written summary which is automatically saved in the CRM system.
Frank Rademakers (Manager Customer contact & Support, PGGM): “”About two years ago, PGGM’s Customer Contact Centre came up with the idea of using speech technology to write a conversation summary in CRM. The objective was to reduce the follow-up time for the advisor and also to realize a more uniform way of recording calls.
It was a logical choice for PGGM to tackle this project with tech-partner Telecats, in co-operation with our own Innovation Department. Last year we set up a multidisciplinary team for this, in which all relevant roles were represented, with Telecats in the lead. By the turn of the year, we had the chain ‘technically working’ and there was only room for improvement in the language model. In terms of terminology and jargon (pensions), that was quite a challenge! We are now fully engaged in rolling out this new working method with a language model that has been specially trained and works well for PGGM.
I look back on an exciting and instructive innovation process, with occasional bumps and pitfalls, but above all with a good result. And the satisfaction was great when we saw the first comprehensible and easy-to-read conversation summaries in our CRM systems. Then it is pretty cool to be the launching customer of a whole new application and innovation in customer contact” .
How does it work?
As speech recognition begins to mature through increasing computing power, available training data, and smart AI algorithms, the next challenge is to understand. But is that possible? Are the current algorithms able to interpret this typically human trait (to understand what is meant, even if it is not said in so many words)? That is the big challenge that Telecats is working on in collaboration with knowledge institutions, such as the renowned University of Twente.
Speech technology has been greatly improved in recent years with the use of Deep-Neural-Networks and artificial intelligence. We see that speech recognition recognizes speech almost as well as humans at optimal conditions (WER* 5 to 10%). Optimal is then a newsreader in a studio (high audio quality) who reads text aloud. While customer contact is often a conversation between two people over the phone (lower audio quality 8kHz) and who think while they talk and therefore the pronounced sentences do not always have a logical structure. Nevertheless, we are also able to process these conversations well with speech recognition (with a WER* of 20 to 40%).
The Word Error Rate (WER*) is a common metric to measure speech recognition performance. WER counts the number of incorrect words identified during recognition, divides the sum by the total number of words specified in the human-labeled transcription, and then multiplies that quotient by 100 to calculate the error rate as a percentage.
Speech recognition typically makes three types of errors:
- Insertions: words that were incorrectly added in the transcription
- Deletions: words that are wrongly missing from the transcription
- Substitutions: words that have been replaced by other words in the transcription
Because computers are getting smarter every two years and we as humans can’t keep up with that pace, the question is increasingly being asked: are people becoming redundant? But our advisors are flexible, creative, have empathy and can therefore much better communicate humanly. We therefore think that artificial intelligence will not replace humans but will support them. The Spoken Summary solution is a very good example. It consists of three parts:
Call recording is a valuable tool within Contact Centers for improving processes and increasing customer satisfaction. It is used, among other things, for analysis, quality assurance, compliance, recording agreements and advisors coaching. Telecats’ call recorder records all conversations in stereo, with the advisor and customer each having their own recording channel. The recordings are first stored in high quality audio, resulting in better and more accurate processing of the recordings from speech-to-text. After processing, the recordings can still be compressed for storage.
Transcription (Speech-To-Text / STT)
In order to convert the recorded conversations from speech-to-text with good quality, language models and the right context are needed. Telecats speech recognition provides support for different languages and can use sector- or customer-specific language models. In addition, it is possible to train the speech recognition with manually worked-out recordings with specific language use by customers and advisors. Dialects and accents are included in the acoustic model of the speech recognizer. We have several “pronunciation dictionaries” to deal with different dialects and accents. If the spoken accents deviate too much from the statements we modelled, we can adjust the acoustic model accordingly. This ensures the quality of the language speech recognition and transcriptions generated by Telecats speech solutions.
The Word Error Rate for telephony audio varies between 20 and 40%, depending on the context and the sound quality. When you then try to use an algorithm to automatically summarise the completely transcribed conversation, the result will be less optimal.
At Telecats, we use speech recognition to convert the spoken summary into text and achieve a WER of less than 20%, resulting in a good readable summary.
The spoken summary
With the Spoken Summary solution, the advisor speaks the summary and only this part of the conversation is saved in the CRM system. The summary can be started with an API or verbal agreed trigger word. The success of this (dictated) spoken summary also depends on clear guidelines and proper training of advisors. But summarizing conversations is often already ingrained in the advisors’ way of working.