Digital technology is mature and cheap enough to go beyond InStore Radio. In text-to-speech, “robotic” voices are a memory of the past, and words recognition already matched human accuracy. Now whole sentence understanding is at good level, even if talking to a machine still makes many uncomfortable. These enormous and rapid advances are due to the Artificial Intelligence techniques that have evolved with Deep Neural Networks and specialized Hardware on the Cloud.

As art is a trailblazer of new trends, it is worth noting that at the Venice Biennale recently concluded, national participations such as Turkey and Lebanon were entirely sound-based, not to mention the sound experience that transformed the Virgin Garden at the Arsenal. Of course, we are not discussing about music.

In the Amazon age, buying in a brick-and-mortar store should necessarily be a pleasant and useful experience because, if the goal is limited to ease and convenience, nothing beats a click on an eCommerce. Sound is a component that can enrich the purchasing experience while being, among the five senses, one of the less demanding in cognitive load or, in other words, less competing for our ever-lowering attention.

 

Why Audio?

 

We have just anticipated it. While looking at a Digital Signage requires diverting attention from the product, a combination of voices and sounds that value the articles I’m passing by, does not even require stopping or looking in a new direction. If I’m touching a product, I do not have to interrupt.

A first difference with InStore Radio is therefore context. Digital audio can be specific for certain areas (more on this subject later) and even adapts to the customer gender, age, mood and dwelling time with anonymous facial recognition techniques, another AI technique. The content and tone to use with an elderly person are certainly different from those most suitable for a teenager.

Besides context, digital audio can also be interactive. If I need assistance, I do not have to search but just answer “yes” to a proposal. By using “chat bots” (robots that can support simple conversations), less “inhibited” customers can make specific product and service requests and get responses which usefulness is proportionate to the commitment and investment put in design and evolution over time of these new tools.

 

Why now?

 

The enormous acceleration of Artificial Intelligence techniques have found an important application area in spoken language. In the transcription of single words, deep neural networks techniques have equalled the accuracy of human beings. In addition, the understanding of whole sentences meaning is fundamental not only for “chat bots” but also for automatic languages translation, to reach the point of speaking in a language while the listener hear the translation directly in another.

The recent agreement between Google and Walmart confirms the trend. It is focused on vocal ecommerce, in order to counter the Amazon dominance with its Alexa-based Echo devices, designed as home voice assistants and, coincidentally, for ordering products just finished, without interrupting the ongoing activity. An experience similar to what we are accustomed by Apple’s Siri and Microsoft’s Cortana.

Another end-of-August agreement (between Amazon and Microsoft, which was rather striking because they are direct Cloud Computing competitors), is about interoperability between their respective Alexa and Cortana technologies. Microsoft has no devices to place massively into the homes, while Amazon could not access the business information resources managed by Outlook and other Office 365 applications. Both companies are hoping to challenge Apple and Google (via Android) dominance in smartphones, the vocal interaction tool for excellence.

 

What’s ahead?

 

It would be useful that physical context could be limited to a single customer and not disturbing neighbours. If the sound beam could follow the customer while moving around, it would be a further step forward.

This dynamic directionality is normal in receiving sound by means of “array” of 2 or more microphones that “focus” only one direction and eliminate disturbances from other directions, as happens in laptops and especially in audio devices such as Amazon’s Echo or Google Home Assistant.

Dynamic directionality in the inverse phase of sound emission, on the other hand, is far more difficult and costly to obtain. Speakers arrays are bulky, so often is used the audible interference between two side-by-side non-audible ultrasonic transducers. Further progress in this area that will reduce costs and dimensions should not be far away.

Today, the use of home audio devices mentioned above, perhaps aided by “sound reflectors” for some privacy safeguard, such as “bells” hanging from the ceiling, allows a good price performance ratio and openness to interactivity via “chat bot”.

The final consideration is that creating “memorable” experiences with “chat bots” is very complex, not only from the technical point of view, but especially from the creative one, without considering that many consumers are not yet comfortable conversing with machines. The non-interactive but contextual audio experiences are more accessible both on the technical and creative fronts, while being able to transform the purchasing experience. It can be the starting point for Intelligent Audio in Retail.

Via del Progresso 2/a
35010 Vigonza (PD)
P.IVA/C.F: 02110950264
REA 458897 Cap.soc. 50.000€

Software

© Copyright 2023 aKite srl – Privacy policy | Cookie policy