Today, nearly 60 per cent of internet users upload and share video online, while almost 80 per cent of all digital video viewers consume this content via smartphones. Two factors inform our experience of this media: what we see, and what we hear.
We are all used to having devices capable of capturing video in high-definition. But when it comes to the accompanying audio, there is often much still to be desired.
While device manufacturers have been slow to address the chasm between the two, emerging audio technologies are beginning to redress the balance. These are now providing entirely new possibilities when creating and sharing content.
Content and storytelling in the smartphone era
- It’s time for a sound revolution (opens in new tab)
We used to say content is king. Now, it is our own stories that rule. Storytelling is a central component of today’s consumer devices. It has enabled citizen journalists to set the news agenda, vloggers to inspire people around the world and for anyone to share memories in high definition.
This shift has been made possible because of smartphone innovation.
Just two decades ago, people would have to make significant investments in equipment to be able to capture compelling, high-quality content. Today, we simply use the device in our hands. The ability to create stories that move people has been truly democratised.
Yet the high-definition content available to all has further highlighted the disparity between the visual and the aural. This is an integral part of any content experience. To address this, device makers need to also offer the right audio tools to users.
The role of immersive audio in smartphones
Truly immersive content cannot happen without spatial audio.
Until recently, the only way to upgrade audio quality was to buy more expensive audio equipment – either external microphones to capture better sound or headphones to improve playback.
This creates an over-reliance on external devices to deliver good quality sound, meaning we must have the right equipment to hand, such as external microphones when capturing our content. But this simply won’t do when we need to capture spur-of-the-moment memories.
Immersive audio means a variety of applications – but at its core it is about democratising professional audio capture for everyone.
We can create spatial audio through just two mics in a device. This allows users to capture the exact sound environment – truly immersive audio that matches what can be achieved with video.
This is achieved by smart audio algorithms. No hardware required. These algorithms are fine-tuned to a device, ensuring optimal audio capture without compromising the existing device form factor. This approach allows us to create even more immersive capabilities.
Artificial intelligence is already used throughout our devices and is prevalent in video capture. By integrating audio algorithms with these AI engines, we unlock abilities such as being able to focus audio on specific objects or subjects and even zoom audio in line with the video. This approach can also make common nuisances a thing of the past – such as windy conditions ruining great content.
- Microsoft adds audio transcription service to OneDrive (opens in new tab)
Beyond user-generated content: entertainment and gaming
From entertainment to gaming – mobile is now the primary way of consuming professional media. So, solving the everyday challenges of device audio is more than just about content creation, it’s also about content consumption.
Think of playing back professional content through a smartphone’s integrated speakers. Often, this is quieter than we would like, and tends to be thin-sounding. There are a good many use-cases for smart audio technology to be applied here – such as adaptive stereo widening.
Using intelligent processing, stereo device audio can be transformed by broadening the sound field to allow sounds to spring to life. Further capabilities such as being able to optimise speaker audio for the environment and creating virtual surround can be unlocked for even more robust audio experiences.
New audio dimensions in our daily lives
So how can these technologies work in the real world?
Imagine taking a video of a beautiful beach on holiday – you spin the camera in 360 degrees to record the full environment. Normally, the audio would stay static, even as you move – this creates a distortion between seeing and hearing.
With spatial audio, the sounds move around as they would in real life: starting by pointing the camera at the sea, you can hear the waves lap in front of you. As the camera moves, so does the sound – recreating a completely natural experience. Rather than distorting the audio, the wind is absent – you hear the crashing of the waves, nothing else.
Now, say you want to record an artist performing in a crowded venue. Usually, all the sounds around you are competing. AI integration means you can focus just on the stage – even if the camera pans away. The crowd melts into the background. Should you zoom in on the artist, the audio will also zoom in line, meaning you can get closer to the action visually and sonically.
- Taming the audio and video 800-pound gorilla (opens in new tab)
The future is immersive
Smartphone device makers have spent the past decade revolutionising how we create and consume content – leading to great innovative leaps in camera quality. Audio is the last piece of the puzzle.
But what might the future of audio innovation hold?
Trends that exist today are likely to gather pace in the coming years. For smart consumer devices, the potential of the audio technologies mentioned above will enable increased immersion in the fields of virtual and augmented reality – exciting frontiers in the audio-visual space.
The key, overarching trend is the improvement of the user experience. Whether professional or consumer, increasing the content capture capabilities of smart devices will continue to unlock new possibilities, enriching our experience of digital media.
Paramita Bhattacharya, Global Head of Marketing, Nokia Technologies (opens in new tab)