Facebook EmaiInACirclel

OtoSense Shows AI-Firms Why They Should be Investing in Sound Recognition Apps

Mickaël Hiver
Mickaël Hiver
Revenue Operations Manager

The power of smart devices has long-been hyped by developers and tech enthusiasts alike. And, those with sound recognition app capabilities are no exception.

But, as of late the adoption of this technology has a rapid rise, partially as a result of the Covid-19 crisis. Consumers who were once skeptical of the “tech revolution” are now finding themselves becoming more comfortable with and trusting of these kinds of devices. And with that, more companies are choosing to invest in them.

Take for example Apple, who just released a new sound recognition feature on iOS14, to help alert iPhone users to different sounds such as fire alarms, door knocks and even crying babies. This feature alone is expected to reach a $4.42 billion market value by 2024. In addition, the overall voice and speech recognition market is expected to rise at an annual growth rate of 17.2% from 2019 to 2025 to reach $26.8 billion by 2025.

So, what other kinds of fields are developing sound recognition apps and how are they doing it?

Sound recognition app for smart devices

Marketing & Advertising

With more actions being automated in the field of Marketing, the benefits of sound recognition technology come two-fold. Not only can sound sensitive smart devices help perform tasks like scheduling social media posts – they’re helping socially savvy brands stay ahead of the curve by allowing them to connect with audiences on a deeper level. Think hyper-targeted marketing.


Today, sound recognition tech geared towards the healthcare sector occupies approximately half of the market space in the business. Medical workers can use sound recognition technology not only as a diagnostic tool – but, to enhance patient engagement, autonomy and bolster record security  – making healthcare more accessible to all.

Machine Health & Preventive Maintenance

At the moment, several different machine tool builders are working on ways to provide employees hands-off alternatives to navigate and control their equipment through HMI’s (Human-Machine-Interfaces). In the field, sound recognition apps can be used to schedule preventive maintenance interventions on equipment to prevent/fix malfunctions and perform tasks remotely.

 Vehicle Autopilot & Safety

One company on the forefront of sound recognition technology in the automotive field is OtoSense. The Boston-based company leveraged deep learning to provide users with an actionable alert program to acquire, learn and interpret what is referred to as “uni-dimensional sound” – which includes vibrations as well as pressure, current or temperature changes in vehicles.

From car body manufacturers to HVAC assembly lines, major plane makers and even internal airport transportation systems – OtoSense’s sound recognition app is built to automatically flag subtle changes in vehicle sounds that can indicate potential malfunctions, sirens or other noises that suggest road danger.

From Sound to SIR

Based on a technology called SIR or Sound Intelligence Recognition, OtoSense’s solutions have several different applications unrelated to the Automotive industry. They’re also geared towards helping hearing and vision-impaired users navigate their environments more easily. So, how does the sound recognition app work?

Humans process sound in 4 different steps – and OtoSense’s solutions are built to work similarly in that they: acquire sounds, convert them, extract & interpret them. OtoSense used these same techniques and harnessed deep learning mechanisms to carry out this series of steps at scale and train a working model. Finally, the company worked alongside a dedicated outsourced team at Pentalog Moldova to integrate and make the software deployable on tablets, smartphones, electronic boards (Raspberry Pi) and other devices.

  • Sound Acquisition & Conversion

At OtoSense, team members worked to capture sound through sensors and amplifiers. The digitization process used a fixed sample rate between 250 Hz to 196 kHz and bits were then saved on buffers ranging from 128 to 4096 samples.

  • Extraction

The sounds were then extracted using fixed time boxes called “chunks”, ranging anywhere from 3 seconds to 23 minutes – depending on the sound or vibration being recorded.

  • Interpretation

After extraction, all sounds were then interpreted using visual, unsupervised sound mapping in a way that very closely replicates human cortical processing and organized in loose categories so that similar sounds were grouped together. Experts then worked to build a visual, semantic map outlining all the samples.


Pentalog x OtoSense

One of OtoSense’s main challenges was not so different than that of most companies with proprietary concerns. Armed with a working system, OtoSense now faced the task of converting an award-winning piece of software into a sound recognition app and offering the tool to those who need it most while maintaining the privacy of their data and models.  With that, the Pentalog x OtoSense collaboration was born.

A team of 5, including mobile developers and a Scrum Master to lead at Pentalog Moldova worked independently and remotely to improve the first OtoSense prototype device. Afterwards, the team spearheaded the integration of the OtoSense system into both the iOS and Android sound recognition apps as well as the smart phone connection to Pebble smart watches.

Deep Learning – The Path to a Solid Sound Recognition App

In order to exploit the full potential of any sound recognition app, makers need data – and a lot of it. This calls for creating a robust deep learning system and training neural networks at scale to “think” on their own. And, while deep learning mechanisms are whizzes at recognizing patterns, they can’t integrate themselves into the smart devices we use every day at our homes and offices.

Many business applications today are making moves to harness deep learning for their sound recognition apps, but much of it is currently still in development and hasn’t been released yet.  Since, many use cases are still in the baby stages of adoption, the earlier you get started the better. Companies looking to leverage deep learning should start working with developers to integrate their systems into consumer devices.

There is a big misconception that you need an in-house team to offer your clients the full scope of benefits and retain stakeholder privacy when it comes to deep learning when this is not at all the case. A.I experts can demand up to $300,000 a year in salary. Pentalog can help you get the expertise, plus the privacy – minus the expenses. You can focus on building your own network. We’ll take care of the integration.

Are you looking to build a sound recognition app? Contact us to get started.

You might like:


Leave a Reply

Your email address will not be published. Required fields are marked *