Microsoft announced on Monday a new private snapshot of the Custom Recognition Intelligence Service (CRIS), a highly customizable implement that has the ability to give applications Siri-like speech-to-text functionality. Also on Monday, Microsoft opened up public previews for two sets of application programming interfaces (APIs) that provide developers technology that can understand who is speaking in audio recordings and what appears in videos.
All this technology falls under Project Oxford, a project to give third-party developers access to Microsoft’s artificial intelligence which they have built up over the years. Google is also following the same path, for example with the launch of the Cloud Vision API.
An emotion detection tool was announced by Microsoft in Project Oxford last month, as well as the fact that the public beta for speaker recognition would be available by end of 2015. According to a blog post on Monday from Ryan Galgon, Microsoft’s technology and research senior program manager, that’s now available. The speech API’s can both identify and verify speakers, while the video APIs are able to stabilize video content, detect motion for stationary backgrounds and track faces.
In a high-level description for the CRIS, Microsoft stated that:
This tool makes it simpler for users to customize speech recognition in challenging environments, like a noisy public area. For instance, a company could utilize it to assist a team to better use speech recognition tools while working in a busy shopping centre or on a loud shop floor. Additionally, it could be used to assist an app to better understand those who have traditionally had problems with voice recognition, such as those with disabilities or non-native speakers.
Once developers sign up to use the service, Microsoft will ask them if they are accustomed to speech-to-text technologies such as SRILM, Kaldi, and HTK, or a simply users of personal digital assistance technologies from Apple, Google or, naturally, Microsoft itself.
As Galgon mentioned, the past couple of years saw huge strides in the performance of speaker recognition systems. Developers will now be able to take advantage of the technology in this sector that Microsoft has put together.