Identified speaker & language in audio/video transcripts
Vocapia's VoxSigma Speech-to-Text software is a state-of-the-art technology for processing speech that provides high-quality large vocabulary continuous speech recognition in multiple languages for various audio data types. With the ability to transcribe large quantities of audio and video documents, including broadcast data, in real-time or batch mode, this software suite offers audio segmentation, partitioning, speaker identification, and language recognition features. The REST Speech-to-Text API is available as a web service and offers full speech transcription, audio indexing, and speech-text alignment capabilities over HTTPS. It also includes advanced language technologies like language identification and speaker diarization, which transform raw audio data into structured and searchable XML documents, making it easier for users to access content in video documents. The software is used for a variety of applications, including broadcast and telephone data mining, speech analytics, media monitoring, media asset management, speech transcription, subtitling, and more. It supports over 82 languages, and clients can create models for their desired language set.