xAI opens Grok STT and TTS audio APIs, with overall word error rate of STT reduced to 6.9%
ME News reports that xAI has launched two standalone audio APIs: Grok STT and Grok TTS, both originating from the same audio stack, supporting Grok Voice, Tesla in-car systems, and Starlink customer service, among others. STT offers REST batch transcription and WebSocket real-time streaming, with word-level timestamps, speaker separation, multi-channel support, and inverse text normalization, covering over 25 languages; TTS supports inline tags for emotion and prosody. They also announced WER comparisons, with Grok leading in multiple scenarios, but no third-party re-evaluation has been conducted yet. Pricing: STT batch processing $0.10 per hour, streaming $0.20 per hour; TTS $4.20 per million characters.