share_log

Xiaomi Open-Sources the First Native End-to-End Speech Large Model

Breakings ·  Sep 19 09:15

On September 19, Xiaomi officially open-sourced its first native end-to-end speech model — Xiaomi-MiMo-Audio. Based on an innovative pre-training architecture and hundreds of millions of hours of training data, it achieves few-shot generalization based on ICL in the field of speech for the first time and observes significant 'emergence' behavior during pre-training. According to reports, MiMo-Audio surpasses other open-source models with similar parameter counts in multiple standard evaluation benchmarks, including general speech understanding and dialogue, achieving the best performance in the 7B parameter category. On the standard test set of the audio understanding benchmark MMAU, MiMo-Audio outperforms Google’s closed-source speech model Gemini-2.5-Flash. In the Big Bench Audio S2T task, a benchmark targeting complex audio reasoning, MiMo-Audio also surpasses OpenAI’s closed-source speech model GPT-4o-Audio-Preview.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment