share_log

海天瑞声(688787.SH):算法团队联合清华大学语音与音频技术实验室充分利用公司海量高质量语音数据集储备

Haitian Ruishi (688787.SH): The algorithm team, in collaboration with the Tsinghua University Speech and Audio Technology Laboratory, fully utilizes the company's massive high-quality voice data set reserves.

Gelonghui Finance ·  Sep 9 15:33

Gelonghui, September 9 | Haitian Ruisheng (688787.SH) said on the investor interactive platform that the company is a professional artificial intelligence training data service provider and is currently investing in construction around the AIGC/big model data service field. On the one hand, the company continues to increase R&D investment in the field of large model data, increasing data reserves in the field of large models. As of June 30, 2024, it has completed and continued to build large data sets in multiple fields, including the “Big Language Model Chinese Dialogue Pre-Training Data Set”, “Big Voice Model (Multi-language) Pre-training and Fine-Tuning Data Set”, “Visual Large Model (Image-Text) Pre-Training and Fine-Tuning Data Set”, and “Visual Large Model (Video-Text) Pre-Training and Fine-Tuning Data Set”. At the same time, in order to better understand the direction of big model technology, in the first half of 2024, the company explored large-scale production methods for large model data through forward-looking research to comprehensively help expand the company's big model business. The company and Tsinghua University have jointly launched a multilingual speech model research and development program. The project will develop its own multilingual data cleaning technology based on the latest speech big model framework technology, train multiple large speech models of different scales, and effectively improve the efficiency and accuracy of multilingual speech data processing. The company's algorithm team and the Tsinghua University Voice and Audio Technology Laboratory make full use of the company's massive high-quality speech data set reserves (over 200 languages/dialects, nearly 0.3 million hours of voice data sets with own intellectual property rights) to give full play to their respective strengths and promote the deeper application of large model technology in horizontal screen viewing in the field of data production.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment