Microsoft's unwillingness to mention the weakness: Win10 voice recognition

Tencent Digital (Wen Xin) According to the PCWorld website, Windows has a feature that Microsoft does not want to mention. Although Windows allows users to use the stylus to "write and draw", use Windows Hello to use the face logon system (or to secure Web security), or even order Xiao Na to set reminders, it clearly does not want users to use a feature. Yes, use their speech recognition engine to give orders to the system or let the user use speech input to edit the document.

The reason why Microsoft did not advocate the Windows speech recognition function can be traced back to 10 years ago when Microsoft product manager Shanen Boettcher blew up the speech input function of Windows Vista. Afterwards, the voice input technology of Windows has been quite "low-key". At present, few users know that Windows has voice input function.

If Windows has a chance to fight the voice input field, it seems that the time is right now - the progress of computers and artificial intelligence has provided a much better foundation for voice input.

When asked about the future of voice input technology in Office, Microsoft’s voice recognition research and Xiao Na and Bing’s executive vice president Harry Shum said, “This is a major issue. Voice input does not play The more important role is incomprehensible."

The reason why speech recognition is not perfect

Some users still think that the voice input is still the level of the Apple Newton PDA in the "Doonesbury" series of comics, and the user says "I am writing a test sentence" as "Siam fighting atomic sentry". Users can forgive this idea: Windows Speech Recognition still uses Microsoft Speech Recognizer 8.0 technology. Since Vista, this technology has basically not changed. Shum called it "Grandfather" level technology.

PCWorld said, but the hardware has undergone great changes: the ability to listen to and interpret voice requirements is far less than 10 years ago. The quality of the integrated microphone array in PC products such as the Surface Book means that high accuracy can be achieved without the use of dedicated microphone speech recognition. However, the development of voice input technology is already suitable for public use?

When using voice input software to enter articles with a length of 1028 words, a 95% accuracy rate means that the user must correct more than 15 errors. In the test, the accuracy rate of Windows speech input was 93.6%. Theoretically, this value is not high, lower than other special speech input software tested. Windows has a strange habit of inserting the word "comma" in the document when entering a comma. The voice input community seems to have different views on whether this relatively small error has a significant impact.

Of course, this is not all. People who have used voice input software know that the key to accuracy is training. As time passes, the voice input software will understand the user's accent. The “a” in “apricot” is pronounced the same as “bad” or “a” in “ape”, and how to filter the unconscious spastic language disorder. Microsoft employees once claimed that with proper training, the accuracy of Windows speech recognition technology can reach 99%. It's not too bad to see 10 errors in 1000 words.

Few users are willing to spend time training to use speech recognition software. Windows speech recognition software requires the user to train several sentences in 10 minutes, which will give the user a feeling of age. Xiaona and Siri do not require users to train, because they have been trained millions of speech samples.

Xiao Na (can be used on PCs and mobile phones) is far superior to the Windows speech input system in terms of speech recognition because it uses the computing power of the Microsoft cloud service. Microsoft will analyze the user's voice, so that the user's voice is associated with other data, to generate the intelligence as Xiaona's soul.

Microsoft values ​​speech recognition

Given Xiaona's outstanding performance, users will think that speech should become the center of the Microsoft Ignite conference last week. However, there were no conferences related to speech input during Ignite, and only one conference was related to speech recognition. Satya Nadella, Microsoft’s chief executive, called speech recognition a key element of Microsoft’s future in keynote speeches.

Take Skype Translator as an example. According to Nadella, Skype Translator relies on three aspects of research: speech recognition, speech synthesis, and machine translation. In the speech, Nadella claimed that Microsoft's speech recognition algorithm has a 6.9% word-error rate, which is a bad result: the accuracy rate is only 93.1%. PCWorld said that if Microsoft is really optimistic about office software, the future of voice recognition in PCs is not just to use Skype to book hotels in Bangladesh, but to write experiences, but through voice rather than fingers.

Source: PCWorld

.wxdigi { padding:20px 0; border-top: 1px dotted DBDBDB; } .wxdigi dt { display:inline; float:center; margin-right:17px; } .wxdigi dd strong { display:inline-block; padding-top :0px; color:414141; font-size:14px; } .wxdigi dd { font-size:14px; color:606060; }

Posted on