How to use Microsoft Speech Recognition Engine
Introducing Microsoft Speech Recognition Engine
A speech recognition infrastructure has been implemented with the 3rd version
of Microsoft .NET Framework.
Applications use System.Speech.Recognition namespace to access and extend this
basic speech recognition technology, by defining algorithms for identifying and
acting on specific phrases or word patterns, and by managing the run time
behavior of this speech infrastructure.
With this infrastructure, you can enable your applications to work with voice
commands with a few lines of code. You can find a simple example application
below, for further information about the speech recognition infrastructure,
please visit
System.Speech.Recognition Namespace page. In this article, i’m going to
demostrate how the speech recognition engine can be used in applications instead
of dealing with the theory of speech recognition.
Using System.Speech
We’re going to develop a sample application that will support 3 voice commands:
- What time is it: Writes the current date and time to console.
- Launch Visual Studio: Launches Microsoft Visual Studio.
- Shut Down Recognizer: Quits speech recognization application.
Our application will have 4 methods to
establish the speech recognition support.
- IsFrameworkInstalled: For we need at least .NET
Framework 3.0 for speech recognition, we need to check the installed
framework versions with IsFramework3Installed function. - InitSpeechEngine: If the .NET Framework 3.0 or higher is
installed, we’re going to create a SpeechRecognitionEngine instance with
InitSpeechEngine function. - LoadGrammars: After initialization of the
SpeechRecognitionEngine, we’re going to define and load the grammars to
SpeechRecognitionEngine. - StartRecognition: This method prepare the events that will be
handled for speech recognition and will start a thread which will handle
asynchronous speech recognition.
We’re going to work with the 2
events of speech recognition engine object.
- SpeechRecognized:
SpeechRecognized events are generated when one or more speech recognition
hypothesis scores is high enough for a recognition engine to accept it.
Detailed information about a recognized phrase available through the Result
property. - RecognizeCompleted:
RecognizeCompletedEventArgs derives from [System.EventArgs] and is generated
upon completion of recognition operations initiated by calls to the
RecognizeAsync overloads on an instance of SpeechRecognitionEngine.
IsFramework3Installed()
Our first method is IsFramework3Installed. I wrote the
IsFrameworkVersionInstalled method which queries the registry for the specified
.NET Framework version. The IsFramework3Installed method uses the
IsFrameworkVersionInstalled method for the .NET Framework 3.0 requirement.
|
|
InitSpeechEngine()
The InitSpeechEngine method checks the .NET Framework requirement and
initializes a new instance of SpeechRecognitionEngine then calls the
LoadGrammars methods which loads our dictionary to the SpeechRecognitionEngine.
|
//load the grammars //set the result
|
LoadGrammars()
SpeechRecognitionEngine requires a dictionary of supported voice commands.
In our sample, we’re going to have 3 voice commands in our dictionary.
|
//load the grammar object |
StartRecognition()
StartRecognition method initializes the SpeechRecognized and
RecognizeCompleted events of SpeechRecognitionEngine and starts a new thread
which will handle the speech recognition.
|
|
SpeechRecognized Event
SpeechRecognized events are generated when one or more speech recognition
hypothesis scores is high enough for a recognition engine to accept it. Detailed
information about a recognized phrase available through the Result property.
|
|
recognizer_RecognizeCompleted
Event
We continue to recognize other voice command after the recognition of
current voice command completes.
|
|
Speech recognition might be a great feature but it can easily turn into a
nightmare for us. For instance, widening the dictionary, low-quality sound cards
and microphones, loud background noise and homonyms will decrease the accuracy
of speech recognition. Also speech recognition may cause some performance issues
for it is not a simple task for cpu’s.
Please note that what we do here is not speech understanding. We’re just
converting an analog input to digital format and querying a database for that
digital sample. Speech understanding is imagine for all researchers but i guess
we need more time, cpu power and research to reach that point.




