How to use Microsoft Speech Recognition Engine

Introducing Microsoft Speech Recognition Engine
A speech recognition infrastructure has been implemented with the 3rd version
of Microsoft .NET Framework.
Applications use System.Speech.Recognition namespace to access and extend this
basic speech recognition technology, by defining algorithms for identifying and
acting on specific phrases or word patterns, and by managing the run time
behavior of this speech infrastructure. 

With this infrastructure, you can enable your applications to work with voice
commands with a few lines of code. You can find a simple example application
below, for further information about the speech recognition infrastructure,
please visit

System.Speech.Recognition Namespace
page. In this article, i’m going to
demostrate how the speech recognition engine can be used in applications instead
of dealing with the theory of speech recognition.

Using System.Speech
We’re going to develop a sample application that will support 3 voice commands:

  1. What time is it: Writes the current date and time to console.
  2. Launch Visual Studio: Launches Microsoft Visual Studio.
  3. Shut Down Recognizer: Quits speech recognization application.

Our application will have 4 methods to
establish the speech recognition support.

  1. IsFrameworkInstalled: For we need at least .NET
    Framework 3.0 for speech recognition, we need to check the installed
    framework versions with IsFramework3Installed function.
  2. InitSpeechEngine: If the .NET Framework 3.0 or higher is
    installed, we’re going to create a SpeechRecognitionEngine instance with
    InitSpeechEngine function.
  3. LoadGrammars: After initialization of the
    SpeechRecognitionEngine, we’re going to define and load the grammars to
    SpeechRecognitionEngine.
  4. StartRecognition: This method prepare the events that will be
    handled for speech recognition and will start a thread which will handle
    asynchronous speech recognition.

We’re going to work with the 2
events of speech recognition engine object.

  1. SpeechRecognized:
    SpeechRecognized events are generated when one or more speech recognition
    hypothesis scores is high enough for a recognition engine to accept it.
    Detailed information about a recognized phrase available through the Result
    property.
  2. RecognizeCompleted:
    RecognizeCompletedEventArgs derives from [System.EventArgs] and is generated
    upon completion of recognition operations initiated by calls to the
    RecognizeAsync overloads on an instance of SpeechRecognitionEngine.

IsFramework3Installed()
Our first method is IsFramework3Installed. I wrote the
IsFrameworkVersionInstalled method which queries the registry for the specified
.NET Framework version. The IsFramework3Installed method uses the
IsFrameworkVersionInstalled method for the .NET Framework 3.0 requirement.

       
///
<summary>
        ///
This routines detects if the specified .NET Framework version is
installed or not
        ///
</summary>
        ///
<param name="version">Version
Number
</param>
        ///
<returns>A
Boolean value that indicates if the specified .NET Framework version is
installed or not
</returns>
        private
Boolean IsFrameworkVersionInstalled(String
version)
        {
            Boolean result =
false;
            RegistryKey componentsKey
= Registry.LocalMachine.OpenSubKey(@"SOFTWARE\Microsoft\NET
Framework Setup\NDP"
); 

           
if (componentsKey !=
null)
            {
                string[] instComps =
componentsKey.GetSubKeyNames();
                for (int
i = 0; i < instComps.Length; i++)
                {
                    String current =
instComps[i];
                    if (!String.IsNullOrEmpty(current))
                    {
                        if (current.StartsWith(String.Format("v{0}",
version), StringComparison.OrdinalIgnoreCase))
                        {
                            result = true;
                            break;
                        }
                    }
                }
            } 

           
return result;
        } 

       
///
<summary>
        ///
This routines detects if the .NET Framework 3.0 or higher is installed
or not
        ///
</summary>
        ///
<returns>A
Boolean value that indicates if the .NET Framework 3.0 or higher is
installed or not
</returns>
        private
Boolean Framework3OrHigherInstalled()
        {
            return
IsFrameworkVersionInstalled("3.0");


        }

 


InitSpeechEngine()
The InitSpeechEngine method checks the .NET Framework requirement and
initializes a new instance of SpeechRecognitionEngine then calls the
LoadGrammars methods which loads our dictionary to the SpeechRecognitionEngine.

       
///
<summary>
        ///
This routine initializes the speech recognition engine and loads
grammars.
        ///
</summary>
        ///
<returns>A
Boolean value that indicates if the initialization is succesfull or not.
</returns>
        private
Boolean InitSpeechEngine()
        {
            //result object
            Boolean result =
false

           
try
            {
                //check if the framework 3.0
or higher is installed
                if
(Framework3Installed())
                {
                    //instantiate the speech
recognition engine
                    recognizer = new
SpeechRecognitionEngine();

                    //load the grammars
                    LoadGrammars();

                    //set the result
                    result = true;
                }
                else
                {//framework
requirement is not satisfied
                    MessageBox.Show(".NET
Framework 3.0 or higher is required to use Speech Recognition engine."
);
                    result = false;
                }
            }
            catch
            {
                //speech recognition could not
initialized
                MessageBox.Show("An
error occured while creating the speech recognition engine."
);
                result = false;
            }

           
return result;


        }


LoadGrammars()
SpeechRecognitionEngine requires a dictionary of supported voice commands.
In our sample, we’re going to have 3 voice commands in our dictionary.

       
///
<summary>
        ///
Load Grammars
        ///
</summary>
        private
void LoadGrammars()
        {
            try
            {
                if (recognizer !=
null)
                {
                    //The Choices object is a
list of alternative items to make up an element in a grammar.
                    Choices
choices = new
Choices(new[]
{ "Start Simulation",
"Stop Simulation",
"Shut Down" });

                
   //Create the grammar builder for grammar
initialization
                    //The grammar
builder object provides an easy-to-use mechanism for
                    //constructing
complicated Grammar objects from simple inputs
                    GrammarBuilder
grammarBuilder = new
GrammarBuilder(choices);


                    //Create the grammar
object.
                    //The grammar
object provides run time support for obtaining and managing
                    //Speech grammar
information.
                    Grammar
grammar = new
Grammar(grammarBuilder);

                    //load the grammar object
to recognizer object
                    recognizer.LoadGrammar(grammar);
                }
            }
            catch
            {}


        }


StartRecognition()
StartRecognition method initializes the SpeechRecognized and
RecognizeCompleted events of SpeechRecognitionEngine and starts a new thread
which will handle the speech recognition.

       
///
<summary>
        ///
Start speech recognition
        ///
</summary>
        private
void StartRecognition()
        {
            try
            {
                if (InitSpeechEngine())
                {
                    if (recognizer !=
null)
                    {
                        /*


                            We use 2 basic events for speech recognition


                            1. SpeechRecognized Event:
SpeechRecognitionRejected events are generated when no
                                                     speech recognition
hypothesis scores high enough for a
                                                    
recognizer to accept it.


                            2. RecognizeCompleted Event:
RecognizeCompletedEventArgs derives from
                                                     [System.EventArgs]
and is generated upon completion of
                                                     recognition
operations initiated by calls to the
                                                     RecognizeAsync
overloads on an instance of
                                                    
SpeechRecognitionEngine.


                        */


                        recognizer.SpeechRecognized +=
(recognizer_SpeechRecognized);
                        recognizer.RecognizeCompleted +=
(recognizer_RecognizeCompleted); 


                        Thread t1 = new
Thread(delegate()
                        {
                            recognizer.SetInputToDefaultAudioDevice();
                            recognizer.RecognizeAsync(RecognizeMode.Single);
                        }); 


                        t1.Start();
                    }
                }
            }
            catch
            {}


        }


SpeechRecognized
Event
SpeechRecognized events are generated when one or more speech recognition
hypothesis scores is high enough for a recognition engine to accept it. Detailed
information about a recognized phrase available through the Result property.

       
///
<summary>
        ///
Returns notification from the SpeechRecognized event.
        ///

        ///
SpeechRecognized events are generated when one or more
        ///
speech recognition hypothesis scores is high enough for
        ///
a recognition engine to accept it.
        ///

        ///
Detailed information about a recognized phrase available
        ///
through the Result property.
        ///
</summary>
        ///
<param name="sender"></param>
        ///
<param name="e"></param>
        private
void recognizer_SpeechRecognized(object
sender, SpeechRecognizedEventArgs e)
        {
            if (e.Result.Text ==
"What time is it")
            {//’what time is it’ recognized
                MessageBox.Show(DateTime.UtcNow.ToString());
            }
            else
if (e.Result.Text ==
"Launch Visual Studio")
            {//’Launch Visual Studio’
recognized
                System.Diagnostics.Process.Start("devenv");
            }
            else
if (e.Result.Text ==
"Shut Down")
            {//’shut down’ recognized
                Application.Exit();
            }


        }


recognizer_RecognizeCompleted

Event
We continue to recognize other voice command after the recognition of
current voice command completes.

       
///
<summary>
        ///
Returns data from the RecognizeCompleted event.
        ///

        ///
RecognizeCompletedEventArgs derives from [System.EventArgs]
        ///
and is generated upon completion of recognition operations
        ///
initiated by calls to the RecognizeAsync overloads on an
        ///
instance of SpeechRecognitionEngine.
        ///
</summary>
        ///
<param name="sender"></param>
        ///
<param name="e"></param>
        private
void recognizer_RecognizeCompleted(object
sender, RecognizeCompletedEventArgs
e)
        {
            recognizer.RecognizeAsync();


        }

Speech recognition might be a great feature but it can easily turn into a
nightmare for us. For instance, widening the dictionary, low-quality sound cards
and microphones, loud background noise and homonyms will decrease the accuracy
of speech recognition. Also speech recognition may cause some performance issues
for it is not a simple task for cpu’s.

Please note that what we do here is not speech understanding. We’re just
converting an analog input to digital format and querying a database for that
digital sample. Speech understanding is imagine for all researchers but i guess
we need more time, cpu power and research to reach that point.

Leave a Comment