Voice Commands for Windows Phone 8.1

Introduction

The goal of this challenge is to discover how to allow users to deep launch your app using Voice Commands. Your app will use your users’ voice inputs to find tagged photographs on their device, display a specific set requested by the user, and allow your users to use Voice to take more photos—including tagged ones. This Quick Start will also introduce you to the powerful new PhraseTopic element, which allows the app to take advantage of the Voice Recognition cloud service to relay the recognized utterance.

Here are the three steps you will go through:
1. Create a Voice Command Definition file.
2. Update a simple Windows Phone app to take advantage of Voice Commands
3. Update the app to allow more natural speech by using some of the Voice Command features.

Requirements
- PC running Windows 8.1
- Visual Studio 2013
- Windows Phone 8.1 SDKTry out the final App – VoiceSnap.

Start the solution “VoiceSnap” in Visual Studio 2013. Be sure this is the final “VoiceSnap” solution (not VoiceSnap Starter, which we’ll use shortly). Press ‘go’ in the debugger to deploy the solution to your Windows Phone and start the app. Once the application has deployed and opened, close it and press and hold the search button on the right hand side.
You’ll hear an audible “earcon.” Say “VoiceSnap, show my pictures.” The phone will launch into the app and navigate to the photo viewing page. To get a subset of your pictures press and hold the search button and say, “VoiceSnap, show me pictures of Madrid.” You can also say “VoiceSnap, take a picture.” Now a different page will launch and take a snapshot of what you’re aiming the device at.

Now that you’ve seen it work, let’s build it.
There are only a few short steps to enabling an app launch with Voice Commands.
1. Open the VoiceSnap Starter solution. This is the same photo app but without the Voice Commands added.
2. Create a Voice Command Definition XML file (aka a “VCD”).
3. Add the code to link the VCD to the app.
4. Add the code to use data from the voice commands to enrich your app.

It should take about 25 minutes to run through this Quick Start.

The files included in this QuickStart.
1. VoiceSnap solution. This is an example of how the app might look when you are done.
2. VoiceSnap Starter app solution. The solution you’ll use to do this QuickStart. It has a functional solution but no Voice Commands code or VCD.
3. An assets folder that contains:
a. VCD_81.XML
b. 10 tagged image files
Creating the project
Start out by opening the VoiceSnap Starter project.
This starter project contains a MainPage that takes photos, and a second page that will show pictures. The app will compile and run at this stage.
Go ahead and give it a try. Run or debug from Visual Studio (Ctrl+F5 or F5) to see the app; deploy on an emulator or hook up a phone and choose device in the toolbar to run it on a WindowsPhone 8.1 device.
You may have to enter a valid live id to get a developer license for Windows 8.1.

Note that you won’t be able to use Voice Commands yet as you were with the final project—we still need to add those!
Getting your app together
Add the capability
Within the package manifest (Package.appxmanifest), add IDCAPMICROPHONE, which is needed to use the Voice Commands feature.
You’ll find Package.appxmanifest in Solution Explorer.



Define what actions your app will respond to
Add the voice command definition file
Right click the project name, select Add  Existing Item; choose the VCD_81.xml from the ‘Assets’ folder in the VoiceCommands Quick Start Folder.




What is all this stuff?

<CommandPrefix>
The CommandPrefix specifies an app name for the user to speak. In our example, the app name is “VoiceSnap”

<Example>
A sample of what your app can do, as shown to your users in the system-wide help page. One example from VoiceSnap is “Show me a picture.”
Within a <Command> element, the example is for that command, and users can find it by drilling down into your app from the same system-wide help page.

<Command>
Defines a request that your VoiceCommand will act on. We’ll cover the child elements shortly.

<ListenFor>
ListenFor is what your user can say to launch their action.
The ‘ViewPicture’ command has a <ListenFor> Show me a {PhotoWord} </ListenFor>
 Words in brackets are optional
 Words in { } are part of a class of words grouped in a PhraseList or a PhraseTopic.

<PhraseList>
PhraseList defines a set of words of phrases that are alternatives or mean the same thing; here, we provide several words to refer to a “photo.” These must be defined ahead of time or updated programmatically from within your application.

<PhraseTopic>
We’ll add one of these soon! PhraseTopic defines a ListenFor resource that will recognize against a large, open-ended set of words. The “Scenario” attribute and “Subject” child element provide hints to the topic to help constrain the vocabulary and provide better accuracy and formatting for the desired usage. Unlike PhraseList, you don’t have to define what your user can say ahead of time.
Hook your Voice Commands up to the app
In the App.xaml.cs file’s OnLaunched() method, underneath Window.Current.Activate(), add the helper function that loads the VCD and registers it.
// Ensure the current window is active
Window.Current.Activate();

// This is a great place to install Voice Commands!
AppUtil.LoadVoiceCommandDefinition();

Now you can try it!
Run your app (Control +F5) to try it out. The “VoiceSnap, show my pictures” command should now work!



Optional: let users speak naturally with the help of <PhraseList>
How many synonyms can you think of for the word ‘photograph?’ Add each one to the PhraseList:
<PhraseList Label="PhotoWord">
<Item> picture </Item>
<Item> picture </Item>
<Item> photo </Item>
<Item> photos </Item>
</PhraseList>

Update your project files to take advantage of the Voice Command information
<Navigate> into your app
With the Voice Commands installed, you can now launch your application—but the application isn’t doing anything to retrieve the information that it’s launched with.
The <Navigate> tag is required to link the command to the page it will link to. Navigate can be used with the Target attribute. Target is optional but can be a handy way to help your app figure out where a Command should navigate to.
Let’s add the navigation code that receives and processes the activation information for Voice Commands.
To add the navigation code, copy the following into our App.xaml.cs file after the OnLaunched function.
/// This function is called when an app is Voice activated
protected override void OnActivated(IActivatedEventArgs args)
{
CommonInitialize();

Frame rootFrame = Window.Current.Content as Frame;

// Decide which page to navigate to
var vcArgs = args as VoiceCommandActivatedEventArgs;
if (vcArgs != null)
{
IReadOnlyList<string> vcdTargetList;

// For the voice command that is currently activating the application, try to navigate to the corresponding target page within the application.
// This is the navigation target element specified for that voice command in VCD file.
if (vcArgs.Result.SemanticInterpretation.Properties.TryGetValue("NavigationTarget", out vcdTargetList))
{
// For simplicity, the navigation target specified in the VCD corresponds to a page type. Convert that page type
// into a namespace-qualified type here.
Type navType = Type.GetType(typeof(App).Namespace + "." + vcdTargetList.First());

Debug.Assert(navType != null, "Specified Navigation Target is invalid");
Debug.Assert(rootFrame.Navigate(navType, args), "Failed to create initial page");
}
}

// Ensure the current window is active
Window.Current.Activate();
}

If you try your app again, it’ll seem to behave the same way—but if you put a breakpoint in the new OnActivated function and run under the debugger (F5), you’ll find that you’re now retrieving the information from the Voice Command. We’ll use that shortly.
Adding PhraseTopic to ViewPicture
Currently, our one Voice Command doesn’t do much interesting—it just navigates to our main page like we launched the application. Let’s use the power of the new <PhraseTopic> element to enable commands like “VoiceSnap, show me pictures of Madrid.”

Add the PhraseTopic to VCD_81.xml
Below all of your <PhraseList> elements, define a new <PhraseTopic> element called “PhotoTag:”
<PhraseTopic Label="PhotoTag"></PhraseTopic>
You can also provide a “Scenario” attribute and zero or more “Subject” child elements to the <PhraseTopic>. These are used to make the recognition more accurate by providing context about what’s being listened for. Let’s add the “Search” scenario to our <PhraseTopic>:
<PhraseTopic Label="PhotoTag" Scenario="Search"/>

Use the new PhraseTopic in ViewPicture
Don’t try it yet! Now we need to use the new PhraseTopic we’ve added in one of our <Command> elements. Add a new <ListenFor> to ViewPicture for this:
<ListenFor> show me a {PhotoWord} of {PhotoTag} </ListenFor>
Feel free to save, compile, and re-run your app; everything should still work, but we still need to make this new information do something.

Look for the PhraseTopic in your app
The command is ready to go, but the app needs to find the PhraseTopic information and do something with it. Add the following block to the OnNavigatedTo() method in ShowMe.xaml.cs:
// Consume Voice command arguments which were passed down to this page on activation.
VoiceCommandActivatedEventArgs vcArgs = e.Parameter as VoiceCommandActivatedEventArgs;
if (vcArgs != null)
{
IReadOnlyList<string> recognizedVoiceCommandPhrases;
if (vcArgs.Result.SemanticInterpretation.Properties.TryGetValue("PhotoTag", out recognizedVoiceCommandPhrases))
{
searchBox.Text = recognizedVoiceCommandPhrases.First();
}
}

Now save, compile, and run again—this time, “VoiceSnap, show pictures of Madrid” should populate the search box and only show you pictures of Madrid!
Add another command your app will respond to
Now we can use VoiceSnap to show tagged pictures of something. Let’s make it take tagged pictures of something, too!
In VCD_81.xml, add a new Command called “VoiceSnap”
<Command Name="VoiceSnap"></Command>
Give an <Example>
<Command Name="VoiceSnap">
<Example> take a photo </Example>
</Command>
Add a new <ListenFor>
<Command Name="VoiceSnap">
<Example> take a photo </Example>
<ListenFor> take a {PhotoWord} </ListenFor>
</Command>
Add <Feedback>
<Command Name="VoiceSnap">
<Example> take a photo </Example>
<ListenFor> take a {PhotoWord} </ListenFor>
<Feedback> Let’s take a picture! </Feedback>
</Command>
Take your users where you want with <Navigate>
<Command Name="VoiceSnap">
<Example> take a photo </Example>
<ListenFor> take a {PhotoWord} </ListenFor>
<Feedback> Say 'cheese!' </Feedback>
<Navigate Target="MainPage"/>
</Command>
Combined with the navigation code we added, earlier, this new command will send users to the snapshot page of our app once we add the code to navigate.
The app will compile and run now, and you should be able to say “VoiceSnap, take a picture.” You can’t yet add tags, though. You can guess from ViewPicture how we’ll accomplish that—by using our <PhraseTopic>!
Add one more <ListenFor>
<Command Name="VoiceSnap">
<Example> take a photo </Example>
<ListenFor> take a {PhotoWord} </ListenFor>
<ListenFor> take a {PhotoWord} of {PhotoTag} </ListenFor>
<Feedback> Say 'cheese!' </Feedback>
<Navigate Target="MainPage"/>
</Command>
MainPage.xaml.cs has already been augmented with the same kind of logic you added to ShowMe.xaml.cs – if you compile and run, then try “VoiceSnap, take a picture of something cool,” you should now be able to take your own tagged pictures—then search for them with the previous command!


Finish
And you are done. You now have a Windows Phone 8.1 app that launches via your customer’s voice and can either retrieve tagged photographs or take tagged photographs.

Extra Credit
• Add more ways for your users to say the commands
• Try to figure out how to add multiple tags!
• Use the Speech Recognition APIs to allow the user to say “cheese” in the app instead of hitting the “take picture” button!

Index
Voice Commands Definition Key
Command Prefix--> What the user calls your app.
Example--> What is shown to the user when they ask for help.
ListenFor--> What commands does your app respond to.
PhraseList--> What options will your app recognize.
PhraseTopic--> A large-vocabulary “slot” that will be “filled in”
Feedback--> Acknowledge what the app heard
Navigate Target--> What deep link will this command activate?
[square brackets] Optional words
{curly braces} PhaseList or PhraseTopic Items

Last edited Apr 19, 2014 at 1:14 AM by monicasouth, version 1