On Monday I visited Nuance for an update on the company's speech
recognition products and initiatives. Two years ago, my screencast
on Dragon NaturallySpeaking 8 demonstrated what was then the state of
the art in automatic dictation. Dragon has for years been asymptotically
approaching the point at which dictation becomes routine and
general-purpose. For most of us, it hasn't yet reached that point. I
didn't upgrade to the latest version 9 because, despite improvements,
I didn't think it would yet cross my threshold for routine
use. Nuance's demo of Dragon 9 confirmed that hunch.
Peter Mahoney, Nuance's marketing VP, showed me how he uses Dragon 9
for dictation. When he read a prepared statement, the results were
perfect. Then I handed him a copy of Newsweek and asked him to read
from a random article. The results were still very good. True, the
Arabic names in the story had to be spelled out. But that wouldn't be the
case if those names were common in your domain of discourse. And training
Dragon to absorb specialized vocabulary is both easy and effective.
The real problem, at least for me, lies elsewhere. And the test I gave
Peter yielded a stunning example of it. At one point he read:
...it's rarely so simple...
Dragon wrote:
...it's really so simple...
Because Dragon works so hard to produce plausible results, this class
of error resists casual proofreading. In this case, you would have to read
very carefully to notice that Dragon had reversed the intended meaning
of the sentence. For me, anyway, the cost of finding and fixing
these kinds of subtle errors outweighs the benefit of routine
dictation, at least when a keyboard is available.
Keyboards aren't always available, though, and that fact made the
second part of the demo a real eye-opener. Check out this 55-second
video of Peter dictating to his Treo:
In case you can't play this video, it shows two
examples of speech recognition. First Peter dictates a brief memo, and
uses his voice to change "LaGuardia" to "Logan". Then he speaks the
query "Eastern equine encephalitis" to Google and reviews the
results. Very cool!
How do you shoehorn Dragon onto a mobile gadget? You don't. There's
only a small client that relays recorded audio to a server and
receives recognized text. This kind of mobile dictation should be
available as a carrier-provided service, for the popular handheld
operating systems, sometime next year. I'll be curious to see who uses
it, and how.
In our follow-on discussion we talked about how Nuance's software is
being used in the automotive realm. Cars themselves offer a growing
range of voice-controllable functions: temperature,
navigation. Passengers' Bluetooth-equipped gadgets paired to cars'
audio I/O systems are another emerging domain for voice control.
What about those us who drive older cars and use older cellphones? I
think there's still all kinds of untapped opportunity. For example,
while driving I'd love to be able to speak questions like these and
hear the answers:
How many new emails from Jill in the last 4 hours?
What are the subject headers?
Can you read the message entitled "New panelist for your session"?
Given the kind of client/server architecture that Nuance has
developed, even my lowly LG VX4400 should be able to handle a protocol
like this. The magic would all be in cloud, where
the speech recognizer and my mail server would consummate a
service-oriented marriage.