The Computer Columns

 

Computer, take a letter

    I've been talking to my computer for years.

     In fact, on the first day of owning my very first computer, I blasted that poor little 8086 with a barrage of invective that would make a South Seas sailor blush.

     I have to be more careful now. That's because my computer understands what I say and writes it down thanks to two programs I had the chance to try recently.

     There are quite a few voice recognition programs on the market today, but the top two seem to be IBM's ViaVoice Gold and Dragon System's Dragon NaturallySpeaking. Both claim to be able to take computer dictation using continous, natural speech.

     But before you even think about trying one of these programs, you'd better make sure your system is up to the job. For either you need Windows 95 or NT 4, a processor equal to at least a Pentium 150 with MMX or faster, 32 MB RAM (48 if you're running NT), 125 MB of spare hard drive space (60 MB for ViaVoice), a Creative Labs Sound Blaster 16 or 100 per cent compatable or an Mwave audio system, and a CD-ROM drive. Both packages come with a high-quality headset microphone.

     I had most of that. And $700 later, I had all of that. (Ok, so I got a few other cool things while I was at it, but I really, really needed them. Honest.)

     Both of these programs claim to be able to take dictation while a user speaks naturally. In earlier voice recognition programs (which can still be purchased at a lower price) you had to talk as if you just learned to speak  - each word separated by a distinct pause. Companies use the phrase "discrete speech" to identify this technology. A better phrase would be "bloody annoying".

     I've used  "discrete speech" products before and found them pretty well useless for day-to-day business use. That's why I looked forward to finding out just how the new continuous speech technology lived up to the praises of its makers. With images of myself pacing across my study, glibly dictating literary masterpieces to my faithful compu-secretary, I installed Dragon's NaturallySpeaking Personal Edition.

     Dragon Systems, privately held and headquartered in Newton, Mass., is a pioneer in the field, having produced the world's first commercially available large vocabulary general purpose dictation system for PCs in 1990. That's why I expected a lot of the program.

     NaturallySpeaking comes in three flavors - Personal, Preferred and Deluxe Editions. I tried the Personal Edition which is slated for the normal home user.

     Installation is smooth, including a little routine to activate the microphone and set the audio parameters. But when you get to the point of training the program to learn your particular voice pattern, be prepared to do a lot of reading - about 30 minutes worth at least. The program offers a selection of reading material on the screen, displaying a chosen author's work paragraph by paragraph as you read the lines into the microphone. You can pick from a variety of authors and articles, ranging from a chunk of Arthur C. Clarke's 3001 - The Final Odyssey to a speech by Mark Twain on stage fright. There are shorter selections, like an job request letter, but the more you read, the better that the program will be able to recognize what you are telling it.

     I chose 3001. I like Arthur C. Clark and I like to read, but reading aloud from a screen is probably best left to TV news anchors. At the end of the session, my voice was hoarse and I think I've pretty well had it with poor old Arthur C.

     You can go on and read all of the selections if you want, which will no doubt improve the performance of the program. You can also use any selection of text which you might have on your computer, drawing it into a Vocabulary Builder program, which scours the text for any words not included in its speech files. It then has you read those words back to it.

     I was satisfied, however, with Arthur. Besides, I wanted to get on with the program.

    NaturallySpeaking, when launched, opens up with it's own wordprocessor. You click on a little microphone icon at the top of the window and you're ready to dictate.

     "The rain in Spain falls mainly on the plane," I said, following program instructions which advised users to talk naturally, but distinctly and clearly, which for me, is rather unnatural. I ended the phrase with the command codes "PERIOD" and "NEW LINE" which instruct the program to put a dot at the end of the sentence and move to a new line.

     Sure enough, as I spoke, the words appeared on screen. "The rain in Spain fold mainly on the plane." Obviously, that chapter of 3001, although long, was still not long enough. But the program notes also claim that the program can learn from its mistakes, and so I uttered the command code "HIGHLIGHT folds" and the offending word was instantly highlighted. Again I said the word "falls", this time a bit more distinctly, and sure enough, it replaced "folds".

     Things were a little tougher with the word "plain", which came out "plane" every time I tried it. There is a way to train the program and force it to use "plain" instead of "plane" but I wondered what I would do when I actually wanted to use the word "plane".

     English is nasty that way. I tried the sentence, "I went to the two places I find too far away." It came back with "I went to the two places are far end to four way." Although I was surprised at the way the program picked up the difference between "to" and "two" by their context in the sentence, I was still less than impressed. I guess one might put the poor result down to my tendency to mumble, but my two main causes of mumbling, talking to the police and talking with my mouth full of potato chips, were not really factors in this case. I tried it again, doing my best to speak distinctly, and sure enough, the program picked it up perfectly and recognized the proper usage of to, two and too.

     Once you get the hang of the way the program thinks you should speak naturally, it seems to do a pretty good job.

     I decided to move on to IBM's ViaVoice Gold. The program installs much the same way, although IBM says that you can run ViaVoice right out of the box, without a training session.

    Nevertheless, I went through what IBM calls the "first enrolment", which involves -  you guessed it -  reading a literary selection. This time I picked a piece by Mark Twain (he seems to be pretty popular in the voice recognition crowd) about a meeting with the ghost of the Cardiff Giant.

     I don't know if it's because I like Twain better than Clarke, or because the selection was shorter, but the enrolment did not seem such an onerous task this time.

     ViaVoice offers a few niceties not provided by the NaturallySpeaking Personal Edition. While it too offers dictation in it's own wordprocessor-type program, which it calls the SpeakPad, it also allows you to dictate inside most popular word processors without having to cut and paste between applications. ViaVoice also offers voice navigation technology which means you can use voice commands to navigate around the desktop. To print a document, for example, you would just have to say the word "Print".

     The program also provides text to speech conversion. IBM's SpeakPad will recite text you have dictated back to you in a variety of voice types, and will even have an animated talking head displayed as it does it. It's an entertaining touch, but aside from handicapped users, probably not all that utilitarian for the general operator.

     Dragon Dictate's NaturallySpeaking Preferred Editon and Deluxe Edition also offers a text-to-speech function. They also allows similar editing and spoken menu commands inside its own word processor.  And the Deluxe edition allows direct dictation into just about any Window application. But then, at a suggested retail list price of $695 for the Deluxe edition, you pay for that priviledge.

     ViaVoice's dictation skills seemed to be equivalent to those of Naturally Speaking. The phrase "The rain in Spain falls mainly on the plain." came back as "The rain in Spain Falls mainly on the plane." I guess "plane" is a common problem, but I really couldn't understand why "Falls" was capitalized. But when I did the "I went to the two places I find too far away" phrase, NaturallySpeaking seemed to have no problem with the "to, two and too" combination, and reproduced it perfectly.

     Because of some of the errors made in my few short tests, I think that claims of 95 per cent accuracy are probably overblown. Perhaps with more training, both programs might approach that number, but even then, I find it difficult to believe. PC Labs, in more extensive tests for PC Magazine, gave Dragon Naturally Speaking a slight 1.3 per cent edge over ViaVoice, but neither program, in those tests, broke the 90 per cent accuracy figure.

     The cost of both programs is close. The retail list price for ViaVoice Gold is $149.  The Naturally Speaking Personal Edition is listed at $147.

     If I were to recommend a program, it would be ViaVoice. It seems to do just as good a job at dictation, and includes other functions like text-to-speech conversion, desktop navigation and most importantly, allows dictation right into a word processor rather than in a standalone window at about the same price as NaturallySpeaking Personal Edition, which does not.

     Personally speaking, however, I probably won't use either product. I find the keyboard more than adequate to get my typing done.

     And I'm quite expert at making my own spelling mistakes.

     Besides, there are still some things I say to my computer that are better left unwritten.