NewsBlast: Please give us some
background on your work in the court
reporting business.
Linda Drake: I've been a court
reporter since 1982, when I obtained
certification in Georgia. I now co-own a freelance court reporting firm
in Savannah, Ga. We report and transcribe depositions, hearings, grand
jury proceedings, municipal and superior court proceedings, and various
public hearings for governmental entities such as the Department of
Transportation, Department of Natural Resources and the Small Business
Administration.
I became nationally certified by the National Verbatim Reporters
Association (NVRA) in 1994. That means that I have dictated and
transcribed, with at least 97 percent accuracy, three five-minute
testings: a 200 word per minute literary selection, a 225 word per
minute jury charge, and a 250 word per minute two-voice question and
answer. Having served on the Board of Directors of NVRA since 1999, I
became president of the association in August of this year.
NVRA is a nonprofit, professional membership organization representing
voice writing verbatim reporters. Members include official court
reporters, deposition reporters, broadcast captioners, and providers of
realtime communication services for the hearing-impaired. Voice writing
verbatim reporters make realtime records of spoken words and actions
using speech recognition and other related technologies. Additional
information about NVRA and voice writer certification can be obtained
by calling (601) 582-4345 or visiting the NVRA Web site at
www.nvra.org.
NB: How long has speech
technology been used in court reporting?
LD: I first became aware of the
possible use of a speech recognition
engine (SRE) with my profession in the mid-90s and purchased my system
in 1997. At that time, my computer didn’t have a fast CPU, yet I was
able to see words appear on the screen as I reported depositions. It
immediately became an invaluable tool for my occupation as it reduced,
by a very good percentage, the total volume of typing and editing
required to produce court and deposition transcripts.
NB: How do the accuracy rates
with speech technology compare to other
types of recording?
LD: Voice writers have always
enjoyed higher accuracy rates compared to
their stenotype- and pen-based cousins in our field, based upon pure
physiology. We must first understand that court reporting is a very
high-volume and high-throughput task where delay between identification
of sound waves' meaning and the production of their English language
equivalents must remain as small as possible. The route taken by an
attorney's cross-examination goes from his or her mouth, to my ear,
through my brain, then to my "inner" voice. This form of repetition is
naturally effortless; it's what we all do in our daily conversation. So
the most natural extension of this process is to psychologically switch
the repetition mechanism from "inner voice" to "spoken voice."
Therefore, we minimize the introduction of cognitive overhead in our
task of routing the spoken word to its permanent destination as printed
English. This streamlined process means that we can achieve greater
than 98 percent accuracy at speeds as high as 350 words per minute,
sustained for five minutes. The other forms of reporting add an
additional mental and physical layer pertaining to the correct
representation, placement and order of material printed by hand, which
requires yet another post-production layer of translation to English.
The example above, five minutes at 350 words per minute, which is
NVRA's annual National Speed Champion test, obviously illustrates
non-speech recognition engine production. Court reporters have their
own definition of "realtime," which simply means that reporters'
production of English is simultaneously transmitted to the reporter's
computer screen, the judge's bench and the attorneys' tables. In this
mode, using ScanSoft's Dragon Naturally Speaking or IBM's ViaVoice, a
voice writer produces English text scrolling on screens throughout the
courtroom at sustained speeds varying between 180 and 200 words per
minute, with at least 96 percent accuracy. This defines the requirement
for our Realtime Verbatim Reporter (RVR) certification.
NB: Has speech recognition
improved your performance, and how?
LD: My dictation style has
become much more clearly enunciated and, by
incorporating more punctuation as I dictate rapidly, my accuracy has
improved. This is the case with all realtime court reporters I know.
The proofreading time for transcript production has been significantly
reduced, allowing me more time for additional court or deposition work,
which has positively impacted my business' bottom line. My production
volume has increased at least 50 percent since I started using an
SRE-equipped, computer-assisted transcription (CAT) program.
NB: How is acceptance with
court reporting professionals in regard to
speech technology?
LD: Court reporters have been
searching for a controlled means of
automating the process, going all the way back to the turn of the 20th
century. For years, NVRA's voice writers have known that voice
recognition's viability was a merely a matter of applying sufficient
computing muscle. Shorthand machine reporters have enjoyed
realtime-like automation for almost 20 years, and the majority of voice
reporters are eager to assume their role as state-of-the-art players.
Those who shun "new" technologies will always be with us, and our
profession has a few. But knowing that today's judges and attorneys,
who were yesteryear's Commodore 64 and SuperMario users, are
comfortable with technology, reminds us that our entire field is moving
forward in harmony. The minimum standard in the courtroom is becoming
realtime and, in the freelance world, a reporter daily hears, "How
quickly can we have that transcript?" Many experienced reporters are
interested in or have purchased speech recognition programs designed
for court reporters, and students are being trained to report using
realtime-oriented (for simultaneous display in the courtroom or
deposition suite) speech recognition at the outset. This generation of
court reporters will be full participants in the digital streamlining
of the judicial process.
NB: What are the barriers for
using speech recognition in this process
and what can be done to improve usage among court reporting
professionals?
LD: The greatest barrier I can
see is interestingly generational, in
that the youngsters are considerably more comfortable than their elders
where the immediate display of their words is shared among far-flung
video screens. Seeing one's words appear on screen in realtime can be a
fascinating and captivating experience for those new to the realtime,
"instant messaging" world. It can also add to the stress of a reporter
who desires perfection, yet knows that the trial or live television
captioning event takes place so quickly there's literally no time to
make corrections — and it's happening in front of a "live audience."
The ubiquity of voice-based and realtime consumer products and
services, such as Sony's Aibo, Sprint PCS phones, Honda/Acura's
VR-enabled systems, instant messaging and video, has already increased
younger reporters' comfort level with realtime, so we expect a natural
shift to VR usage. Our association is full of technology enthusiasts
and they are adopting VR at a very good pace, leading to the creation
of new educational programs across the 50 states. A vacuum has just
been created by the expected funding of the Telecommunications Act of
1996, which requires 75 percent of all new TV programming to be
captioned by 2006, and 100 percent by 2008. Captioners and court
reporters do exactly the same thing, which has led to a stampede to
fill this new market with realtime-based technologies. We know that
people will go where the money is, which has led us to begin
certification programs for this new area of reporting.
NB: Please provide your general
thoughts on the future of court
reporting and the role speech technology will play.
LD: Court reporting has been in
a state of flux throughout its
existence. We fully understand that multimodal biometrics will define
the new human-to-machine interface. In fact, we are living examples of
the commercial application of this evolution in today's (not "some time
in the near future, but today's) marketplace. We live in a "realtime"
world where instantaneous translation, e-transcripts, streaming text
and video, instant messaging technology are in constant demand, and
where untold numbers of new applications are now forming in the minds
of our students.
The emergence of multimodal communications confirms that human
interaction is carried over many distinct channels, or wavelengths.
Life-and-death situations, contentions over millions of dollars,
interpersonal disputes which spill over into litigation are matters
which were born of multiple-wavelength, human-to-human interaction, and
over which humans will try to convince other humans who was right and
who was wronged. The English language's complexities notwithstanding,
we know it will take many decades to reach Star Trek capabilities.
Humans will always be required to determine the meaning of what they
try to communicate, and they will always seek another human to mediate
and ferret out meaning. While speech recognition may be rapidly nearing
levels of accuracy amenable to general consumer acceptance, the legal
world demands perfect understanding of communications where real
capital is on the line. Any recording system which processes only one
aspect of human communication is insufficient to determine the true
meaning of what was communicated. Thus, we believe the judiciary will
always seek to place a competent human as the responsible guardian of a
true and accurate record of human communication.
Now that speech recognition is a reality and high accuracy rates can be
achieved, it rapidly being applied to meet the nationwide shortage of
professional court reporters and to train captioners and computer
access realtime translation (CART) providers for the hearing-impaired
population. It is generally accepted that demand exceeds supply. In
this regard, we see applications where an SRE solution may be deployed
in lieu of an absent human reporter. But in every case where
reporter-less recording is being done, state judiciaries still place a
human in the work process to certify the final record, ensuring that
some person is held responsible. We see speech technology eliminating
the national shortage of court reporters in well under 10 years. We
also see it as an enabling force for the rapid expansion of the
newly-created CART and captioning fields, manned by court reporters or
individuals certified to produce these services.
NB: What technology providers
are used for court reporting?
LD: There are three vendors who
have designed computer-aided
transcription systems around either ScanSoft's or IBM's SREs for the
court reporting profession: AudioScribe, StenoScribe, and
Voice-to-Text. Two vendors of stenotype machines have now incorporated
speech recognition in their software, Eclipse and ProCat. They are
anticipating the needs of non-realtime stenotype reporters who view
voice as their avenue to achieving realtime-level incomes. There are
also vendors who provide separate direct and Web-based streaming text
applications, and the standard cadre of providers who service
computer-based occupations.
NB: What can speech vendors do
to increase the use of speech recognition
among court reporting professionals?
LD: Provide speed and duration!
Court reporters sometimes dictate at
speeds which presently exceed the software and for long periods of time
— hours of depositions or hearings without a break (or lunch). Some
reporters say that the accuracy deteriorates over time. My experience
has been the converse. I find that my computer begins to become more
and more "compatible" as a long day progresses. But speed is definitely
an issue when you're trying to repeat someone's words very, very
rapidly with hardly a breath in between. Specializing the dictionary
for use by our profession would also be a change I'd recommend. Since
we repeat every word verbatim, we don't use abbreviations. ScanSoft, we
hope, is working on an option to disable the use of abbreviations and
retain the use of contractions and numbers.
We note that 64-bit CPUs are just starting to hit the streets, although
they're not quite running in full 64-bit mode. The delay is probably
Wintel-inspired, as Microsoft's 64-bit consumer operating system is not
yet available. However, 64-bit Linux is on the scene, and so we believe
speech systems can reap the benefits of this computing power before
other custom applications. IBM's Linux port does not get good reviews,
probably because they view speech applications as extensions of the
consumer electronics space. Speech is tailor-made for a 64-bit
environment, and we'd like to see it happen sooner than later. We're
engaged in Transcript-XML and, with Linux's huge lead in
internationalization and the overwhelming world trend, we believe it's
not unreasonable to expect serious Linux ports.