How to Transcribe Thirty-Six Hours of Audio Recording
From 2005 through 2007, I interviewed a dozen 80- to 90-year old
elders from the Coosawhatchie (coo-sa-HAT-chee) Community
Senior Center in Jasper County, South Carolina. The center
served the elders of the region through daily activities, devotion
and meals. My insider standing was established when I first met
with the larger group, since I am a descendent of the Gullah
Geechee culture from St. Helena Island, South Carolina. I told
them the story of their cultural links to West African ethnic
groups, coupled with the memories they hold as the
descendants of enslaved families and their knowledge of
growing rice on the plantations that have since become
communities on land now owned by their families.
Once the twelve elders gave me permission to sit with them in their homes, I collected everything I needed
for their interviews into a field kit devised from a high-end photography bag. I made pockets and sockets
from the Velcro spacers for a digital camera, film camera to shoot black-and-white film only, audio cassette
recorder, accessories and every power backup source possible. There were even slots for the journal I used
for note-taking as each interview progressed, along with copies of the release forms for participants to sign.
Note the use of cassette recorder, which failed during a couple of interviews because the battery died.
Sometimes an electrical outlet for a/c power was too far away – next time, remember a longer extension cord.
Other issues with the cassette recorder included the need to flip the cassette when one side ran out. I constantly
kept an eye on the tape counter in preparation for the turn -- distracting to the person being interviewed.
Switching to a digital recorder made the interview process much easier — until the digital components began
to crash. Digital corruption is not good. I ran two digital recorders for redundancy the last few interviews. No
single recording device worked 100 percent of the time.
Since I am a TV producer, people have asked to see the video of the interviews. There is no video, only audio
and still photographs. I always give the explanation in my jovial techno-style of storytelling. Usually the listener
has a blank stare, or kindly pretends to listen, until I get to the part of blowing up the house of an elder.
Shooting video requires a lot of electricity to create worthwhile quality image.
An elder's home might still have screw-in glass fuses, and the power surge from lighting gear and video
camera could cause a fire. Why bother? Besides, most people alter their talking style when they see a video
camera pointed at them. (‘Oh, I am on TV. I need to look like and act like one of those people.’) An audio
recorder with still photography is the method of choice. There are many documentaries that use motion
action on still photos with voice track.
After each interview, I had the film developed by a now-defunct photo shop that still used black-and-white
photo chemicals in the darkroom, and I filed all of the digital photos and paperwork according to an archiving
method I used for this project —research title, sub-files for each elder, internal files for audio, images, writing.
Once I started using the digital audio recorder, the audio interview was easy to download into the file for each
interview. Easy. Why not start transcribing approximately three hours of audio after each interview?
The rite of passage for anyone who would dare place a microphone in front of anyone's face is to type the
recordings into MS Word before you write. Should be done. I did not do it. I played the audio and got hooked
on the images. I snapped away with the digital camera to get the elder used to seeing a camera in my hand.
One hundred or so images per session gave me selections for the time spent with them. At some point I would
switch to black-and-white film -- I did not use a flash -- in order to feel when to take the image that would
become the signature photo for each story. Shooting film meant a lot more calculation for light and shutter
speed. I did end up selecting a digital image converted to black-and-white for a couple or so of the elders
instead of using an image shot with black-and-white film.
Two years of interviewing between 2005 and 2007 then led to full-time teaching, diving headlong into work
as a federal commissioner for the newly-formed Gullah Geechee Cultural Heritage Corridor. Then my left
thumb doubled in size.
De Quervain's Tenosynovitis Syndrome: Wait . . . What?!
I like using interrobang [‽] whenever given a chance. It is now
the right punctuation to express the combination of WTF and
"Are you crazy?" It seems the combination of writing a lot and
the advent of texting caused the tendons in my thumb and wrist
to inflame. I faced a setback. A cyst in the joint of my left thumb
added to the swelling, not to mention the extreme pain. Then
the right hand went the way of the left, sans the cyst. Wrist
tendinitis was the reason I gave for the black wrap around both
wrist for six months, instead of describing de Quervain's Syndrome.
De Quervain's tenosynovitis (dih-kwer-VAINS ten-oh-sine-oh-VIE-tis)
in the end became a jutting bone on my left wrist that led to some
arthritis. I needed to find another solution for transcribing audio
other than typing. Using a transcription service meant as little as
$5 per typed page up to $150 per hour of audio. Thirty-six hours
of audio and no funding for this project placed another hold on
writing. Grants are rarely given to independent scholars, especially
those unaffiliated with a research institution. The search for funding
never stopped because the signature images of the elders were
always staring at me in my home office. Then someone suggested
using transcription software.
Gullah Geechee Accent Meets Transcription Software
I knew that by 2012 software was coded well enough to convert speech to text. The hunt was on to find
the correct type of software, one that did not cost a fortune. One suggestion after another led to a possible
solution in 2018. Dragon Speak did well after the setup paragraph was read so that the software could
identify the individual speaking pattern. Multiple voice versions were out of my price range. Most of the
elders had died by this time, so reading a standardized paragraph was not going to happen. A friend then
mentioned DeScript and that it had the algorithm to convert three hours of audio into text in approximately
one hour. Solution found! Stop.
I read the conversion of one audio and realized the software would not be able to do the job for most of the
elders. Even though English was the primary language spoken, the algorithm was not designed to understand
Gullah Geechee accent. Dour emotion ensued. Plus, the Windows version of the software was going to cost
five cents per minute, after 30 minutes of free time. Below is a portion of an email I sent to the CEO of the
DeScript does not understand Gullah Geechee accent. It is a designated
language as well, but the elders were speaking English. There are some West
African loanwords mixed in, which is why anyone who helps on this work must
also know the phonetic alphabet.
Here is a sample of what DeScript converted. I know the corrected information,
but can you make out any of this?
From Software Transcription:
Oh, they had three he loved her. So he took a left left my grandmother and the doors day
women have nothing to do with outside the men in charge of everything. So he left she
was at a loss or she's not cooking washing already babies and she know about translation.
Cotton Gin Mill a Mill and a big stool and and 1800. That's my grandmother bear. That was
um, his wife beautiful. I thought my grandmother she gonna go back. What was her name
again. She was a buzzy. Margie mod and Buffy. Oh, she married Laura. It was telling of him
was going now that that leave it as my mother died. That's what I knew and cut the did
I never know how to school you want. Let me hit me. Never. Have you the dirty word? Drink
a can of liquor bed my entire life. In fact, I never saw that I don't like ever my elders was like,
you know I Uncle Jeb any and all all like family on the plastic on the hood. So I put that
business house. I had to behave like I said my dad I got the grave. The same thing so I'm kind
of shield in all of my life and my uncleand then all night. What else what else hook?
. . . now, what am I to do? I no longer see paying for the use of this software. The algorithm is
likely based on prestige English as spoken by a white male. I had hopes, but I was now back
at the drawing board.
I complained to a friend about the search for solutions and the software problems. We reminisced about
the Dictaphone of the old days when a foot pedal was used to control the audio cassette as you typed -- on
a typewriter, at first. No need to stop and start the computer player running the digital audio, to then activate
the Word document to type. Then go back to activate the audio file in order to move forward. So forth and so
on. What could possibly be out there?
My old cassette Dictaphone had been put away in a box a couple of decades ago. I searched online. Whoa -- an
audio software controlled by a foot pedal that looked just like the one from the olden days! I ordered the unit
immediately. Within days, the USB- connected foot pedal was at my door, and after hours of installation and
setup, functioned with the same precision I used decades ago. I began to transcribe again, even if it’s with a
keyboard and digital audio software. Yes, there is an audio-to-Word conversion option, but with the same
DeScript problem. Back to square one. I will just have to take it easy with the arthritis and the de Quervain's
tenosynovitis. But now I know I will get it done.
By Althea Sumpter