Understanding the process of second language acquisition entails understanding the process of interaction. Observation of language-in-action accompanied by note-taking may provide only superficial insights or require greater detail for substantive discussion or analysis. Even observers with extensive training may lose one-half to two-thirds of the data in real-time coding (Kieren & Munro, 1985). Therefore, teachers and researchers may choose to capture classroom interactions with audio or video and then transcribe these events for subsequent review, discussion, and analysis of learner language.
There are many different ways to transcribe language. A transcript is simply an approach to the notation of language, a “selective process reflecting theoretical goals” (Ochs, 1979). The researcher’s decisions regarding what to transcribe and how to transcribe the data reflect the orientation of the researcher and constrain the analysis and interpretation of the language episode (Lapadat & Lindsay, 1999). For example, broad transcriptions document only sounds and words that are important to meaning. They often denote pauses and use standard spelling. Unlike broad transcriptions, narrow transcriptions document phonological features, using diacritics to transcribe how a word might be pronounced with contrasting accents in English, for example.
TRANSCRIPTION OF AUDIO FILES
Free software can assist in the manual transcription of audio
recordings by enabling the transcriber to control the speed of the
audio/video playback. Two options include Windows
Media Player and Audacity,
which allow the user to control the speed of playback or start and stop
the audio through the use of assigned keys on the computer keyboard. A
third option, Express
Scribe, also enables the user to control the playback speed
through the use of a foot pedal.
AUTOMATED TRANSCRIPTION OF AUDIO/VIDEO FILES
To date, attempts to automate the transcription of audio/video
recordings through voice recognition have not been particularly
successful. Errors in automated transcription are numerous even in high
fidelity with only a single speaker. However, the use of software such
as Adobe
Premiere with Adobe After
Effects, which result in many errors in the automated
transcription, can be time-saving because the software allows the user
to control the playback easily while making corrections to the
transcript as the correct timecodes are automatically inserted for each
edited word(s). Manual synchronization of words and timecode is thus
avoided because the edited words continue to correspond to their precise
occurrence in the video; the software automatically codes and inserts
them as both words and metadata. Thus, searching for a word or string
will queue the video to all the instances of the word(s) in one video or
in several videos designated by the user.
ANNOTATION AND LINKING OF AUDIO/VIDEO FILES TO TRANSCRIPTS
The following free software applications are widely used by
applied linguistics in the annotation and linking of media files to
transcripts.
Anvil,
originally designed for gesture research, is a video annotation tool
that can import data from phonetic tools such as Praat. It can
display waveform and pitch contour and offers frame-accurate,
multilayered annotation.
CLAN-CA,
developed in the context of the CHILDES and TalkBank projects, allows
users to link audio and video documents, pictures, and notes to a
transcript. The software aids in the transcription, coding, analysis,
and sharing of transcripts of conversations linked to either audio or
video media.
EXMARaLDA, an
acronym of Extensible Markup Language for Discourse Annotation, is a
system of data formats and tools for the computer-assisted transcription
and annotation of spoken language, and for the construction and
analysis of spoken-language corpora.
In addition to facilitating manual transcription, applications
such as these provide numerous ways for users to annotate a broad or
narrow language transcript, easily synchronizing a researcher’s notes
with the words and phrases of the language transcript. For example, even
though verbal behavior is situated in a larger, interactional context,
nonverbal behavior is often not noted in transcriptions. Research papers
are rife with illustrations of transcripts that note only the language
spoken, sometimes leading to conflicting conclusions, depending upon the
interpretation of the intent of a given utterance. The possibility that
one of the speakers was pointing to direct an interlocutor’s attention,
that another shrugged her shoulders in response, or that others were
rolling their eyes may result in different interpretations of the
language interaction. The notation of nonverbal behavior that triggers
an utterance may yield information critical to the understanding of the
language episode that is not apparent from a transcript of only the
spoken language.
CAQDAS (COMPUTER-ASSISTED QUALITATIVE DATA ANALYSIS SOFTWARE)
Although not specifically designed for language research,
CAQDAS applications offer extensive suites of integrated tools. Atlas.ti and Transana are two
applications, each providing users many ways to identify, arrange, and
rearrange pertinent clips, assign keywords to clips, and create complex
collections of interrelated clips.
Like other applications, Transana also helps researchers
annotate, segment, and code audio and video data as it automatically
synchronizes the transcript with the (audio/video) data and annotations.
However, researchers in different locations can use Transana to
annotate the same video file simultaneously, using same or different
coding schemas, without writing over the coding scheme or annotation of
other researchers. While one researcher may code a segment for speech
acts, another researcher may associate the same segment with imported
data from Praat, software designed for the analysis of speech. A
researcher interested in assessing language complexity might be
interested in coding AS-units, a unit for measuring spoken language,
defined as a single speaker’s utterance consisting of an independent
clause or subclausal unit, together with any subordinate clause(s)
associated with it (Foster, Tonkyn, & Wigglesworth, 2000).
However, the researcher focused on assessing fluency may be more
interested in coding various types of hesitation phenomena such as false
starts, repetitions, reformulations, and replacements.
Coded segments can appear in a variety of visual displays, such
as color-coded bars along the entire timeline, so that users can
visualize patterns in the data. Users can click on components of the
visual display to play the video or audio segments while displaying the
associated transcript or notes from one researcher or multiple
researchers. A segment can also be retrieved from a lexical search of
lexical data (i.e., transcript, annotations, lexical codes) with which
it is associated.
FRAMING LANGUAGE RESEARCH WITH COMPUTER ASSISTANCE
Just as the selective process of language transcription
reflects theoretical goals, so does the process of determining what type
of technology to use. Rather than assume that all language interactions
should be recorded with the same technology in the same way,
researchers should address the question: “What type of recording and
what type of transcription might be most useful for my research
purposes?”
For example, the author designed a system (Price, 1992) to
record real-time data without any transcription of the language. The
software simply identified individual speakers, when they spoke, and
information about them as to gender, age, and language background.
Through visual displays and automated quantitative analysis, teachers
and students could, in real time, consult summaries of wait-time between
speakers, directions of communication among speakers, volume of
utterances, and amount of talk-time for an individual speaker. Summaries
could be displayed, based upon gender, age group, or native-language
group. Even without transcription of language, the querying of this data
served as a catalyst for empowering change among the students and
teachers. In one class, for example, students were intrigued to see
significant differences in wait-time between the Asian and
Latin-American students. Although measured in milliseconds, students saw
significant differences, prompting discussion of the role of wait-time
in turn-taking and participation in conversation.
Explicitly or implicitly, each technology and medium of
recording influences what can be transcribed, and to
some extent how the data is transcribed. Rather than
assume that one process is always preferable to another, researchers
must be attentive as to how the choices and variables may frame the
conclusions. As Konopásek (2008) argued, these technologies should not
be considered “mere tools for coding and retrieving, but also as complex
virtual environments for embodied and practice-based knowledge making
(pg.9).”
REFERENCES
Foster, P., Tonkyn, A., & Wigglesworth, G. (2000).
Measuring spoken language: A unit for all reasons. Applied
Linguistics, 21, 354-75.
Kieren, D. K., & Munro, B. (1985). The observational
recording dilemma. Ottawa (Ontario) Canada: Social Sciences and
Humanities Research Council of Canada. (ERIC Document Reproduction
Service No. ED 297 021)
Konopásek, Z. (2008). Making thinking visible with Atlas.ti:
Computer assisted qualitative analysis as textual practices. Forum Qualitative Sozialforschung / Forum: Qualitative Social
Research, 9.
Lapadat, J. C., & Lindsay, A. C. (1999). Transcription
in research and practice: From standardization of technique to
interpretive positionings. Qualitative Inquiry, 5, 64-86.
Ochs, E. (1979). Transcription as theory. In E. Ochs &
B. Schieffelin (Eds.), Developmental pragmatics (pp.
43–72). New York: Academic Press.
Price, K. (1992). Look who’s talking. CrossCurrents:
An International Journal of Language Teaching and Intercultural
Communication, 19, 73-79.
Karen Price has authored more than 20 articles and
developed early prototypes of technology now commonly used, such as
lexical searching of video. She enjoys conducting workshops and
consulting in developing countries as well as for entities that have
included Microsoft, Annenberg, USAID, U.S. State Dept., AmidEast,
Fulbright, and Kodak. She is currently a visiting scholar at Boston
University, where she conducts research and teaches graduate courses on
SLA and CALL. |