
Paula Winke
|

Susan Gass
|
Idea units have been adopted by second language (L2)
researchers to measure reading, writing, and listening comprehension,
including L2 video comprehension. While the term idea
units has been defined in several ways—for example, as
individual, simple sentences; basic semantic propositions; or phrases
(Kroll, 1977)—there is no general consensus on the methods to be
undertaken in operationalizing idea units for scoring. In this article,
we (a) overview select empirical studies that employed idea-unit
scoring, (b) discuss methodological issues, and (c) suggest guidelines
for researchers and teachers who use them to measure comprehension.
Overview of Empirical Studies
In an early paper on L2 summary recall, Johns and Mayes (1990)
had 80 ESL students read an English text on pollution (approximately 600
words) and write a 100-word summary. The students kept the original
text while crafting their summary. The authors segmented the original
text into 77 possible idea units as defined by Kroll (1977). The
students’ texts were classified into (a) correct replications or (b)
distortions. This study is often cited in the methods sections of
academic articles when authors are describing their recall protocol
scoring, but the original empirical work by Johns and Mayes (1990) lacks
information on how they scored the summaries. Because the students were
able to keep the original text while they summarized, the task may not
have been a true measure of reading comprehension (cf. discussion by
Riley & Lee, 1996).
Two decades later, Ableeva and Lantolf (2011) published a paper
on whether dynamic assessment promotes French L2 listening
comprehension. Seven language learners listened to six video recordings
of speakers of French talking about food and restaurants; three
recordings were listened to before and three after different assessment
types. The researchers measured the effects of the assessment type by
gain scores (change) from pretesting to posttesting on idea units
recalled. They used pausal unit analysis, which, they
reported, is counting idea units. They followed the scoring guidelines
by Riley and Lee (1996).
Based on Riley and Lee’s (1996) work, Ableeva and Lantolf
(2011) first segmented the transcripts of the videos into “syntactically
related units” (p. 140). Unlike Johns and Mayes (1990), Ableeva and
Lantolf (2011) had three independent researchers do the segmenting, and
then the researchers compared their work. They discussed differences and
achieved a consensus on the segmentation. A second group of researchers
weighted the segments (or idea units) into main ideas, supporting
ideas, and details. Researchers then analyzed the oral recalls of the
learners to derive the total number of idea units accurately produced
and marked whether the idea units recalled were main ideas, supporting
ideas, or details. Paraphrases were counted, but distortions (untrue
ideas, facts, or details) were not.Logical inferences were considered
distortions and were not counted. Ableeva and Lantolf (2011) only used
the main-idea scores in their paper.
Methodological Issues
We decided to use idea-unit scoring in one of our ongoing
research projects on the use of captions for second language learning
(Winke, Gass, & Sydorenko, 2013). In mapping out the procedures
for the recall protocols and for the idea-unit scoring, we noted that
previous authors had not stated whether grammar or spelling mistakes
were allowed. The directions given to the test-takers were not given in
previous research; we drafted these ourselves. We had no guidance on how
to compute hierarchical scoring. Should we award more points for main
ideas and fewer for supporting idea and details, or should we give fewer
points for commonly identified main ideas and more points for abstruse
supporting ideas and rarely recalled details? We also noted that none of
the researchers who used idea-unit scoring reported the type of
reliability that was run (when a reliability statistic was reported).
We decided to tell the learners before they watched the video
(a short, commercially produced video about bears) about the upcoming
test. They could take notes while watching, although they could not use
the notes when recalling. We decided to have students type their recalls
on computers using the target language or their native language, as
they wanted. We scored idea units regardless of spelling, grammatical
mistakes, or language (we translated non-English responses into
English). The directions were as follows:
In the space below, please type (in English or in your
native language) everything you understood and recall from the video.
(Type out/retell the story from the video.) Please provide as many
details as you can. There is no time limit.
We segmented the transcripts of our videos as modeled by
Ableeva and Lantolf (2011). We then organized the idea-unit segments
onto an easy-to-use, one-page score sheet. When we first began scoring,
we noted that the learners sometimes wrote correct things that were not
exactly represented in the segments. For example, in the video, there
were idiomatic expressions or understatements that conveyed concrete
meanings but which could be rephrased more bluntly. These were the
logical inferences that Ableeva and Lantolf (2011) did not count, but we
decided that we wouldcount them (see, for example, item 15 in Figure
1). We went through an iterative process of amending and verifying the
coding sheet by first using it on a subset (about 10%) of the recalls
before we started coding. We added correct alternative interpretations
to the scoring sheet in italics. We put the main ideas in bold and
supplemental ideas and details in roman. If all ideas (main,
supplemental, and details) were conveyed, the student received a full
point. If fewer than all were conveyed, the student received a
half-point. We also had on the scoring sheet a comment area for notes on
scoring decisions. These notes were helpful when we needed to negotiate
score assignments or amend the scoring sheet again (and then rescore
previously scored recalls, as needed). See Figure 1 for an example of a
portion of the final scoring sheet (the first 15 of 36 idea units).

Figure 1. Idea-unit scoring sheet sample.
Main ideas are in bold and supplemental ideas and details in
roman.
To calculate interrater reliability, we input the two raters’
scores (A and B) into an Excel spreadsheet (Figure 2). We then
calculated a correlation (Pearson Product Moment) coefficient
(r = .98), percent agreement (which averaged at 93%,
with a 1 and 1 assignment being 100% agreement, a .5 and 1 being 50%
agreement, and 0 and 1 being 0% agreement), and Cronbach’s alpha, with
all 36 items resulting in an alpha of .94; when using only 34 of the
items that provided variance (that is, by eliminating items that no one
got or that everyone got), the alpha increased to .96.

Figure 2. Scoring sample sheet, for estimating interrater reliability.
For research purposes, the key to idea-unit scoring is in the
segmenting of the original input (which should be done collaboratively
between two or more researchers) and also in creating a scoring rubric
or sheet through an iterative process. Researchers decide the parameters
of what will be included as correct responses. Researchers can also
calculate various reliability estimates and should report them. Multiple
estimates are needed because each one is only an estimate, and the true
reliability should be seen as anywhere between those derived.
Conclusion
Comprehension is difficult to measure because it is an
internal, cognitive process. Thus, comprehension is measured indirectly.
A question that follows is whether good comprehension should
necessarily entail a good memory of what one comprehended. And is
comprehension of little worth if one cannot convey what he or she
comprehended through good speaking or writing skills? Or is
comprehension more of an online process that should
not overlap (in measurement) with memory skills?
We believe idea-unit scoring is a preferable method of
measuring listening comprehension, even if the scoring is difficult,
because test-takers must rely on their language skills, memory,
processing strategies, and background knowledge to convey what they
comprehended. Thus, the construct of comprehension conveyed by idea-unit
scoring is multicomponential and skill integrated, and it is tied to
the learners’ overall knowledge base. It also represents an authentic
and communicatively oriented task; recalling and reporting is akin to
something one might do in real life. Teachers can embrace idea-unit
scoring for classroom-based comprehension assessment because it is
authentic and informative. And most important, teachers can learn a lot
about their students through recall scoring, which makes it ideal for
formative, classroom-based assessment purposes.
References
Ableeva, R., & Lantolf, J. (2011). Mediated dialogue
and the microgenesis of second language listening comprehension. Assessment in Education: Principles, Policy & Practice,
18(2), 133–149.
Johns, A. M., & Mayes, P. (1990). An analysis of
summary protocols of university ESL students. Applied
Linguistics, 11(3), 253–271.
Kroll, B. (1977). Combining ideas in written and spoken
English: A look at subordination and coordination. In E. O. Keenan
& T. L. Bennett (Eds.), Discourse across time and
space. Southern California Occasional Papers in Linguistics, No.
5. Los Angeles, CA: University of Southern
California.
Riley, G., & Lee, J. F. (1996). A comparison of recall
and summary protocols as measures of second language reading
composition. Language Testing, 13(2), 173–189.
Winke, P., Gass, S., & Sydorenko, T. (2013). Factors
influencing the use of captions by foreign language learners: An
eye-tracking study. The Modern Language Journal,
97(1), 254–275.
Paula Winke is an associate professor of second
language studies at Michigan State University, where she teaches in the
TESOL MA and Second Language Studies PhD Programs. Her research
interests include language assessment and task-based language teaching.
She is the 2012 recipient of the TESOL International Association
Distinguished Research Award.
Susan Gass is university distinguished professor of
second language studies (SLS) at Michigan State University, where she
serves as director of the English Language Center and of the SLS
Program. She has published widely in the field of second language
acquisition, including books on second language acquisition and research
methods. |