RVIS Newsletter - February 2022 (Plain Text Version)

Return to Graphical Version


In this issue:




Rebeca Arndt, Pasco County School District, Florida, USA

For decades, researchers have continuously highlighted the vital role that knowledge of vocabulary plays in comprehending language, spoken or written. Whether acquiring a foreign language, a second or third language, vocabulary is the foundation of language comprehension.

Informed about the importance of vocabulary in language comprehension, language and content educators across the world implemented various vocabulary teaching techniques into their instruction: word cards, keywords, word parts, dictionary use, word lists, multiword units, etc. For instance, empowered with the knowledge that academic and science-specific vocabulary accounts for a substantial portion of the variance in science reading comprehension of English Speakers of Other Languages (ESOL) and former ESOL students, science teachers began to incorporate explicit academic science-specific vocabulary in their instruction. Similarly, language instructors of English for Academic Purposes (EAP) and Intensive English Program (IEP) learners, aware of the importance of vocabulary in language learning, routinely integrate vocabulary teaching techniques in their language instruction. The increased attention to vocabulary in classrooms with ESOL, IEP, and EAP learners is aimed to enhance the learners' vocabulary size (how many lexical items a learner knows) and their vocabulary depth (how well a learner knows a lexical item).

Despite the wide acknowledgment that vocabulary is the cornerstone of language and content learning, educators continue to seek the most effective vocabulary teaching and learning techniques and tools that can be used in their specific instructional context and adapted to the needs of the learners. Some effective vocabulary teaching and learning tools are corpus-based lists. Broadly, these lists are divided into three main categories: general, academic, and technical (domain-specific) vocabulary lists designed for spoken or written purposes. On an individual level, each of the vocabulary lists within the three main categories is unique in many ways because: (a) each is designed from a corpus (large collections of language) comprised of language from specific sources (school textbooks, novels, academic articles, movies, etc.), (b) each corpus has a different size (c) the unit of counting "words" varies (the unit can be a type, a lemma or a word family at level 6), (d) frequency is one of the main criteria of including "words" in the list (e) the size of each list varies, (f) the purpose of each list varies, (g) the usefulness measured by the lexical coverage (the percentage of a specific text/ corpus) of each list varies. With these differences in mind, it may be challenging for educators to select the corpus-based list that best fits their instructional context, the linguistic and/or academic needs of the learners, and the goals sought to accomplish.

While students in ESOL, IEP, and EAP are prepared in the classroom to cope with the language in academic settings where technical vocabulary (which generally is fundamental to a specific topic) can be extremely dense, it is of utmost importance that these learners have strong general and academic vocabulary because general vocabulary makes up the largest percentage in written text and academic vocabulary offers contextual information about technical lexical items. Two corpus-based tools that can help emergent bilinguals in their general and academic vocabulary learning are the General Service List (West, 1953) and the Academic Word List (AWL), designed by Coxhead (2000). The GSL, a high-frequency list for second language (L2) learners, was designed from English written corpora of approximately five million running words. This list contains 1,986 word families at level 6. Likewise, the AWL, a 570 word families list, was extracted from an academic corpus of 3.5 million running words collected from 28 subject areas. In terms of usefulness, measured by the provided lexical coverage (percentage) of the lexical items in the list across texts or corpora, the GSL covers between 71.52%– 91.9% (Coxhead & Hirsch, 2007; Hirsch & Nation, 1992), depending on the texts (science texts versus novels for adolescents) while the AWL provides around 10% in a wide range of academic written texts (Coxhead, 2000).

The beauty of these two lists is that they are embedded in online platforms (e.g., VocabProfiler Classic, WordSift) that can be easily used in the classroom to identify GSL and AWL lexical items across texts. These innovations help teachers process any digital text and extract general, academic, and non-academic or non-general vocabulary (e.g., off-list). An excerpt from the novel Fahrenheit 451 by Ray Bradbury is used in the sections that follow to illustrate the Lexical Frequency Profiling (LFP) process of identifying general and academic lexical items on the two above-mentioned platforms:

The last few nights he had had the most uncertain feelings about the sidewalk just around the corner here, moving in the starlight toward his house. He had felt that a moment prior to his making the turn, someone had been there. The air seemed charged with a special calm as if someone had waited there, quietly, and only a moment before he came, simply turned to a shadow and let him through. Perhaps his nose detected a faint perfume, perhaps the skin on the backs of his hands, on his face, felt the temperature rise at this one spot where a person's standing might raise the immediate atmosphere ten degrees for an instant. There was no understanding it. Each time he made the turn, he saw only the white, unused, buckling sidewalk, with perhaps, on one night, something vanishing swiftly across a lawn before he could focus his eyes or speak. (Bradbury, 1992, p.2)

1. VocabProfiler (VP) Classic

After accessing VocabProfilers (VP) Classic v.4, the user can insert the text in the input window, select AWL on the right side and press submit window. The profiled output will be color-coded: light blue for the first thousand word families in the GSL, green for the second thousand word families in the GSL, yellow for AWL lexical items, and red for off-list lexical items (mostly technical in nature) that are neither GSL nor AWL lexical items (see Table 1). As a side note, the VocabProfilers suite provides options for processing text against a wide range of academic and technical lists (e.g., Academic Collocations List, the Academic Phrase List adapted from the Oxford Academic Phrasal Lexicon, the Business Service List, etc.).

Table 1. Excerpt profiled with VocabProfiler (VP) Classic


Cumulative percentage

Lexical items




a, about, across, air, an, and, around, as, at, backs, been, before, came, charged, could, degrees, each, eyes, face, feelings, felt, few, for, had, hands, he, here, him, his, house, if, in, it, just, last, let, made, making, might, moment, most, moving, night, nights, no, of, on, one, only, or, perhaps, person, raise, rise, saw, seemed, shadow, simply, someone, something, speak, special, spot, standing, ten, that, the, there, this, through, time, to, toward, turn, turned, uncertain, understanding, waited, was, where, white, with




calm, corner, faint, immediate, instant, nose, quietly, skin, temperature




detected, focus, prior




atmosphere, buckling, lawn, perfume, sidewalk, starlight, swiftly, unused, vanishing

2. WordSift

WordSift also provides the option for pasting a text into the input window, allowing for the text to be further processed. The output obtained can be organized in cloud view or in text view. Cloud view allows for a certain number of lexical items to be displayed in a word cloud, whereas text view offers statistics and readability information. Maintaining the output in cloud view, the user can select the desired cloud style and further choose from the mark words dropdown menu, the GSL, or AWL. Suppose the user clicks on one of the AWL lexical items, for instance. In that case, a WordNet® Visualization (see Picture 1) will appear in the box below the word cloud as well as images and videos associated with the specific lexical item that was clicked on. One more interesting feature available on this platform is the display of the selected lexical item (be it GSL, AWL, etc.) in context, together with information about the number of occurrences and the number of sentences that appear in the text. This feature enables teachers and learners to engage with vocabulary in context.

Picture 1. Excerpt profiled with WordSift

In conclusion, corpus-based lists can be effective for vocabulary learning and teaching because they are purposefully designed from a large representative corpora, following stringent methodological considerations and precise pedagogical purposes. Two corpus-based and frequency-based vocabulary lists that proved their effectiveness repeatedly are the GSL and the AWL. These two lists embedded in VocabProfiler Classic and WordSift can be used in ESOL, IEP, and EAP classrooms to profile any digital text and identify general, academic, and even technical vocabulary. Upon LFP, the vocabulary of interest can be extracted and explicitly taught, discussed, and interacted with to increase the learners' vocabulary size. Simultaneously, one of the platforms presented allows users to interact with vocabulary in context and engage with vocabulary via word visualization, images, and even videos. These features allow learners to enhance their depth of vocabulary knowledge, another facet of vocabulary knowledge that is extremely important and intricately related to reading comprehension.


Bradbury, R. (1992). Fahrenheit, 451. Del, Rey, Books.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.

Coxhead, A., & Hirsh, D. (2007). A pilot science-specific word list. Revue Française de Linguistique Appliqueé, 12(2), 65–78.

Hirsh, D., & Nation, P. (1992). What vocabulary size is needed to read unsimplified texts for pleasure? Reading in a Foreign Language, 8, 689-696.

West, M. (1953). A general service list of English words. Longman: Longman, Green. Yilin Education.

Rebeca Arndt is a Ph.D. graduate in Education, TESOL track, working full-time as an English Language Arts high school, English Honors II, teacher in Florida. Her research interests are related to the field of corpus linguistics (e.g., examining academic/discipline-specific vocabulary across corpora, exploring the relationship between vocabulary and reading comprehension).