March 2013
Research Insights Into Expanding L2 Writing Vocabulary
Mark D. Johnson, Leonardo Mercado, & Anthony Acevedo

Identifying target vocabulary in second language (L2) writing instruction presents a challenge to L2 writing instructors. Selection of target vocabulary is often either prescribed by the course texts or based on the intuition of the teacher. L2 writing instructors looking to research for guidance may be disappointed, because research on vocabulary in L2 writing appears to be in short supply. Studies often examine vocabulary incidentally as one of many surface features of L2 writers’ texts. The findings of such studies with regard to vocabulary seem to be consistent, indicating that the use of a broad diversity of unique words—or word types—compared to the total number of words—or word tokens—is associated with holistic L2 writing quality. [1] The results of such studies suggest that teachers of L2 writing should provide students with explicit vocabulary instruction rather than rely on incidental vocabulary learning (Laufer, 2005). However, this leaves the question of how to target vocabulary for instruction to help L2 teachers expand the productive vocabulary of L2 writers.

In an attempt to better understand how to broaden L2 writers’ vocabulary base, Johnson, Acevedo, and Mercado (in press) used lexical frequency profiles to characterize the vocabulary of L2 writers and its relationship to L2 writing quality. To do this, they conducted a study examining the writing of a homogeneous group of Spanish-speaking learners of English as a foreign language (N = 101). They examined the participants’ texts by comparing the vocabulary in their texts to three different frequency lists: (a) the General Service List (GSL; West, 1953), (b) the Academic Word List (AWL; Coxhead, 2000), and the first (1K) through the fifth (5K) most frequent word families according to the British National Corpus (BNC; Nation, 2006). Each of these lists is described briefly in Table 1.

Table 1. Commonly Used Frequency Lists in Lexical Frequency Profiles



General Service List (West, 1953)

Two lists of 1,000 words compiled based on “frequency, ease of learning, and necessity”

Academic Word List (Coxhead, 2000)

A list of 570 word families that, in conjunction with the General Service List, provide approximately 86% coverage of most academic texts

British National Corpus Frequency Lists (Nation, 2006)

Lists of word families arranged incrementally such that a text may be compared against the first through the fourteenth 1,000 most frequent word families

Using the Range program (Heatley, Nation, & Coxhead, 2002), they calculated the number of word types from each of the lists and normed their occurrence per 100 words in order to facilitate comparison of texts of varying lengths. The normed frequency of word types from each of the lists was then entered into a series of three step-wise multiple regression analyses to determine the extent to which the use of vocabulary from each of the lists predicted holistic writing quality scores assigned by a group of L2 writing instructors. Table 2 summarizes the variables entered into each multiple regression analysis.

Table 2. Multiple Regression Analyses Conducted by Johnson, Acevedo, and Mercado (in press)


Criterion variable

Predictor variables


Holistic writing quality score (0–6)

Normed (to 100 words) frequency of:

-Word types from the GSL first 1,000 words
-Word types from the GSL second 1,000 words
-Word types from the AWL


Holistic writing quality score (0–6)

Normed (to 100 words) frequency of:

-Word types from the BNC 1K list
-Word types from the BNC 2K list
-Word types from the BNC 3K list
-Word types from the BNC 4K list


Holistic writing quality score (0–6)

Normed (to 100 words) frequency of:

-Word types from the BNC 1K list
-Word types from the BNC 2K list
-Word types from the BNC 3K list
-Word types from the BNC 4K list
-Word types from the BNC 5K list

The first analysis yielded no significant model. In other words, the use of word types from the GSL and AWL did not predict holistic writing quality scores among this group of learners. [2] The second analysis, however, revealed that the use of word types from the 4K BNC list significantly predicted holistic writing quality score. In the third analysis, when word types from the 5K list were added, the frequency of 5K word types was the only significant predictor of holistic quality score, accounting for 4% of the variance in scores. On its surface, 4% of variance may seem rather small. However, it is important to note that the normed frequency of 5K word types was 0.28 per 100 words, suggesting that the use of less frequent vocabulary makes a considerable impact on holistic writing quality.

Based on the results of their research, Johnson et al. (in press) recommend a three-strand approach to L2 writing instruction that will not only help students build a foundation of the most frequent word families but also expand their vocabulary beyond that base. Such an approach would incorporate (a) extensive reading input for writing, (b) repeated exposure to and practice with target vocabulary (both receptive and productive practice), and (c) explicit instruction in self-study methods to give students the tools to expand their vocabularies beyond the base of word families needed for basic written communication. Such an instructional program would make use of lexical frequency profiles to target vocabulary for instruction, practice, and self-study.

Incorporating lexical frequency profiles into L2 writing instruction offers an opportunity for L2 writing teachers to move beyond the simple recommendation that they teach students vocabulary. Online lexical analysis tools—such as those available at—are easy to use and offer L2 writing instructors a principled method for identifying which vocabulary to teach students. Such tools also allow students and teachers to analyze student writing, potentially expanding students’ productive vocabularies, ultimately leading to gains in L2 writing quality.


[1] Despite well-known methodological challenges in calculating lexical diversity, recent more sophisticated measures (e.g., vocD, MTLD) have confirmed the results of previous research.

[2] According to Coxhead and Byrd (2007), 80% of the AWL is Greco-Latin in origin. This is a possible reason that use of word types from the AWL did not contribute to variance among the Spanish-speaking group of L2 writers.

