January 2013
Bringing Corpora Into Teacher Education
Roger W. Gee, Holy Family University, Pennsylvania, USA

In the master’s in TESOL and literacy program at Holy Family University in Philadelphia, in the United States, class activities and assignments with corpora are embedded throughout the coursework. This article describes a class activity to prepare materials for a grammar exercise suggested in a methods text. Rather than make up material for the exercise, authentic language from the Corpus of Contemporary American English (COCA) was used. COCA contains 450 million words from a variety of genres, is easy to search, and is free.

In his methods text for teaching reading and writing, Nation (2009) suggests transformation techniques such as sentence combining to help learners understand and use patterns like too + adjective + to + stem (pp. 107–108). A class activity with COCA was developed to engage the graduate students with the use of COCA. The point of the activity was to demonstrate how material to construct teaching materials can easily be obtained from a corpus. After completing the activity described in this article, the graduate students worked in pairs and small groups to independently create grammar exercises for a reading passage using material found in COCA.

Searching COCA for “too [j*] to” yields a list of the most frequent phrases with that pattern. By not specifying verb stems in the search, it is possible to pick just one adjective and create exercises with a variety of verb stems for that adjective. Below is a screenshot showing the search field. Note that entering a part of speech is not difficult with the drop-down menu.

Here is a list of the top 25 too + adjective + to phrases and their frequencies, out of 14,993 total occurrences:

In COCA, clicking on a phrase yields sentence-level context, which can be searched for suitable sentences. The first 10 items for the most frequent phrase, too late to, are the following:

From these 10 items, 5 sentences were selected as being appropriate to construct an exercise for intermediate or advanced learners. They were copied and pasted in a list with the original formatting.

it's not too late to book an inexpensive weekend getaway
It wasn't too late to turn back.
It's too late to get a babysitter.
it would be too late to turn back.
It's never too late to help an old friend

A text-only paste removed the online formatting, the sentences were numbered, and a sentence combining exercise was created with students only having to fill in blanks to complete the pattern. As seen below, the first pair of sentences was combined as an example, and the remaining four were prepared with blanks for some or all of the too + adjective + to + stem pattern. The sentences are presented in order of increasing difficulty.

  1. It's not too late to book an inexpensive weekend getaway.
    1. It’s not too late. We can book an inexpensive getaway.
    2. It’s not too late to book an inexpensive getaway.
  2. It wasn't too late to turn back.
    1. It’s not too late. We can turn back.
    2. It’s not too late _____ _____ back.
  3. It's too late to get a babysitter.
    1. It’s too late. We can’t get a babysitter.
    2. It’s too _____ _____ _____ a babysitter.
  4. It would be too late to turn back.
    1. It would be too late. We can’t turn back.
    2. It would be _____ _____ _____ _____ _____.
  5. It's never too late to help an old friend.
    1. It’s never too late. We can help an old friend.
    2. It’s never _____ _____ _____ _____ _____ _____ _____.

Note that in the exercise above, the original sentence was left to make the items easier. To make the exercise more difficult, the original sentence could be deleted as below:

  1. It’s never too late. We can help an old friend.
  2. It’s never _____ _____ _____ _____ _____ _____ _____.

The vocabulary of the exercises was not difficult. Using the Word and Phrase feature of COCA, it was found that in these sentences 77% of the words were from 1–500 most frequent words in COCA, weekend was in the 501–3,000 most frequent, and only inexpensive and getaway were in the greater than 3,000 range. There were no academic words. The frequent vocabulary would allow students to concentrate on the too + adjective + to + stem construction.

Rather than use very frequent items, the teacher may wish to focus on specific vocabulary found in a reading text, but the principle remains the same: Identify a pattern, search COCA for suitable material, and use the material to construct exercises.

Activities and assignments like this one, where teachers can see the utility of corpora as source material for teaching activities, helps bring corpora into the classroom.


Roger W. Gee, PhD, is a Professor at Holy Family University in Philadelphia, Pennsylvania (USA) where he is the Director of the Masters in TESOL and Literacy Program.

