CALL Newsletter - March 2019 (Plain Text Version)

Return to Graphical Version


In this issue:



José Franco

Kara Mac Donald

This article seeks to highlight the importance of corpus linguistics as an increasingly influential procedure in the field of language analysis and also to promote the use of AntConc as an option to conduct precise and sophisticated language analyses with respect to two key elements: a) the creation of raw corpora from learners’ written production and b) the analysis of such corpora to determine and categorize errors and their causes to design more effective correction strategies and writing instruction.


According to Froehlich (2015), AntConc is a concordance software package for linguistic analysis of texts. It is self-contained; freely available for Windows, Mac OS, and Linux; and highly maintained by its creator, Laurence Anthony. It contains very important features, such as keyword in contextview (KWIC), clusters, collocates, frequency word list, keyword list, and search operators. AntConc is also the third most used concordance software for corpus linguistics around the world. This software allows researchers and language teachers to create and develop their own corpora and also analyze them. It offers researchers and language teachers independence in terms of corpus content and source selection; they can create their own corpora from any type of textual objects, including students’ written production.

AntConc and Corpus Linguistics

If we think of what corpus linguistics is today, we may answer that it is a relatively new field in language learning closely related to the use of computers and the internet to access huge collections of texts. Yet, it has been around for almost a century, considering the first attempts of lexicographers and dictionary-makers to collect examples of language in use to help accurately define words since at least the late 19th century (Bennett, 2010). In fact, with the advent of computers, what has changed is the way we collect those samples of language, which has led to the construction of the large electronically stored corpora we are used to employing today. For the purpose of this article, a learner corpus will be considered as electronic collections of texts produced by English language learners (ELLs) upon which some general linguistic analysis can be conducted (Granger, 2008).< p>In relation to corpora and learner corpora analysis, AntConc has become considered one of the most employed tools by linguists and researchers to develop language analyses. However, as the content of this article suggests, this tool is also employed by language teachers because of its features and utility in language analysis.

Learner Corpora and Error Analysis in Classroom Instruction

Learner corpora serve as a source of data for researchers and teachers, which is essential in the error analysis field; no analysis can be made without a sample from a determined learner or group of learners. Thus, collecting students’ samples of written work to create a learner corpus is the first step any teacher needs to take toward the employment of error analysis to enhance written error correction.

The identification and analysis of errors can reveal a plethora of factors that may affect students’ linguistic production so that teachers can design effective remediation strategies and pedagogical interventions to correct such errors. Granger (2008) states that learner corpora “can be used to develop pedagogical tools and methods that more accurately target the needs of language learners” (p.1).Such needs can be understood as the gap they need to fill to be competent writers/speakers. In relation to writing, most common errors are related, among other factors, to spelling and misuse of prepositions.

Hence, learner corpora along with error analysis by means of AntConc can be viewed as an effective strategy to identify features in students’ written production that are not easily identified by merely correcting their written work. Significant error correction must go beyond correction to just grade students’ linguistic performance. Through the use of AntConc, the possibility for teachers to improve and adapt their practice to align with students’ real needs remains open-ended.

The Creation of Raw Corpora From Students’ Written Production

Learner corpora contain all the characteristics of a corpus. However, the main difference between learner corpora and EFL corpora, according to Seidlhofer (as cited in Granger, 2008) “lies in the researchers’ orientation towards the data and the purposes they intend the corpora to serve” (p. 1). With respect to the latter aspect, I suggest the purposes of a corpus play a crucial role in its design and development; it often determines most of its characteristics (e.g., size). Yet, there is nothing for teachers to worry about if the corpus is limited in size and scope, as what matters is that it represents the learners and serves the teacher’s purpose or purposes as even a single learner corpus has diverse potential uses. In general, learner corpora or mini-corpora with pedagogical purposes are generally small in comparison to other commercial or academic corpora available online. So much so that nowadays, teachers and researchers employ the term mini-corpora to refer to learner corpora, or claim that a determined research investigation is framed within a mini-corpus approach when employing learner corpora as a methodology to gather the required data from a determined group of learners or even from a single learner (Ragan, as cited in Granger, 2008; Gould, 2009). Therefore, all students’ written production of a determined class group can be collected during a semester or academic period to create a learner corpus or mini-corpus, which are generally analyzed as a whole; however, teachers can analyze the written production of a learner or learners with specific characteristics or linguistic background.

Language naturalness, on the other hand, is another important feature in a corpus, which, in fact, can represent a great challenge for teachers and researchers to gather reliable data to create a corpus. In this sense, Granger (2008) explains that learner production data can display diverse degrees of naturalness, which may rank very low in tasks such as reading aloud or fill-in-the-blanks, but may rank much higher in informal interviews or free compositions. Data reliability can be also affected by the intervention of automated spell/grammar checkers of word-processing programs if, for example, analysis is focused on the use of prepositions by foreign language learners.

The Employment of AntConc in the Analysis of Raw Learner Corpora

The creation of learner corpora implies basically the following processes: a) students’ data collection, b) data conversion into text-plain files (no tagging or annotation is needed in raw learner corpora), and c) analysis. For the purpose of this article, only the frequency word list, the keyword in contextview (KWIC), and the collocates features will be referred to in this section. The word list feature shows, among other data, the total number of files that compose the corpus, the total number of word tokens as well as the word types along with their respective frequency. Function words are more likely to show higher frequency levels within a corpus in comparison to content words, which allows teachers to detect mistakes in spelling (see Figure 1).

Figure 1.Word list feature analysis. (Click image to enlarge)

However, the concordance feature highlights how particular words appear in context (see Figure 2). These results were displayed by clicking on to in the word list; they show all language patterns in which the preposition to is employed. This is a very useful feature to determine what specific errors are linked to the use of a search word in the corpus.

Figure 2.Results for to in the concordance feature. (Click image to enlarge)

Words with a high likelihood to appear before or after to are recognized by AntConc as collocates (see Figure 3). The collocates feature menu also displays the frequency of such word partnership in the corpus. By clicking on a specific word or by searching it, users can observe its collocates in the concordance line or lines.

Figure 3.Collocates for to.

The main difference between the cluster and the collocates feature is that the former shows the adjacent words before and after the search word (see Figure 4), and the collocate feature shows all the words whose frequency in the corpus implies a high degree of cooccurrence likelihood.

Figure 4. Clusters after the word to.


This article aimed to provide teachers and researchers a starting point to exploit the potential of such valuable areas as the language corpora and error correction that can be complemented by means of AntConc to enhance language instruction. Little is known about language corpora; it is still considered as new research area. Thus, teachers and researchers require new experiences in and considerable knowledge of the field to produce “useful and usable results” (Tono, 2003, p. 806). With this in mind, future research on language corpora should focus on essential aspects such as design principles, theoretical scaffolding, and data reliability.


Bennett, G. (2010). Using corpora in the language learning classroom. Ann Arbor, MI. Michigan University Press.

Froehlich, H. (2015 June 19). Corpus analysis with AntConc. Programming Historian. Retrieved from

Gould, T. (2009). Assessing lexical production in NNS-NNS casual conversations: A mini-corpus approach. Sophia Junior College Faculty Journal, 29, 25–45.

Granger S. (2008). Learner corpora. In A. Lüdeling&M. Kytö (Eds.),Corpus linguistics. An International Handbook(Vol. 1; pp. 259–275). Berlin, Germany: Walter de 5 Gruyter.

Tono, Y. (2003). Learner corpora: Design, development and applications.In D. Archer, P. Rayson, A. Wilson,& T.McEnery (Eds.), Proceedings of the Corpus Linguistics 2003 Conference (pp. 800–809). UCREL technical paper number 16.Lancashire, England:Lancaster University. Retrieved from


José Franco is an assistant professor at Universidad de Los Andes, in Trujillo State, Venezuela. He holds an MEd in TEFL. His main interests include ICT, corpus linguistics, and lexicon.

Kara Mac Donald is an associate professor at the Defense Language Institute in Monterey, CA. She conducts preservice and in-service faculty training and offers academic support to students. She earned a masters in applied linguistics, TESOL, and a doctorate in applied linguistics.