March 2021
Ali Yaylali and Aleksey Novikov, University of Arizona, Tucson, AZ, USA
Hadi Banat, University of Massachusetts Boston, Boston, MA, USA

Data-Driven Learning (DDL)

Data-driven Learning (DDL) is an inductive corpus-based pedagogical approach that integrates corpora in classroom instruction (Johns 1991). DDL is based on the principles of guided discovery, in which the teacher provides students with language data and guides students through finding patterns for themselves.

The language data in DDL is typically presented in the form of concordance lines that facilitate the process of looking for language patterns in large amounts of data. Concordance lines are not complete sentences, rather they show the searched word/phrase highlighted in the middle, often referred to as Key Word in Context along with the the surrounding context of the word/phrase (See Figure 1). Concordance lines are made with special tools called concordancers, search tools created for linguistic analysis which can be part of a corpus interface (e.g., Crow, COCA), or be a standalone program (e.g., AntConc, Lancsbox). Concordancers help students search for a word/phrase in a corpus, allowing them to notice language patterns more easily. To further facilitate noticing patterns, concordancers usually have a sorting function to sort the concordance lines by the words before or after the key word or phrase.

These capabilities of DDL can make classroom learning more effective compared to more traditional approaches of language instruction (Smart, 2014). Boulton and Cobb (2017) analyzed 64 DDL studies, providing evidence for DDL materials’ effectiveness in language teaching. When learner performance was compared on pre- and post- tests, learners showed significant improvements after using DDL, especially for learning vocabulary and grammar.

Figure 1
Concordance Lines for Search Phrase “such as”

Corpus-Based Materials in K-12 Contexts

The use of corpus-based materials in language teaching is becoming more common. Despite the proliferation of corpora and the benefits of DDL, this pedagogical approach has not been widely used in K-12. Boulton (2009) indicated that Corpus Linguistics (CL) researchers work increasingly more in K-12 contexts, yet not much has been done in this area.

One reason for the underutilization of corpora in K-12 settings is that corpus-based methodologies have found their way into pre-service teacher education only recently (Braun, 2007). While teacher education programs might offer CL courses, it is equally important to consider logistical concerns such as language teaching curriculum and standards. Since English Language Development (ELD) [1] teachers often prepare English learners (EL) for state-mandated assessments, lack of time might be an ongoing concern. Teachers might also be discouraged by the fact that corpus-based materials require student training before effective use. Since various constraints intervene in the use of corpus in the classroom, ELD teachers might appreciate custom-made materials for their teaching contexts. This article will therefore introduce a corpus-based project that developed pedagogical materials for K-12 teachers.

High School Materials Based on Crow

The Corpus and Repository of Writing (Crow) is a corpus of L1 and L2 essays written in first-year composition courses and a repository of instructor materials (Velázquez et al., 2020). The Crow team develops corpus-based materials for college writing instructors. As part of the project’s goals, a team of scholars have recently started a collaborative initiative to develop materials for high school classrooms. These materials and others are freely accessible on the Crow for Teachers page.

The materials specifically designed for ELs introduce exemplification phrases (e.g., for instance) and their functions in two genres (literacy narratives and argumentative essays) to address ELs’ writing needs. The materials particularly address the repetitive use of exemplification phrases. These phrases were chosen since they stood out in written teacher feedback as key areas of writing development.

One of the authors of this article was an ELD teacher at a high school and invited other ELD teachers to collaborate on a professional development event that introduced corpus-based materials as they fit ELD teachers’ curriculum and objectives. The teachers’ genuine interest in exploring corpus pedagogy allowed the Crow team to start developing corpus-based materials. Since the pandemic required professional development activities to take place virtually, this encouraged the Crow team to design these materials suitably for virtual classroom contexts.

For this project, we first reviewed high school EL writing samples to understand the genres which students write and the issues around L2 writing. The reviewed samples represented two school genres: narrative and persuasive writing. Specifically, ELs wrote narrative essays to share a life experience or event (e.g. their arrival in the US) while the persuasive essays allowed them to take a position on a topic like eating off-campus vs. eating in the school cafeteria. In both genres, students provided examples for different reasons, but the repetitive use of for example was salient in many samples. Written teacher feedback had also suggested the use of alternative phrases in those writings. Based on the sample student writings and identified developmental needs, this material development intervention focused on the functions of exemplification phrases in different genres to raise student awareness and expand the students’ repertoire of those phrases in EL writing.

The first set of activities, titled Exemplification Strategies, start with an activity that raises students’ awareness of purpose, audience, and genre. Definitions of these three terms are provided along with sample questions that teachers can use to gauge ELs’ knowledge of these terms and scaffold the remaining activities. To complete the assignment, students review two excerpts from the Crow essays (argumentative essay and literacy narrative) and identify the purpose, audience, and genre of those excerpts. Next, students view a frequency table that demonstrates the differences in the counts of exemplification phrases in the two genres selected for practice. Since corpus-based materials foreground patterns and frequency in language use rather than prescriptive rules, ELD teachers can leverage this activity to introduce these aspects of corpus. Finally, teachers utilize interactive concordance lines that allow for sorting texts by words before and after the exemplification phrases. These concordance lines allow teachers to start a discussion and exploration around how these exemplification features are actually used in writing (i.e., sentence initial or middle position) and what kind of words precede or follow them (e.g., nouns, gerund verb phrases). While the sample questions help teachers walk students through this material, teachers could also add their own questions to ensure student understanding of the materials.

The second set of activities, titled Exemplification Variety, addresses the repetitive use of exemplification phrases. These activities were inspired by the frequent use of for example and the rare instances of other possible phrases like for instance, such as, and an example of in the EL writings. These materials, therefore, walk students through alternative exemplification phrases used in writing. First, students review a frequency table similar to the first activity above. Next, they analyze the use of lowercase and capitalized versions of exemplification phrasesin context. They compare the use of these phrases and decide whether they could be used interchangeably in different sentence positions. They are then asked to identify more examples and patterns of what comes before and after the exemplification phrase (e.g., gerund verb clause after such as). During this activity, ELs are encouraged to have one sample of their own writing ready since teachers can culminate this material with a discussion on how ELs actually use exemplification phrases and what similarities or differences they notice between their own writing and the L2 writing samples presented in the activity. This material also allows teachers to highlight punctuation use (e.g., comma after for instance) since in some states like Arizona, ELs’ use of punctuation marks is taken into consideration in state assessments.

These materials have demonstrated how an inductive process of learning language works. By encouraging ELs to independently analyze concordance lines, corpus-based materials promote experiential learning rather than depending too much on teacher-provided examples. Students are clearly encouraged to take ownership of their learning and further their knowledge of exemplification features and linguistic patterns. These materials were presented at the 2020 AZTESOL State Conference and in a workshop for ELD teachers. Hopefully, we have made it clear that ELs in K-12 could gain valuable language learning experiences by using corpus-based materials. Overall, this process involves collaborative efforts between ELD teachers and researchers who compile and use corpus for different purposes.

Next Steps for K-12 Teachers

K-12 teachers keen on incorporating corpus-based materials in their own teaching contexts can benefit from resources on the Crow webpage. By using these resources, K-12 teachers can further explore the benefits of using learner corpora in classrooms. They can try various DDL materials to raise student awareness about language patterns, thus allowing students to explore language use in various genres and registers. Since we are continuing to develop this resource, we are keen on building a relationship with teachers in K-12 and collecting feedback that helps us understand their needs. This feedback will enhance mentoring teachers on the design of DDL materials. For providing such feedback, please write to ( As Crow teachers and researchers, we are investing in outreach work. To stay informed about our upcoming professional development events, visit


Ali Yaylali is a Ph.D. candidate in Teaching, Learning, and Sociocultural Studies Department at the University of Arizona. His current interests are L2 writing development, corpus pedagogy in language classrooms, disciplinary literacy development, and educational discourses. For more information about his work, please visit

Dr. Hadi Banat is an assistant professor of Rhetoric and Composition and the ESL Program Director at UMass Boston. For a detailed overview of his scholarly expertise, please visit,, and

Aleksey Novikov is a Ph.D. candidate in Second Language Acquisition and Teaching (SLAT) at the University of Arizona. His academic interests include register variation, L2 Russian syntactic and morphological complexity development, corpus-informed pedagogy and Data-driven Learning (DDL).

[1] This acronym stands for teachers of English learners in schools and it varies by state or school.
