What Does Corpus Linguistics Do?

Corpus linguistics encompasses the compilation and analysis of collections of spoken and written texts as the source of evidence for describing the nature, structure, and use of languages.

Defining Foreign-Language Ability

In the past several decades, researchers and language educators have devoted considerable attention to the definition of foreign-language ability. New perspectives from second and foreign-language acquisition research, language teaching, and corpus linguistics have challenged prevailing views. Since the early 1980s, the dominant model of foreign-language ability has been based on the notion of communicative competence. This construct originated in a debate between the linguist Noam Chomsky and the anthropologist Dell Hymes during the mid-1960s. Chomsky defined linguistic competence in terms of an underlying, innate and universal grammar, or set of principles capable of generating the structural properties of any human language. For Hymes, this view of language was too restrictive in that it did not take into consideration knowledge of the social aspects of language use as shared within a given speech community. When such knowledge is absent, according to Hymes, utterances may be grammatically correct but socially inappropriate. In coining the term communicative competence, Hymes (1972) conveyed the notion that knowledge of language, and of variation within languages, is closely tied to the social conditions of use in particular contexts, including features of the setting, characteristics of the participants, and norms for interpretation.

In the early 1980s, under pressure to generate more realistic and practical approaches to language teaching, theorists began to elaborate models of communicative competence intended for use by the language-education profession. The first comprehensive model was presented to the profession by Canale and Swain (1980) and revised by Canale (1983). This model included four interrelated components that have remained at the core of the construct through subsequent revisions (see Table 1).

Table 1. Components of communicative competence

Grammatical competence: knowledge of and ability to use the forms of the language (lexical items and rules of syntax, morphology, and phonology)Sociolinguistic competence: ability to use the language appropriately in a variety of social settingsDiscourse competence: ability to interpret and create spoken, written, or multimedia second-language texts that are cohesive (internally well structured) and coherent (interpretable and appropriate within their contexts)Strategic competence: ability to compensate for lacunae in any of the other areas

A similar model, under development by van Ek (1986) and the Council of Europe, added to these four components a provision for sociocultural competence, or knowledge of the social components of situations where communication is likely to occur, and for social competence, a category encompassing the learner's motivation, attitude, and stance toward second-language communications. In subsequent years, the model was revised, most notably by Celce-Murcia et al. (1995) to place discourse competence squarely at the core, with the other components seen as support for the practice of socially situated language use.

Models of communicative competence have traditionally attempted to portray the abilities of expert language users within their own speech communities, that is, native speakers of their own first language. As noted above, the idealized native speaker has come under critical scrutiny due to the sociopolitical climate of foreign-language teaching in the late twentieth and early twenty-first centuries. In the 1990s, a number of foreign and SLA researchers also began to question the rational basis for proposing that first-language users should be held up as models for the emulation of second-language speakers. Such views imply a deficit model of the second-language user who can rarely achieve more than a native-like competence in circumscribed contexts. Moreover, multilinguals, it is argued, possess capabilities different from, and in some ways greater than those of monolinguals. For example, they exhibit greater metalinguistic awareness, and more divergent thinking, but may require more time than monolinguals do when performing language-related cognitive tasks. These and other observations led Cook (1999) to conclude that users of more than one language possess multicompetence wherein the development of second-language abilities interacts with first-language abilities to produce unique and complex communicative repertoires. The model for the language learner should therefore not be a native speaker of that language but instead a person possessing relevant multilingualism.

In the field of foreign-language teaching, researchers have also critiqued the notion that language learners should be interpreted as aspiring native speakers. Rather, they should be construed as developing intercultural communicative competence, or the knowledge and abilities required for communication in situations where the language in question is either the dominant medium of interaction or is the code shared among speakers of divergent primary languages. In Byram's (1989) model, intercultural communicative competence consists of: (1) a cognitive dimension, including knowledge of conventions for communicative activity in one's own and the other group; (2) an affective dimension including empathy and understanding of others' perspectives; and (3) a behavioral dimension including abilities such as the tolerance for diverse communicative styles and the ability to initiate interactions and form interpersonal relationships.

A further challenge to the idealized native-speaker model and the view of language competence as consisting of an underlying sentence-level grammar has come not from second- or foreign-language theorists but rather from the robustly empirical investigation of language in use via corpus linguistics. Based on computerized collections of spoken and written texts, this research aims to reveal regularities and patterns in the documented language use of real speakers and writers, thus demonstrating, for example, that expert users are in command not only of individual words but also of the ways in which these words conventionally form collocations in discourse. Corpus-based analysis also reveals that the grammar of spoken language use should be described in terms of probabilistic statements based on empirical observation of discourse rather than as resulting from the application of deterministic rules (McCarthy, 1998).

Taken together, efforts to revise and refine the field's core definitional construct correspond to a perceived need for realistic, rationally defined, and empirically based goals for language learners to inform both research and instruction. Over time, a shift has occurred, moving the field away from a view in which the abstract, sentence-level grammatical competence of native speakers is taken as the desired developmental end point of foreign-language learning. In its place is a perspective suggesting that the field's centerstage should be occupied by the discourse of intercultural communication and the competence of multilinguals as revealed by empirical observation in the settings where foreign languages are used.