What Is The Meaning Of Corpus Linguistics?

Corpus linguistics is a methodology that involves computer-based empirical analyses (both quantitative and qualitative) of language use by employing large, electronically available collections of naturally occurring spoken and written texts, so-called corpora.
The word "corpus", derived from the Latin word meaning "body", may beused to refer to any text in written or spoken form.However, in modern Linguistics this term is used to refer to large collections of texts which represent a sample of a particular variety or use of language(s) that are presented in machine readable form. Other definitions, broader or stricter, exist. See, for example, the definition in the book "Corpus Linguistics" by Tony McEnery and Andrew Wilson or read more about different kinds of corpora in the Systematic Dictionary of Corpus Linguistics.

Computer-readable corpora can consist of raw text only,i.e. plain text with no additional information. Many corpora have been provided with some kind of linguistic information, here called mark-up or annotation.

Types of corpora

There are many different kinds of corpora. They can contain written or spoken (transcribed) language, modern or old texts, texts from one language or several languages. The texts can be whole books, newspapers, journals, speeches etc, or consist of extracts of varying length. The kind of texts included and the combination of different texts vary between different corpora and corpus types.

'General corpora' consist of general texts, texts that do not belong to a single text type, subject field, or register. An example of a general corpus is the British National Corpus.Some corpora contain texts that are sampled (chosen from) a particular variety of a language, for example, from a particular dialect or from a particular subject area. These corpora are sometimes called 'Sublanguage Corpora'.

Corpora can consist of texts in one language (or language variety) only or of texts in more than one language. If the texts are the same in all languages, e.i. translations, the corpus is called a ParallelCorpus. A ComparableCorpus is a collection of "similar" text

For a list of various corpora, click HERE

Corpora serve as the basis for a number of research tasks within the field of Corpus Linguistics.