|
Corpus linguistics is the learn of language every bit expressed within samples (corpora) or "real world" text. A approach diarrhea counter to Noam Chomsky's view that rattling language is riddled by using performance-related errors, so requiring careful analysis of little speech samples found around the extremely restricted laboratory setting. Corpus L=linguistics does away by having Chomsky's competence/performance split, viewing that i personally may simply ever dependably analyse language whenever a investigator doesn't interfere.
Inside a few areas there exists an overlap by owning computational linguistics, as a latter moves towards language processing applications. This means treating by using really input file, in which descriptions according to the linguist's intuition are non ordinarily helpful.
a landmark around modern corpus linguistics was the publication by Henry Kucera and Nelson Francis of Computational Analysis of Present-Day American English inside 1967, a operate according to the analysis of the Brown Corpus, the carefully compiled choice of todays Western English, totalling astir the million words drawn from either a wide kind of sources. Kucerthe & Francis subjected it to the kind of computational analyses, from either which it compiled a rich & varicolored piece, combining elements of linguistics, language teaching, psychological science, actual cost, & sociology.
Shortly thenceforth Boston publisher Houghton-Mifflin approached Kucerthe to supply a million word, deuce-ace-line citation base for its newly American Heritage Dictionary, the number one lexicon to become compiled utilizing corpus linguistics. A AHD manufactured a innovative step of combining normative elements (how else language should become utilized) using descriptive reference (how else it actually is utilized).
More publishers followed lawsuit. A British publisher Collins' COBUILD dictionaries, designed for users learning English as a foreign language, were compiled using the Bank of English.
A Brown Corpus has also spawned a total of likewise integrated corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (Just released Zealand English), ACE (australian English), a Frown Corpus (early 1990s U.s. English), & a FLOB Corpus (1990s British English). More corpora represent several languages, varieties & modes, & include A British National Corpus, the 100 million word collection of the range of spoken & written texts, created in the Nineties by a pool of publishers, universities (Oxford and Lancaster) and a British Library. There is a task afoot to produce an American National Corpus.
See further
corpus
concordance (KWIC)
collocation
keyword
lexical profile
machine translation
semantic prosody
translation memory
|