Home >> Science >> Social Sciences >> Linguistics >> Computational Linguistics >> Corpus Analysis


  WordNet
       

  Tools
       


Corpus linguistics is the learn of language every bit expressed within samples (corpora) or "real world" text. A approach diarrhea counter to Noam Chomsky's view that rattling language is riddled by using performance-related errors, so requiring careful analysis of little speech samples found around the extremely restricted laboratory setting. Corpus L=linguistics does away by having Chomsky's competence/performance split, viewing that i personally may simply ever dependably analyse language whenever a investigator doesn't interfere.

Inside a few areas there exists an overlap by owning computational linguistics, as a latter moves towards language processing applications. This means treating by using really input file, in which descriptions according to the linguist's intuition are non ordinarily helpful.

a landmark around modern corpus linguistics was the publication by Henry Kucera and Nelson Francis of Computational Analysis of Present-Day American English inside 1967, a operate according to the analysis of the Brown Corpus, the carefully compiled choice of todays Western English, totalling astir the million words drawn from either a wide kind of sources. Kucerthe & Francis subjected it to the kind of computational analyses, from either which it compiled a rich & varicolored piece, combining elements of linguistics, language teaching, psychological science, actual cost, & sociology.

Shortly thenceforth Boston publisher Houghton-Mifflin approached Kucerthe to supply a million word, deuce-ace-line citation base for its newly American Heritage Dictionary, the number one lexicon to become compiled utilizing corpus linguistics. A AHD manufactured a innovative step of combining normative elements (how else language should become utilized) using descriptive reference (how else it actually is utilized).

More publishers followed lawsuit. A British publisher Collins' COBUILD dictionaries, designed for users learning English as a foreign language, were compiled using the Bank of English.

A Brown Corpus has also spawned a total of likewise integrated corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (Just released Zealand English), ACE (australian English), a Frown Corpus (early 1990s U.s. English), & a FLOB Corpus (1990s British English). More corpora represent several languages, varieties & modes, & include A British National Corpus, the 100 million word collection of the range of spoken & written texts, created in the Nineties by a pool of publishers, universities (Oxford and Lancaster) and a British Library. There is a task afoot to produce an American National Corpus.

See further

corpus concordance (KWIC) collocation keyword lexical profile machine translation semantic prosody translation memory

International Journal of Corpus Linguistics
A journal published twice a year, presenting articles from linguists, lexicographers and language engineers. Contents, abstracts, submission information.

A Logical Approach to Computational Corpus Linguistics
A 1996 thesis by Torbjörn Lager. Abstract available, as well as full text in PostScript and PDF formats.

Shallow Processing of Large Corpora Workshop 2003
Held at Lancaster University. Presented papers are available in PDF format.

Centre for Corpus Research
At the University of Birmingham, England. Information on programmes, research and available resources.

American National Corpus
Information about this massive database of American English in use, which is not accessible to the public.

Centre for English Corpus Linguistics
At the Catholic University of Leuven, this institute focuses on cross-linguistic corpora and learner corpora. Research, events, staff, publications.

Corpus Encoding Standard
Application of SGML to corpus encoding. Covers the standard and projects currently using it.

Hungarian National Corpus
More than 150 million Hungarian words, a model of Hungarian language of the 1990s. Free and extensive query system. [Hungarian, English]

Clitic climbing in electronic corpora
Thesis study by Kertes Gábor that analyses the phenomenon of clitic climbing or clitic promotion. [Parallel Spanish and English]

Corpus Linguistics
Online lessons intended to supplement the book by Tony McEnery and Andrew Wilson. Introductory information on the field.






© 2005 GeneralAnswers.org