Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
en:start [2016/12/14 21:49] – vaclavcvrcek | en:start [2016/12/14 21:50] – [User support] vaclavcvrcek | ||
---|---|---|---|
Line 91: | Line 91: | ||
===== What is a corpus? ===== | ===== What is a corpus? ===== | ||
- | A language | + | A language corpus is an extensive collection of **authentic textual data** (written or spoken) converted to **electronic form** in a uniform format, meaning that it can easily be **searched** for various linguistic phenomena -- especially words and phrases (collocations). Corpora differ from a plain text archive or database primarily because they have been carefully compiled with the research purpose in mind (they should, for example, represent contemporary spoken language or written language or one of its parts, e.g. journalistic texts). A corpus displays linguistic phenomena in their **natural context**, which allows us to do language research based on actual data on a scale so large that it would have previously been unthinkable. |
Line 103: | Line 103: | ||
<WRAP center round box 64%> | <WRAP center round box 64%> | ||
- | [[en: | + | [[en: |
</ | </ |