Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:pojmy:lexikalni_bohatost [2024/09/30 18:17] – [Lexical Diversity] alexandrrosen | en:pojmy:lexikalni_bohatost [2024/10/18 21:07] (current) – [References] alexandrrosen | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| * InterCorp release 16ud is annotated by following two measures of lexical diversity. They are specified as metadata for each text of sufficient length, for each linguistically annotated language: | * InterCorp release 16ud is annotated by following two measures of lexical diversity. They are specified as metadata for each text of sufficient length, for each linguistically annotated language: | ||
| - | * '' | + | * **lexDivWord**: average number of different word forms per 1000 tokens |
| - | * '' | + | * **lexDivLemma**: average number of different lemmas per 1000 tokens |
| - | * The measures are based on the type-token ratio. They show the average number of different types (word forms or lemmas) in a moving window of 1000 tokens. Punctuation is ignored. | + | * The measures are based on the type-token ratio metrics. They show the average number of different types (word forms or lemmas) in a moving window of 1000 tokens. Punctuation is ignored. |
| * If the text has less than 1000 tokens, the measures are not defined and the value of both attributes equals the underscore character ('' | * If the text has less than 1000 tokens, the measures are not defined and the value of both attributes equals the underscore character ('' | ||
| * For languages which are not linguistically annotated, only the measure counting word forms ('' | * For languages which are not linguistically annotated, only the measure counting word forms ('' | ||
| * In KonText, they can be displayed and queried like any other metadata items about a text, such as author or text ID. | * In KonText, they can be displayed and queried like any other metadata items about a text, such as author or text ID. | ||
| - | * | + | * Average values for all combinations of a language and a text type in InterCorp v16ud are shown in the table on [[https:// |
| * See also [[en: | * See also [[en: | ||
| Line 18: | Line 18: | ||
| ===== References ===== | ===== References ===== | ||
| - | [[https://docs.google.com/document/d/1nSPzyhT6oHKUDN8A_uYmWrZH6tAmxTH_pUMOdjg01Eg/edit? | + | Alexandr Rosen (2024): Lexical and syntactic variability |
| + | of languages and text genres – a corpus-based study. | ||
| - | [[https:// | + | |
| + | Alexandr Rosen (2024). | ||