Among the linguistic units, the word is probably the most prominent object with respect to the number of quantitative studies which have been published - despite of the difficulties of a satisfactory definition. One of the most common ways to operationalise the word as unit of investigation is the typographical one: Any string of letters delimited by spaces or interpunctation is counted as a word. This definition is, of course, the easiest but also the least meaningful one from a linguistic point of view. For every kind of study, the researches has to decide which of the possible definitions of word will serve the given purpose best. The decision will also depend on factors such as which properties of a word is to be measured. If, e.g. word length lies in the focus of interest, another definition may be appropriate than if part-of-speech distribution or polysemy is investigated.

The following properties of words (as well as their distributions and interrelations between each other) have been studied in quantitative linguistics so far: frequency, length, polysemy, homonymy, age, polytextuality, level of semantic generality/specifity, motivation (compositionality of meaning), origin, ...

