1. Problem and history

“Reference” is any sign joining a sentence with any preceding sentence. References are the means for making the text a compact entity. They are the basis for evaluating text cohesion. In qualitative linguistics there is an ample literature on different kinds of references and their behavior. The sentences joined by a commoh reference are called hrebs.

The only law, known in the literature as “Hřebíček´s reference law”, originates from Hřebíček´s (1985) derivation. Altmann (1988: 81-85) proposed merely some further problems for investigation. The law was corroborated on many Turkish texts. Formula (4) supports Herdan´s version of the type-token ratio ($\rightarrow$).

2. Hypothesis

The number of references in text depends on the number of words and the number of sentences.

“Word” is every word-like entity (token) in the text. “Sentence” in written texts is demarcated by orthographic signs.

3. Derivation

Let

Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): r = number of references in text

Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): s = number of sentences in text

$n$ = number of word tokens in text

Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): v = number of word types in text (vocabulary of the text)

$w$ = vocabulary richness

Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): a, b, c, A, B = constants

Hřebíček´s assumptions:

(i) the richer the vocabulary, the smaller the number of references,

(ii) the more sentences in the text, the greater the number of references.

The change in the number of references relative to the change in the vocabulary richness is proportional to the number of sentences,

$\partial r}{\partial w$ ,

and, at the same time, the change in the number of references relative to the change in the number of sentences is proportional to the vocabulary richness of the text,

$\partial r}{\partial s$ .

This yields the following solution<:

(1) Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): r=csw\quad (Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): c = AB ).

Taking the simplest interpretation of vocabulary richness $w$ as

$v}{n$ ,

we obtain

(2) $v}{n$ .

Using Herdan´s (1966: 76) type-token ratio ($\rightarrow$) to express the vocabulary of the text as a power function of its length,

(3) Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): v=n^a,\quad 0 < a < 1 ,

and inserting this in (2), one obtains

(4) $a-1$ ,

which meets assumptions (i) and (ii).

Example: The course of references in a Turkish text

Hřebíček (1992) examined the course of references in several Turkish texts. One of these cases is shown in Table 1.

4. Authors: U. Strauss, G. Altmann

5. References

