# Word associations

1. Problem and history

Giving a word as a stimulus, different persons respond with different words, e.g. “music” as stimulus can evoke “violin”, “Chopin”, “love”, “melody”, etc. Asking many persons one can observe that the frequency of particular responses (associations) is not equal, on the contrary, the response words can be ranked according to their frequency. The problem is to find the adequate ranking (Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \rightarrow ) distribution. Associations are thus both ranking and diversification problems.

Not considering qualitative work and compilation of frequency lists having mostly the character of voluminous books, the first attempt at modelling was probably made by Horvath (1963) who used inductively the Yule distribution. Haight (1966) compared the Borel, the Yule, and the logarithmic distributions with the distribution derived by him for this purpose, called now Haight-zeta distribution (cf. Wimmer, Altmann 1999), but none of them could yield adequate results. Haight and Jones (1974) as well as Lánský and Radil-Weiss (1980) tried another approach but attained good results only in about 50% of cases. Dolinskij (1994, 1988) used for this purpose the Zipf-Alekseev distribution which is a generalization of Zipf distribution. Altmann (1992) has shown that the deviations from this distribution are extremely small (P ≈ 1 in almost all cases) but the foundation of this distribution was established by Hřebíček (1995, 1996, 1997). Altmann (1992) used the usual proportionality approach with speaker-hearer balance and obtained the negative binomial distribution.

2. Hypothesis

The ranking of word associations abides by a regular ranking distribution.

3. Derivation

3.1. The Zipf-Alekseev model (Hřebíček 1997: 43)

Hřebíček starts from two assumptions:

(i) The frequency of an association $f_x$ at any rank $x$ is proportional to the frequency at the first rank Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): f_1 .

(ii) The frequency $f_x$ at rank $x$ is inversely proportional to the rank $x$.

Putting this together, we obtain

$1}{x$

or, in logarithmic form

Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \ln (f_1/f_x) \approx \ln x

The proportionality is given by Menzerath´s law, i.e.

(1) Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): \ln(P_1/P_x) = \ln(Ax^b) \ln x \quad .

Solving for $P_x$ yields

(2)$-(a+b\ln x)$ .

Usually $P_1$ is so important that one gives it a special value, i.e. one modifies the distribution obtaining

(3) $cases} \alpha, & x = 1 \\ \frac{(1-\alpha)x^{-(a+b \ln x)}}{T}, & x = 2,3,...,n \end{cases$ .

with $j=2}^n j^{-(a+b \ln j)$ .

3.2. The negative binomial model (Altmann 1992)

Assumption: The probability $P_x$ at rank $x$ is proprotional to the probability $P_{x-1}$ at rank $x-1$, the coefficient of proportionality being Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): g(x) = (a+bx)/(cx) . If ranking begins with Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): x = 1 , one solves the equation for the displaced form, i.e.

(4) $x+1} = \frac{a+bx}{cx$

and after reparametrization one obtains the 1-displaced negative binomial distribution

(5) $k+x-2 \choose x-1}p^k q^{x-1$

where $a/b = k-1$, $b/c = q$.

Example: Associations of the word “high”

Altmann (1992) used the associations of the word “high” (4th grade, male) from Palermo, Jenkins (1964) and obtained the result presented in Table 1 and Fig. 1.

Both results are excellent.

4. Authors: U. Strauss, G. Altmann, J. Eom

5. References:

Altmann, G. (1992). Two models for word association data. Glottometrika 13, 105-120.

Altmann, G., Bagheri, D., Goebl, H., Köhler, R., Prün, C. (2002). Einführung in die quantitative Lexikologie. Götingen: Peust & Gutschmidt.

Dolinskij, V.A. (1994). Moscow Student´s word associations. In: 2nd International Conference on Quantitative Linguistics, September 20-24, 1994, Moscow: 66-68. Moscow: Lomonosov Moscow State University.

Dolinskij, V.A. (1988). Raspredelenie reakcij v ekseprimentach po verbal´nym associacijam. Acta et Commentationes Universitatis Tartuensis 827, 80-101.

Haight, F.A. (1966). Some statistical problems in connection with word association data. J. of Mathematical Psychology 3, 217-233.

Haight, F.A., Jones, R.B. (1974). A probabilistic treatment of qualitative data with special reference to word association tests. J. of Mathematical Psychology 11, 237-244.

Horvath, W.J. (1963). A stochastic model for word association tests. Psychological Review 70, 361-364.

Hřebíček, L. (1995). Text levels. Language constructs, constituents and Menzerath-Altmann law. Trier: WVT.

Hřebíček, L. (1996). Word associations and text. Glottometrika 15, 12-17.

Hřebíček, L. (1997). Lectures on text theory. Prague: Oriental Institute.

Lánský, P., Radil-Weiss, T. (1980). A generalization of the Yule-Simon model, with special reference to word association tests and neural cell assembly formation. J. of Mathematical Psychology 21, 53-65.

Palermo, D.S., Jenkins, J.J. (1964): Word association norms. Grade School through College. Minneapolis: University of Minnesota Press.

Wimmer, G., Altmann, G. (1999). Thesaurus of univariate discrete probability distributions. Essen: Stamm.