Difference between revisions of "Word associations"

m
 
(3 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
Giving a word as a stimulus, different persons respond with different words, e.g. “music” as stimulus can evoke “violin”, “Chopin”, “love”, “melody”, etc. Asking many persons one can observe that the frequency of particular responses (associations) is not equal, on the contrary, the response words can be ranked according to their frequency. The problem is to find the adequate ranking (<math> \rightarrow</math>) distribution. Associations are thus both ranking and diversification problems.
 
Giving a word as a stimulus, different persons respond with different words, e.g. “music” as stimulus can evoke “violin”, “Chopin”, “love”, “melody”, etc. Asking many persons one can observe that the frequency of particular responses (associations) is not equal, on the contrary, the response words can be ranked according to their frequency. The problem is to find the adequate ranking (<math> \rightarrow</math>) distribution. Associations are thus both ranking and diversification problems.
 
   
 
   
Not considering qualitative work and compilation of frequency lists having mostly the character of voluminous books, the first attempt at modelling was probably made by Horvath (1963) who used inductively the Yule distribution. Haight (1966) compared the Borel, the Yule, the logarithmic distributions with the distribution derived by him for this purpose called now Haight-zeta distribution (cf. Wimmer, Altmann 1999) but none of them could yield adequate results. Haight and Jones (1974) as well as Lánský and Radil-Weiss (1980) tried another approach but attained good results only in about 50% of cases. Dolinskij (1994, 1988) used for this purpose the Zipf-Alekseev distribution which is a generalization of Zipf distribution. Altmann (1992) has shown that the deviations from this distribution are extremely small (P ≈ 1 in almost all cases) but the foundation of this distribution was performed by Hřebíček (1995, 1996, 1997). Altmann (1992) used the usual proportionality approach with speaker-hearer balance and obtained the negative binomial distribution.
+
Not considering qualitative work and compilation of frequency lists having mostly the character of voluminous books, the first attempt at modelling was probably made by Horvath (1963) who used inductively the Yule distribution. Haight (1966) compared the Borel, the Yule, and the logarithmic distributions with the distribution derived by him for this purpose, called now Haight-zeta distribution (cf. Wimmer, Altmann 1999), but none of them could yield adequate results. Haight and Jones (1974) as well as Lánský and Radil-Weiss (1980) tried another approach but attained good results only in about 50% of cases. Dolinskij (1994, 1988) used for this purpose the Zipf-Alekseev distribution which is a generalization of Zipf distribution. Altmann (1992) has shown that the deviations from this distribution are extremely small (P ≈ 1 in almost all cases) but the foundation of this distribution was established by Hřebíček (1995, 1996, 1997). Altmann (1992) used the usual proportionality approach with speaker-hearer balance and obtained the negative binomial distribution.
 +
 
  
 
'''2. Hypothesis'''
 
'''2. Hypothesis'''
  
 
''The ranking of word associations abides by a regular ranking distribution''.
 
''The ranking of word associations abides by a regular ranking distribution''.
 +
  
 
'''3. Derivation'''
 
'''3. Derivation'''
 +
  
 
'''3.1. The Zipf-Alekseev model''' (Hřebíček 1997: 43)
 
'''3.1. The Zipf-Alekseev model''' (Hřebíček 1997: 43)
Line 15: Line 18:
 
Hřebíček starts from two assumptions:
 
Hřebíček starts from two assumptions:
 
   
 
   
(i) The frequency of an association <math>f_x</math> at any rank x is proportional to the frequency at the first rank <math>f_1</math>.
+
(i) The frequency of an association <math>f_x</math> at any rank <math>x</math> is proportional to the frequency at the first rank <math>f_1</math>.
 +
 
 +
(ii) The frequency <math>f_x</math> at rank <math>x</math> is inversely proportional to the rank <math>x</math>.
  
(ii) The frequency fx at rank x is inversely proportional to the rank x.
 
 
Putting this together, we obtain  
 
Putting this together, we obtain  
  
Line 24: Line 28:
 
or, in logarithmic form
 
or, in logarithmic form
  
<math>\ln (f_1/f_2) \approx \ln x</math>  
+
<math>\ln (f_1/f_x) \approx \ln x</math>  
  
 
The proportionality is given by Menzerath´s law, i.e.
 
The proportionality is given by Menzerath´s law, i.e.
  
(1)<math>\ln(P_1/P_2) = \ln(Ax^b) \ln x \quad</math> .
+
(1) <math>\ln(P_1/P_x) = \ln(Ax^b) \ln x \quad</math> .
  
Solving for fx yields
+
Solving for <math>P_x</math> yields
  
 
(2)<math> P_x = P_1 x ^{-(a+b\ln x)}\quad</math>.
 
(2)<math> P_x = P_1 x ^{-(a+b\ln x)}\quad</math>.
Line 36: Line 40:
 
Usually <math>P_1</math> is so important that one gives it a special value, i.e. one modifies the distribution obtaining
 
Usually <math>P_1</math> is so important that one gives it a special value, i.e. one modifies the distribution obtaining
  
(3)<math> P_x = \begin{cases} \alpha, & x = 1 \\ \frac{(1-\alpha)x^{-(a+b \ln x)}}{T}, & x = 2,3,...,n \end{cases}</math>.
+
(3) <math> P_x = \begin{cases} \alpha, & x = 1 \\ \frac{(1-\alpha)x^{-(a+b \ln x)}}{T}, & x = 2,3,...,n \end{cases}</math>.
  
 
with  <math> T = \sum_{j=2}^n j^{-(a+b \ln j)}, a, b \epsilon \Re \quad, n  \epsilon N, \quad 0 < \alpha < 1</math>.
 
with  <math> T = \sum_{j=2}^n j^{-(a+b \ln j)}, a, b \epsilon \Re \quad, n  \epsilon N, \quad 0 < \alpha < 1</math>.
 +
  
 
'''3.2. The negative binomial model''' (Altmann 1992)
 
'''3.2. The negative binomial model''' (Altmann 1992)
  
Assumption: The probability <math>P_x</math> at rank x is proprotional to the probability <math>P_{x-1}</math> at rank x-1, the proportionality being g(x) = (a+bx)/(cx). If ranking begins with x = 1, one solves  the equation for the displaced form, i.e.
+
Assumption: The probability <math>P_x</math> at rank <math>x</math> is proprotional to the probability <math>P_{x-1}</math> at rank <math>x-1</math>, the coefficient of proportionality being <math>g(x) = (a+bx)/(cx)</math>. If ranking begins with <math>x = 1</math>, one solves  the equation for the displaced form, i.e.
  
(4)<math> P_{x+1} = \frac{a+bx}{cx}P_x</math>  
+
(4) <math> P_{x+1} = \frac{a+bx}{cx}P_x</math>  
  
 
and after reparametrization one obtains the 1-displaced negative binomial distribution
 
and after reparametrization one obtains the 1-displaced negative binomial distribution
  
(5)<math> P_x = {k+x-2 \choose x-1}p^k q^{x-1}, \quad x=1,2,3,...</math>  
+
(5) <math> P_x = {k+x-2 \choose x-1}p^k q^{x-1}, \quad x=1,2,3,...</math>  
  
where a/b = k-1, b/c = q.
+
where <math>a/b = k-1</math>, <math>b/c = q</math>.
  
 
'''Example''': Associations of the word “high”
 
'''Example''': Associations of the word “high”
Line 62: Line 67:
  
 
Both results are excellent.
 
Both results are excellent.
 +
  
 
'''4. Authors:''' U. Strauss, G. Altmann, J. Eom
 
'''4. Authors:''' U. Strauss, G. Altmann, J. Eom
 +
  
 
'''5. References:'''  
 
'''5. References:'''  

Latest revision as of 11:34, 16 June 2009

1. Problem and history

Giving a word as a stimulus, different persons respond with different words, e.g. “music” as stimulus can evoke “violin”, “Chopin”, “love”, “melody”, etc. Asking many persons one can observe that the frequency of particular responses (associations) is not equal, on the contrary, the response words can be ranked according to their frequency. The problem is to find the adequate ranking ( \rightarrow) distribution. Associations are thus both ranking and diversification problems.

Not considering qualitative work and compilation of frequency lists having mostly the character of voluminous books, the first attempt at modelling was probably made by Horvath (1963) who used inductively the Yule distribution. Haight (1966) compared the Borel, the Yule, and the logarithmic distributions with the distribution derived by him for this purpose, called now Haight-zeta distribution (cf. Wimmer, Altmann 1999), but none of them could yield adequate results. Haight and Jones (1974) as well as Lánský and Radil-Weiss (1980) tried another approach but attained good results only in about 50% of cases. Dolinskij (1994, 1988) used for this purpose the Zipf-Alekseev distribution which is a generalization of Zipf distribution. Altmann (1992) has shown that the deviations from this distribution are extremely small (P ≈ 1 in almost all cases) but the foundation of this distribution was established by Hřebíček (1995, 1996, 1997). Altmann (1992) used the usual proportionality approach with speaker-hearer balance and obtained the negative binomial distribution.


2. Hypothesis

The ranking of word associations abides by a regular ranking distribution.


3. Derivation


3.1. The Zipf-Alekseev model (Hřebíček 1997: 43)

Hřebíček starts from two assumptions:

(i) The frequency of an association f_x at any rank x is proportional to the frequency at the first rank f_1.

(ii) The frequency f_x at rank x is inversely proportional to the rank x.

Putting this together, we obtain

 f_x\approx f_1 \frac{1}{x}

or, in logarithmic form

\ln (f_1/f_x) \approx \ln x

The proportionality is given by Menzerath´s law, i.e.

(1) \ln(P_1/P_x) = \ln(Ax^b) \ln x \quad .

Solving for P_x yields

(2) P_x = P_1 x ^{-(a+b\ln x)}\quad.

Usually P_1 is so important that one gives it a special value, i.e. one modifies the distribution obtaining

(3)  P_x = \begin{cases} \alpha, & x = 1 \\ \frac{(1-\alpha)x^{-(a+b \ln x)}}{T}, & x = 2,3,...,n \end{cases}.

with  T = \sum_{j=2}^n j^{-(a+b \ln j)}, a, b \epsilon \Re \quad, n  \epsilon N, \quad 0 < \alpha < 1.


3.2. The negative binomial model (Altmann 1992)

Assumption: The probability P_x at rank x is proprotional to the probability P_{x-1} at rank x-1, the coefficient of proportionality being g(x) = (a+bx)/(cx). If ranking begins with x = 1, one solves the equation for the displaced form, i.e.

(4)  P_{x+1} = \frac{a+bx}{cx}P_x

and after reparametrization one obtains the 1-displaced negative binomial distribution

(5)  P_x = {k+x-2 \choose x-1}p^k q^{x-1}, \quad x=1,2,3,...

where a/b = k-1, b/c = q.

Example: Associations of the word “high”

Altmann (1992) used the associations of the word “high” (4th grade, male) from Palermo, Jenkins (1964) and obtained the result presented in Table 1 and Fig. 1.

Tabelle11 WA.jpg


Grafik1 WA.jpg

Both results are excellent.


4. Authors: U. Strauss, G. Altmann, J. Eom


5. References:

Altmann, G. (1992). Two models for word association data. Glottometrika 13, 105-120.

Altmann, G., Bagheri, D., Goebl, H., Köhler, R., Prün, C. (2002). Einführung in die quantitative Lexikologie. Götingen: Peust & Gutschmidt.

Dolinskij, V.A. (1994). Moscow Student´s word associations. In: 2nd International Conference on Quantitative Linguistics, September 20-24, 1994, Moscow: 66-68. Moscow: Lomonosov Moscow State University.

Dolinskij, V.A. (1988). Raspredelenie reakcij v ekseprimentach po verbal´nym associacijam. Acta et Commentationes Universitatis Tartuensis 827, 80-101.

Haight, F.A. (1966). Some statistical problems in connection with word association data. J. of Mathematical Psychology 3, 217-233.

Haight, F.A., Jones, R.B. (1974). A probabilistic treatment of qualitative data with special reference to word association tests. J. of Mathematical Psychology 11, 237-244.

Horvath, W.J. (1963). A stochastic model for word association tests. Psychological Review 70, 361-364.

Hřebíček, L. (1995). Text levels. Language constructs, constituents and Menzerath-Altmann law. Trier: WVT.

Hřebíček, L. (1996). Word associations and text. Glottometrika 15, 12-17.

Hřebíček, L. (1997). Lectures on text theory. Prague: Oriental Institute.

Lánský, P., Radil-Weiss, T. (1980). A generalization of the Yule-Simon model, with special reference to word association tests and neural cell assembly formation. J. of Mathematical Psychology 21, 53-65.

Palermo, D.S., Jenkins, J.J. (1964): Word association norms. Grade School through College. Minneapolis: University of Minnesota Press.

Wimmer, G., Altmann, G. (1999). Thesaurus of univariate discrete probability distributions. Essen: Stamm.