Difference between revisions of "Complexity of syntactic constructions"

Line 1: Line 1:
 
'''1. Problem and history'''
 
'''1. Problem and history'''
  
The complexity of a syntactic construct is measured in terms of the number of its immediate constituents.The partitioning in immediate constituents can be performed on the basis of any grammar.
+
The complexity of a syntactic construct is measured in terms of the number of its immediate constituents. The partitioning in immediate constituents can be performed on the basis of any grammar.
 
The first model seems to be that of Köhler and Altmann (2000).
 
The first model seems to be that of Köhler and Altmann (2000).
  
Line 20: Line 20:
 
I(K) – the size of inventory of constructions.  
 
I(K) – the size of inventory of constructions.  
  
Assumptions: The number of constructions with complexity x is proportional to that with complexity x-1. maxH increases the probability of a higher complexity, minX decreases it. E nad I(K) are inversely proportional to each other: the greater the inventory, the less complexity is needed.  
+
Assumptions: The number of constructions with complexity x+1 is proportional to that with complexity x. maxH increases the probability of a higher complexity, minX decreases it. E and I(K) are inversely proportional to each other: the greater the inventory, the less complexity is needed.  
 
Putting these assumptions together, we obtain
 
Putting these assumptions together, we obtain
  
Line 34: Line 34:
 
, \quad x=1,2,3...</math>
 
, \quad x=1,2,3...</math>
 
 
 
 
where<math> P_1^{-1}= _2 F_1 (k,1;m;q)</math>.
+
where<math> P_1^{-1}= _2F_1 (k,1;m;q)</math>.
  
 
'''Example''': Complexity of syntactic constructions in the Negra corpus (Brants 1999)
 
'''Example''': Complexity of syntactic constructions in the Negra corpus (Brants 1999)
Line 53: Line 53:
 
'''Brants, T'''. (1999). ''Tagging and parsing with cascaded Markov models. Automation of corpus annotation''. Saarbrücken: Universität der Saarlandes.
 
'''Brants, T'''. (1999). ''Tagging and parsing with cascaded Markov models. Automation of corpus annotation''. Saarbrücken: Universität der Saarlandes.
  
'''Köhler, R., Altmann, G.''' (2000). Probability distributions of syntactic units and properties. ''J. of Qwuantitative Linguistics 7, 189-200''.
+
'''Köhler, R., Altmann, G.''' (2000). Probability distributions of syntactic units and properties. ''J. of Quantitative Linguistics 7, 189-200''.

Revision as of 21:50, 14 November 2006

1. Problem and history

The complexity of a syntactic construct is measured in terms of the number of its immediate constituents. The partitioning in immediate constituents can be performed on the basis of any grammar. The first model seems to be that of Köhler and Altmann (2000).

2. Hypothesis

The complexity of syntactic constructions follows the hyper-Pascal distribution.

3. Derivation

The complexity depends on following quantities (Köhler, Altmann 2000:192):

minX – the requirement of minimization of the complexity of a syntactic construction in order to decrease memory effort in processing the construction;

maxH – the requirement of maximazing compactness. This enables us diminishing the complexity of the subordinated level of embedding by embedding constituents into the given level… minX on the level m corresponds to the requirement maxH on the level m+1;

E – a variable representing the average degree of fullness, the default value of complexity;

I(K) – the size of inventory of constructions.

Assumptions: The number of constructions with complexity x+1 is proportional to that with complexity x. maxH increases the probability of a higher complexity, minX decreases it. E and I(K) are inversely proportional to each other: the greater the inventory, the less complexity is needed. Putting these assumptions together, we obtain

(1) P_{x+1}= \frac{maxH + x}{minX + x} \frac{E}{I(K)}P_x, \quad x= 1, 2, ...

Setting maxH = k-1, minX = m-1 and E/I(K) = q yields

(2) P_{x+1} = \frac{k+x-1}{m+x-1}qP_x, \quad x=1, 2, ...

resulting in

(3) P_X = \frac{{k+x-2 \choose x-1}}{{m+x-2 \choose x-1}}q^{x-1}P_1
, \quad x=1,2,3...

where P_1^{-1}= _2F_1 (k,1;m;q).

Example: Complexity of syntactic constructions in the Negra corpus (Brants 1999) Köhler and Altmann (2000) fitted (3) to the complexity of syntactic constructions in the Negra corpus. The result is presented in Table 1 and Fig. 1.

Tabelle11 CoSC.jpg

Since the number of observations is too great, the use of the chi-square is problematic. The authors use the contingency coefficient C = X^2/N which is acceptable. It would be advisable to use single texts instead of corpora.

Grafik1 CoSC.jpg
Fig. 1. Distribution of syntactic complexity in the Negra corpus


4. Authors: G. Altmann

5. References

Brants, T. (1999). Tagging and parsing with cascaded Markov models. Automation of corpus annotation. Saarbrücken: Universität der Saarlandes.

Köhler, R., Altmann, G. (2000). Probability distributions of syntactic units and properties. J. of Quantitative Linguistics 7, 189-200.