Complexity of syntactic constructions

1. Problem and history

The complexity of a syntactic construct is measured in terms of the number of its immediate constituents. The partitioning in immediate constituents can be performed on the basis of any grammar. The first model seems to be that of Köhler and Altmann (2000).

2. Hypothesis

The complexity of syntactic constructions follows the hyper-Pascal distribution.

3. Derivation

The complexity depends on following quantities (Köhler, Altmann 2000:192):

minX – the requirement of minimization of the complexity of a syntactic construction in order to decrease memory effort in processing the construction;

maxH – the requirement of maximazing compactness. This enables us diminishing the complexity of the subordinated level of embedding by embedding constituents into the given level… minX on the level m corresponds to the requirement maxH on the level m+1;

E – a variable representing the average degree of fullness, the default value of complexity;

I(K) – the size of inventory of constructions.

Assumptions: The number of constructions with complexity x+1 is proportional to that with complexity x. maxH increases the probability of a higher complexity, minX decreases it. The greater E, the more complexity is needed to code the individual messages. On the other hand, the greater the inventory size I(K), the less complexity is needed. With these assumptions, we obtain

(1) $P_{x+1}= \frac{maxH + x}{minX + x} \frac{E}{I(K)}P_x, \quad x= 1, 2, ...$

Within a given period of time, the relation E/I(K) can be considered as a constant, say q. Setting E/I(K) = q, maxH = k-1, and minX = m-1, from (1) we get

(2) $P_{x+1} = \frac{k+x-1}{m+x-1}qP_x, \quad x=1, 2, ...$

resulting in

(3) $P_X = \frac{{k+x-2 \choose x-1}}{{m+x-2 \choose x-1}}q^{x-1}P_1 , \quad x=1,2,3...$

where $P_1^{-1}= _2F_1 (k,1;m;q)$ .

Example: Complexity of syntactic constructions in the Negra corpus (Brants 1999) Köhler and Altmann (2000) fitted (3) to the complexity of syntactic constructions in the Negra corpus. The result is presented in Table 1 and Fig. 1.

Since the number of observations is too great, the use of the chi-square is problematic. The authors use the contingency coefficient $C = X^2/N$ which is acceptable. It would be advisable to use single texts instead of corpora.

Fig. 1. Distribution of syntactic complexity in the Negra corpus

4. Authors: G. Altmann

5. References

Brants, T. (1999). Tagging and parsing with cascaded Markov models. Automation of corpus annotation. Saarbrücken: Universität der Saarlandes.

Köhler, R., Altmann, G. (2000). Probability distributions of syntactic units and properties. J. of Quantitative Linguistics 7, 189-200.

Anonymous

Search

Navigation

Navigation

Wiki tools

Wiki tools

Complexity of syntactic constructions

Namespaces

Page actions

Anonymous

Search

Navigation

Wiki tools

Page tools

Complexity of syntactic constructions