Phoneme frequency

Revision as of 08:06, 19 July 2006 by Ahans (talk | contribs)

1. Problem and history

The problem is to find a function or a distribution for the phoneme frequencies of a text or of a corpus. Sometimes letters or even sounds are counted, which is fully justified. In the same way one could count e.g. the syllables of the Japanese katakana or hiragana. The number of examinations is enormous, some of them give the absolute frequencies, other ones merely the proportions.

The counting began in the 19th century (Förstemann 1846, 1852; Meyer 1869; Bourdon 1892) and developed quickly on practical grounds: stenographers, printers, constructors of typewriters, decoders, etc. needed urgently the frequency of letters for their own purposes. Förstemann and Meyer pursued comparative aims, e.g. the problem of the relation between consonants and vowels in the examined languages (Old Indian, Greek, Latin and Gothic) and its impact for the development of languages.

The first who considered phonemes from the frequency point of view and set up hypotheses was G.K. Zipf (1929, 1935, 1949). Afterwards a great number of works appeared using phoneme frequencies for finding other interrelations. The first empirical model, namely the the geometric (and the right truncated geometric) distribution, was proposed by Sigurd (1968). Good (1969) brought a partial-sums distribution (Whitworth distribution) whose modelling was revived in word length (\rightarrow) research. Tuldava (1971/1995) considered different possibilities, Altmann (1993) used the synergetic way of modelling and derived a special function for this purpose. Martindale, Gusein-Zade, Mckenzie and Borodovsky (1996) compared several curves (functions) and many data in order to find the “best” model. Altmann and Lehfeldt (1980) and Zörnig, Altmann (1983, 1984) developed hypotheses on the entropy (\rightarrow) and the repeat rate (\rightarrow) of phonemes, Kubáček (1994) derived the formula for the necessary size of the phoneme count in order to attain confident counts. Naranan and Balasubrahmanyan (1998, 2000) developed a theory from which different curves for phoneme frequencies are derivable.

Not all arguments holding for word frequencies are valid in this domain. The modelling has been performed in two ways: (i) a continuous curve has been fitted to the proportions of phonemes, (ii) a discrete distribution has been fitted. It can be shown that continuous curves have their analogues in discrete distributions.


2. Hypothesis

The ranked frequencies of phonemes follow a regular probability function or a regular monotone decreasing function.

The result depends on whether one considers the ranked frequencies as a discrete distribution (normalized) or merely a regular series approximated by a continuous function (not normalized).


3. Derivation

The formulas used up to now can be derived from different approaches.


3.1. Tuldava´s approach (1988)

This approach can be represented by the simple differential equation

(1)  y' = \frac{b}{x}

telling that the change of frequency (y) is inversely proportional to the rank (x) and yielding

(2)y = a + b \ln x\quad ,

where b is negative. This curve is frequently used in other domains, too (cf. also Martindale et al. 1996; Laherrère, Sornette 1998).


3.2. Derivations related to the unified theory (→) are


(a) Zipf´s law (zeta function)

When formula (2) of the unified theory (\rightarrow) is used with a_0 = a_2 = a_3 = ... = 0, a_1 = -b, this yields

(3) \frac{dy}{y} = -\frac{b}{x}dx

telling that the relative rate of change of frequency is proportional to the relative rate of change of rank, resulting in

(4)y = Ax^{-b}\quad.

This is, perhaps, the most disseminated formula in linguistics representing the power law.


(b) Yule´s species/genera function (1924)

When formula (2) of the unified theory is used with a_0 = c_', a_1 = b, a_2 = a_3 = ... = =, this yields

(5) \frac{dy}{y}	= \left( c-\frac{b}{x} \right)dx

resulting in

(6) y= ae^{cx}x^{-b} = ad^x x^{-b}\quad.


(c) Naranan and Balasubrahmanyan´s (1992a,b, 2000) function

When formula (2) of the unified theory is used with a_0 = 0, a_3 = a_4 = ... = 0,, this yields


(7) \frac{dy}{y}	= \left( -\frac{a_1}{x} + \frac{a_2}{x^2} \right)dx

resulting in

(8) y= Ce^{-a_2/x}x^{-a_1},

derived by the authors in a different way.


(d) Altmann´s ranking function (1993)

Using formula (11) of the unified theory, which can be written as

(9) y_x =  \left( 1+a_0\frac{a_1}{(x-b_1)^{c_1}} + \frac{a_2}{(x-b_2)^{c_2}} \right)y_{x-1},

and reparametrizing a_i = 0 (i=0,2,3,...)c_1 = 1, yields

(10) y_x =  \left( 1+\frac{a_1}{x-b_1} \right)y_{x-1} .

Upon setting b_1 = -a, a_1 - b_1 = b, this results in

(11) y_x = \frac{\begin{pmatrix} b + x \\ x - 1 \end{pmatrix}}{\begin{pmatrix} a + x \\ x - 1 \end{pmatrix}}y_1\quad, x = 1,2,3,...

This proved to be a very good model for letter distribution in English and German (Best 2005).

All these formulas can be transformed in distributions by appropriate normalizing. Several distributions have been derived directly, namely


(e) Geometric distribution

Sigurd (1968) used simply the 1-displaced geometric distribution. It can be obtained from formula (9) setting a_i = 0 (i = 1,2,3,...), which yields

(12)y_{x+1}= (1+a_0)y_x\quad.

For -1 < a_0 < 0, 1+a_0 = q, 1-q = p, y_x = P_x one obtains the usual (1-displaced) geometric distribution

(13)P_x = pq^{x-1},\quad x = 1,2,3,...

The same result was proposed also by Orlov, Boroda, Nadarejšvili (1982). Treating directly the relative frequencies one can write (13) as

(14) y_x = y_1 q^{x-1},\quad x=1,2,3,...


(f) Negative hypergeometric distribution

A systematic analysis of Slavic languages and German (Grzybek & Kelih 2003, 2003a,b, 2005, 2006b; Grzybek, Kelih, & Altmann 2004, 2006a,b; Best 2005a,b) showed that the most stable distribution for letter frequencies follows from the unified theory by setting a_1 = (K+n-1)(-K+M+1)(-K+M-n), a_2 = (M-1)(K-M+n), a_0 = b_2 = 0, b_1 = -K+M-n,yielding

(15) P_x = \frac{(M+x-1)(K-M+n-x)}{x(n-x+1)}P_{x-1}

from which

(16) P_x = \frac{\begin{pmatrix} M+x-1 \\ x \end{pmatrix}\begin{pmatrix} K-M+n-x-1 \\ n-x \end{pmatrix}}{\begin{pmatrix} K+n-1 \\ n \end{pmatrix}} = \frac{\begin{pmatrix} -M \\ x \end{pmatrix}\begin{pmatrix} -K+M \\ n-x \end{pmatrix}}{\begin{pmatrix} -K \\ n \end{pmatrix}}\quad x= 0,1,...,n

which is usually displaced by 1 step to the right.


3.3. Partial-sums distributions (Good 1969)

Good (1969) introduced a new distribution, mentioned in Martindale et al. (1996). It is a so-called partial-sums distribution, namely a “sterred” discrete uniform distribution (cf. Wimmer, Altmann 1999). Their provenience is shown in the chapter on Word frequency (à). The Good distribution has the form

(17) P_x = \frac{1}{n}\sum_{i=x}^n \frac{1}{i},\quad x=1,2,...,n.

Example. Frequency of phonemes in Hawaiian

In Table 1 and Fig. 1 one can find the fitting of the above formulas to the relative frequencies of Hawaiian phonemes. If functions are used, normalizing is not necessary.


Tabelle11 PF.jpg


Except for the geometric series, all of them yield in this case a good – approximately equal – fitting. In Fig. 1, only fitting of (11) is shown.


Grafik11 PF.jpg
Fig. 1. Fitting function (11) to Hawaiian phoneme frequencies


4. Authors: U. Strauss, G. Altmann, K.-H. Best


5. References

Alekseev, P.M. (1973). Häufigkeitswörterbücher und Verfahren ihrer Erarbeitung. In: Alexejew, P.M., Kalinin, W.M., Piotrowski, R.G. (eds.), Sprachstatistik: 86-143. München: Fink.

Altmann, G. (1993). Phoneme counts. Glottometrika 14, 55-70.

Altmann, G., Bagheri, D., Goebl, H., Köhler, R., Prün, C. (2002). Einführung in die quantitative Lexikologie. Göttingen: Peust & Gutschmidt.

Altmann, G., Lehfeldt, W. (1980). Einführung in die quantitative Phonologie. Bochum: Brockmeyer.

Andreev, N.D. (ed.) (1965). Statistiko-kombinatornoe modelirovanie jazykov. Moskva-Lenin-grad: Nauka.

Andreev, N.D. (1965a). Opyt statistiko-kombinatornogo vydelenija pervogo morfologičeskogo tipa v vengerskom jazyke. In: Andreev 1965: 205-211.

Andreev, N.D. (1967). Statistiko-kombinatornye metody v teoretičeskom i prikladnom jazyko-znanii. Leningrad: Nauka.

Andreeva, L.D., Kordi, E.E., Smirnova, L.N., Fedulova, N.I., Fitialova, I.B., Fichman, B.S. (1965). Polučenie pervogo morfologičeskogo tipa russkogo jazyka v pod˝jazyke radioelektroniki posredstvom algoritma statistiko-kombinatornogo modelirovanija. In: Andreev 1965: 49-64.

Attneave, F. (1953). Psychological probability as a function of experienced frequency. J. of Experimental Psychology 46, 81-86.

Avram, A. (1964). Some thoughts on the functional yield of phonemic oppositions. Linguistics 5, 40-47.

Bauer, F.L. (³2000). Entzifferte Geheimnisse. 3., überarbeitete und erweiterte Auflage. Berlin/ Heidelberg: Springer.

Bektaev, K.B. (1973). Alfavitno-častotnyj slovar´ slogov kazzachskogo jazyka. In: Statistika kazachskogo teksta 3: 566-611. Alma-Ata: Nauka.

Belevitch, V. (1956). Théorie de l´information et statistique linguistique. Bulletin de la Classe des Sciences Académie Royale de Belgique 419-436.

Belonogov, G.G., Frolov, G.D. (1963). Empiričeskie dannye o raspredelenii bukv v russkoj pis´mennoj reči. Problemy kibernetiki, Vyp. 9, 287-305.

Benkö, L., Samu, I. (1972). The Hungarian language. Budapest: Akadémiai Kiadó.

Berger, K.W. (1967). A study of printed Pilipino usage. Phonetica 17, 31-37.

Bergmann, H. (1986). Einige Ergebnisse der Phonemstatistik. Abhandlungen der Heidelberger Akademie der Wissenschaften, Philosophisch-historische Klasse 1986, 5-19.

Best, K.-H. (²2003). Quantitative Linguistik: Eine Annäherung. 2., überarb. u. erw. Auflage. Göttingen: Peust & Gutschmidt.

Best, K.-H. (2005). Buchstabenhäufigkeiten im Deutschen und Englischen. Naukovij visnik Černivec´kogo universitetu vypusk 231, Germans´ka filologija, 119-127.

Best, K.-H. (2005a). Zur Häufigkeit von Buchstaben, Leerzeichen und anderen Schriftzeichen in deutschen Texten. Glottometrics 11, 9-31.

Best, K.-H. (2005b). Laut- und Phonemhäufigkeiten im Deutschen. Göttinger Beiträge zur Sprachwissenschaft 10, 21-32.

Beutelspacher, A. (41994). Kryptologie. 4., abermals leicht verbesserte Auflage. Braunschweig/ Wiesbaden: Vieweg.

Bhagvat, S.V. (1961). Phonemic frequencies in Marathi and their relation to devising a speed-script. Poona: Deccan College.

Boldrini, M. (1948). Le statistiche letterarie e i fonemi elementari nella poesia. Milano.

Bosák, J. (1965). Frequency of phonemes and letters in Slovak and numerical expression of some phonemic relations. Jazykovedný časopis 14, 120-130.

Bourne, C.P., Ford, D.F. (1961). A study of the statistics of letters in English words. Information and Control 4, 48-61.

Bourdon, B. (1892). L´expression des émotions et des tendences dans le langage. Paris: Alcan.

Carroll, J.B. (1962). Transitional probabilities of English phonemes. Cambridge, Mass.

Chol´m, Ch.A. (1965). Vydelenie pervogo morfologičeskogo tipa v estonskom jazyke na osnove statistiko-kombinatornogo modelirovanija v pod˝jazyke radioelektroniki. In: Andreev (1965): 212-218.

Čistjakov, V.F. (1972). Častotnosti glasnych i soglasnych v 50 jazykach raznogo gramma-tičeskogo stroja. Lingua Posnaniensis 16, 45-48.

Csehély, A. (1943). A magyar magánhangzók eloszlása. Magyar Nyelv 1943, 64-65.

Deitz, P. (1952). The relative frequency of correlations and oppositions phonologiques in Modern French. Iowa: Iowa State University.

Denes, P.B. (1963). On the statistics of spoken English. J. of the Acoustical Society of America 30, 892-904.

Denes, P.B. (1964). On the statistics of spoken English. Zeitschrift für Phonetik, Sprachwis-senschaft und Kommunikationsforschung 17, 51-72.

Dewey, G. (1923). Relative frequencies of English speech sounds. Cambridge, MA: Harvard University Press.

Dietze, J. (1982). Grapheme und Graphemkombinatorik der russischen Fachsprache. Eine Phonostatistische Untersuchung. Glottometrika 4, 80-94.

Doležel, L. (1963). Předběžný odhad entropie a redundance psané čestiny. Slovo a slovesnost 24, 165-175.

Džubanov, A.Ch. (1979). K voprosu o grafemnoj statistike kazachskogo teksta. In: Voprosy kazachskoj fonetiki i fonologii 79-86.

Eliseeva, K.A. (1965). Statistiko-kombinatornoe modelirovanie pervogo tipa v ukrainskoj morfologii. In: Andreev 1965: 85-88.

Estoup, J.B. (1916). Gammes sténographiques. Méthode et exercises pour l´acquisition de la vitesse. Paris : Institut sténographique.

Fähnrich, M., Meinold, G. (1973). Phonemstatistischer Vergleich zwischen Georgisch, Awarisch und Tschesarenisch. Wissenschaftliche Zeitschrift 22, 109-117.

Fairbanks, G.H. (1957). Frequency and phonemics. Indian Linguistics 17, 105-113.

Fant, C.G.M. (1958). Some notes on the relative occurrence of letters, phonemes, and words in Swedish. In: Proceedings of the 8th International Congress of Linguistics, Oslo 1958: 815-

Fedulova, N.I. (1965). Vydelenie pervogo morfologičeskogo tipa v bolgarskom jazyke. In: Andreev (1965): 110-115.

Ferguson, C.A., Chowdhury, M. (1960). The phonemes of Bengali. Language 36, 22-59.

Fichman, B.S. (1965a). Vydelenie pervogo morfologičeskogo tipa v jazyke chausa po algoritmu statistiko-kombinatornogo modelirovanija. In: Andreev (1965): 189-195.

Fichman, B.S. (1965b). Vydelenie pervogo morfologičeskogo tipa v jazyke suachili po algoritmu statistiko-kombinatornogo modelirovanija. In: Andreev (1965): 196-204.

Findra, J. (1968). Frekvencia foném v ústnych prejavoch. Jazykovedný časopis 19, 84-95.

Fitialova, I.B. (1965). Statistiko-kombinatornoe vydelenie pervogo morfologičeskogo tipa v nemeckom jazyke. In: Andreev (1965): 158-171.

Förstemann, E. (1846). Ueber die numerischen Lautverhältnisse im Deutschen. Germania 7, 83-90.

Förstemann, E. (1852). Numerische Lautverhältnisse im Griechischen, Lateinischen und Deutschen. Zeitschrift für vergleichende Sprachforschung 1, 163-179.

Fowler, M. (1957). Herdan´s statistical parameter and the frequency of English phonemes. In: Pulgram, E. (ed.), Studies presented to Joshua Whatmough on his sixties birthday: 47-52. s´Gravenhage: Houton.

French, N.R., Carter, C.W., Koenig, W. (1930). Words and sounds of telephone communications. Bell System Technical Journal 9, 290-325.

Gačečiladze, T.G., Eliašvili, A.I. (1958). Statistika bukv sovremennogo literaturnogo gruzinskogo jazyka. Soobščenija Akademii nauk gruzinskoj SSR 20, 565-567.

Gaines, H.F.(1956). Cryptanalysis: A study of ciphers and their solution. New York: Dover.

Gerber, S.E., Vertin, S. (1969). Comparative frequency counts of English phonemes. Phonetica 19, 133-141.

Good, I.J. (1969). Statistics of language. In: Meethoun, A.R., Hudson, R.A. (Eds.), Encyclopedia of information, linguistics and control: 567-581. Oxford: Pergamon.

Grigoriev, V.I. (1980). Frequency distribution of letters and their ranks in a running text. In: Viks, Ü. (ed.), Symposium: Computational linguistics and related topics. Tallinn: Academy of Sciences 1980: 43-47.

Grigor´ev, V. I. (1980). O dinamike raspredelenija bukv v tekste. In: Aktual´nye voprosy strukturnoj i prikladnoj lingvistiki. Sbornik statej: 40-48. Moskva.

Grzybek, P., Kelih, E. (2003). Grapheme frequencies in Slovene. In: Benko, Vladimir (ed.), Slovko 2003. Bratislava (erscheint).

Grzybek, P., Kelih, E. (2003a). Graphemhäufigkeiten (am Beispiel des Russischen). Teil I: Methodologische Vor-Bemerkungen und Anmerkungen zur Geschichte der Erforschung von Graphemhäufigkeiten im Russischen. Anzeiger für Slavische Philologie 31, 131-162.

Grzybek, P., Kelih, E. (2005). Graphemhäufigkeiten im Ukrainischen. Teil I: Ohne Apostroph (´). In: Altmann, G., Levickij, V., Perebejnos, V. (eds.), Problems of Quantitative Linguistics: 159-179. Černivcy: Ruta.

Grzybek, P., Kelih, E., Altmann, G. (2004). Graphemhäufigkeiten (Am Beispiel des Russischen). Teil II: Modelle der Häufigkeitsverteilung. Anzeiger für Slavische Philologie 32, 25-54.

Grzybek, P., Kelih, E., Altmann, G. (2005). Graphemhäufigkeiten (Am Beispiel des Russischen). Teil III: Die Bedeutung des Inventarumfangs – Eine Nebenbemerkung zur Diskussion um das ë. Anzeiger für Slavische Philologie 33, 117-140.

Guirao, M., Garciá Jurado,M.A. (1990). Frequency of occurrence of phonemes in American Spanish. Revue québécoise de linguistique 19(2), 135-150.

Gusein-Zade, S.M. (1988). On the distribution of letters of the Russian language by frequencies. Problemy Peredači Informacii 23, 102-107.

Häkkinen, K. (1977). Tilastotietoja suomen kielen äännerakenteesta. Sananjalka 19, 57-68.

Harary, F., Paper, H.H. (1957). Toward a general calculus of phonemic distribution. Language 33, 143-169.

Hayden, R. (1950). The relative frequency of phonemes in general-American English. Word 6, 217-223.

Herdan, G. (1958). The relation between the functional burdening of phonemes and the frequency of occurrence. Language and Speech 1, 8-13.

Herdan, G. (1966). The advanced theory of language as choice and chance. Berlin, Springer.

Hoffmann, L. (²1985). Kommunikationsmittel Fachsprache. Eine Einführung. Zweite, völlig neu bearb. Aufl. Tübingen: Narr.

Holas, A. (1926). Naskýtání a shlukování hlásek, jich interakce a kombinace ve slovenštine a přirovnání s češtinou. PTL 51, 43-49.

Holas, A. (1927). Naskýtání a shlukování hlásek, jich interakce a kombinace ve slovenštine a přirovnání s češtinou (cont.). PTL 52, 3-12.

Hultzén, L.S., Allen, J.H.D., Miron, M.S. (1964). Tables of transitional frequencies of English phonemes. Urbana, Ill.: University of Illinois Press.

Hussien, O. A. (2004). The Lerchianness plot. Glottometrics 7, 50-64.

Isengel´dina, A.A. (1973). Faktory, opredeljajuščie otnositel´nuju častotnost´ fonem. In: Statistika kazachskogo teksta 3, 659-662. Alma-Ata: Nauka.

Ishii, H. (1990). Otogizōshi Sagoromo no Chūjō no kana no shutsugen hindo no kansoku gosa ni tsuite. Mathematical Linguistics 17, 328-353.

Ishii, H. (1991). Kana oyobi on no shutsugen hindo no shochōsa. Mathematical Linguistics 18(2), 84-97.

Izumi, A., Mizutani, S. (1991). Tsutsui Yasutaka Zanzō ni Kuchibnei o no onbunpu hoi. Mathematical Linguistics 18, 80-83.

Jakubajtis, T.A. (1965). Statistiko-kombinatornoe bydelenie pervogo morfologičeskogo tipa v latyšskom jazyke. In: Andreev 1965: 116-122.

Jakuševa, D.A. (1965). Opyt primenenija algoritma statistiko-kombinatornogo modelirovanija k vétnamskomu jazyku. In: Andreev 1965, 225-228.

Jékel, P., Papp, F. (1974). Ady Endre összes költöi müveinek fonémstatisztikája. Budapest: Akadémiai Kiadó.

Job, M. (1974). Untersuchungen zur Frequenz der Phoneme des Georgischen. Bochum: Seminararbeit.

Kálmán, B. (1972). Hungarian historical phonology. In: Benkö, L., Imre, S. (eds.), The Hungarian language: 49-83. The Hague: Mouton.

Kerkhoffs, A. (1883). La cryptographie militaire. Paris.

King, R.D. (1966). On preferred phonemicisation for statistical studies. Phoneme frequencies in German. Phonetica 15, 22-31.

Kordi, E.E. (1965). Ischodnye dannye dlja statistiko-kombinatornogo modelirovanija morfologii sovremennogo franzuzskogo jazyka i vydelenie pervogo morfologičeskogo tipa. In: Andreev 1965: 172-180.

Kosonovskij, A.I. (1968). Nekotorye predvaritel´nye dannye o častotnosti grafem i fonem sovremennogo literaturnogo jazyka chindi. In: Jazyki Indii, Pakistana, Nepala i Cejlona: 167-181. Moskva: Nauka.

Krámský, J. (1965). Some statistical observations on the role of the place of articulation in languages. Philologica Pragensia 8, 245-250.

Krámský, J. (1966). The frequency of occurrence of vowel phonemes in languages possessing vowel systems of identical structure. Prague Studies in Mathematical Linguistics 1, 17-32.

Kubáček, L. (1994). Confidence limits for proportions of linguistics entities. J. of Quantitative Linguistics 1, 56-61.

Kučera, H. (1963). Mechanical phonemic transcription and phoneme count in Czech. International Journal of Slavic Linguistics and Poetics 6, 36-50.

Kučera, H. (1963a). Entropy, redundancy and functional load in Russian and Czech. In: American Contributions to the Fifth International Congress of Slavists, Vol. I, 191-219. The Hague: Mouton.

Kučera, H., Monroe, G.K. (1968). A comparative quantitative phonology of Russian, Czech, and German. New York: American Elsevier.

Kullback, S. (1976). Statistical methods in cryptanalysis. Laguna Hills, CA: Agean Park Press.

Küpfmüller, K. (1954). Die Entropie der deutschen Sprache. Fernmeldetechnische Zeitschrift 7, 265-272.

Kuzina, V. (1977). Statistika bukv v tekstach raznych tipov sovremennogo latyšskogo jazyka. In: Statistika un valodas funkcionālie stili: 97-106. Riga: Zinatne.

Laherrère, J, Sornette, D. (1998). Stretched exponential distributions in nature and economy: “fat tails” with characteristic scales. The European Physical Journal B 2, 525-539.

Lotz, J. (1952). Vowel frequency in Hungarian. Word 8, 227-235.

Łobacz, P., Jassem, W. (1974). Fonotaktyczna analiza mówionego tekstu polskiego. Biuletyn polskiego towarzystwa językoznawczego 32, 179-197.

Lua, K.T. (1990). Analysis of Chinese character stroke sequences. Computer Processing of Chinese & Oriental Languages 4(4), 375-385.

Lua, K.T. (1992). Linearization of Zipfian distribution for Chinese characters. J. of Information Processing 15(1), 10-16.

Ludvíková, M., Königová, M. (1967). Quantitative research of graphemes and phonemes in Czech. Prague Bulletin of Mathematical Linguistics 7, 15-29.

Maas, H.D. (1971). Einige statistische Untersuchungen zum Werk Georg Trakls. In: Gunzenhäuser, Rul (Hrsg.),Statistik, Textanalyse, ästhetische Wertung. = Zeitschrift für Literaturwissenschaft und Linguistik IV, 43-50.

Macrea, D. (1941/43). Frecvenţa fonemelor în limba română. Dacoromania 10, 39-49.

Maneca, C. (1968). Considérations statistiques sur les finales vocaliques en roumain. Revue Roumaine de Linguistique 13, 61-71.

Marinova, M., Marinov, A. (1964). Statističeski izsledovanija na fonemite v bъlgarskija knožoven ezik. Blъgarski ezik 1964, 2-3.

Martindale, C., McKenzie, D., Gusein-Zade, S.M., Borodovsky, M.Y. (1996). Comparison of equations describing the frequency distribution of graphemes and phonemes. J. of Quantitative Linguistics 3, 106-112.

Meier, H. (²1967). Deutsche Sprachstatistik. Hildesheim: Olms.

Melkumjan, M.R. (1965). Ischodnye dannye i statistiko-kombinatornoe vydelenie paradigmy pervogo morfologičeskogo tipa v armjanskom jazyke. In : Andreev 1965: 123-136.

Messner, D. (1974). Der portugiesische Anteil am Dictionaire chronologique des language ibéro-romanes. Portugiesische Forschungen der Görres-Gesellschaft 1974, 108-138.

Messner, D. (1976). A statistical approach to Potuguese. In: Schmidt-Radefeldt, J. (Hrsg.), Readings in Potruguese Linguistics: 425-446. Leiden: North-Holland.

Meyer, L. (1869). Die gothische Sprache. Ihre Lautgestaltung insbesondere im Verhältnis zum Altindischen, Griechischen und Lateinischen. Berlin: Weidmannschen Buchhandlung.

Moīnfar, M.D. (1973). Phonologie quantitative du Persan. Paris: Editions Jean-Favard.

Moreau, R. (1961). Au sujet de l´utilisation de la notion de fréquence en linguistique. Cahiers de lexicologie 3, 140-159.

Mossner, F. (1967). J-förmige Häufigkeitsverteilung chinesischer Schriftzeichen in chinesischen Texten. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 20, 479-488.

Nagórko-Kufel, A. (1975). Z badań nad częstościami elementów tekstowych języka polskiego dla potrzeb pisma niewidomych. Poradnik językowy 1, 7-13.

Nagy, G.O. (1969). Vokalfrequenzwerte in altungarischen Textdenkmälern. Ural-altaische Jahrbücher 41, 146-154.

Naranan, S., Balasubrahmanyan, V.K. (1992a). Information theoretic models in statistical linguistics - Part I: A model for word frequencies. Current Science 63, 261-269.

Naranan, S., Balasubrahmanyan, V.K. (1992b) Information theoretic models in statistical linguistics - Part II: Word frequencies and hierarchical structure in language - statistical tests. Current Science 63, 297-306.

Naranan, S., Balasubrahmanyan, V.K. (1993). Information theoretic model for frequency distribution of words and speech sounds (phonemes) in language. J. of Scientific and Industrial Research 52, 728-738.

Naranan, S., Balasubrahmanyan, V.K. (1998). Models for power law relations in linguistics and information science. J. of Quantitative Linguistics 5(1-2), 35-61.

Naranan, S., Balasubrahmanyan, V.K. (2000). Information theory and algorithmic complexity: Applications to linguistic discourses and DNA sequences as complex systems. Journal of Quantitative Linguistics 7, 129-183.

Nasvytis, A. (1953). Die Gesetzmäßigkeiten kombinatorischer Technik. Berlin u.a.: Springer.

Navarro, T. (1968). Studies in Spanish phonology. Miami Linguistic Series 4, 1-160.

Nemetz, T., Szilléry, A. (1979). Nyelvstatisztikai táblázatok. Alkalmazot Matematikai Lapok 5, 69-87.

Newman, E.B. (1951). The pattern of vowels and consonants in various languages. American Journal of Psychology 64, 369-379.

Nikonov, V.A. (1960). Konsonantnyj koefficient. Lingua Posnaniensis 8, 228-235.

Nisbet, J.D. (1960). Frequency counts and their uses. Educational Research 1960, 51-64.

Noreen, A. (1907). Vårt språk. Nysvensk grammatik i utförlig framställning. Lund: Gleerup.

Novak, L.A. (1971). Statistica delle lettere e delle combinazioni di lettere nella lingua rumena scritta. In: Tagliavini, C. (ed.), Statistica linguistica: 291-322. Bologna: Patron.

Novak, L.A. (1968). Statistika bukv i bukvosočetanij v rumynskom pis´mennom jazyke. In: Alekseev, P.M., Kalinin, V.M., Piotrovskij, R.G. (eds.), Statistika reči: 228-230. Leningrad: Nauka.

Ohlmann, N. (1958). Subject-word letter frequencies with applications to superimposed coding. In: Proceedings of the International Conference of Scientific Information 2: 903-915. Washington.

Orlov, Ju.V., Boroda, M.G., Nadarejšvili, I.Š. (1982). Sprache, Text, Kunst. Quantitative Analysen. Bochum, Brockmeyer.

Ovčinnikov, A. (1962). Polučenie paradigmy pervogo morfologičeskogo tipa v nemeckom jazyke na materiale publicističeskich tekstov. Dnepropertovsk: DGU.

Ožigova, G.I. (1965). Statistiko-kombinatornoe modelirovanie paradigmy pervogo morfologičeskogo tipa v češskom jazyke na materiale publicističeskich tekstov. In: Andreev 1965: 96-103.

Pääkkönen, M. (1993). Graphemes and context. Glottometrika 14, 1-53

Pandit, P.B. (1965). Phonemic and morphemic frequencies of the Gujarati language. Poona: Deccan College.

Panina, N.A. (1965). Opyt statistiko-kombinatornogo vydelenija paradigmy vtorogo morfologi-českogo tipa v serbochorvatskom jazyke. In: Andreev (1965): 241-245.

Penkov, V. et al. (1962). Frequencies of letters in written Bulgarian. Comptes rendus de l’Académie bulgare des Sciences, 15, No. 3.

Perebejnos, V.I. (1965). Častota i sočetaemost´ fonem sovremennogo ukrainskogo jazyka. In: Seminar – Avtomatizacija informacionnych rabot i voprosy prikladnoj lingvistiki: 25-30. Kiev.

Perebyjnis, V.S. (1970). Kil´kisni ta jakisni charakteristiki sistemi fonem sučasnoï ukraïnskoï literaturnoï movi. Kiïv: Naukova dumka.

Peršikov, V.F. (1965). Iz opyta statistiko-kombinatornogo modelirovanija albanskoj morfologii. In: Andreev (1965): 181-188.

Peškovskij, A.M. (1925). Desjat´ tysjač zvukov. (Opyt zvukovoj charakteristiki russkogo jazyka kak osnovy dlja eufoničeskich issledovanij). In: Peškovskij, A.M. (ed.), Metodika rodnogo jazyka, lingvistika, stilistika, poetika: 167-191. Leningrad/Moskva.

Petrowa, N.W. (1973). Code-Merkmale des schriftlichen Textes. In: Alexejew, P-M., Kalinin, W.M., Piotrowski, R.G. (Eds.), Sprachstatistik: 20-70. München: Fink.

Pierce, J.E. (1957). A statistical study of consonants in New World languages (I) Introduction, (II) Data. International Journal of American Linguistics 23, 36-45, 94-108.

Piirainen, I.T. (1971). Grapheme als quantitative Größen. Linguistische Berichte 13, 81-82.

Plath, W. J. (1958). The relation frequency of English consonantal phonemes. Zeitschrift für Phonetik und allgemeine Sprachwissenschaft 11, 67-87.

Proskurnin, N. (1933). Podsčety častoty liter i komplektovka šrifta. In: Revoljucija i pis´mennost´. Sbornik I: 72-82. Moskva-Leningrad.

Pukui, H.K., Elbert, S.H. (1957). Hawaiian-English dictionary. Honolulu: University of Hawaii Press.

Rachmanov, D.A.O. (1988). Statistiko-distributivnyj analiz azerbajdžanskogo teksta. na urovne grafem i fonem. Baku: Diss.

Rademacher, A. (1974). Untersuchungen zu den Buchstabenhäufigkeiten des See-Dajakischen. Bochum: Seminararbeit.

Ramakrishna, B.S., Nair, K.K., Chipllunkar, V.N., Atal, B.S., Ramachandran, V., Subramanian, R. (1962). Some aspects of the relative efficiencies of Indian languages. Bangalore.

Roberts, A.H. (1965). A statistical linguistic analysis of American English. The Hague: Mouton

Roceric-Alexandrescu, A. (1968). Fono-statistica limbii române. Bucureşti: Editura Academiei RSR.

Rocławski, B. (1975). Ze studiów fonostatystycznych nad kaszubszczyną. Rozklad częstości występowania fonemów. Gdańskie Studia Językoznawcze Zakład Narodowy Im. Ossolińskich 1975, 107-130.

Rūłe, V. (1951). Lauthäufigkeit in der lettischen Schriftsprache. In: Slaviska instituts vid Lunds universitetet årsbok 1948/1949: 153-164. Lund.

Sadler, V. (1959). Relativaj oftecoj de kelkaj lingvaj elementoj en esperanto. Scienca revuo 10, 67-71.

Sauvageot, A. (1951). Esquisse de la lange hongroise. Les langues et leur structure III. Paris : Klincksieck.

Savický, N.P. (1966). Ob ustojčivosti otnositel´nych častot lingvističeskich elementov. Československá rusistika 11, 214-217.

Schleicher, A. (1852). Die Formenlehre der kirchenslawischen Sprache, erklärend und vergleichend dargestellt. Bonn/ Wien/ Prag: H.B. König.

Schönpflug, W. (1969). n-Gramm-Häufigkeiten in der deutschen Sprache. 1. Monogramme und Digramme. Zeitschrift für experimentelle und angewandte Psychologie 16, 157-183.

Schulze, E. (1974). Untersuchungen zu den Buchstabenhäufigkeiten des Hawaiischen. Bochum: Seminararbeit.

Sedláček, Z. (1924). Základní studie k českému těsnopisu. I. Stanoveni poměrů frekvenčních, iteračních a kombinačních v jazyce českém. Těsnopisné Rozhledy

Segal, D.M. (1969). K statističeskoj charakteristike pol´skogo jazyka na fonologičeskom urovne. In: Issledovanija po pol´skomu jazyku: 20-52. Moskva: Nauka.

Segal, D.M. (1972). Osnovy fonologičeskoj statistiki. Moskva: Nauka.

Seiden, W. (1960). Chamorro phonemes. Anthropological Linguistics 2, 6-35.

Seljutina, T.A. (1965). Vydelenie pervogo morfologičeskogo tipa v anglijskom jazyke metodom statistiko-kombinatornogo modelirovanija (na materiale chudožestvennogo teksta). In: Andreev (1965): 150-157.

Sharma, S., Debnath, S. (1972). A comparative study of Teutonic languages. Calcutta.

Shtrikman, S. (1994). Some comments on Zipf´s law for the Chinese language. J. of Information Science 20(2), 142-143.

Sievers, E. (1892). Tatian, lateinisch und altdeutsch mit ausführlichem Glossar. Paderborn.

Sigurd, B. (1968). Rank-frequency distribution for phonemes. Phonetica 18, 1-15.

Siméonoff, E. (1965). In the distribution of “costs” of combinations of k letters in a written language. Statistical Methods in Linguistics 4, 45-50.

Singhal, R., Toussaint, G.T. (1978). Probabilities of occurrence of characters, character-pairs, and character triplets in English text. ALLC Bulletin 6, 245-253.

Siromoney, G. (1963). Entropy of Tamil prose. Information and Control 6, 297-300.

Solso R.L., King, J.F. (1976). Frequency and versatility of letters in the English language. Behavior Reserch Methods and Instrumentation 8, 283-286.

Steffen, M. (1957). Częstość występowania głosek polskich. Biuletyn polskiego towarzystwa językoznawczego 16, 145-164.

Stolze, F. (1891). Die Iterationsverhältnisse der Laute in der lateinischen Sprache für die Kurzschrift.. Magazin für Stenographie. 1891, 47-48.

Suhren, S. (2002). Untersuchung zum Gesetz von Zwirner, Zwirner und Frumkina am Beispiel des niederdeutschen „De lütte Prinz“. Staatsexamensarbeit, Göttingen.

Svacevičius, B.I. (1966). K voprosu o častote vstrečaemosti fonem v litovskoj pis´mennoj reči. Materialy kollokviuma: 19-22. Vilnius: Pedagogical Institute Press.

Tamaoka, K., Makioka, Sh. (2004). Frequency of occurrence for units of phonemes, morae, and syllables appearing in a lexical corpus of a Japanese newspaper. Behavior Research Methods, Instruments, & Computers 36(3), 531-547.

Tambovcev, J.A. (1982). Empiričeskoe raspredelenie častotnosi fonem v jazyke kazymskich kanty [v kazymskom dialekte chntyjskogo jazyka]. In: Lingvostatistika i vyčislietel´naja lingvistika: 121-135. Tartu..

Tambovcev, J.A. (1983). Phonostatistical study of Komi Zyryan vowels and consonants. Finnisch-ugrische Forschungen 45, 164-167.

Tambovcev, J.A. (1983a). Empiričeskoe raspredelenie častotnosi fonem v oročskom jazyke. In: Kvantitativnaja lingvistika i stilistika: 124-125. Tartu.

Tambovcev, J.A. (1984a). Empirical distribution of the phonemes in Orokh. Typological analysis. Archiv orientální 52, 285-294.

Tambovcev, J.A. (1984b). Phoneme frequency and closeness quotient. establishing genetic relationship degrees by phonostatistics. Ural-altaische Jahrbücher 56, 103-119.

Tambovcev, J.A. (1988a). Nekotorye fonostatističeskie charakteristiki jazyka barabinskich tatar. In: Fonetika i grammatika jazykov Sibiri: 135-139. Novosibirsk: IIFF.

Tambovcev, J.A. (1988b). Phonostatistical characteristics of different dialects of Eskimo. In: 6th Inuit studies conference. Copenhagen, October 17-20, 1988: 11-17. Thorndike, E.L. (1948). The psychology of punctuation. American Journal of Psychology 61, 222-228.

Tobias, J.V. (1959). Relative occurrence of phonemes in American English. J. of the Acoustical Society of America 31, 631-633

Tolnai, V. (1921). A nyelvek szépségéröl. Magyar Nyelv 17, 28-32.

Tolnai, V. (1924). Halhatatlan magyar nyelv. Magyar Nyelv 20, 50-59.

Tolnai, V. (1936). Egynéhány számadat a hangorkól és betükröl. Magyar Nyelv 31, 421-425.

Toots, N. (1970). On the frequency of occurrence of the stressed vowel phonemes in present-day English. Linguistica 2, 82-111.

Trnka, B., Kanekiyo, T., Koizumi, T. (1968). A phonological analysis of present-day standard English. Alabama: University of Alabama Press.

Trubetzkoy, N.S. (1939). Zur phonologischen Statistik. Travaux du Circle Linguistique de Prague 7, 230-241.

Tuldava, J. (1980). Eesti keele sõnavara foneetilis-grafeemilised mõõted. Acta et Commentationes Universitatis Tartuensis 518, 51-100.

Tuldava, J. (1988). Opyt kvantitativnogo analiza sistemy fonem estonskogo jazyka. Acta et Commentationes Universitatis Tartuensis 838, 120-133.

Tuldava, J. (1995). Quantitative analysis of the phonemic system of the Estonian language. In: Tuldava, J., Methods in Quantitative Linguistics, Chapter 10, 161-187. Trier: WVT.

Veenker, W. (1979a). Zur phonologischen Statistik der komipermjakischen Sprache. Finnisch-Ungarische Mitteilungen 3, 13-27.

Veenker, W. (1979b). Zur phonologischen Statistik der vogulischen Sprache. In: Gläser, Ch., Pusztay, J. (eds.), Festschrift für Wolfgang Schlachter zum 70. Geburtstag: 305-346. Wiesbaden: Harrassowitz.

Veenker, W. (1981a). Problemy fonologičeskoj statistiki chantyjskogo jazyka. In: Ubrjtova, E.I., Kim Čer Len, Kuzmina, A.I., Ryžkina, O.A. (eds.), Teoretičeskie voprosy fonetiki i grammatiki jazykov narodov: 84-96. Novosibirsk.

Veenker, W. (1981b). Zur phonologischen Statistik der mordvinischen Schriftsprachen. Ural-altaische Jahrbücher 1, 33-72.

Veenker, W. (1981c). Zur phonologischen Statistik der votjakischen Sprache. In: Bereczki, G., Molnár, J. (eds.), Lakó-Emlékkönyv – nyelvészeti tanulmányok 196-213. Budapest.

Veenker, W. (1982a). Konfrontierende Darstellung zur phonologischen Statistik der unga-rischen und finnischen Schriftsprache. Nyelvtudományi közlemények 84, 305-348a.

Veenker, W. (1982b). Zur phonologischen Statistik der syrjänischen Sprache. Etudes Finno-Ougriennes 15, 435-445.

Verglas, A. (1962). Remarques sur la relation entre rang et fréquence des lettres français. Bulletin d´information du laboratoire d´analyse lexicographique 6, 29-40.

Vértes, E. (1953). Statistische Untrsuchungen über den phonetischen Aufbau der ungarischen Sprache. Acta Linguistica Academiae Scientiarum Hungaricae 3, 125-158; 411-430.

Vértes, E. (1970). Beiträge zu den typologischen Fragen des Ostjakischen. In: Dezsö, L., Hajdú, P (eds.), Theoretical problems of typology and the Northern Eurasian languages: 135-144. Amsterdam: Grüner.

Vogt, H. (1958). Structure phonémique du gérgien. Norsk Tidskrift for Sprogvidenskap 18, 5-90.

Wang, W.S.-Y., Crawford, J. (1960). Frequency studies of English consonants. Language and Speech 3, 131-139.

Weidert, A. (1972). Die Vokalphoneme des Khasi, III. Teil. Zeitschrift für Phonetik, Sprach-wissenschaft und Kommunikationsforschung 25, 506-521.

Weiss, M. (1962). Über die relative Häufigkeit der Phoneme des Schwedischen. Statistical methods in Linguistics 1, 41-55.

Whitney, W.D. (1880). On the comparative frequency of occurrence of the alphabetic elements in Sanskrit. American Oriental Society Studies 10.

Wimmer, G., Altmann, G. (1999). s. Essen: Stamm.

Wimmer, G., Altmann, G. (2005). Unified derivation of some linguistic laws. In: Köhler, R., Altmann, G., Piotrowski, R,G. (eds.), Quantitative Linguistics – An International Handbook: 791-807. Berlin: de Gruyter.

Wioland, F. (1972). Estimation de la „fréquence” des phonèmes en français parlé. Travaux de l´Institut phonétique de Strasbourg 4, 177-204.

Wioland, F. (1974). Contribution à l´établissement de constantes en relation avec la fréquence des phonèmes en français parlé. Travaux de l´Institut phonétique de Strasbourg 6, 141-164.

Yannakoudakis, E.J., Tsomokos, I., Hutton, P.J. (1990). n-Grams and their implication to natural language understanding. Pattern Recognition 23,(5), 509-528.

Yokoyama, S. (1981). Occurrence frequency data of Japanese dictionary. Bulletin of the electrotechnical laboratory 45, 395-418.

Yule, G.U. (1924). A mathematical theory of evolution, based on the conclusions of Dr. J.C. Willis, F.R.S. Philosophical Transactions of the Royal Society of London Biological Sciences 213, 21-87.

Zemanek, H. (1959). Elementare Informationstheorie. Wien/ München: Oldenbourg.

Zettersten, A. (1969). A statistical study of the graphic system of present day American English. Lund: Studentenlitteratur.

Žilinskienė, V.Ju. (1978). Lietuviũ kalbos raidžiũ dažnumas publicistikos tekstuose. Kalbotyra 29, 83-95.

Zipf, G.K. (1929). Relative frequency as a determinant of phonetic change. Harvard Studies in Classical Phlology 40, 1-95.

Zipf, G.K. (1935). The psycho-biology of language. Boston: Houghton Mifflin

Zipf, G.K. (1949). Human behavior and the principle of least effort. Cambridge: Addison-Wesley.

Zörnig, P., Altmann, G. (1983). The repeat rate of phoneme frequencies and the Zipf-Mandel-brot law. Glottometrika 5, 205-211.

Zörnig, P., Altmann, G. (1984). The entropy of phoneme frequencies and the Zipf-Mandelbrot law. Glottometrika 6, 41-47.

Zwirner, E., Zwirner, K. (1936). Die Häufigkeit von Buchstaben und Lautkombinationen. Forschungen und Fortschritte 12, 23-24, 286-287.