Phoneme frequency

Revision as of 14:27, 11 July 2006 by Ahans (talk | contribs)

1. Problem and history

The problem is to find a rank-frequency distribution for the phoneme of a text or of a corpus. Sometimes letters or even sounds are counted, which is fully justified. In the same way one could count e.g. the syllables of the Japanese katakana or hiragana. The number of examinations is enormous, some of them give the absolute frequencies other ones merely the proportions.

The counting began in the 19th century (Förstemann 1852; Bourdon 1892) and developed quickly on practical grounds since stenographers, printers, constructors of typewriters, decoders etc., needed urgently the frequency of letters for their own purposes.

The first who considered phonemes from the frequency point of view and set up hypotheses was G.K. Zipf (1929, 1935, 1949). Afterwards a great number of works appeared using phoneme frequencies for finding other interrelations. The first empirical model for a distribution, namely the geometric (and the right truncated geometric) distribution, was proposed by Sigurd (1968). Good (1969) brought a partial-sums distribution whose modelling was revived in word length (\rightarrow) research. Altmann (1993) used the synergetic way of modelling and derived a special function for this purpose. Martindale, Gusein-Zade, Mckenzie and Borodovsky (1996) compared several curves (functions) and many data in order to find the “best” model. Altmann and Lehfeldt (1980) and Zörnig, Altmann (1983, 1984) developed hypotheses on the entropy (\rightarrow) and the repeat rate (\rightarrow) of phonemes, Kubáček (1994) derived the formula for the necessary size of the phoneme count in order to attain confident counts. Naranan and Balasubrahmanyan (2000) developed a theory from which different curves for phoneme frequencies are derivable.

Not all arguments holding for word frequencies are valid in this domain.

2. Hypothesis

The ranked frequencies of phonemes follow a regular probability function or a regular monotone decreasing function.

The result depends on whether one considers the ranked frequencies as a discrete distribution (normalized) or merely a regular series approached by a continuous function (not normalized).


3. Derivation

The formulas used up to now can be derived from different approaches.

3.1. Tuldava´s approach (1988)

This approach can be represented by the simple differential equation

(1)  y' = \frac{b}{x}

to obtain

(2)A y = a + b \ln x \quad

This curve is frequently used in other domains, too (cf. also Martindale et al. 1996).


3.2. Derivations related to the unified theory (→) are

(a) Zipf´s law (zeta distribution)

When formula (2) of the unified theory is used with  a_0 = a_2 = a_3 = ... = 0, a_1 = -b \quad, this yields

(3) \frac{dy}{y} = -\frac{b}{x}dx

resulting in

(4) y = Ax^{-b}\quad.

This is, perhaps, the most disseminated formula in linguistics.

(b) Yule´s species/genera function

When formula (2) of the unified theory is used with a_0 = c, a_1 = b, a_2 = a_3 = ... = 0\quad, this yields

(5) \frac{dy}{y}= \left(c- \frac{b}{x} \right)dx

resulting in

(6) y= ae^{cx}x^{-b}= ad^x b^{-b}\quad .


(c) Naranan and Balasubrahmanyan´s (1992a,b, 2000) function

When formula (2) of the unified theory is used with a_0 = 0, a_3 = a_4 = ... = 0\quad, this yields


(7) \frac{dy}{y}\left(- \frac{a_1}{x}{a_2}{x^2} \right)dx

resulting in

(8) y= Ce^{-a_2/x}x^{-a_1}.


(d) Altmann´s ranking function (1993)

Using formula (11) of the unified theory, which can be written as

(9) y_x = \left(1-a_0 +  \frac{a_1}{(x-b_1)^{c_1}} + \frac{a_2}{(x-b_2)^{c_2}} \right)y_{x-1} ,

and reparametrizing a_i = 0 (i =  0,2,3,...), c_1 = 1, yields

(10) y_x = \left(1+ \frac{a_1}{x-b_1} \right)y_{x-1}.

Upon setting b_1 = -a, a_1 - b_1 = b\quad, this results in

(11)y_x = \frac{\begin{pmatrix} b+x \\ x-1 v \end{pmatrix}}{\begin{pmatrix} a+x \\ x-1 \end{pmatrix}}y_1 \quad,    x = 1,2,3,...

All these formulas can be transformed in distributions by appropriate normalizing.

(e) Geometric distribution

Sigurd (1968) used simply the geometric distribution. It can be obtained from formula (10) of the unified theory setting a_i = 0 (i = 1,2,3,...)\quad, which yields

(12) y_{x+1}= (1+a_0)y_x\quad.

For- < a_ < 0, 1+a_ = q, 1-q = p, y_ = Px\quadtains the usual (1-displaced) geometric distribution

(13) P_x = pq^{x-1}, \quad x = 1,2,3,...

The same result was proposed also by Orlov, Boroda, Nadarejšvili (1982). Treating directly the relative frequencies one can write (13) as

(14) y_x = y_1 q^{x-1}, \quad x=1,2,3,...


3.3. Partial-sums distributions (Good 1969)

Good (1969) introduced a new distribution, mentioned in Martindale et al. (1996). It is a so-called partial-sums distribution, namely a “sterred” discrete uniform distribution (cf. Wimmer, Altmann 1999). Their provenience is shown in the chapter on Word frequency ( \rightarrow)has the form

(15)P_x = \frac{1}{n}\sum_{i=x}^n \frac{1}{i},\quad x=1,2,...,n.

Example: Frequency of phonemes in Hawaiian

In Table 1 and Fig. 1 one can find the fitting of the above formulas to the relative frequencies of Hawaiian phonemes. If functions are used, normalizing is not necessary.

Tabelle1 PF.jpg

Except for the geometric distribution, all of them yield in this case a good – approximately equal – fitting. In Fig. 1, only fitting (11) is shown.

Grafik1 PF.jpg
Fig. 1. Fitting function (11) to Hawaiian phoneme frequencies


4. Authors: G. Altmann


5. References

Alekseev, P.M. (1973). Häufigkeitswörterbücher und Verfahren ihrer Erarbeitung. In: Alexejew, P.M., Kalinin, W.M., Piotrowski, R.G. (eds.), Sprachstatistik: 86-143. München: Fink.

Altmann, G. (1993). Phoneme counts. Glottometrika 14, 55-70.

Altmann, G., Lehfeldt, W. (1980). Einführung in die quantitative Phonologie. Bochum: Brockmeyer.

Andreev, N.D. (ed.) (1965). Statistiko-kombinatornoe modelirovanie jazykov. Moskva-Lenin-grad: Nauka.

Andreev, N.D. (1965a). Opyt statistiko-kombinatornogo vydelenija pervogo morfologičes¬kogo tipa v vengerskom jazyke. In: Andreev 1965: 205-211.

Andreev, N.D. (1967). Statistiko-kombinatornye metody v teoretičeskom i prikladnom jazyko-znanii. Leningrad: Nauka.

Andreeva, L.D., Kordi, E.E., Smirnova, L.N., Fedulova, N.I., Fitialova, I.B., Fichman, B.S. (1965). Polučenie pervogo morfologičeskogo tipa russkogo jazyka v pod˝jazyke radioelektroniki posredstvom algoritma statistiko-kombinatornogo modelirovanija. In: Andreev 1965: 49-64.

Avram, A. (1964). Some thoughts on the functional yield of phonemic oppositions. Linguistics 5, 40-47.

Bektaev, K.B. (1973). Alfavitno-častotnyj slovar´ slogov kazzachskogo jazyka. In: Statistika kazachskogo teksta 3: 566-611. Alma-Ata: Nauka.

Belonogov, G.G., Frolov, G.D. (1963). Empiričeskie dannye o raspredelenii bukv v russkoj pis´mennoj reči. Problemy kibernetiki, Vyp. 9, 287-305.

Benkö, L., Samu, I. (1972). The Hungarian language. Budapest: Akadémiai Kiadó.

Berger, K.W. (1967). A study of printed Pilipino usage. Phonetica 17, 31-37.

Bergmann, H. (1986). Einige Ergebnisse der Phonemstatistik. Abhandlungen der Heidelberger Akademie der Wissenschaften, Philosophisch-historische Klasse 1986, 5-19.

Bhagvat, S.V. (1961). Phonemic frequencies in Marathi and their relation to devising a speed-script. Poona: Deccan College.

Boldrini, M. (1948). Le statistiche letterarie e i fonemi elementari nella poesia. Milano.

Bosák, J. (1965). Frequency of phonemes and letters in Slovak and numerical expression of some phonemic relations. Jazykovedný časopis 14, 120-130.

Bourne, C.P., Ford, D.F. (1961). A study of the statistics of letters in English words. Information and Control 4, 48-61.

Bourdon, B. (1892). L´expression des émotions et des tendences dans le langage. Paris: Alcan.

Chol´m, Ch.A. (1965). Vydelenie pervogo morfologičeskogo tipa v estonskom jazyke na osnove statistiko-kombinatornogo modelirovanija v pod˝jazyke radioelektroniki. In: Andreev (1965): 212-218.

Čistjakov, V.F. (1972). Častotnosti glasnych i soglasnych v 50 jazykach raznogo gramma-tičeskogo stroja. Lingua Posnaniensis 16, 45-48.

Csehély, A. (1943). A magyar magánhangzók eloszlása. Magyar Nyelv 1943, 64-65.

Deitz, P. (1952). The relative frequency of correlations and oppositions phonologiques in Modern French. Iowa: Iowa State University.

Denes, P.B. (1963). On the statistics of spoken English. J. of the Acoustical Society of America 30, 892-904.

Denes, P.B. (1964). On the statistics of spoken English. Zeitschrift für Phonetik, Sprachwis-senschaft und Kommunikationsforschung 17, 51-72.

Dewey, G. (1923). Relative frequencies of English speech sounds. Cambridge, MA: Harvard University Press.

Dietze, J. (1982). Grapheme und Graphemkombinatorik der russischen Fachsprache. Eine Phonostatistische Untersuchung. Glottometrika 4, 80-94.

Džubanov, A.Ch. (1979). K voprosu o grafemnoj statistike kazachskogo teksta. In: Voprosy kazachskoj fonetiki i fonologii 79-86.

Eliseeva, K.A. (1965). Statistiko-kombinatornoe modelirovanie pervogo tipa v ukrainskoj morfologii. In: Andreev 1965: 85-88.

Estoup, J.B. (1916). Gammes sténographiques. Méthode et exercises pour l´acquisition de la vitesse. Paris : Institut sténographique.

Fähnrich, M., Meinold, G. (1973). Phonemstatistischer Vergleich zwischen Georgisch, Awarisch und Tschesarenisch. Wissenschaftliche Zeitschrift 22, 109-117.

Fairbanks, G.H. (1957). Frequency and phonemics. Indian Linguistics 17, 105-113.

Fant, C.G.M. (1958). Some notes on the relative occurrence of letters, phonemes, and words in Swedish. In: Proceedings of the 8th International Congress of Linguistics, Oslo 1958: 815-

Fedulova, N.I. (1965). Vydelenie pervogo morfologičeskogo tipa v bolgarskom jazyke. In: Andreev (1965): 110-115.

Ferguson, C.A., Chowdhury, M. (1960). The phonemes of Bengali. Language 36, 22-59.

Fichman, B.S. (1965a). Vydelenie pervogo morfologičeskogo tipa v jazyke chausa po algoritmu statistiko-kombinatornogo modelirovanija. In: Andreev (1965): 189-195.

Fichman, B.S. (1965b). Vydelenie pervogo morfologičeskogo tipa v jazyke suachili po algoritmu statistiko-kombinatornogo modelirovanija. In: Andreev (1965): 196-204.

Findra, J. (1968). Frekvencia foném v ústnych prejavoch. Jazykovedný časopis 19, 84-95.

Fitialova, I.B. (1965). Statistiko-kombinatornoe vydelenie pervogo morfologičeskogo tipa v nemeckom jazyke. In: Andreev (1965): 158-171.

Förstemann, E. (1852). Numerische Lautverhältnisse im Griechischen, Lateinischen und Deutschen. Zeitschrift für vergleichende Sprachforschung 1, 163-179.

Fowler, M. (1957). Herdan´s statistical parameter and the frequency of English phonemes. In: Pulgram, E. (ed.), Studies presented to Joshua Whatmough on his sixties birthday: 47-52. s´Gravenhage: Houton.

Gačečiladze, T.G., Eliašvili, A.I. (1958). Statistika bukv sovremennogo literaturnogo gruzinskogo jazyka. Soobščenija Akademii nauk gruzinskoj SSR 20, 565-567.

Gaines, H.F.(1956). Cryptanalysis: A study of ciphers and their solution. New York: Dover.

Gerber, S.E., Vertin, S. (1969). Comparative frequency counts of English phonemes. Phonetica 19, 133-141.

Good, I.J. (1969). Statistics of language. In: Meethoun, A.R., Hudson, R.A. (Eds.), Encyclopedia of information, linguistics and control: 567-581. Oxford: Pergamon.

Grigoriev, V.I. (1980). Frequency distribution of letters and their ranks in a running text. In: Viks, Ü. (ed.), Symposium: Computational linguistics and related topics. Tallinn: Academy of Sciences 1980: 43-47.

Gusein-Zade, S.M. (1988). On the distribution of letters of the Russian language by frequencies. Problemy Peredači Informacii 23, 102-107.

Häkkinen, K. (1977). Tilastotietoja suomen kielen äännerakenteesta. Sananjalka 19, 57-68.

Harary, F., Paper, H.H. (1957). Toward a general calculus of phonemic distribution. Language 33, 143-169.

Hayden, R. (1950). The relative frequency of phonemes in general-American English. Word 6, 217-223.

Herdan, G. (1958). The relation between the functional burdening of phonemes and the frequency of occurrence. Language and Speech 1, 8-13.

Herdan, G. (1966). The advanced theory of language as choice and chance. Berlin, Springer.

Holas, A. (1926). Naskýtání a shlukování hlásek, jich interakce a kombinace ve slovenštine a přirovnání s češtinou. PTL 51, 43-49.

Holas, A. (1927). Naskýtání a shlukování hlásek, jich interakce a kombinace ve slovenštine a přirovnání s češtinou (cont.). PTL 52, 3-12.

Hultzén, L.S., Allen, J.H.D., Miron, M.S. (1964). Tables of transitional frequencies of English phonemes. Urbana, Ill.: University of Illinois Press.

Isengel´dina, A.A. (1973). Faktory, opredeljajuščie otnositel´nuju častotnost´ fonem. In: Statistika kazachskogo teksta 3, 659-662. Alma-Ata: Nauka.

Ishii, H. (1990). Otogizōshi Sagoromo no Chūjō no kana no shutsugen hindo no kansoku gosa ni tsuite. Mathematical Linguistics 17, 328-353.

Ishii, H. (1991). Kana oyobi on no shutsugen hindo no shochōsa. Mathematical Linguistics 18(2), 84-97.

Izumi, A., Mizutani, S. (1991). Tsutsui Yasutaka Zanzō ni Kuchibnei o no onbunpu hoi. Mathematical Linguistics 18, 80-83.

Jakubajtis, T.A. (1965). Statistiko-kombinatornoe bydelenie pervogo morfologičeskogo tipa v latyšskom jazyke. In: Andreev 1965: 116-122.

Jakuševa, D.A. (1965). Opyt primenenija algoritma statistiko-kombinatornogo modelirovanija k vétnamskomu jazyku. In: Andreev 1965, 225-228.

Jékel, P., Papp, F. (1974). Ady Endre összes költöi müveinek fonémstatisztikája. Budapest: Akadémiai Kiadó.

Job, M. (1974). Untersuchungen zur Frequenz der Phoneme des Georgischen. Bochum: Seminararbeit.

Kálmán, B. (1972). Hungarian historical phonology. In: Benkö, L., Imre, S. (eds.), The Hungarian language: 49-83. The Hague: Mouton.

King, R.D. (1966). On preferred phonemicisation for statistical studies. Phoneme frequencies in German. Phonetica 15, 22-31.

Kordi, E.E. (1965). Ischodnye dannye dlja statistiko-kombinatornogo modelirovanija morfologii sovremennogo franzuzskogo jazyka i vydelenie pervogo morfologičeskogo tipa. In: Andreev 1965: 172-180.

Kosonovskij, A.I. (1968). Nekotorye predvaritel´nye dannye o častotnosti grafem i fonem sovremennogo literaturnogo jazyka chindi. In: Jazyki Indii, Pakistana, Nepala i Cejlona: 167-181. Moskva: Nauka.

Krámský, J. (1965). Some statistical observations on the role of the place of articulation in languages. Philologica Pragensia 8, 245-250.

Krámský, J. (1966). The frequency of occurrence of vowel phonemes in languages possessing vowel systems of identical structure. Prague Studies in Mathematical Linguistics 1, 17-32.

Kubáček, L. (1994). Confidence limits for proportions of linguistics entities. J. of Quantitative Linguistics 1, 56-61.

Kučera, H. (1963). Mechanical phonemic transcription and phoneme count in Czech. International Journal of Slavic Linguistics and Poetics 6, 36-50.

Kučera, H. (1963a). Entropy, redundancy and functional load in Russian and Czech. In: American Contributions to the Fifth International Congress of Slavists, Vol. I, 191-219. The Hague: Mouton.

Kučera, H., Monroe, G.K. (1968). A comparative quantitative phonology of Russian, Czech, and German. New York: American Elsevier.

Kullback, S. (1976). Statistical methods in cryptanalysis. Laguna Hills, CA: Agean Park Press.

Kuzina, V. (1977). Statistika bukv v tekstach raznych tipov sovremennogo latyšskogo jazyka. In: Statistika un valodas funkcionālie stili: 97-106. Riga: Zinatne.

Lotz, J. (1952). Vowel frequency in Hungarian. Word 8, 227-235.

Łobacz, P., Jassem, W. (1974). Fonotaktyczna analiza mówionego tekstu polskiego. Biuletyn polskiego towarzystwa językoznawczego 32, 179-197.

Ludvíková, M., Königová, M. (1967). Quantitative research of graphemes and phonemes in Czech. Prague Bulletin of Mathematical Linguistics 7, 15-29.

Macrea, D. (1941/43). Frecvenţa fonemelor în limba română. Dacoromania 10, 39-49.

Maneca, C. (1968). Considérations statistiques sur les finales vocaliques en roumain. Revue Roumaine de Linguistique 13, 61-71.

Martindale, C., McKenzie, D., Gusein-Zade, S.M., Borodovsky, M.Y. (1996). Comparison of equations describing the frequency distribution of graphemes and phonemes. J. of Quantitative Linguistics 3, 106-112.

Melkumjan, M.R. (1965). Ischodnye dannye i statistiko-kombinatornoe vydelenie paradigmy pervogo morfologičeskogo tipa v armjanskom jazyke. In : Andreev 1965: 123-136.

Messner, D. (1974). Der portugiesische Anteil am Dictionaire chronologique des language ibéro-romanes. Portugiesische Forschungen der Görres-Gesellschaft 1974, 108-138.

Messner, D. (1976). A statistical approach to Potuguese. In: Schmidt-Radefeldt, J. (Hrsg.), Readings in Potruguese Linguistics: 425-446. Leiden: North-Holland.

Moīnfar, M.D. (1973). Phonologie quantitative du Persan. Paris: Editions Jean-Favard.

Moreau, R. (1961). Au sujet de l´utilisation de la notion de fréquence en linguistique. Cahiers de lexicologie 3, 140-159.

Mossner, F. (1967). J-förmige Häufigkeitsverteilung chinesischer Schriftzeichen in chinesischen Texten. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 20, 479-488.

Nagórko-Kufel, A. (1975). Z badań nad częstościami elementów tekstowych języka polskiego dla potrzeb pisma niewidomych. Poradnik językowy 1, 7-13.

Nagy, G.O. (1969). Vokalfrequenzwerte in altungarischen Textdenkmälern. Ural-altaische Jahrbücher 41, 146-154.

Naranan, S., Balasubrahmanyan, V.K. (1992a). Information theoretic models in statistical linguistics - Part I: A model for word frequencies. Current Science 63, 261-269.

Naranan, S., Balasubrahmanyan, V.K. (1992b) Information theoretic models in statistical linguistics - Part II: Word frequencies and hierarchical structure in language - statistical tests. Current Science 63, 297-306.

Naranan, S., Balasubrahmanyan, V.K. (1993). Information theoretic model for frequency distribution of words and speech sounds (phonemes) in language. J. of Scientific and Industrial Research 52, 728-738.

Naranan, S., Balasubrahmanyan, V.K. (2000). Information theory and algorithmic complexity: Applications to linguistic discourses and DNA sequences as complex systems. Journal of Quantitative Linguistics 7, 129-183.

Navarro, T. (1968). Studies in Spanish phonology. Miami Linguistic Series 4, 1-160.

Nemetz, T., Szilléry, A. (1979). Nyelvstatisztikai táblázatok. Alkalmazot Matematikai Lapok 5, 69-87.

Newman, E.B. (1951). The pattern of vowels and consonants in various languages. American Journal of Psychology 64, 369-379.

Nikonov, V.A. (1960). Konsonantnyj koefficient. Lingua Posnaniensis 8, 228-235.

Nisbet, J.D. (1960). Frequency counts and their uses. Educational Research 1960, 51-64.

Noreen, A. (1907). Vårt språk. Nysvensk grammatik i utförlig framställning. Lund: Gleerup.

Novak, L.A. (1971). Statistica delle lettere e delle combinazioni di lettere nella lingua rumena scritta. In: Tagliavini, C. (ed.), Statistica linguistica: 291-322. Bologna: Patron.

Novak, L.A. (1968). Statistika bukv i bukvosočetanij v rumynskom pis´mennom jazyke. In: Alekseev, P.M., Kalinin, V.M., Piotrovskij, R.G. (eds.), Statistika reči: 228-230. Leningrad: Nauka.

Ohlmann, N. (1958). Subject-word letter frequencies with applications to superimposed coding. In: Proceedings of the International Conference of Scientific Information 2: 903-915. Washington.

Orlov, Ju.V., Boroda, M.G., Nadarejšvili, I.Š. (1982). Sprache, Text, Kunst. Quantitative Analysen. Bochum, Brockmeyer.

Ovčinnikov, A. (1962). Polučenie paradigmy pervogo morfologičeskogo tipa v nemeckom jazyke na materiale publicističeskich tekstov. Dnepropertovsk: DGU.

Ožigova, G.I. (1965). Statistiko-kombinatornoe modelirovanie paradigmy pervogo morfologičes-kogo tipa v češskom jazyke na materiale publicističeskich tekstov. In: Andreev 1965: 96-103.

Pääkkönen, M. (1993). Graphemes and context. Glottometrika 14, 1-53.

Pandit, P.B. (1965). Phonemic and morphemic frequencies of the Gujarati language. Poona: Deccan College.

Panina, N.A. (1965). Opyt statistiko-kombinatornogo vydelenija paradigmy vtorogo morfologi-českogo tipa v serbochorvatskom jazyke. In: Andreev (1965): 241-245.

Penkov, V. et al. (1962). Frequencies of letters in written Bulgarian. Comptes rendus de l’Académie bulgare des Sciences, 15, No. 3.

Perebejnos, V.I. (1965). Častota i sočetaemost´ fonem sovremennogo ukrainskogo jazyka. In: Seminar – Avtomatizacija informacionnych rabot i voprosy prikladnoj lingvistiki: 25-30. Kiev.

Perebyjnis, V.S. (1970). Kil´kisni ta jakisni charakteristiki sistemi fonem sučasnoï ukraïnskoï literaturnoï movi. Kiïv: Naukova dumka.

Peršikov, V.F. (1965). Iz opyta statistiko-kombinatornogo modelirovanija albanskoj morfologii. In: Andreev (1965): 181-188.

Petrowa, N.W. (1973). Code-Merkmale des schriftlichen Textes. In: Alexejew, P-M., Kalinin, W.M., Piotrowski, R.G. (Eds.), Sprachstatistik: 20-70. München: Fink.

Pierce, J.E. (1957). A statistical study of consonants in New World languages (I) Introduction, (II) Data. International Journal of American Linguistics 23, 36-45, 94-108.

Piirainen, I.T. (1971). Grapheme als quantitative Größen. Linguistische Berichte 13, 81-82.

Plath, W. J. (1958). The relation frequency of English consonantal phonemes. Zeitschrift für Phonetik und allgemeine Sprachwissenschaft 11, 67-87.

Pukui, H.K., Elbert, S.H. (1957). Hawaiian-English dictionary. Honolulu: University of Hawaii Press.

Rachmanov, D.A.O. (1988). Statistiko-distributivnyj analiz azerbajdžanskogo teksta. na urovne grafem i fonem. Baku: Diss.

Rademacher, A. (1974). Untersuchungen zu den Buchstabenhäufigkeiten des See-Dajakischen. Bochum: Seminararbeit.

Ramakrishna, B.S., Nair, K.K., Chipllunkar, V.N., Atal, B.S., Ramachandran, V., Subramanian, R. (1962). Some aspects of the relative efficiencies of Indian languages. Bangalore.

Roberts, A.H. (1965). A statistical linguistic analysis of American English. The Hague: Mouton.

Roceric-Alexandrescu, A. (1968). Fono-statistica limbii române. Bucureşti: Editura Academiei RSR.

Rocławski, B. (1975). Ze studiów fonostatystycznych nad kaszubszczyną. Rozklad częstości występowania fonemów. Gdańskie Studia Językoznawcze Zakład Narodowy Im. Ossolińskich 1975, 107-130.

Rūłe, V. (1951). Lauthäufigkeit in der lettischen Schriftsprache. In: Slaviska instituts vid Lunds universitetet årsbok 1948/1949: 153-164. Lund.

Sadler, V. (1959). Relativaj oftecoj de kelkaj lingvaj elementoj en esperanto. Scienca revuo 10, 67-71.

Sauvageot, A. (1951). Esquisse de la lange hongroise. Les langues et leur structure III. Paris : Klincksieck.

Savický, N.P. (1966). Ob ustojčivosti otnositel´nych častot lingvističeskich elementov. Československá rusistika 11, 214-217.

Schönpflug, W. (1969). n-Gramm-Häufigkeiten in der deutschen Sprache. 1. Monogramme und Digramme. Zeitschrift für experimentelle und angewandte Psychologie 16, 157-183.

Schulze, E. (1974). Untersuchungen zu den Buchstabenhäufigkeiten des Hawaiischen. Bochum: Seminararbeit.

Segal, D.M. (1969). K statističeskoj charakteristike pol´skogo jazyka na fonologičeskom urovne. In: Issledovanija po pol´skomu jazyku: 20-52. Moskva: Nauka.

Seiden, W. (1960). Chamorro phonemes. Anthropological Linguistics 2, 6-35.

Seljutina, T.A. (1965). Vydelenie pervogo morfologičeskogo tipa v anglijskom jazyke metodom statistiko-kombinatornogo modelirovanija (na materiale chudožestvennogo teksta). In: Andreev (1965): 150-157.

Sharma, S., Debnath, S. (1972). A comparative study of Teutonic languages. Calcutta.

Sievers, E. (1892). Tatian, lateinisch und altdeutsch mit ausführlichem Glossar. Paderborn.

Sigurd, B. (1968). Rank-frequency distribution for phonemes. Phonetica 18, 1-15.

Siméonoff, E. (1965). In the distribution of “costs” of combinations of k letters in a written language. Statistical Methods in Linguistics 4, 45-50.

Singhal, R., Toussaint, G.T. (1978). Probabilities of occurrence of characters, character-pairs, and character triplets in English text. ALLC Bulletin 6, 245-253.

Siromoney, G. (1963). Entropy of Tamil prose. Information and Control 6, 297-300.

Solso R.L., King, J.F. (1976). Frequency and versatility of letters in the English language. Behavior Reserch Methods and Instrumentation 8, 283-286.

Steffen, M. (1957). Częstość występowania głosek polskich. Biuletyn polskiego towarzystwa językoznawczego 16, 145-164.

Stolze, F. (1891). Die Iterationsverhältnisse der Laute in der lateinischen Sprache für die Kurzschrift. Magazin für Stenographie. 1891, 47-48.

Tambovcev, J.A. (1982). Empiričeskoe raspredelenie častotnosi fonem v jazyke kazymskich kanty [v kazymskom dialekte chntyjskogo jazyka]. In: Lingvostatistika i vyčislietel´naja lingvistika: 121-135. Tartu.

Tambovcev, J.A. (1983). Phonostatistical study of Komi Zyryan vowels and consonants. Finnisch-ugrische Forschungen 45, 164-167.

Tambovcev, J.A. (1983). Empiričeskoe raspredelenie častotnosi fonem v oročskom jazyke. In: Kvantitativnaja lingvistika i stilistika: 124-125. Tartu.

Tambovcev, J.A. (1984a). Empirical distribution of the phonemes in Orokh. Typological analysis. Archiv orientální 52, 285-294.

Tambovcev, J.A. (1984b). Phoneme frequency and closeness quotient. establishing genetic relationship degrees by phonostatistics. Ural-altaische Jahrbücher 56, 103-119.

Tambovcev, J.A. (1988a). Nekotorye fonostatističeskie charakteristiki jazyka barabinskich tatar. In: Fonetika i grammatika jazykov Sibiri: 135-139. Novosibirsk: IIFF.

Tambovcev, J.A. (1988b). Phonostatistical characteristics of different dialects of Eskimo. In: 6th Inuit studies conference. Copenhagen, October 17-20, 1988: 11-17.

Thorndike, E.L. (1948). The psychology of punctuation. American Journal of Psychology 61, 222-228.

Tobias, J.V. (1959). Relative occurrence of phonemes in American English. J. of the Acoustical Society of America 31, 631-633.

Tolnai, V. (1921). A nyelvek szépségéröl. Magyar Nyelv 17, 28-32.

Tolnai, V. (1924). Halhatatlan magyar nyelv. Magyar Nyelv 20, 50-59.

Tolnai, V. (1936). Egynéhány számadat a hangorkól és betükröl. Magyar Nyelv 31, 421-425.

Toots, N. (1970). On the frequency of occurrence of the stressed vowel phonemes in present-day English. Linguistica 2, 82-111.

Trnka, B., Kanekiyo, T., Koizumi, T. (1968). A phonological analysis of present-day standard English. Alabama: University of Alabama Press.

Trubetzkoy, N.S. (1939). Zur phonologischen Statistik. Travaux du Circle Linguistique de Prague 7, 230-241.

Tuldava, J. (1980). Eesti keele sõnavara foneetilis-grafeemilised mõõted. Acta et Commentationes Universitatis Tartuensis 518, 51-100.

Tuldava, J. (1988). Opyt kvantitativnogo analiza sistemy fonem estonskogo jazyka. Acta et Commentationes Universitatis Tartuensis 838, 120-133.

Tuldava, J. (1995). Quantitative analysis of the phonemic system of the Estonian language. In: Tuldava, J., Methods in Quantitative Linguistics, Chapter 10, 161-187. Trier: WVT.

Veenker, W. (1979a). Zur phonologischen Statistik der komipermjakischen Sprache. Finnisch-Ungarische Mitteilungen 3, 13-27.

Veenker, W. (1979b). Zur phonologischen Statistik der vogulischen Sprache. In: Gläser, Ch., Pusztay, J. (eds.), Festschrift für Wolfgang Schlachter zum 70. Geburtstag: 305-346. Wiesbaden: Harrassowitz.

Veenker, W. (1981a). Problemy fonologičeskoj statistiki chantyjskogo jazyka. In: Ubrjtova, E.I., Kim Čer Len, Kuzmina, A.I., Ryžkina, O.A. (eds.), Teoretičeskie voprosy fonetiki i grammatiki jazykov narodov: 84-96. Novosibirsk.

Veenker, W. (1981b). Zur phonologischen Statistik der mordvinischen Schriftsprachen. Ural-altaische Jahrbücher 1, 33-72.

Veenker, W. (1981c). Zur phonologischen Statistik der votjakischen Sprache. In: Bereczki, G., Molnár, J. (eds.), Lakó-Emlékkönyv – nyelvészeti tanulmányok 196-213. Budapest.

Veenker, W. (1982a). Konfrontierende Darstellung zur phonologischen Statistik der unga-rischen und finnischen Schriftsprache. Nyelvtudományi közlemények 84, 305-348a.

Veenker, W. (1982b). Zur phonologischen Statistik der syrjänischen Sprache. Etudes Finno-Ougriennes 15, 435-445.

Verglas, A. (1962). Remarques sur la relation entre rang et fréquence des lettres français. Bulletin d´information du laboratoire d´analyse lexicographique 6, 29-40.

Vértes, E. (1953). Statistische Untrsuchungen über den phonetischen Aufbau der ungarischen Sprache. Acta Linguistica Academiae Scientiarum Hungaricae 3, 125-158; 411-430.

Vértes, E. (1970). Beiträge zu den typologischen Fragen des Ostjakischen. In: Dezsö, L., Hajdú, P (eds.), Theoretical problems of typology and the Northern Eurasian languages: 135-144. Amsterdam: Grüner.

Vogt, H. (1958). Structure phonémique du gérgien. Norsk Tidskrift for Sprogvidenskap 18, 5-90.

Wang, W.S.-Y., Crawford, J. (1960). Frequency studies of English consonants. Language and Speech 3, 131-139.

Weidert, A. (1972). Die Vokalphoneme des Khasi, III. Teil. Zeitschrift für Phonetik, Sprach-wissenschaft und Kommunikationsforschung 25, 506-521.

Weiss, M. (1962). Über die relative Häufigkeit der Phoneme des Schwedischen. Statistical methods in Linguistics 1, 41-55.

Whitney, W.D. (1880). On the comparative frequency of occurrence of the alphabetic elements in Sanskrit. American Oriental Society Studies 10.

Wimmer, G., Altmann, G. (1999). Thesaurus of univariate discrete probability distributions. Essen: Stamm.

Wioland, F. (1972). Estimation de la „fréquence” des phonèmes en français parlé. Travaux de l´Institut phonétique de Strasbourg 4, 177-204.

Wioland, F. (1974). Contribution à l´établissement de constantes en relation avec la fréquence des phonèmes en français parlé. Travaux de l´Institut phonétique de Strasbourg 6, 141-164.

Yokoyama, S. (1981). Occurrence frequency data of Japanese dictionary. Bulletin of the electrotechnical laboratory 45, 395-418.

Zettersten, A. (1969). A statistical study of the graphic system of present day American English. Lund: Studentenlitteratur.

Žilinskienė, V.Ju. (1978). Lietuviũ kalbos raidžiũ dažnumas publicistikos tekstuose. Kalbotyra 29, 83-95.

Zipf, G.K. (1929). Relative frequency as a determinant of phonetic change. Harvard Studies in Classical Phlology 40, 1-95.

Zipf, G.K. (1935). The psycho-biology of language. Boston: Houghton Mifflin .

Zipf, G.K. (1949). Human behavior and the principle of least effort. Cambridge: Addison-Wesley.

Zörnig, P., Altmann, G. (1983). The repeat rate of phoneme frequencies and the Zipf-Mandel-brot law. Glottometrika 5, 205-211.

Zörnig, P., Altmann, G. (1984). The entropy of phoneme frequencies and the Zipf-Mandelbrot law. Glottometrika 6, 41-47.

Zwirner, E., Zwirner, K. (1936). Die Häufigkeit von Buchstaben und Lautkombinationen. Forschungen und Fortschritte 12, 23-24, 286-287.