Aims and methods of quantitative linguistics
Overview
While the formal branches of linguistics use only the qualitative mathematical means (algebra, set theory) and logics to model structural properties of language, quantitative linguistics (QL) studies the multitude of quantitative properties which are essential for the description and understanding of the development and the functioning of linguistic systems and their components. The objects of QL research do, therefore, not differ from those of other linguistic and textological disciplines; nor is there a principal difference in epistemological interest. The difference lies rather in the ontological points of view (do we consider a language as a set of sentences with their structures assigned to them, or do we see it as a system which is subject to evolutionary processes in analogy to biological organisms, etc.) and, consequently, in the concepts which form the basis of the disciplines.
Differences of this kind form the ability of a researcher to perceive – or not – elements, phenomena, or properties in his area of study. A linguist accustomed to think in terms of quantities, probabilities and trends is more likely to find the study of properties such as length, frequency, age, degree of polysemy etc. interesting and necessary than a researcher who thinks in terms of set theory and algebra does. There is, however, an immense number of of properties and processes in language which can be detected and analysed only with quantitative methods on the basis of quantitative concepts: features and interrelations which can be expressed only by numbers or rankings.
And there are interrelations among these features which play central roles in the development of language(s) because their consequences form the structures and properties we can observe in language and text. Among these interrelations are, e.g. dependences of length (or complexity) of syntactic constructions on their frequency and on their ambiguity, of homonymy of grammatical morphemes on their dispersion in their paradigm, the length of expressions on their age, the dynamics of the flow of information in a text on its size, the probability of change of a sound on its articulatory difficulty … in short, in every field and on each level of linguistic analysis – lexicon, phonology, morphology, syntax, text structure, semantics, pragmatics, dialectology, language change, psycho- and sociolinguistics, in prose and lyric poetry – phenomena of this kind are predominant. They are observed in every language in the world and at all times.
Moreover, it can be shown that these properties of linguistic elements and their interreations abide by universal laws, which can be formulated in a strict mathematical way – in analogy to the laws of the well-known natural sciences. Emphasis has to be put on the fact that these laws are stochastic; they do not capture single cases (this would neither be expected nor possible), they rather predict the probabilities of certain events or certain conditions in a whole. It is easy to find counter-examples to any of the examples cited above. However, this does not mean that they contradict the corresponding laws. Divergences from a statistical av-erage are not only admissible but even necessary – they are themselves determined with quan-titative exactness. This situation is, in principle, not different from that in the natural sciences, where the old deterministic ideas have been disused since long and have been replaced by modern statistical/probabilistic models.
The role of QL is now to unveil corresponding phenomena, to systematically describe them, and to find and formulate laws which explain the observed and described facts. Quantitative interrelations have an enormous value for fundamental research but they can also be used and applied in many fields such as computational linguistics and natural language processing, language teaching, optimisation of texts etc.
Early modern linguistics, in the time after the seminal contribution of de Saussure, was mainly interested in the structure of language. Consequently, linguists adopted the qualitative means of mathematics: logics, algebra, set theory. The historical development of linguistics and a subsequent one-sided emphasis on certain elements in the structuralist achievements resulted in the emergence of an absolutely static concept of system, which has prevailed until our days. The aspects of systems which exceed structure, viz. functions, dynamics, processes, were disregarded almost completely. To overcome this flaw, the quantitative parts of mathematics (e.g., analysis, probability theory and statistics, function theory, differential and difference equations) must be added to the qualitative ones, and this is the actual aim of QL.
Last but not least, important applications in the fields of language and text technology, computational linguistics etc. have adopted quantitative methods because purely qualitative means failed in practice. Nowadays, most working systems in these fields apply QL techniques and, therefore, gain increasing interest also among teachers and students.
References
Köhler, Reinhard: Gegenstand und Arbeitsweise der quantitativen Linguistik. In: Reinhard Köhler, Gabriel Altmann and Rajmund G. Piotrowski [eds.]: Quantitative Linguistik. Ein internationales Handbuch. Quantitative Linguistics. An international Handbook. (=HSK27) Berlin, New York: de Gruyter, pp. 1-15.
Köhler, Reinhard, Altmann, Gabriel: “Quantitative Linguistics”. In: The Cambridge Encyclopedia of the Language Sciences, ed. Patrick Colm Hogan, (to appear).