Language and information

Lecture 1. A formal theory of syntax

1.1. Problems and methods

This is a question of method. To consider a structure of language, people usually assume the existence of meanings and of words. A scientist coming to the matter from the outside usually expects to find regularities out of the sequential relation among the words, because that is the data that is given, sequences of words. However, these attempts have not yielded regularities. People who work habitually with language—linguists, if you wish—do have ways of describing language. They do it primarily by what is called grammatical relations. The grammatical relations by and large are adequate to describe language, to some approximation (we'll discuss this later on). However, they have certain difficulties.

One is that there are few if any grammatical relations that are universal, and common to all languages. These are questions like subjects and objects, or predicates, and things of that sort. Another is that—the situation actually is that, in each language, one is able to find something which one can call grammatical relations, but it isn't as though there were some fixed thing that runs across the whole of the system of language.

There are other difficulties. One of them is that grammatical relations are unique to language, that is, there is nothing else that has it. This makes it impossible to compare languages with anything else, not even with systems which in one way or another are close to language, presumably, like gestures on one hand, and mathematics on the other. Furthermore, the things that underlie grammatical relations—the meanings, the concept of meaning in general, what are the meaning ranges of individual words, and what are the definitions of words—all of them present extremely great difficulties. They are very hard to formulate, and therefore, they do not suffice to be the primitive elements of a general structure or theory.

We therefore go back to the beginning, to the initial question: how can one consider—how can one investigate the structure of language? Now, in general, when one wants to investigate a field, one discusses the field, analyzes it, in the metalanguage of the field. This is clear in mathematics and in logic, where the precise structure of the material in mathematics and logic makes it possible to recognize that the statements made about the field are not in the field. They do not have the structure of the things that are said in the field.

English has no external metalanguage. Mathematics and logic are described in English, in a subset of sentences of English. Language has no external metalanguage, because any language in which one could describe has to have already at its disposal words and sentences, the very things that we would be trying to describe. So there is no external metalanguage that is available for use in analyzing language. You can only analyze it with language again.

In the absence of an external metalanguage, what one can do is to rely on the non-randomness of the material, the fact that not all combinations of the entities, whatever the entities may be, are found. In fact, this is the case in all languages as far as we know—no one, certainly not I, has surveyed all languages—but in languages as far as we know not all combinations of the entities—letters, phonemes, words ...—occur. There are certain combinations that are not in the language. They are not part of the language, nobody can do anything with them, they are not recognized as in the language. Now, if we want to use this, and to find out what are the combinations which are in the language, combinations of whatever, and what combinations are not in the language, and try to find from that the regularities which would characterize language, we must have a rather precise statement of what are the elements, the entities. This is crucial, because, it isn't as though we know certain things about a language, you can know them precisely, and to judge the entities, the elements, by dividing them up into parts and so forth, we don't know what the thing that we are studying really is. Therefore, we have to be rather sure of the elements that we are working with, because all that we are going to be doing is making combinations of the elements, or rather, inspecting which combinations of elements constitute language and which combinations do not. The elements have to be precise.

I want now to sketch, very briefly, what we know that is precise about the most elementary elements, the bottom elements that we have to work with.