Language and information

Lecture 2: Information

2.4. Science languages

[Audio recording—Opens a separate, blank window. Return to this window to read the transcript.]

At this point, we can explain "why symbols?" The symbols are important, not just as a matter of convenience. They enable us to avoid synonyms and phrase composition—the whole question of internal structure of phrases which are members of a single word-class. The subscripts are convenient for sub-classification. And, very important, they enable us to be free of the grammatical requirements of the language which are irrelevant to the science. There are some sciences to which quantity is irrelevant, so plural is troublesome. There are some sciences for which time is irrelevant, so that tense is troublesome. If you are speaking English, you have to carry these things, and this muddies the representation of the information. In the symbols, this material is avoided.

We can now move from science sublanguages to science languages. These are systems, therefore, of information formulas which are proposed as seen just now. They are independent of the original language. The symbols, the representative symbols, are regarded as independent of the original language, and in fact in the analysis of French, the articles which were on the same problem reached the same formulaic representation. It didn't matter that in one case the articles were French, and in the other case the articles were English.

Science languages therefore constitute a new kind of system. The main feature that a science language has in common with natural languages is the partial ordering that creates predication, and sentencehood. But this mathematics also has. Science languages also retain from natural languages the relative clause construction, which yields modifiers—adjectives and adverbs—which mathematical notation (at least) lacks. Approximately like mathematics, science languages do not have likelihood gradings of words in respect to their particular operators, because they only have subclasses. But they do use the subclasses, in respect to operator and argument, which mathematics does not. And perhaps even more than mathematics, science languages avoid the reductions, and therefore the paraphrases, which are characteristic of language. For example, both antibody appears in lymphocytes, and lymphocytes contain antibody are formally projected onto AVC. It's the same formula and the same information. The fact that the relation has been turned around by different, synonymous verbs doesn't matter to the science.

I want to say one additional point about the sentences of the science language. These sort of canonical formulas which give you the typical sentence-types in the science do no have to be elementary sentences. In language, they have to be the elementary things or the elementary sentences. In a science, that is not necessarily the case. I'll give you an example from a report that was published on pharmacology texts, where a typical structure in pharmacology articles is exemplified by the sentence digitalis affects the beating of the heart. In this sentence, there's an elementary sentence, the heart beats. It is acted on by certain operators such as affect, whose first argument is usually a drug name, usually digitalis, often with a modifier specifying the dosage. Now, this is a sentence of pharmacology, it's in fact an elementary sentence-struture for pharmacology: nothing smaller than this structure is a pharmacology sentence. The heart beats is a sentence, but it's not a pharmacology sentence. For pharmacology, this is an elementary sentence-type, and this is a very common sentence-type, apparently, in those articles. A canonical formula may also contain a pair, or some other sequence, of elementary sentences. It may not be that the pairing by a certain conjunction, or something of this sort, may be crucial to the science. A semi-crucial case is in immunology, in this conjunction, the thereafter conjunction which I mentioned, which relates the response of the tissue to the injection.

Now, there is a certain structure in which research is still at a very early stage, but I would like to mention it, though this is of course very indefinite. This is the dependence among formulas, and above all their sequencing. If we look at the typical division of articles, laboratory articles, into Procedures, Results, Discussion, these are not just habitual or traditional, or something, they are in fact on the whole very very good and useful, and grammatically characterizable divisions. If we look at the sentences in the Procedures sentences and the sentences in the Results section, we find that in the Procedures section the formulas for the sentences identify particular antigens—they took what they deal with—the number and times and condition of injection, the precise location of injection, what animal, what part of the animal, and so forth. And a few other things. The result section...has a different sentence-type. For example, the one we have seen, of antibody is injected, thereafter, antibody appears in cell—a different one. But there are certain words that are shared by the two. And what is more important than the mere sharing of words is the fact that the result sentences are seen to be partially dependent on the Procedure sentences. I'll put it only in a simple way, and not say more on this now. Certain kinds of result sentences, one can show, could not appear in a given article if the Procedure sentences were different. Which really is not so surprising, of course, they are connected to what the procedures were, but the point is that we can show it, and we can show it in a precise way.

Now, a much more important situation is found if we compare the Discussion section with the Result section. Whereas the Result section formulas are largely different from the Procedure ones, as we just now saw, the Discussion section for the most part does not bring in new formulas. What it does, is, it consists of certain slightly modified Result formulas, sentences from the Result section slightly modified, with classifiers let's say instead of specific words, and certain other interesting differences, and, above all, arranged in particular orders and arranged with particular conjunctions.

Now, this is of course faintly—very, very faintly—reminiscent of the syntactic conditions for proof in mathematics. It raises the question whether we can specify conditions on Result sentences: on what they contain, and on their ordering, and on their conjunctions, such that particular arrangements of particular Result sentences would justify the reasonableness or the truth of the Concluding sentence on grounds of the truth of the preceding sentences, which were Result sentences. This is not a pious hope about reshaping science argument into mathematial proof. It wouldn't reach mathematical proof in any case, of course. But it is an investigation into how Result sentences lead to justified Conclusion sentences in the actual reports of the science, at least those reports which are not considered wrong.

Now, to consider the uses of these formulas, short of what further research may be able to do with them. Note first, that since very many sentences, especially within the same section of an article, are cases of the same gross formula. The formulaic representation of a section of an article approaches a double array. That is to say, you have for example the antigen is injected, thereafter antibody appears sentence repeated ten or fifteen or twenty times. Sometimes it's one intrusion of something else, but mostly just that sentence, time after time. The whole work, of course, is done in the different superscripts, the different modifiers, the conditions, the amounts, and so forth, and the different subscripts, the different particular things, particular antigens, particular antibodies, an things of that sort. This [ 9:36.5] structure, and the double array locate the information. We know where to look for it, if it is there, and we know what form each item of information would have. In principle, the repeated formulas permit retrieval of the specific information, although making this structure into a computer capability would require a great deal of very sophisticated work. So, just the fact that one says that it's a possibility doesn't mean that it is done quite that easily.

In any case, it makes the information—that is, the whole structure that we have seen here makes the information inspectable, even by the human eye. And we must bear in mind that all research has imprecisions at one point or another. It also makes possible precise comparison of different documents in the science. And, if we want to think of the future, it may be possible to analyze information in real time so the critique of past work can be used to affect ongoing work in the field, if the field doesn't change too fast.