Language and information

Lecture 2. Sublanguages

2.3. Science sublanguages

[Audio recording—Opens a separate, blank window. Return to this window to read the transcript.]

Now, when the subject matter in question is a subfield of science, we obtain a complex linguistic system that can present precisely the information of the science, losing nothing from the original presentation in a natural language. I want to show this, and I want to consider first a specific case. This is an analysis of representative research papers in immunology which was carried out here, at Columbia, by Michael Gottfried and Thomas Ryckman and myself, later, also with Paul Mattick, Jr., with a parallel analysis being made for French by Anne Daladier.

The period of the articles that was covered was about from 1940 to 1966. It was a period when immunology was far simpler and more inspectable than it is today, and when it had a central research problem of determining which cell was the producer of antibodies. There was also a controversy at the time, as to whether it was the lymphocytes or the plasma cell, both being of the lymphatic system. Afterwards, it was found by electron microscope and other methods that both cell types produced antibodies. The controversy was resolved by the understanding that the two cell names indicated different stages of development of the same cell line. This was the answer, so to speak.

We wanted to see in this work whether we could represent all the information in these articles in an orderly and a usable way, and if we could locate in the sentence structures, and characterize, the changes in information, and also locate and characterize the disagreements—the point of disagreement.

In its barest outline—and naturally I will give ... something that takes a book I will do in a few sentences—in its barest outline, the sublanguage consisted of the following. By listing how words occurred with each other in sentences of the articles, and collecting words with similar combinability into classes, we found some fifteen main classes in these articles. Chiefly: There are classes for Antigen—all the different antigens and so forth. (I give them by a stock word, but they were established combinatorially)—for Antigen, for Antibody, for Inject, for Tissue, for Cell. Also, there is a class of operators which occurred between Antibody and Cell, such as appears in, is produced by, is secreted by; words or operators between Antigen and Tissue, such as moves to (antigen moves to a tissue); operators between Cell and Cell, either is similar to or develops into, or something like that; and verbs on Tissue or on Cell, the tissue is inflamed, the cells proliferate, and so on.

These word-classes appeared in fewer than ten sentence-types. These were the main sentence-types:

  • Antigen is injected into a body
  • Antigen moves to a Tissue
  • Cell or Tissue changes, or has some property
  • Antibody appears in Cell
  • Cell is the same as, or develops into, another Cell

These are the main sentence-types that we found. For reasons to be explained soon, we will write each class with a letter, so that Antibody appears in lymphocytes would be AGC, A for antibody, G for appears in, and C for lymphocytes, or any other cell. The many synonymous words—and there were very many, especially verbs—are considered just variant forms of a single word, and are not indicated in this notation. The non-synonymous words within a class, like the difference between different antigens, or different antibodies, or between to be secreted and to appear in a cell: a cell secretes or a cell contains antibody. The non-synonymous words within a class were marked by subscripts on the letter. A letter C marked cell and Cy for lymphocyte Cz for plasma cell.

In addition, there are modifiers that appear on words. On certain verbs, for instance, not, or increase, or begin, or have a role in; and on certain nouns, such as much, immature, family offamily of cells. These were marked by superscripts on the word which was being modified. So we have letters, we have subscripts for the subclasses, and superscripts for the modifiers on them. There is also one special conjunction. Not just a conjunction that appears freely between sentences, but a conjunctions that appeared between a fixed type of two sentences. This is a conjunction which appeared between the antigen inject sentence and the antibody appear sentence, or the tissue respond sentence, like gets inflamed or something like that. The conjunction was thereafter or afterwards, or many other forms—it took many different grammatical forms, but it was always recognized because it was between these two sentence-types, always, and in effect it would read something like antigen is injected into a body, five days later antibody appears in the cell. The time modifier, when it appeared, it always appeared on this conjunction, nowhere else; or antibody appears three days after injection of antigen, and so on.

Now, to give a very sketchy picture of what comes out of this analysis. I will do this very fast. I'll just give you a list of about ten or so types of things that came out.

One is that there was immediate separation of the meta-science words from the, let us say, the object-science words. The meta-science words were words like we expected that, or we have shown that, or is uncertain, or something like that. These are always grammatically the highest operator, no matter where they are in the sentence in the way the sentence is written, in our partial order, which I described yesterday, these words appear at the very top of the partial order. They apply to the whole sentence. They are therefore separable, very clearly, from the object science, or the science itself, which is being discussed.

Another thing was  that we obtained a gross structure for the information in the field, that is, the class-sequence formulas: AVC for antibody appears in cell, and TW for tissue is inflamed, and so forth. And as I said, we found fewer than ten gross information structures; and all this of course not semantically, but by a grammatical mapping onto the partial ordering.

Furthermore, we obtained a representation for the specific information in each sentence, and this was nothing more than the same formulas with the subscripts and the superscripts, which told you which word you were dealing with (not counting synonyms), which objects specifically, and any limitations on conditions or on time or whatever it is—or amount—that was involved [as] a modifier.

We also found type-sentence sequences. This is what was formed by this special conjunction, the thereafter type of conjunction. These sequences consisted primarily in antigen thereafter antibody but not only, because certain things could move in between, so that you doubled it. You could have antigen is injected, thereafter antigen moves to tissue, thereafter antibody appears, and so on, so that it was expandable. Also, there were divergent paths, which from a scientific point of view was very interesting, and was very evident—much more evident in this formulation than in the material written in English. The divergent path that we saw was that after antigen was injected, thereafter instead of there being a fixed thing or a fixed chain, fixed sequence of things, there were two alternatives. There was either cell changes, tissue was inflamed, or cells proliferate, or something like that, was one sentence-type; or antibody appears in cell, that's the other sentence-type. So there were two alternative results, so to speak, coming out of this one thing, which of course already has to be, so to speak, explained.

Now, we saw how slightly different research lines, that were related closely to each other, how they were differentiated. For instance, there was a different kind of research at the same time which was called donor research, where antigen was injected into one animal, cells were then taken from that animal into another animal, and then the other animal was tested for antibodies, and showed antibodies. The way this came out in these articles was that we had antigen was injected sentence but with a B1, meaning, in a particular body. Then we had thereafter, the same thereafter, cells were—sometimes using the same word injectedwere injected or removed, or whatever, transferred from one body—from the body of injection—to another body. Then the same thereafter antibody appears in the cells of the other body. So that, with a certain kind of expansion, you see the relation in the information that is being presented (or obtained) between the neighboring lines of research in the same field at the same time.

We also saw changes over time. This appeared, for instance, simply in the fact that the Tissue words were largely replaced after the first article or two by Cell words, as the research began to be able to deal with cells and not merely with gross tissue. More interestingly, a sentence comes in, after the first few articles, and this is a cell cell sentence, the cell verb sentence, where the verb was either develops into, differs from in certain respects, is similar to, and so forth. This came as the observations piled up, and laboratory equipment somewhat improved, and more was seen about the structure of the cells, and more differentiations were made about the cells. We found where there was unclarity, because in the course of making these observations about cells, very many people, making different observations named different cell types that they observed, sometimes the cell types were the same ones, sometimes the differences were so small they were unimportant, and so forth. The result is that a certain position in the cell is like cell or cell develops into cell, the second position began being very ... hospitable. It began having an awful lot of different cell names, much more than you normally have in a class of words in this material. Later on, it turns out that many of these could be eliminated by either being synonymous or being very small difference in time, and so forth. But you can see that there was unclarity at this point—and this is a general property, that one can spot the unclarities in this work.

We could locate the disagreements. The disagreements turned out to be in a particular sentence-form, and at a particular point in the sentence-form. It was in the AVC sentence, antibody appears in cell. It was in the case where the C was lymphocyte, and where the V was produced, in other words, antibodies are produced in cell. In that case, there were certain articles that said, yes, antibodies are produced in the cell, they had that sentence-form. Other articles said that antibodies are produced by plasma cells, not by lymphocytes, and they said lymphocytes only had a role in the production. So that meant that there's a little r on antibody is produced, the produced has a little r, meaning it has a role in production. Or they had a flat denial, antibody is not produced in lymphocytes. So that was a little tilde, for not on the produce word. So this was where the disagreements were, and a person could look down the formulas and see if two articles disagreed with each other; if one of them had a negative on a symbol where the other one had the same subscripts but without the negative.

And finally, the resolution was clear—was locatable—because the resolution appeared in the cell becomes cell, or cell develops into cell, or cell is similar to cell sentence. That was a well-established sentence-form at this time, but it appeared in a case where the two contenders—the lymphocytes and the plasma cell—appeared as the subject and the object of that sentence. In other words, they were joined, now, by the sentence lymphocytes develop into plasma cells. Now, of course, if you read very slowly, and you look at all the articles, you can see that. Not that they quite exactly say that in the sentence—it comes out in the passive, or this or that. In the formulas, it's glaring. The fact that the two contenders appear as the subject and the object of the develops into sentence-type means that the issue has been settled.