This essay was originally published in the Current Contents print editions April 15, 1994, when Thomson Reuters was known as the Institute for Scientific Information (ISI)

In exploring the unique advantages of citation indexing, we have looked at its usefulness in conducting searches, its relationship to other systems, and its flexibility in controlling the amount of information retrieved.1, 2, 3, 4 Building on this familiarity with citation indexing, we will now examine and explain the similarity as well as the perceived lack of similarity between citing and cited papers. (As you will recall from the first essay of this year, the cited work is a paper or book that has been mentioned in the references of other works, and the citing work is the one that contains the references.)

Are They or Aren't They?

There is a basic assumption that citing and cited references have a strong link through semantics. Different studies have offered disparate findings on the validity of this assumption, and a like number of theories have been offered to explain those findings. Since an understanding of the interplay between citing and cited articles is key to an understanding of citation indexing, we will look carefully at these studies.

In a well-designed study, Peters et al. show that publications with a citing relationship as well as bibliographically coupled publications--those that have one or more cited documents in common--are content-related.5 The study looked at the cognitive resemblance, or subject-relatedness, between citing and cited publications as well as the relatedness of bibliographically coupled publications in the interdisciplinary field of chemical engineering. The test examined cognitive resemblance with word-profile similarity and mapping. The study supports the results of an earlier study by Braam et al. that shows relatively strong cognitive resemblance within consensus groups for agricultural biochemistry and chemoreception.6

On the other hand, a recent study by Harter et al. suggests that the subject similarity between citing and cited documents is usually small.7 In the study, only one indexing term in ten for the citing documents was shared with the indexing terms assigned to cited documents. The study concludes that there is only a weak link between cited and citing papers in the library literature.

Explanations and Interpretations

Tiered Citations In reply to the findings of Harter et al., Blaise Cronin suggests that a possible reason for the seemingly counterintuitive result is the use of tiered citations.8 That is, the difference in importance between a very broad citation that cites the works of an author in general and a very targeted citation that cites just one word or phrase from a single article may not be apparent in the index.

Citation Motivation Although initial research on the topic of citation motivation has produced interesting results, systematic studies of citation behavior are needed. A clearer understanding of the motivational factors in citation behavior would surely shed light on the relationship between citing and cited papers. As it stands, some of the more commonly accepted motives are: recognition of work done previously, identification of methodology, justification, substantiation of claims, correction of one's work or the work of others, self-citation, and persuasion. And a key distinction must be made between studies in natural sciences versus those in social sciences and the humanities. In the latter, highly specific citation of papers is the norm.

Linguistic Interpretation When information scientists discuss the relationship between citing and cited documents they create probabilistic descriptions of the average situation. The reality of specific situations varies considerably from field to field. In my paper about the linguistic aspect of this and other situations, I indicate that a full text analysis of a scientific paper can never be complete unless it takes into account the cited documents and their full texts. This is especially true when considering certain selected groups of papers that are more directly related to the research in question. I call this "metatext," that is, the text of the cited paper.

A great deal of publication in science consists of a series of cumulative papers that are the result of many years of evolving research. The "ethics" of publication or the economics of limited journal space do not permit the full repetition of what has been previously reported. The surrogate or substitute for reiterating implicit knowledge is the reference citation.

KeyWords Plus ® Many journals compromise the usefulness of the already abbreviated but crucial linking of related documents through references by eliminating the title of the cited paper. In KeyWords Plus, this semantic link is restored. In effect, we restore a piece of the metatext. The experiments reported by myself and Irving Sher demonstrate the usefulness of this "derivative indexing" method.10

Interestingly enough, some authors contend that we have sometimes supplied nonrelated terms for KeyWords Plus. These same authors are surprised when we are able to demonstrate that not only is the topic in question mentioned in the paper, but enough papers are cited in the discussion for it to pass the KeyWords Plus threshold.


Relatedness is quite variable. It can range from a total match to a situation in which there is no apparent semantic tie that would establish a reasonable connection. The study of the relationship between citing and cited articles--and, for that matter, citing and cited journals--is interesting and informative. Next month, we will look at a very interesting way to use citation indexing. We will explore a method for identifying noninteractive yet logically related pairs of medical literatures.

Dr. Eugene Garfield
Founder and Chairman Emeritus, ISI


