HISTORY OF CITATION INDEXING

The concept behind citation indexing is fundamentally simple. By recognizing that the value of information is determined by those who use it, what better way to measure the quality of the work than by measuring the impact it makes on the community at large. The widest possible population within the scholarly community (i.e. anyone who uses or cites the source material) determines the influence or impact of the idea and its originator on our body of knowledge. Because of its simplicity, one tends to forget that citation indexing is actually a fairly recent form of information management and retrieval.

There were three factors that led to the development of citation indexing back in the 1950's. With the huge influx of government dollars into research and development following World War II, the research community naturally began to publicly document its findings through the accepted channel of published scientific journal literature. The subsequent burgeoning of the literature created a need for a method of indexing and retrieval that would be more cost effective and efficient than the then-current model of human indexing of materials for subject specific indices. While the subtle judgements made by subject specialists were valuable in giving depth to a subject index, manual indexing was both a more time consuming process and labor intensive. Its costs increased in proportion to the growth of material to be indexed. So the need for a better way of managing information was the first factor.

The second factor was the growing dissatisfaction with the capacity of subject indexing to meet the needs of the active researcher. At this point in time, a subject index could have excessive lag times in adding materials to the indexes of the time; months could pass before researchers in one field would learn of published findings in some other field that had relevance to their own study. Furthermore, there were limitations to the subject indexing in terms of retrieval. Terminology appropriate to one specific discipline would not necessarily have meaning to researchers in another, perhaps overlapping, discipline. At the same time, scientists were recognizing that they had to be aware of, if not completely familiar with, work in a number of different subject disciplines in order to be confident that they had properly grounded the research through an appropriate review of the literature.

Along with this need was the hope that automation might hold the answers, the third and final factor in the development of citation indexing. Computerization in the 1950s was far removed from the desktop environment of today, but there was tremendous excitement over potential benefits to be derived from the application of machines to the generation and compilation of data. The U.S. government hoped that automation could mitigate or even eliminate completely the difficulties of manual indexing. A number of projects were launched by the United States with the intention of investigating these possibilities.

Dr. Eugene Garfield, founder and now Chairman Emeritus of ISI® (now Thomson Reuters), was deeply involved in the research relating to machine generated indexes in the mid-1950's and early 1960's. One of his earliest points of involvement was a project sponsored by the Armed Forces Medical Library (predecessor to our current National Library of Medicine). The Welch Medical Library Indexing project, as it was called, was to investigate the role of automation in the organization and retrieval of medical literature. The hope was that the problems associated with subjective human judgement in selection of descriptors and indexing terms could be eliminated. By removing the human element, one might thereby increase the speed with which information was incorporated in to the indexes. It might also increase the cost-effectiveness of the indexes. Garfield grasped early on that review articles in the journal literature were heavily reliant on the bibliographic citations that referred the reader to the original published source for the notable idea or concept. By capturing those citations, Garfield believed, the researcher could immediately get a view of the approach taken by another scientist to support an idea or methodology based on the sources that the published writer had consulted and cited as pertinent in the bibliography. As retrieval terms, citations could function as well as keywords and descriptors that were thoughtfully assigned by a professional indexer.

In the early 1960s, Eugene Garfield and Associates developed two pilot projects that would test the viability and efficiency of citation indexing. The first project involved the creation of a database that would index the citations of 5,000 chemical patents held by two private pharmaceutical companies. The referenced citations in this instance were to prior patents, the documentation sources that the government patent examiners were using to support a decision to grant or deny a patent. The connections that the patent citation index made were then analyzed with two comparable classification and indexing systems that were currently being used by the participants. Based on this investigation and analysis, the project sponsors determined that citation indexing permitted the retrieval of relevant literature across arbitrary classifications in a way that subject- oriented indexing could not.

A second pilot project in 1962 involved Garfield's recently incorporated enterprise, the Institute for Scientific Information (now Thomson Reuters), with the United States National Institutes of Health in building an index to the published literature on genetics. This project was far more complex in nature than the patents index. Three databases were built to cover the literature over 1 year, 5 years and 14 years with a varying number of source publications indexed in each. While this project was to test the feasibility and utility of a narrow, discipline-oriented citation index, at completion, it was concluded that the database with the most broadly based set of source publications formed the most comprehensive and useful guide to the published literature in the field of genetics. The database for the single-year term had drawn not just on journals that were primarily devoted to the field of genetics research but had drawn as well from a large pool of journals that published genetics papers on a more peripheral or occasional basis. Additionally, while the automated system required a certain level of effort in standardizing the entries from a wide variety of published materials, the project demonstrated the cost-effectiveness of citation indexing as opposed to the expense of traditional subject indexing processes.

While, at the time of the project's completion, the government sponsors chose not to subsidize the development of a national citation database, Eugene Garfield was encouraged to move ahead with the private publication of his multidisciplinary citation index as the first edition of the Science Citation Index® (SCI®). Available for purchase since 1963, the SCI then and now represents the most comprehensive citation index to the scientific journal literature. Today, the Web-based version of that index covers 5,600 journals across more than 150 scientific disciplines.

Garfield's achievement lay in establishing the utility and objectivity of a citation index in pulling up related papers in published literature that at first glance might not have seemed pertinent to the researcher's inquiry. Today, it is considered to be one of the most reliable of resources in tracing the development of an idea across the multitude of disciplines that are part of our body of scientific knowledge.