Biosis Archive

The story behind the unique collection of life sciences backfiles

The Story

Before the Great Depression and World War II … before the explosive growth of federal funds for research and development and the huge swelling of graduate enrollments and degrees … two professional societies joined forces to create a new kind of life sciences resource. In 1926, the Society of Bacteriologists and the Botanical Society realized there was a need for a new kind of information source in the life sciences. One that brought together the most important information in this field from worldwide resources, making data more accessible and searchable. To this end, the two societies merged their publications — the Abstracts of Bacteriology and Botanical Abstracts — to create Biological Abstracts. Over the years, Biological Abstracts has chronicled the rapidly changing world of life sciences research. The.complete backfiles of this important resource are available in digital format —.completely indexed, standardized and searchable.

The process:

Indexing, digitizing and integrating decades of data

Forty-nine bound print volumes were scanned, with all information tagged and totally integrated into the current BIOSIS format to make it digitally searchable. Joel Hammond, Director of BIOSIS Product Development and Management, Bruce Kiesel, Director of Knowledge Base Management, and Beth Ten Have, BIOSIS Product Development Manager, oversaw the.compilation, indexing, and digitization of this extensive archive. As the work proceeded, it became evident to them that these backfiles presented unique information that greatly added to the value of the total BIOSIS data collection. The valuable historical data go beyond the articles themselves, and include original summaries, abstracts, translations, regional source material and enhanced titles that make each item more searchable. “Today, titles and abstracts are left alone,” says Hammond. “But then, they were looked at as part of the larger indexing stream, so they were supplemented with indexing terms to make them more valuable as a search medium. The original editors added organism names, geographic areas, and chemical names to titles and abstracts. We’ve kept these additions. In today’s search environment, they add a great deal of usefulness. In modern searches, keyword searches yield better retrieval results than title searches. So this added information is very important.”


Created by life scientists, for life scientists.

Targeted life sciences coverage

The backfiles within the BIOSIS Archive came out of the fabric of the life The original collection was created and funded by societies, not publishers. And so it reflects the realities and needs of working biologists. The original item-based selection policy helped reviewers and editors focus on selecting appropriate items for inclusion, instead of entire journals. The paramount question asked was “Is this item relevant to the life sciences?” and this approach resulted in focused, selective coverage. BIOSIS Archive presents a targeted life sciences view – including many unique titles not found even in Web of Science™ Core Collection. In addition to extensive journal coverage, these archives also include items from proceedings, patents and books. “Sometimes, there was just no coverage of scientific meetings anywhere in the literature,” explains Hammond. “If there was no journal or book equivalent, there was no coverage of a meeting. So when abstractors wrote reviews and summaries of meetings — including lists of sessions — this information often appeared nowhere else in the literature. Also, a fair number of patents were covered in the early years. Back then, as today, biologists were looking at patents because patents constituted a part of the literature that was important to them.”

Enhanced abstracts offer unique insights

One unique aspect of these archives is that they were originally assembled and enhanced by young biologists of the time — many of whom are now well-known for their work. Contributors such as Barbara McClintock, a Nobel Laureate in Physiology and Medicine and one of the world’s most distinguished geneticists, served as both authors and editors. As TenHave points out: “Many articles of this time did not offer abstracts. So these contributors devoted considerable time and scholarship to write them. Some of the added abstracts were much more than an abstract — they offered summaries of meetings, extensive book reviews and in-depth summaries of monograph-length works covering important topics such as new species. Sometimes these entries would be the only insight a U.S. scientist might have of life sciences research in other parts of the world such as China or Russia.” As Hammond says, “They were creating a modern bibliographic work; something that never existed before — a biological abstract.”

Additional keywords provide unprecedented access

Contributors also included additional keywords that provided rich entry points into the literature. Hammond states: “The only way to equal the access these keywords provide would be to digitize and index the full text of over two million scholarly print documents. And some of these documents remain untranslated and only available in foreign language publications scattered throughout the world. So the depth and breadth of access provided by the indexing is truly unmatched elsewhere.” “All these elements – additional abstracts, enhanced titles and keywords – are essential parts of BIOSIS Archive and make it a uniquely useful life sciences tool.”


The challenge:

Creating a searchable archive covering over 100 years of data

In addition to the unique content, the creation of BIOSIS Archive also offered unique challenges. The Thomson Scientific Knowledge Management group, as it was called at the time, headed by Bruce Kiesel, was responsible for creating and managing the knowledge bases used for indexing of all data. “Editorial practices have changed significantly over the course of the last century, and the standards and conventions used today — including formats, subject categories, terminology, and fonts – are quite different than what indexers and editors encountered when examining these historical papers,” says Kiesel. “Format changes occurred from year to year and editor to editor. Journals were.combined, changed titles, or stopped publication entirely. Special care had to be taken to discern what was intended by the original editors, and what is still relevant and useful to today’s life sciences researchers.” As Keith MacGregor, Executive Vice President, Academic & Government Markets, Clarivate Analytics (formerly Thomson Scientific) says: “With the digitization of the print volumes, we’re trying to get as close as we can to our current electronic version, in format and in content. Our goal is to make a seamless search possible that includes both older and newer records.”

Benefiting from past experience

Fortunately, the Knowledge Management group was able to draw on the Century of Science™ project, which digitized Web of Science Core Collection backfiles to 1900. For both projects, OCR (Optical Character Recognition) scanning proved to be the method that provided more easily searchable records than searchable PDFs would have. And the.combination of human term mapping and data manipulation with.computerized scanning assured a highly accurate, usable data file. In many ways, digitizing original Biological Abstracts files was much easier than the Century of Science project proved to be. To create BIOSIS Archive, journal selection, location and indexing were not issues, since it was already known which records to include for this project: all of Biological Abstracts from 1926-1968. There were no algorithms to run and no journals to track down, because all the old, bound print volumes were already in-house.

The result

A unique collection that supports current research

BIOSIS Archive enables researchers to look at past works with the added perspective of today’s knowledge. “Many times”, Ten Have states, “this ability to look at previous studies through a modern lens can result in a newly discovered use for a.compound that may have been previously tested and fallen out of favor. For instance, thalidomide was once used for morning sickness but is now used to treat chemotherapy patients’ side effects.” Hammond adds: “The information you can find in BIOSIS Archive goes way beyond drugs. It also include data vital to subjects such as environmental toxicology. For example, DDT. Originally used as a pesticide, it is now used to fight human diseases.” Keith MacGregor sums up the significance of BIOSIS Archive: “Why is this collection important? Because it documents the history of the life sciences worldwide. It brings together a breadth and depth of information that can’t be found elsewhere in one easily searchable format. And in doing so, it supports current life sciences research and gives today’s researchers an invaluable and unique tool.”