DATA REPOSITORIES VERSUS DATA SETS

The Data Citation Index captures all available metadata for the data repositories we index. In many cases, this available metadata is very granular and the repository is broken into a variety of child data types (studies, sets). In other instances the content will only appear as a single repository record. This can be caused by two separate scenarios – one that will change over time and one that will remain as indexed at launch.

In some cases a data repository is still working to create a uniform structure that will enable its content to be indexed at the granular level. Through our evaluation and selection process, Thomson Reuters determined that because the content in the repository is so critical to Web of Science users and because the repository is working with us to implement a more consistent data structure, the data would be made available within the Data Citation Index as the repository formatting work is underway.

In many other instances, however, the data repository will always only include a single record in Data Citation Index with no affiliated child records unless the repository itself makes significant architectural changes – and its users in turn change the way they interact with the repository. These repositories are essentially one large record and the citable object is the repository itself. When thinking about these repositories, data set records and data studies do not exist. Usually either the repository is one big collection of data (maybe from a specific project) or bespoke data are exported as some sort of report from the repository by the user entering various parameters, and an internal repository search pulls together appropriate data based on a variety of criteria. The nature of these repositories is to be a large trunk filled with data related to a specific subject or topic as opposed to a neatly organized cabinet with several separate drawers and shelves.

COVERAGE OF REGIONAL REPOSITORIES

While data repositories are often affiliated with a specific institution or organization, their content can be global in origin. Researchers from around the world tend to select the data repository that is most applicable to the subject area they are examining as opposed to one based on geographical proximity. For the first phase of the Data Citation Index we have identified data repositories that have the some of the most relevant, widely applicable data and prioritized these for early stage inclusion.
However, there is likely to be truly regional content only available within regional repositories that is important to our customers. As with all of our products, we will closely monitor usage trends and feedback from our customers to ensure our content strategy aligns with the market’s needs.

We have aggressive goals associated with indexing additional data repositories each year as the Data Citation Index develops from an essential database of data sets and studies to a fully integrated web of citation and analytics. Part of this will include monitoring these regional needs and determining the best way to meet researchers’ expectations. As with regional citation indexes on Web of ScienceSM, we will look for the most relevant content from reliable, sustainable data repositories to add content and context to the Data Citation Index as the product matures.