
DATA REPOSITORIES VERSUS DATA SETS
The Data Citation Index captures all available metadata for the data repositories we index. In many cases, this available metadata is very granular and the repository is broken into a variety of child data types (studies, sets). In other instances the content will only appear as a single repository record. This can be caused by two separate scenarios – one that will change over time and one that will remain as indexed at launch.
In some cases a data repository is still working to create a uniform structure that will enable its content to be indexed at the granular level. Through our evaluation and selection process, Thomson Reuters determined that because the content in the repository is so critical to Web of Knowledge users and because the repository is working with us to implement a more consistent data structure, the data would be made available within the Data Citation Index as the repository formatting work is underway.
In many other instances, however, the data repository will always only include a single record in Data Citation Index with no affiliated child records unless the repository itself makes significant architectural changes – and its users in turn change the way they interact with the repository. These repositories are essentially one large record and the citable object is the repository itself. When thinking about these repositories, data set records and data studies do not exist. Usually either the repository is one big collection of data (maybe from a specific project) or bespoke data are exported as some sort of report from the repository by the user entering various parameters, and an internal repository search pulls together appropriate data based on a variety of criteria. The nature of these repositories is to be a large trunk filled with data related to a specific subject or topic as opposed to a neatly organized cabinet with several separate drawers and shelves.
COVERAGE OF REGIONAL REPOSITORIES
While data repositories are often affiliated with a specific institution or organization, their content can be global in origin. Researchers from around the world tend to select the data repository that is most applicable to the subject area they are examining as opposed to one based on geographical proximity. For the first phase of the Data Citation Index we have identified data repositories that have the some of the most relevant, widely applicable data and prioritized these for early stage inclusion.
However, there is likely to be truly regional content only available within regional repositories that is important to our customers. As with all of our products, we will closely monitor usage trends and feedback from our customers to ensure our content strategy aligns with the market’s needs.
We have aggressive goals associated with indexing additional data repositories each year as the Data Citation Index develops from an essential database of data sets and studies to a fully integrated web of citation and analytics. Part of this will include monitoring these regional needs and determining the best way to meet researchers’ expectations. As with regional citation indexes on Web of ScienceSM, we will look for the most relevant content from reliable, sustainable data repositories to add content and context to the Data Citation Index as the product matures.
Repository |
Discipline |
Responsible Organization |
| Archaeological Data Service | Social Sciences | University of York |
| Array Express | Life Sciences | European Bioinformatics Institute |
| Association of Religion Data Archives | Arts and Humanities | Pennsylvania State University |
| Australian Antarctic Data Centre | Climatology | Australian Government; Department of Sustainability, Environment, Water, Population and Communities |
| Australian Data Archive | Social Sciences | Australian National University |
| BioMagResBank | Life Sciences | University of Wisconsin |
| British Antarctic Survey | Physical Sciences | Natural Environment Research Council |
| British Atmospheric Data Centre | Atmospheric Science | Natural Environment Research Council |
| British Geological Survey | Physical Sciences | Natural Environment Research Council |
| British Oceanographic Data Centre | Physical Sciences | Natural Environment Research Council |
| CaArray | Life Sciences | National Cancer Institute |
| caNanoLab | Life Sciences | National Cancer Institute |
| CanGEM | Life Sciences | University of Helsinki |
| CEBS | Life Sciences | The National Institute of Environmental Health Sciences |
| Cell Centered Database | Neuroscience | University of California |
| Centre for Ecology and Hydrology | Physical Sciences | Natural Environment Research Council |
| Codex Sinaiticus | Arts and Humanities | The British Library/ Leipzig Univeristy Library/ St. Catherine's Monastery/ The National Library of Russia |
| CPLA | Life Sciences | University of Science and Technology of China |
| Crystallography Open Database | Physical Sciences | Vilnius University* |
| Disprot | Life Sciences | Indiana University School of Medicine/ Temple University |
| DrugBank | Life Sciences | University of Alberta* |
| Dryad | Life Sciences | National Evolutionary Synthesis Center |
| EcoGene | Life Sciences | University of Miami |
| eCrystals | Crystallography | University of Southampton |
| Emage | Life Sciences | Medical Research Council |
| EMDB | Life Sciences | European Bioinformatics Institute |
| Esther | Life Sciences | French National Institute for Agricultural Research* |
| Eurostat | Social Sciences | European Union |
| Finnish Social Science Data Archive | Social Sciences | University of Tampere |
| Gene Expression Omnibus | Genetics | National Center for Biotechnology Information |
| Greengenes | Life Sciences | Lawrence Berkeley National Laboratory |
| GWAS Central | Life Sciences | University of Leicester* |
| Infevers | Life Sciences | Institute of Human Genetics* |
| Inter University Consortium for Political and Social Research | Social Sciences | University of Michigan |
| IQSS | Social Sciences | Harvard University |
| Michigan Corpus of Academic Spoken English | Arts and Humanities | University of Michigan |
| Microkit | Life Sciences | University of Science and Technology of China |
| miRBase | Life Sciences | University of Manchester |
| Mouse Phenome Database | Life Sciences | The Jackson Laboratory |
| National Archives | Social Sciences | U.S. National Archives and Records Administration |
| National Snow and Ice Data Centre | Environmental Science | University of Colorado, Boulder |
| NERC Earth Observation Data Centre | Physical Sciences | Natural Environment Research Council |
| nmrshiftdb2 | Chemistry | Johannes Gutenberg University* |
| NOAA Paleoclimatology | Physical Sciences | National Oceanic and Atmospheric Administration |
| Nucleic Acid Database | Life Sciences | Rutgers, The State University of New Jersey |
| Oak Ridge National Laboratory Distributed Active Archive Center | Multi-discipline | U.S. National Aeronautics and Space Administration |
| Odum Institute | Social Sciences | Odum Insitute, University of North Carolina |
| Office for National Statistics | Social Sciences | UK Statistics Authority |
| Old Bailey Proceedings Online | Arts and Humanities | Humanities Research Institute |
| Pangaea | Earth Sciences | Alfred Wegener Institute for Polar and Marine Research/ Center for Marine Environmental Sciences, University of Bremen |
| PHI-base | Life Sciences | Rothamsted Research |
| Protein Data Bank | Life Sciences | Research Collaboratory for Structural Bioinformatics |
| Pseudobase | Life Sciences | Institute of Theoretical Biology/ Leiden Institute of Chemistry, Leiden University |
| QTL Archive | Life Sciences | The Jackson Laboratory |
| Reading Experience Database | Arts and Humanities | The Open University |
| Refold | Life Sciences | Monash University* |
| Roper Center | Social Sciences | Roper Center, University of Connecticut |
| Sloan Digital Sky Survey | Astronomy | Astrophysical Research Consortium |
| South African Data Archive | Social Sciences | National Research Foundation |
| Stanford Microarray Database | Genetics | Stanford School of Medicine |
| Tardis | Physical Sciences | Monash University |
| The Cell: An Image Library | Life Sciences | The American Society for Cell Biology |
| The Dataweb | Social Sciences | US Census Bureau |
| TreeBASE | Life Sciences | National Evolutionary Synthesis Center |
| U.S. National Oceanographic Data Center | Physical Sciences | United States Department of Commerce |
| UK Data Archive | Social Sciences | University of Essex |
| Uniprobe | Life Sciences | Bulyk Laboratory (Department of Medicine at Brigham and Women's Hospital/ Harvard Medical School) |
| Uniprot | Life Sciences | European Bioinformatics Insitute/ Swiss Institute of Bioinformatics/ Protein Information Resource |
| World Values Survey | Social Sciences | World Values Survey Association |