Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories

Authors

DOI:

https://doi.org/10.5334/dsj-2019-003

Keywords:

Data discovery, Usability, Data repository, Requirements and recommendations, FAIR data

Abstract

As data repositories make more data openly available it becomes challenging for researchers to find what they need either from a repository or through web search engines. This study attempts to investigate data users’ requirements and the role that data repositories can play in supporting data discoverability by meeting those requirements. We collected 79 data discovery use cases (or data search scenarios), from which we derived nine functional requirements for data repositories through qualitative analysis. We then applied usability heuristic evaluation and expert review methods to identify best practices that data repositories can implement to meet each functional requirement. We propose the following ten recommendations for data repository operators to consider for improving data discoverability and user’s data search experience:

1. Provide a range of query interfaces to accommodate various data search behaviours.

2. Provide multiple access points to find data.

3. Make it easier for researchers to judge relevance, accessibility and reusability of a data collection from a search summary.

4. Make individual metadata records readable and analysable.

5. Enable sharing and downloading of bibliographic references.

6. Expose data usage statistics.

7. Strive for consistency with other repositories.

8. Identify and aggregate metadata records that describe the same data object.

9. Make metadata records easily indexed and searchable by major web search engines.

10. Follow API search standards and community adopted vocabularies for interoperability.

Author Biographies

Mingfang Wu, Australian Research Data Commons

Mingfang Wu received her PhD in computer science from Royal Melbourne Institute of Technology (RMIT) University, Australia. She has published in the research area of information retrieval, especially user interaction with information retrieval systems; and recently in the areas of data provenance and eResearch infrastructure. Mingfang is holding a Senior Business Analyst position at Australian Research Data Commons (ARDC), overseeing ARDC funded eResearch infrastructure projects and researching on description, curation, publication and discoverability of research data and software. Mingfang worked at CSIRO ICT Center and RMIT University as a research scientist in the area of information retrieval prior to join ARDC.

Fotis Psomopoulos, Institute of Applied Biosciences, Centre for Research and Technology, Hellas, Thessaloniki, GR; Dept of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm

Dr Fotis E. Psomopoulos works as a Postdoctoral Researcher at the Institute of Applied Biosciences (INAB) at the Centre for Research and Technology Hellas (CERTH) with a primary focus on bioinformatics workflows and computational analysis of large bio-datasets. He has received his doctorate and engineering diploma from the Department of Electrical and Computer Engineering at the Aristotle University of Thessaloniki (AUTH), in 2010 and 2004 respectively. In 2014 he has been awarded with a post-doctoral fellowship scholarship by the AUTh Research Committee. He has worked as a teaching assistant and academic fellow at the Aristotle University of Thessaloniki, as a Visiting Professor at Quest University (Vancouver, Canada) and as an Adjunct Lecturer at the University of Western Macedonia. Moreover, he has participated as a researcher and project manager in several European and national research projects. His research interests are mainly focused on Bioinformatics (Phylogenetic Profiling and NGS data analyses), Data Mining techniques for knowledge extraction form large bio-datasets and Parallel and Distributed Computing (such as Cloud Computing) among others. In this context, he has been selected as an EGI Champion on Bioinformatics in 2013 and as an RDA Early Career Fellow in 2016. Finally, in addition to his research activities, he is also engaged in several national and international training activities on NGS Data Analysis and Cloud Computing; he is a certified Carpentries Instructor and Trainer, and has been recently elected deputy Training Coordinator for ELIXIR Greece.

Siri Jodha Khalsa, National Snow and Ice Data Center, University of Colorado, Boulder

Siri Jodha Singh Khalsa received a PhD in Atmospheric Science from the University of Washington, Seattle. He has published in the fields of glaciology, satellite remote sensing, global atmospheric teleconnections, air-sea interaction, boundary layer turbulence and earth science informatics. He is on the research faculty of the University of Colorado, Boulder where he performs science evaluation and algorithm support for data products from NASA’s Earth observing system. He is the chair of the Institute of Electrical and Electronics Engineers (IEEE) Geoscience and Remote Sensing Society (GRSS) Standards Technical Committee and is the GRSS liaison to ISO/Technical Committee 211 and the Open Geospatial Organization. He sits on the Program Board of the Intergovernmental Group on Earth Observations (GEO) and steering committee of the World Meteorological Organization’s Polar Prediction Project.

Anita de Waard, Research Data Collaborations, Elsevier

Anita de Waard has Masters (docturandus) of Science in low-temperature physics from Leiden University, and worked in Moscow before joining Elsevier as a physics publisher in 1988. Since 1997, she has worked on bridging the gap between science publishing and computational and information technologies, collaborating with groups in Europe and the US. Her past work includes developing a semantic model for research papers, and co-founding the interdisciplinary member organization Force11.org. For her current remit as VP Research Data Collaborations, Anita is developing cross-disciplinary frameworks to store, share and search experimental outputs, in collaboration with academic and government groups. She co-chairs the RDA Data Discovery Interest Group, and is on the Steering Committee of the National Data Service.

Downloads

Published

2019-01-08

Issue

Section

Research Papers

Categories