Recommendations to Improve Downloads of Large Earth Observation Data

Authors

  • Rahul Ramachandran NASA Marshall Space Flight Center
  • Christopher Lynnes NASA Goddard Space Flight Center
  • Kathleen Baynes NASA Goddard Space Flight Center
  • Kevin Murphy NASA Headquarters
  • Jamie Baker Amazon Web Services
  • Jamie Kinney Amazon Web Services
  • Ariel Gold Amazon Web Services
  • Jed Sundwall Amazon Web Services
  • Mark Korver Amazon Web Services
  • Allison Lieber Google
  • William Vambenepe Google
  • Matthew Hancher Google
  • Rebecca Moore Google
  • Tyler Erickson Google
  • Josh Henretig Microsoft
  • Brant Zwiefel Microsoft
  • Heather Patrick-Ahlstrom Microsoft
  • Matthew J. Smith Microsoft

DOI:

https://doi.org/10.5334/dsj-2018-002

Keywords:

Earth Observation Data, Large Data Transfers, Cloud, Best Practices

Abstract

With the volume of Earth observation data expanding rapidly, cloud computing is quickly changing the way these data are processed, analyzed, and visualized. Collocating freely available Earth observation data on a cloud computing infrastructure may create opportunities unforeseen by the original data provider for innovation and value-added data re-use, but existing systems at data centers are not designed for supporting requests for large data transfers. A lack of common methodology necessitates that each data center handle such requests from different cloud vendors differently. Guidelines are needed to support enabling all cloud vendors to utilize a common methodology for bulk-downloading data from data centers, thus preventing the providers from building custom capabilities to meet the needs of individual vendors.

This paper presents recommendations distilled from use cases provided by three cloud vendors (Amazon, Google, and Microsoft) and are based on the vendors’ interactions with data systems at different Federal agencies and organizations. These specific recommendations range from obvious steps for improving data usability (such as ensuring the use of standard data formats and commonly supported projections) to non-obvious undertakings important for enabling bulk data downloads at scale. These recommendations can be used to evaluate and improve existing data systems for high-volume data transfers, and their adoption can lead to cloud vendors utilizing a common methodology.

Author Biography

Rahul Ramachandran, NASA Marshall Space Flight Center

Dr Rahul Ramachandran works at NASA Marshall Space Flight Center and serves as the manager for the Global Hydrology Resource Center (GHRC). GHRC is one of NASA’s twelve Distributed Active Archive Center distributing Earth Science data. His research focuses on Earth Science Informatics and investigates novel application of computational methods and information technology to the acquisition, storage, processing, interchange, analysis and visualization of Earth Science data and information. His research has covered many aspects of this field including the application of data mining algorithms to extract information from large volumes of satellite data, designing metadata specifications to improve data use, the development of online platforms to support scientific collaboration via workflows, incorporating provenance standards to a current NASA data production system, the use of emerging semantic web technology to improve search in science, and the design and development of the next generation of Earth Science data systems. He has designed software tools that are being used by other several projects including GLIDER: GUI driven tool for visualizing and mining satellite imagery and the recipient of NASA ESDSWG Software Reuse Award; Talkoot: a reusable collaborative platform for online coordination and collaboration which is used in several NASA field campaigns; Earth Science Markup Language (ESML): an XML based solution to address the data format heterogeneity problem; and Noesis: an ontology driven meta-search engine with data, information and service aggregation capability. In addition, Dr Ramachandran has over 10 years of project management experience. During this time, he led the development of large software projects including the Bioenergy Knowledge Discovery Framework (ORNL/DoE). Dr Ramachandran also formulated the ESIP Commons concept to construct a new online publication paradigm for the ESIP Federation. He has 42 peer-reviewed publications including two book chapters and over 100 other scientific publications including workshop reports. He is the Deputy Editor for the Earth Science Informatics Journal (Springer) and has been a Guest Editor for Computer and Geosciences Journal (Elsevier). He was an Adjunct Professor at the Dept. of Atmospheric Science at the University of Alabama Huntsville and taught GIS and Satellite Remote Sensing. He has been the Chair of the Information Technology and Interoperability Committee, ESIP Federation and a member of the Artificial Intelligence Applications to Environmental Science Committee, American Meteorological Society. He received the Presidential Early Career Award for Scientists and Engineers (PECASE) award in 2009.

Downloads

Published

2018-01-24

Issue

Section

Practice Papers