Finding the ideal data set for a research endeavor can be challenging when your organization generates and collects as much data as the US National Aeronautics and Space Administration (NASA).
The agency generates an enormous amount of data every day from its seven operating centers, nine research facilities, and more than 18,000 employees. It stores this data in more than 30 science data repositories that cover five topical areas: astrophysics, heliophysics, biological science, physical science, earth science, and planetary science. Across 128 data sources, the agency holds more than 88,000 datasets and 715,000 documents in total. By 2025, its earth science data alone is projected to reach 250 petabytes. To navigate through all of this complexity, scientists require more than simply domain expertise.
Researchers must be aware of which repository to access and what is contained therein, according to NASA data scientist Kaylin Bugbee of the Marshall Space Flight Center in Huntsville, Alabama. “You need to understand data as well as science.”
A report based on a series of interviews with scientists was produced by NASA’s Science Mission Directorate (SMD) in 2019. The report indicated that scientists required a centralized search capability in order to locate the data they required. The purpose of the SMD is to interact with the scientific community in the United States, support scientific endeavors, and conduct spaceflight, balloon, and aviation research projects to explore Earth orbit and beyond. As a result of that report, SMD established the Open Source Science Initiative (OSSI) in an attempt to make publicly financed scientific research transparent, inclusive, accessible, and reproducible. SMD recognized that granting scientists and researchers access to its data was essential to its mission. The open sharing of software, data, and knowledge (including articles, documents, algorithms, and auxiliary material) from the earliest stage of the scientific process is the goal of the Open Science Society (OSSI).
According to Bugbee, “it truly originated from the scientific community and scientists, and it also aligns with our broader SMD priority of enabling interdisciplinary science.” “New discoveries are made there.”
The agency is now using generative AI in conjunction with neural nets to help scientists access these enormous amounts of data in order to support that aim.