Big climate data offers great opportunities for scientific discovery but demands efficient and effective analytics to investigate unknown and complex patterns. Most existing online processing and analytics systems for climate studies only support fixed user interface with predefined functions. These systems are often not scalable to handle massive climate data that could easily accumulate terabytes daily. To address the major limitations of existing online systems for climate studies, a scalable online visual analytic system, known as SOVAS, is developed to balance both usability and flexibility. SOVAS, enabled by a set of key techniques, supports large-scale climate data analytics and knowledge discovery in a scalable and sharable environment. This research not only contributes to the community an efficient tool for analyzing big climate data but also contributes to the literature by providing valuable technical references for tackling spatiotemporal big data challenges.
Integrated the array-based data and record-based data, both can be queired by Structured Query Language.
Perform the data-intensive process and analysis in parallel. A hybrid query processing engine is employed in a highly scalable Hadoop cluster.
The extended SQL afford users with the flexibility to perform complex spatial data analytical functions based on the original datasets.
MERRA is a re-analysis of global climate observation data for the satellite era. Focusing on the historical analyses of the hydrological cycle on a broad range of weather and climate time scales, MERRA is a project of the NASA Global Modeling and Assimilation Office at NASA Goddard Space Flight Center (GSFC). Three representative MERRA data products (~2 terabytes) with different spatial, temporal, and dimensions are loaded in the current system.
The Global Historical Climatology Network -Daily (GHCN-Daily, Menne et al., 2012) is downloaded from NOAA National Climatic Data Center (http://doi.org/10.7289/V5D21VHZ). The GHCN-Daily dataset contains about 2.5 billion measurements for over 137 variables (such as precipitation, snowfall, maximum and minimum temperature) from more than 100,725 meteorological stations worldwide with the time spanning from 1763 to 2016.
Aggregation performed spatially across the study area, producing a single value. These functions are often used together with temporal functions.
Aggregation performed temporally along the timestamps for each cell location, producing a single grid. These functions are usually used with GROUP BY keyword and combined with other functions, such Avg_S and Render, to produce the result.
For supporting spatial neighbourhood analysis. It uses a moving window to define the neighbourhood for each cell in the grid, and all cells contained in the neighbourhood are summarized using the statistical function to obtain a statistic
There are two types of Arithmetic-Functions. One type takes two grids as the input and a new grid is generated with each cell value obtained from the arithmetic operations of the two corresponding input cells.
Returns a new grid with each cell value equals to the boolean result(1 for true, 0 for false) of the logical comparison. The logical operators can be applied to either two raster grids or one grid and a primitive value.
Convert the data from one format(e.g., grid) to another format(e.g., image). Currently, two functions are supported: GridToPoints and Render (used to convert a grid to an image).
|ESRI Hive Spatial UDF||
SCOVAS supports ESRI Hive Spatial User Defined Functions(UDFs). Esri Hive UDFs can be used along with SCOVAS functions to build more advanced spatial analysis queries.
Acknowlegement: SOVAS is developed by Zhenlong Li at the Geoinformation and Big Data Research Lab(GIBD), University of South Carolina (USC). An earlier prototype of SOVAS is in part supported by the USC Office of the Vice President for Research and Federation of Earth Science Information Partners(ESIP).