A Scalable Online Visual Analytic System for Big Climate Data Analysis





Explore big climate data in SOVAS,
get results in minutes.

Go to SOVAS Web Portal

Why and What is SOVAS?

Big climate data offers great opportunities for scientific discovery but demands efficient and effective analytics to investigate unknown and complex patterns. Most existing online processing and analytics systems for climate studies only support fixed user interface with predefined functions. These systems are often not scalable to handle massive climate data that could easily accumulate terabytes daily. To address the major limitations of existing online systems for climate studies, a scalable online visual analytic system, known as SOVAS, is developed to balance both usability and flexibility. SOVAS, enabled by a set of key techniques, supports large-scale climate data analytics and knowledge discovery in a scalable and sharable environment. This research not only contributes to the community an efficient tool for analyzing big climate data but also contributes to the literature by providing valuable technical references for tackling spatiotemporal big data challenges.

Heterogeneous data integration

Integrated the array-based data and record-based data, both can be queired by Structured Query Language.

Parallel query analytics

Perform the data-intensive process and analysis in parallel. A hybrid query processing engine is employed in a highly scalable Hadoop cluster.

Extended SQL

The extended SQL afford users with the flexibility to perform complex spatial data analytical functions based on the original datasets.

Query Demonstration






Raster Data: NASA MERRA


MERRA is a re-analysis of global climate observation data for the satellite era. Focusing on the historical analyses of the hydrological cycle on a broad range of weather and climate time scales, MERRA is a project of the NASA Global Modeling and Assimilation Office at NASA Goddard Space Flight Center (GSFC). Three representative MERRA data products (~2 terabytes) with different spatial, temporal, and dimensions are loaded in the current system.

Point Data: Meteorological Station Measurements


The Global Historical Climatology Network -Daily (GHCN-Daily, Menne et al., 2012) is downloaded from NOAA National Climatic Data Center (http://doi.org/10.7289/V5D21VHZ). The GHCN-Daily dataset contains about 2.5 billion measurements for over 137 variables (such as precipitation, snowfall, maximum and minimum temperature) from more than 100,725 meteorological stations worldwide with the time spanning from 1763 to 2016.

Supported Spatiotemporal Functions


 Spatial Functions  

Aggregation performed spatially across the study area, producing a single value. These functions are often used together with temporal functions.

Temporal Functions

Aggregation performed temporally along the timestamps for each cell location, producing a single grid. These functions are usually used with GROUP BY keyword and combined with other functions, such Avg_S and Render, to produce the result.

Neighbour Functions

For supporting spatial neighbourhood analysis. It uses a moving window to define the neighbourhood for each cell in the grid, and all cells contained in the neighbourhood are summarized using the statistical function to obtain a statistic

Arithmetic Functions

There are two types of Arithmetic-Functions. One type takes two grids as the input and a new grid is generated with each cell value obtained from the arithmetic operations of the two corresponding input cells.

 Logical Functions  

Returns a new grid with each cell value equals to the boolean result(1 for true, 0 for false) of the logical comparison. The logical operators can be applied to either two raster grids or one grid and a primitive value.

Convert Functions

Convert the data from one format(e.g., grid) to another format(e.g., image). Currently, two functions are supported: GridToPoints and Render (used to convert a grid to an image).

ESRI Hive Spatial UDF

SCOVAS supports ESRI Hive Spatial User Defined Functions(UDFs). Esri Hive UDFs can be used along with SCOVAS functions to build more advanced spatial analysis queries.

REST API


SOVAS provides RESTful APIs, enabling users build their own query anltyical web applications.

REST API Demonstration

Web Portal


Feel free to try out the SOVAS Web Portal to explore the data.

SOVAS Web Portal

Contact SOVAS


Please contact Dr. Zhenlong Li with any questions or comments.

zhenlong@sc.edu
Geoinformation and Big Data Research Lab
Department of Geography, University of South Carolina


Acknowlegement: SOVAS is developed by Zhenlong Li at the Geoinformation and Big Data Research Lab(GIBD), University of South Carolina (USC). An earlier prototype of SOVAS is in part supported by the USC Office of the Vice President for Research and Federation of Earth Science Information Partners(ESIP).