#6 From Big Data to Smart Data: Big Data Analytics on Simulation Datasets
By Heidi Dahl, PhD, Research Scientist, SINTEF
Generating and analysing big datasets is not a new challenge in science and engineering. From seismic surveys, e.g., of potential oil fields, through flow simulations in water power plants, to meteorological models: inadequate storage space and computational resources have always been central limitations for the size and complexity of the problems to be solved.
Processing power and storage space have become cheaper and more efficient. Companies such as Google and Facebook have developed infrastructures and algorithms exploiting this to extract valuable information from enormous data sets. These Big Data tools have to a large extent focused on large amounts of textual data, such as customer databases, financial data, and personal status updates, where the information can easily be divided into manageable chunks.
Such tools are now also used in the fields of science and engineering, where they enable the analysis of larger and more complex systems through distributed storage and processing. However, the intrinsic structure of these data brings new challenges, and necessitates the adaption of existing Big Data methods and infrastructures.
On the other hand, the spatial and physical structures in our data provide additional information which should be exploited when mining these datasets for information. By incorporating these aspects of our data, we move from Big Data to Smart Data, making it easier to extract general trends as well as local features. For large simulation datasets, we approach the challenge of integrating these diverse aspects into one model by using Locally Refined (LR) splines, generating a compact model which is well-suited for visualization and interrogation. LR splines adapts its data structure and degrees of freedom to the distribution of the data, focusing its power on areas with large local variation, while adopting a coarser structure where a lower level of detail is needed.
In this presentation we will share our experiences from two recently completed European Big Data projects, IQmulus and VELaSSCo, showing how LR spline models of simulation data enables high-quality visualization and interaction with the data. We will also outline our current and future research in the area of Big Data analytics on simulation datasets, incorporating LR splines, statistical modelling, and techniques from the field of Artificial Intelligence.