Tutorial: Large-Scale Array Analytics: Taming the Data Tsunami

Peter Baumann - Jacobs University

Tutorial slides


Never before in history mankind has collected data at the rates we face today. Alone in 2002, an estimated 403 Petabyte of data has been acquired, equivalent to all printed information ever created before. Earth orbiting satellites, as well as ground, airborne, and underwater sensors, space observatories scan their environment at unprecedented resolutions, giving rise to "Big Science". The same holds for the life sciences where genomic data, high-resolution scans, and other modalities are collected in steadily increasing streams. Social network analysis, OLAP, and stock exchange trading represent further examples, the latter involving real-time correlation of thousands of ticker time series resulting in Terabytes of data to be analysed per single run.

Summarized under Large-Scale Analytics we are witnessing an exploding demand for flexible access to massive volumes of scientific and business data sets. Arguably a large class of these massive data is represented by multi-dimensional arrays. Consequently, large arrays pose new challenges to data modelling, querying, optimization, and maintenance ? in short: we need Large-Scale Array Analytics.

This tutorial introduces to the topic from a database perspective. Aspects addressed include modelling, query languages, query optimization and parallelization, and storage management. High emphasis will be devoted to applications in "Big Science", particularly geo, space, and life sciences; real-life use cases will be presented and discussed which stem from our 15 years of experience with the open-source rasdaman array DBMS and our work on geo raster service standardization. We will highlight requirements, achievements, open research issues, and avenues for future research. Discussion will make use of real-life examples, many of which Internet connected participants can replay hands-on.



Peter Baumann - Center for Advanced Systems Engineering (CASE), Jacobs University

Dr. Peter Baumann is Professor of Computer Science at Jacobs University Bremen. Since 1991 he is researching on large-scale multidimensional array (?raster?) analytics and its application to geo, space, and life sciences. He has published 100+ book chapters and journal/conference articles in the areas of array databases and further fields. He is principal architect of the first complete array DBMS, rasdaman. He holds international patents on raster database technology and has received numerous national and international innovation awards for his work, such as the European IT Prize. He is founder and CEO of a research spin-off specializing in software solutions for large-scale raster services.

In the Open Geospatial Consortium (OGC) standardization body he chairs the working groups on geo raster service standards. He is editor of currently 9 adopted standards and several candidate standards. Further, he is member of the Commission for the Management and Application of Geoscience Information (a Commission of the International Union of Geological Sciences), advisor of national spatial data infrastructure bodies and expert to the European spatial data infrastructure legislation, INSPIRE.

 See www.faculty.jacobs-university.de/pbaumann for more information.