Across the last decades, technological improvements have drastically changed the way we collect, explore and analyze data. Heterogeneous sources, different formats, and huge quantities of data have defined new ways to extract information from data and understand them. So far, new visualization tools and data analysis techniques have been developed to understand and highlight data correlations.
Nevertheless, data visualisation remains a challenging topic. Complex phenomena, such as temperature evolution, pollution spread, natural disasters, require very heterogeneous and complex data structures that need a first exploration phase to be better understod before analysis. Therefore, to achieve meaningful results in a limited amount of time, a new generation of visualisation techniques must be provided to scientists. For example, how can datasets of images be summarised? How can .json data be depicted? How can data from different sources be compared? Is it possible to define new relevant statistics?
In this challenge, we want to answer the above questions and more. We want you and your team to extend traditional graphs and visualisation tools to more sophisticated tools and create an instrument that is able to help data scientists and researchers to better characterise, explore, understand and summarise data from heterogeneous sources and formats, for explorative purposes.
In particular, the expected result is a dashboard (web-application) that allows users to:
1. upload one dataset of one single file, (i.e. a csv) or one dataset made of multiple files (i.e. images), or multiple datasets related to each other (i.e. temperature as .csv and geographical granularity as .geojson)
2. display visualisations and statistics:
a. if the dataset consists in one single file is uploaded
i. display a preview of the file. For example with a .csv format a table is normally used. What would happen for .json format? And for .pdf? For .png? It is up to you to define the best previews based on the format of the dataset.
ii. define and display data summary, meaningful statistics and visualizations based on the file format
b. if dataset consists in multiple files
i. display the preview of each file (as in 2.a.i). Is there a way to display one overall preview for the multiple files? How can you show if there exist connections among these files? Can you infer it with a (semi)-automatic procedure?
ii. define and visualize meaningful statistics and graphs for each dataset (as explained above 2.a.ii) and do the same for the overall information of all files, highlighting the relations between them
Datasets can be found and downloaded from Zenodo.org. Here some example datasets you can use (but not limited to)
Information for students: It would be useful to know state-of-the-art visualisation tools for (big) data, design skills and web developer background
Additional information: Is the challenge linked to a particular project/company? Yes, it is linked with CERN Science for Open Data (CS4OD) project