Scientific data infrastructure for combinatorial material science
Sigurd Thienhaus, Ruhr-Universität Bochum, Bochum, GermanyLars Banko, Ruhr-Universität Bochum, Bochum, GermanyAlfred Ludwig, Ruhr-Universität Bochum, Bochum, Germany
Data mining by statistical/machine learning methods is an emerging topic in material science. Advanced algorithms are able to find patterns in large datasets beyond human capabilities. Additionally, these techniques can accelerate the analyses of complex data. Combinatorial material science generates large, comparable data sets of materials libraries that are designated for data mining applications. Aggregation of those data sets within a research group or even within a certain scientific community provides the opportunity to generate knowledge based on non-trivial correlations. The basis for this approach is a solid data management which ensures a high degree of reusability by appropriate data curation. Here, we demonstrate our recent achievements in the development of a customized scientific data infrastructure. The solution consists of a commercially available, customizable document management system, a terminal server-based IT infrastructure and in-house developed software tools. The main purpose of this data infrastructure is to track all data and information about a materials library throughout the whole sample lifecycle, from experimental planning and synthesis over processing to characterization and analyses. It is demonstrated that standardization of data acquisition, pre-processing and storage promote time efficient, machine assisted data analyses. The use of terminal servers guaranties access from various devices (computers, tablets, smartphones) and operating systems (Windows, Linux, OS X, iOS, android etc.) and improves data security at the same time. Maintenance is reduced by remote applications which are easy to deploy and update. An additional benefit is the structured storage of knowledge which counteracts fast personnel cycles in university research.