Prof. Dr. Claus Weihs

Computational Statistics
TU Dortmund University


  • Development and Implementation of Statistical Methods for Quality Optimization in the Large-Format Lithium-Ion Cells Production
    Meyer, O. and Weihs, C. and Mähr, S. and Tran, H.-Y. and Kirchhof, M. and Schnackenberg, S. and Neuhaus-Stern, J. and Rößler, S. and Braunwarth, W.
    Energy Technology 8 (2020)
    Herein, two techniques to optimize the production process of large-format lithium-ion cells for plug-in hybrid electric vehicles using data-driven methods are introduced and demonstrated. The first approach uses standard settings of the quality influencing factors to maximize the number of produced electrode sheets that meet predefined quality specifications. The second approach uses statistical methods to determine the levels of the quality influencing factors of a certain process that optimizes all quality parameters of the corresponding product jointly. © 2019 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
    view abstract10.1002/ente.201900244
  • Multi-Criteria Optimization in the Production of Lithium-Ion Batteries
    Kornas, T. and Wittmann, D. and Daub, R. and Meyer, O. and Weihs, C. and Thiede, S. and Herrmann, C.
    Procedia Manufacturing 43 (2020)
    Lithium-ion-batteries (LIBs) play a key role in determining the environmental impacts of future mobility technologies. In particular, the production of LIBs has a strong environmental impact as it is characterized by high scrap rates. In addition to existing expert-based approaches for the identification of quality drivers in production, a trend towards data-driven methods is discernible. Nevertheless, most approaches show shortcomings in the involvement of multi-criteria optimization. Therefore, this paper uses desirability functions to jointly optimize several quality parameters. Validation was conducted based on the data of an assembly line for prismatic LIBs. © 2020 The Authors. Published by Elsevier B.V.
    view abstract10.1016/j.promfg.2020.02.113
  • Optimal node subset with minimal graph kernel prediction error in an electrical transmission system via evolutionary algorithms
    Surmann, D. and Ligges, U. and Weihs, C.
    Electric Power Systems Research 175 (2019)
    Electrical Transmission Systems consist of a huge number of nodes with different types of measurements available. Our aim is to derive a subset of nodes which contains almost sufficient information to describe the voltage magnitude and angle at all nodes of the whole energy network. The objective is to minimise the data transmission from the nodes of an energy network to a data analysis centre by using a representative subset of nodes. In consequence we optimise the data analyse task. We derive a parameter set which characterises every single measuring node, on the basis of Low Frequency Oscillation data. Via analysing the behaviour of each node with respect to its neighbours, we construct a feasible random field metamodel over the whole transmission system. The metamodel works in a discrete spatial domain with a non-isotropic distance function. We derive a statistic to evaluate the metamodel using information from a subset of measuring nodes. Using an evolutionary algorithm, we optimise the selected subset with respect to the target statistic. This results in an optimal subset of nodes which represents the network regarding voltage magnitude and angle sufficiently well. © 2019 Elsevier B.V.
    view abstract10.1016/j.epsr.2019.105915
  • Predicting industrial-scale cell culture seed trains–A Bayesian framework for model fitting and parameter estimation, dealing with uncertainty in measurements and model parameters, applied to a nonlinear kinetic cell culture model, using an MCMC method
    Hernández Rodríguez, T. and Posch, C. and Schmutzhard, J. and Stettner, J. and Weihs, C. and Pörtner, R. and Frahm, B.
    Biotechnology and Bioengineering 116 (2019)
    For production of biopharmaceuticals in suspension cell culture, seed trains are required to increase cell number from cell thawing up to production scale. Because cultivation conditions during the seed train have a significant impact on cell performance in production scale, seed train design, monitoring, and development of optimization strategies is important. This can be facilitated by model-assisted prediction methods, whereby the performance depends on the prediction accuracy, which can be improved by inclusion of prior process knowledge, especially when only few high-quality data is available, and description of inference uncertainty, providing, apart from a “best fit”-prediction, information about the probable deviation in form of a prediction interval. This contribution illustrates the application of Bayesian parameter estimation and Bayesian updating for seed train prediction to an industrial Chinese hamster ovarian cell culture process, coppled with a mechanistic model. It is shown in which way prior knowledge as well as input uncertainty (e.g., concerning measurements) can be included and be propagated to predictive uncertainty. The impact of available information on prediction accuracy was investigated. It has been shown that through integration of new data by the Bayesian updating method, process variability (i.e., batch-to-batch) could be considered. The implementation was realized using a Markov chain Monte Carlo method. © 2019 The Authors. Biotechnology and Bioengineering Published by Wiley Periodicals, Inc.
    view abstract10.1002/bit.27125
  • A new dynamic weighted majority control chart for data streams
    Mejri, D. and Limam, M. and Weihs, C.
    Soft Computing 22 (2018)
    Dynamics are fundamental properties of batch learning processes. Recently, monitoring dynamic processes has interested many researchers due to the importance of dealing with time-changing data stream processes in real-world applications. In this article, a dynamic weighted majority (DWM)-based identification model is proposed for monitoring small, large as well as covariate shifts in nonstationary processes. The proposed method applies DWM ensemble method to aggregate decisions of different control charts to improve single charts’ performances and to reduce the risk of choosing a nonadequate chart. Also in order to improve the shift adaptation mode, a prediction of class label is used to help in classifying the shift during the changing of the process toward the approximated right direction. The new proposed ensemble chart has the ability to deal with complex datasets and presents a concrete shift identification method based on a classification learning technique of changes in nonstationary processes. © 2016 Springer-Verlag Berlin Heidelberg
    view abstract10.1007/s00500-016-2351-3
  • Predicting measurements at unobserved locations in an electrical transmission system
    Surmann, D. and Ligges, U. and Weihs, C.
    Computational Statistics 33 (2018)
    Electrical transmission systems consist of a huge number of locations (nodes) with different types of measurements available. Our aim is to derive a subset of nodes which contains almost sufficient information to describe the whole energy network. We derive a parameter set which characterises every single measuring location or node, respectively. Via analysing the behaviour of each node with respect to its neighbours, we construct a feasible random field metamodel over the whole transmission system. The metamodel is used to smooth the measurements across the network. In the next step we work with a subset of locations to predict the unobserved ones. We derive different graph kernels to define the missing covariance matrix from the neighbourhood structures of the network. This results in a metamodel that is able to smooth observed and predict unobserved locations in a spatial domain with non-isotropic distance functions. © 2017, Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/s00180-017-0734-2
  • Workload of German professors in 2016 [Arbeitszeiten von Professorinnen und Professoren in Deutschland 2016]
    Weihs, C. and Hernández Rodríguez, T. and Doeckel, M. and Marty, C. and Wormer, H.
    AStA Wirtschafts- und Sozialstatistisches Archiv 12 (2018)
    In this study, we determine reliable prediction intervals for the weekly total workload of active German full-time university professors from a 2016 survey and prior information from earlier studies. Additionally, also workloads for subtasks are determined. The aim to develop a detailed image of the workload of different subtasks and subject groups is combined with the methodological question whether frequentist and Bayesian analyses lead to similar results in this example. From the valid questionnaires, a mean of 56 h and 95 %-prediction intervals within [ 35, 80 ] h arise as direct estimates of the weekly total workload. Frequentist and Bayesian analysis lead to similar results, subject groups and sexes differ only slightly. Total workload estimated as the sum of workloads of subtasks reaches a significantly higher mean of 63 h and distinctly different 95 %-prediction intervals in the Bayesian case with [ 43, 85 ] h and in the frequentist case with approximately [ 27, 114 ] h. Therefore, measurements of the total workload from independently determined workloads of subtasks only appear to be reliable if a Bayesian analysis with prior information on the total workload is carried out since sums of independent parts of the workload appear to be bigger than direct estimates of the total workload, in the mean as well as in variation. Possible reasons are insufficient distinction between subtasks, missing overview over the total workload already specified when no counter is shown during completion of the questionnaire, or underestimation of the real total workload when only a global estimate is asked for. The share of research related activities of the total workload is with approximately 60 % distinctly higher than the share of teaching and student examination with 23 % and the share of administration with 17 %. The greatest significant differences of subject groups appear between Humanities and Social Sciences and one of the other subject groups, for the total workload as well as for workloads for subtasks. The difference between the mean total workload of female and male professors appears to be small. © 2018, Springer-Verlag GmbH Germany, part of Springer Nature.
    view abstract10.1007/s11943-018-0227-y
  • A computational study of auditory models in music recognition tasks for normal-hearing and hearing-impaired listeners
    Friedrichs, K. and Bauer, N. and Martin, R. and Weihs, C.
    Eurasip Journal on Audio, Speech, and Music Processing 2017 (2017)
    The benefit of auditory models for solving three music recognition tasks—onset detection, pitch estimation, and instrument recognition—is analyzed. Appropriate features are introduced which enable the use of supervised classification. The auditory model-based approaches are tested in a comprehensive study and compared to state-of-the-art methods, which usually do not employ an auditory model. For this study, music data is selected according to an experimental design, which enables statements about performance differences with respect to specific music characteristics. The results confirm that the performance of music classification using the auditory model is comparable to the traditional methods. Furthermore, the auditory model is modified to exemplify the decrease of recognition rates in the presence of hearing deficits. The resulting system is a basis for estimating the intelligibility of music which in the future might be used for the automatic assessment of hearing instruments. © 2017, The Author(s).
    view abstract10.1186/s13636-017-0103-7
  • Spectral complexity reduction of music signals based on frequency-domain reduced-rank approximations: An evaluation with cochlear implant listeners
    Nagathil, A. and Weihs, C. and Neumann, K. and Martin, R.
    Journal of the Acoustical Society of America 142 (2017)
    Methods for spectral complexity reduction of music signals were evaluated in a listening test with cochlear implant (CI) listeners. To this end, reduced-rank approximations were computed in the constant-Q spectral domain using blind and score-informed dimensionality reduction techniques, which were compared to a procedure using a supervised source separation and remixing scheme. Previous works have shown that timbre and pitch cues are transmitted inaccurately through CIs and thus cause perceptual distortions in CI listeners. Hence, the scope of this evaluation was narrowed down to classical chamber music, which is mainly characterized by timbre and pitch and less by rhythmic cues. Suitable music pieces were selected in accordance to a statistical experimental design, which took musically relevant influential factors into account. In a blind two-alternative forced choice task, 14 CI listeners were asked to indicate a preference either for the original signals or a specific processed variant. The results exhibit a statistically significant preference rate of up to 74% for the reduced-rank approximations, whereas the source separation and remixing scheme did not provide any improvement. © 2017 Acoustical Society of America.
    view abstract10.1121/1.5000484
  • Statistical analysis of sequential process chains based on Errors-in-Variables models
    Meyer, O. and Weihs, C.
    2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016 (2017)
    A process chain comprises a series of sequential (production) processes, mostly in the area of manufacturing engineering. It describes a consecutive sequence of activities, which together form one single system. Within this system the sub-processes are presumed to influence each other by transferring characteristics. The single process steps of such a system can easily be simulated using regression (or other statistical learning) methods. The main obstacle in simulating entire process chains, however, is to determine how to handle prediction uncertainty in the transferred characteristics. In this paper, we will describe how using Error-in-Variable models instead of ordinary regression models can solve this problem. We especially focus on the question how uncertainty (measured by variance) develops through the process chain and its influences on the results along the process chain. We will also discuss how the presented methods can be applied to the field of process control. At this point, our research is mainly limited to polynomial regression, but the basic principals can be applied to other statistical learning techniques, including classification and time series as well. © 2016 IEEE.
    view abstract10.1109/SSCI.2016.7850058
  • A comparative study on large scale kernelized support vector machines
    Horn, D. and Demircioğlu, A. and Bischl, B. and Glasmachers, T. and Weihs, C.
    Advances in Data Analysis and Classification (2016)
    Kernelized support vector machines (SVMs) belong to the most widely used classification methods. However, in contrast to linear SVMs, the computation time required to train such a machine becomes a bottleneck when facing large data sets. In order to mitigate this shortcoming of kernel SVMs, many approximate training algorithms were developed. While most of these methods claim to be much faster than the state-of-the-art solver LIBSVM, a thorough comparative study is missing. We aim to fill this gap. We choose several well-known approximate SVM solvers and compare their performance on a number of large benchmark data sets. Our focus is to analyze the trade-off between prediction error and runtime for different learning and accuracy parameter settings. This includes simple subsampling of the data, the poor-man’s approach to handling large scale problems. We employ model-based multi-objective optimization, which allows us to tune the parameters of learning machine and solver over the full range of accuracy/runtime trade-offs. We analyze (differences between) solvers by studying and comparing the Pareto fronts formed by the two objectives classification error and training time. Unsurprisingly, given more runtime most solvers are able to find more accurate solutions, i.e., achieve a higher prediction accuracy. It turns out that LIBSVM with subsampling of the data is a strong baseline. Some solvers systematically outperform others, which allows us to give concrete recommendations of when to use which solver. © 2016 Springer-Verlag Berlin Heidelberg
    view abstract10.1007/s11634-016-0265-7
  • Big data classification - Aspects on many features
    Weihs, C.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9580 (2016)
    In this paper we discuss the performance of classical classification methods on Big Data.We concentrate on the case with many more features than observations and discuss the dependence of classification methods on the distance of the classes and their behavior for many noise features. The examples in this paper show that standard classification methods should be rechecked for Big Data. © Springer International Publishing Switzerland 2016.
    view abstract10.1007/978-3-319-41706-6_6
  • Big data classification: Aspects on many features and many observations
    Weihs, C. and Horn, D. and Bischl, B.
    Studies in Classification, Data Analysis, and Knowledge Organization (2016)
    In this paper we discuss the performance of classical classification methods on Big Data.We distinguish the cases many features and many observations. For the many features case we look at projection methods, distance-based methods, and feature selection. For the many observations case we mainly consider subsampling. The examples in this paper show that standard classification methods should not be blindly applied to Big Data. © Springer International Publishing Switzerland 2016.
    view abstract10.1007/978-3-319-25226-1_10
  • Fast model based optimization of tone onset detection by instance sampling
    Bauer, N. and Friedrichs, K. and Bischl, B. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2016)
    There exist several algorithms for tone onset detection, but finding the best one is a challenging task, as there are many categorical and numerical parameters to optimize. The aim of this task is to detect as many true onsets as possible while avoiding false detections. In recent years, model-based optimization (MBO) has been introduced for solving similar problems. The main idea of MBO is modeling the relationship between parameter settings and the response by a socalled surrogate model. After evaluating the points of an initial design—each point represents here one possible algorithmconfiguration—themain idea is a loop of two steps: firstly, updating a surrogate model, and secondly, proposing a new promising point for evaluation. While originally this technique has been developed mainly for numerical parameters, here, it needs to be adapted for optimizing categorical parameters as well. Unfortunately, optimization steps are very time-consuming, since the evaluation of each new point has to be performed on a large data set of music instances for getting realistic results. Nevertheless, many bad configurations could be rejected much faster, since their expected performance might appear to be very low after evaluating them on just a small partition of instances. Hence, the basic idea is to evaluate each proposed point on a small sample and only evaluate on the whole data set if the results seem to be promising. © Springer International Publishing Switzerland 2016.
    view abstract10.1007/978-3-319-25226-1_39
  • Model based optimization of a statistical simulation model for single diamond grinding
    Herbrandt, S. and Ligges, U. and Ferreira, M.P. and Kansteiner, M. and Biermann, D. and Tillmann, W. and Weihs, C.
    Computational Statistics (2016)
    We present a model for simulating normal forces arising during a grinding process in cement for single diamond grinding. Assuming the diamond to have the shape of a pyramid, a very fast calculation of force and removed volume can be achieved. The basic approach is the simulation of the scratch track. Its triangle profile is determined by the shape of the diamond. The approximation of the scratch track is realized by stringing together polyhedra. Their sizes depend on both the actual cutting depth and an error implicitly describing the material brittleness. Each scratch track part can be subdivided into three three-dimensional simplices for a straightforward calculation of the removed volume. Since the scratched mineral subsoil is generally inhomogeneous, the forces at different positions of the workpiece are expected to vary. This heterogeneous nature is considered by sampling from a Gaussian random field. To achieve a realistic outcome the model parameters are adjusted applying model based optimization methods. A noisy Kriging model is chosen as surrogate to approximate the deviation between modelled and observed forces. This deviation is minimized and the results of the modelled forces and the actual forces from conducted experiments are rather similar. © 2016 Springer-Verlag Berlin Heidelberg
    view abstract10.1007/s00180-016-0669-z
  • Monitoring a dynamicweighted majority method based on datasets with concept drift
    Mejri, D. and Limam, M. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2016)
    Monitoring changes during a learning process is an interesting area of research in several online applications. The most important problem is how to detect and explain these changes so that the performance of the learning model can be controlled and maintained. Ensemble methods have perfectly coped with concept drift. This paper presents an online classification ensemble method designed for concept drift entitled dynamic weighted majority (DWM) algorithm. It adds and removes experts based on their performance and adjusts learner’s weights taking into account their age in the ensemble as well as their historical correct prediction. The idea behind this paper is to monitor the classification error rates of DWM based on a time adjusting control chart which adjusts the control limits each time an adjustment condition is satisfied. Moreover, this paper handles datasets with concept drift and analyzes the impact of the diversity of base classifiers, noises, permutations and number of batches. Experiments tested with ANOVA and confirmed by Tukey’s test have shown that monitoring classification errors with DWM algorithm has a perfect reaction capacity to different types of concept drift. © Springer International Publishing Switzerland 2016.
    view abstract10.1007/978-3-319-25226-1_21
  • Optimization of a simulation for inhomogeneous mineral subsoil machining
    Herbrandt, S. and Weihs, C. and Ligges, U. and Ferreira, M. and Rautert, C. and Biermann, D. and Tillmann, W.
    Studies in Classification, Data Analysis, and Knowledge Organization (2016)
    For the new generation of concrete which enables more stable constructions, we require more efficient tools. Since the preferred tool for machining concrete is a diamond impregnated drill with substantial initial investment costs, the reduction of tool wear is of special interest. The stochastic character of the diamond size, orientation, and position in sintered segments, as well as differences in the machined material, justifies the development of a statistically motivated simulation. In the simulations presented in the past, workpiece and tool are subdivided by Delaunay tessellations into predefined fragments. The heterogeneous nature of the ingredients of concrete is solved by Gaussian random fields. Before proceeding with the simulation of the whole drill core bit, we have to adjust the simulation parameters for the two main components of the drill, diamond and metal matrix, by minimizing the discrepancy between simulation results and the conducted experiments. Due to the fact that our simulation is an expensive black box function with stochastic outcome, we use the advantages of model based optimization methods. © Springer International Publishing Switzerland 2016.
    view abstract10.1007/978-3-319-25226-1_41
  • Spectral complexity reduction of music signals for mitigating effects of cochlear hearing loss
    Nagathil, A. and Weihs, C. and Martin, R.
    IEEE/ACM Transactions on Audio Speech and Language Processing 24 (2016)
    In this paper we study reduced-rank approximations of music signals in the constant-Q spectral domain as a means to reduce effects stemming from cochlear hearing loss. The rationale behind computing reduced-rank approximations is that they allow to reduce the spectral complexity of a music signal. The method is motivated by studies with cochlear implant listeners which have shown that solo instrumental music or music remixed at higher signal-to-interference ratios are preferred over complex music ensembles or orchestras. For computing the reduced-rank approximations we investigate methods based on principal component analysis and partial least squares analysis, and compare them to source separation algorithms. The strategies, which are applied to music with a predominant leading voice, are compared in terms of their ability for mitigating effects of simulated reduced frequency selectivity and with respect to source signal distortions. Established instrumental measures and a newly developed measure indicate a considerable reduction of the auditory distortion resulting from cochlear hearing loss. Furthermore, a listening test reveals a significant preference for the reduced-rank approximations in terms of melody clarity and ease of listening. ©2016 IEEE.
    view abstract10.1109/TASLP.2015.2511623
  • Analyzing the BBOB results by means of benchmarking concepts
    Mersmann, O. and Preuss, M. and Trautmann, H. and Bischl, B. and Weihs, C.
    Evolutionary Computation 23 (2015)
    We presentmethods to answer two basic questions that arise when benchmarking optimization algorithms. The first one is: which algorithm is the “best” one? and the second one is: which algorithm should I use for my real-world problem? Both are connected and neither is easy to answer. We present a theoretical framework for designing and analyzing the raw data of such benchmark experiments. This represents a first step in answering the aforementioned questions. The 2009 and 2010 BBOB benchmark results are analyzed by means of this framework and we derive insight regarding the answers to the two questions. Furthermore, we discuss how to properly aggregate rankings from algorithm evaluations on individual problems into a consensus, its theoretical background and which common pitfalls should be avoided. Finally, we address the grouping of test problems into sets with similar optimizer rankings and investigate whether these are reflected by already proposed test problem characteristics, finding that this is not always the case. © 2015 by the Massachusetts Institute of Technology.
    view abstract10.1162/EVCO_a_00134
  • Automatic model selection for high-dimensional survival analysis
    Lang, M. and Kotthaus, H. and Marwedel, P. and Weihs, C. and Rahnenführer, J. and Bischl, B.
    Journal of Statistical Computation and Simulation 85 (2015)
    Many different models for the analysis of high-dimensional survival data have been developed over the past years. While some of the models and implementations come with an internal parameter tuning automatism, others require the user to accurately adjust defaults, which often feels like a guessing game. Exhaustively trying out all model and parameter combinations will quickly become tedious or infeasible in computationally intensive settings, even if parallelization is employed. Therefore, we propose to use modern algorithm configuration techniques, e.g. iterated F-racing, to efficiently move through the model hypothesis space and to simultaneously configure algorithm classes and their respective hyperparameters. In our application we study four lung cancer microarray data sets. For these we configure a predictor based on five survival analysis algorithms in combination with eight feature selection filters. We parallelize the optimization and all comparison experiments with the BatchJobs and BatchExperiments R packages. © 2014, © 2014 Taylor & Francis.
    view abstract10.1080/00949655.2014.929131
  • Clustering of electrical transmission systems based on network topology and stability
    Krey, S. and Brato, S. and Ligges, U. and Götze, J. and Weihs, C.
    Journal of Statistical Computation and Simulation 85 (2015)
    A proper understanding and modelling of the behaviour of heavily loaded large-scale electrical transmission systems is essential for a secure and uninterrupted operation. In this paper, we present methods to cluster electrical power networks based on different criteria into regions. These regions are useful for the efficient modelling of large transcontinental electricity networks, switching operation decisions or placement of redundant parts of the monitoring and control system. In alternating current electricity networks, power oscillations are normal, but they can become dangerous if they build up. The first approach uses the correlation between results of a stability assessment for these oscillations at every node for the cluster criterion. The second method concentrates on the network topology and uses spectral clustering on the network graph to create clusters where all nodes are interconnected. In this work, we also discuss the problem how to choose the right number of clusters and how the discussed clustering methods can be used for an efficient modelling of large electricity networks or in protection and control systems. © 2014, © 2014 Taylor & Francis.
    view abstract10.1080/00949655.2014.924517
  • Impact of frame size and instrumentation on chroma-based automatic chord recognition
    Stoller, D. and Mauch, M. and Vatolkin, I. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 48 (2015)
    This paper presents a comparative study of classification performance in automatic audio chord recognition based on three chroma feature implementations, with the aim of distinguishing effects of frame size, instrumentation, and choice of chroma feature. Until recently, research in automatic chord recognition has focused on the development of complete systems. While results have remarkably improved, the understanding of the error sources remains lacking. In order to isolate sources of chord recognition error, we create a corpus of artificial instrument mixtures and investigate (a) the influence of different chroma frame sizes and (b) the impact of instrumentation and pitch height. We show that recognition performance is significantly affected not only by the method used, but also by the nature of the audio input. We compare these results to those obtained from a corpus of more than 200 real-world pop songs from The Beatles and other artists for the case in which chord boundaries are known in advance © Springer-Verlag Berlin Heidelberg 2015.
    view abstract10.1007/978-3-662-44983-7_36
  • Interpretability of music classification as a criterion for evolutionary multi-objective feature selection
    Vatolkin, I. and Rudolph, G. and Weihs, C.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9027 (2015)
    The development of numerous audio signal characteristics led to an increase of classification performance for automatic categorisation of music audio recordings. Unfortunately, models built with such low-level descriptors lack of interpretability. Musicologists and listeners can not learnmusically meaningful properties of genres, styles, composers, or personal preferences. On the other side, there are new algorithms for the mining of interpretable features from music data: instruments, moods and melodic properties, tags and meta data from the social web, etc. In this paper, we propose an approach how evolutionary multi-objective feature selection can be applied for a systematic maximisation of interpretability without a limitation to the usage of only interpretable features. We introduce a simple hypervolume based measure for the evaluation of trade-off between classification performance and interpretability and discuss how the results of our study may help to search for particularly relevant high-level descriptors in future. © Springer International Publishing Switzerland 2015.
    view abstract10.1007/978-3-319-16498-4_21
  • Model-based multi-objective optimization: Taxonomy, multi-point proposal, toolbox and benchmark
    Horn, D. and Wagner, T. and Biermann, D. and Weihs, C. and Bischl, B.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9018 (2015)
    Within the last 10 years, many model-based multi-objective optimization algorithms have been proposed. In this paper, a taxonomy of these algorithms is derived. It is shown which contributions were made to which phase of the MBMO process. A special attention is given to the proposal of a set of points for parallel evaluation within a batch. Proposals for four different MBMO algorithms are presented and compared to their sequential variants within a comprehensive benchmark. In particular for the classic ParEGO algorithm, significant improvements are obtained. The implementations of all algorithm variants are organized according to the taxonomy and are shared in the open-source R package mlrMBO. © Springer International Publishing Switzerland 2015
    view abstract10.1007/978-3-319-15934-8_5
  • Validation of ICT-based protection and control applications in electric power systems
    Kubis, A. and Robitzky, L. and Kuech, M. and Muller, S.C. and Jablkowski, B. and Georg, H. and Dorsch, N. and Krey, S. and Langesberg, C. and Surmann, D. and Mayorga, D. and Rehtanz, C. and Hager, U. and Spinczyk, O. and Wietfeld, C. and Weihs, C. and Ligges, U. and Myrzik, J. and Gotze, J.
    2015 IEEE Eindhoven PowerTech, PowerTech 2015 (2015)
    The use of Information and Communication Technology (ICT)-based power system applications increases continually which poses new engineering challenges regarding the development, validation and management of both - the applications and the intertwined infrastructures. In this paper the need for a joint analysis of power and ICT systems for evaluating smart grid applications is discussed and a systematic validation approach is proposed. After reviewing state of the art validation techniques, a newly developed Wide-Area Monitoring, Protection and Control (WAMPAC) system is introduced. Its extensive use of wide-area communication and the combination of centralized and decentralized decision making stress the complexity of such a cyber-physical system, where the interdependency between the power system and the ICT domains are challenging to validate. Deduced from these requirements, a validation concept is proposed that comprises (i) the usage of a comprehensive smart grid reference model, (ii) a systematic and objectively verifiable generation of scenarios, and (iii) a single and multi-domain validation process using analytical, simulative and experimental techniques. For the latter, a composition of analyses using co-simulation, Hardware-in-the-Loop (HiL) simulations and an empirical test bed is outlined. © 2015 IEEE.
    view abstract10.1109/PTC.2015.7232644
  • Analysis and Modeling of Complex Data in Behavioral and Social Sciences
    Vicari, D. and Okada, A. and Ragozini, G. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 49 (2014)
    view abstract10.1007/978-3-319-06692-9
  • Benchmarking classification algorithms on high-performance computing clusters
    Bischl, B. and Schiffner, J. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 47 (2014)
    Comparing and benchmarking classification algorithms is an important topic in applied data analysis. Extensive and thorough studies of such a kind will produce a considerable computational burden and are therefore best delegated to high-performance computing clusters. We build upon our recently developed R packages BatchJobs (Map, Reduce and Filter operations from functional programming for clusters) and BatchExperiments (Parallelization and management of statistical experiments). Using these two packages, such experiments can now effectively and reproducibly be performed with minimal effort for the researcher. We present benchmarking results for standard classification algorithms and study the influence of pre-processing steps on their performance. © Springer International Publishing Switzerland 2014.
    view abstract10.1007/978-3-319-01595-8_3
  • Modelling low frequency oscillations in an electrical system
    Surmann, D. and Ligges, U. and Weihs, C.
    ENERGYCON 2014 - IEEE International Energy Conference (2014)
    Due to market integration, energy trading and the stronger weighting on renewable energies, the European electrical transmission system operates increasingly close to its operational limits. A potential of the usable bandwidth is a part of the energy that oscillates with low frequency through the electrical transmission system. Our analysis leads to a new model which uses connected mechanical harmonic oscillators to describe the low frequency in the transmission system. The verification of this system of differential equations is done by comparison to a well established and much more complex simulation system used at the ie3 of TU Dortmund University. The derived model is capable to process data from a Co-Simulator without any data preparation, which is an important requirement to link the statistical analysis in an on-line environment. © 2014 IEEE.
    view abstract10.1109/ENERGYCON.2014.6850482
  • MOI-MBO: Multiobjective infill for parallel model-based optimization
    Bischl, B. and Wessing, S. and Bauer, N. and Friedrichs, K. and Weihs, C.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8426 LNCS (2014)
    The aim of this work is to compare different approaches for parallelization in model-based optimization. As another alternative aside from the existing methods, we propose using a multi-objective infill criterion that rewards both the diversity and the expected improvement of the proposed points. This criterion can be applied more universally than the existing ones because it has less requirements. Internally, an evolutionary algorithm is used to optimize this criterion. We verify the usefulness of the approach on a large set of established benchmark problems for black-box optimization. The experiments indicate that the new method's performance is competitive with other batch techniques and single-step EGO. © 2014 Springer International Publishing.
    view abstract10.1007/978-3-319-09584-4_17
  • Music genre prediction by low-level and high-level characteristics
    Vatolkin, I. and Rötter, G. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 47 (2014)
    For music genre prediction typically low-level audio signal features from time, spectral or cepstral domains are taken into account. Another way is to use community-based statistics such as Last.FM tags. Whereas the first feature group often can not be clearly interpreted by listeners, the second one lacks in erroneous or not available data for less popular songs. We propose a two-level approach combining the specific advantages of the both groups: at first we create high-level descriptors which describe instrumental and harmonic characteristics of music content, some of them derived from low-level features by supervised classification or from analysis of extended chroma and chord features. The experiments show that each categorization task requires its own feature set. © Springer International Publishing Switzerland 2014.
    view abstract10.1007/978-3-319-01595-8_46
  • Recognition of musical instruments in intervals and chords
    Eichhoff, M. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 47 (2014)
    Recognition of musical instruments in pieces of polyphonic music given as mp3- or wav-files is a difficult task because the onsets are unknown. Using source-filter models for sound separation is one approach. In this study, intervals and chords played by instruments of four families of musical instruments (strings, wind, piano, plucked strings) are used to build statistical models for the recognition of the musical instruments playing them by using the four high-level audio feature groups Absolute Amplitude Envelope (AAE), Mel-Frequency Cepstral Coefficients (MFCC) windowed and not-windowed as well as Linear Predictor Coding (LPC) to take also physical properties of the instruments into account (Fletcher, The physics of musical instruments, 2008). These feature groups are calculated for consecutive time blocks. Statistical supervised classification methods such as LDA, MDA, Support Vector Machines, Random Forest, and Boosting are used for classification together with variable selection (sequential forward selection). © Springer International Publishing Switzerland 2014.
    view abstract10.1007/978-3-319-01595-8_36
  • Statistical comparison of classifiers for multi-objective feature selection in instrument recognition
    Vatolkin, I. and Bischl, B. and Rudolph, G. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 47 (2014)
    Many published articles in automatic music classification deal with the development and experimental comparison of algorithms—however the final statements are often based on figures and simple statistics in tables and only a few related studies apply proper statistical testing for a reliable discussion of results and measurements of the propositions’ significance. Therefore we provide two simple examples for a reasonable application of statistical tests for our previous study recognizing instruments in polyphonic audio. This task is solved by multi-objective feature selection starting from a large number of up-to-date audio descriptors and optimization of classification error and number of selected features at the same time by an evolutionary algorithm. The performance of several classifiers and their impact on the pareto front are analyzed by means of statistical tests. © Springer International Publishing Switzerland 2014.
    view abstract10.1007/978-3-319-01595-8_19
  • Statistical process modelling for machining of inhomogeneous mineral subsoil
    Weihs, C. and Raabe, N. and Ferreira, M. and Rautert, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 46 (2014)
    Because in the machining process of concrete, tool wear and production time are very cost sensitive factors, the adaption of the tools to the particular machining processes is of major importance. We show how statistical methods can be used to model the influences of the process parameters on the forces affecting the workpiece as well as on the chip removal rate and the wear rate of the used diamond. Based on these models a geometrical simulation model can be derived which will help to determine optimal parameter settings for specific situations. As the machined materials are in general abrasive, usual discretized simulation methods like finite elements models can not be applied. Hence our approach is another type of discretization subdividing both material and diamond grain into Delaunay tessellations and interpreting the resulting micropart connections as predetermined breaking points. Then, the process is iteratively simulated and in each iteration the interesting entities are computed. © Springer International Publishing Switzerland 2014.
    view abstract10.1007/978-3-319-01264-3_22
  • Support vector machines on large data sets: Simple parallel approaches
    Meyer, O. and Bischl, B. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 47 (2014)
    Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521–528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy.We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy. © Springer International Publishing Switzerland 2014.
    view abstract10.1007/978-3-319-01595-8_10
  • The most dangerous districts of Dortmund
    Beige, T. and Terhorst, T. and Weihs, C. and Wormer, H.
    Studies in Classification, Data Analysis, and Knowledge Organization 47 (2014)
    In this paper the districts of Dortmund, a big German city, are ranked concerning their level of risk to be involved in an offence. In order to measure this risk the offences reported by police press reports in the year 2011 (Presseportal, http://​www.​presseportal.​de/​polizeipresse/​pm/​4971/​polizei-dortmund?​start=​0, 2011) were analyzed and weighted by their maximum penalty corresponding to the German criminal code. The resulting danger index was used to rank the districts. Moreover, the socio-demographic influences on the different offences are studied. The most probable influences appear to be traffic density (Sierau, Dortmunderinnen und Dortmunder unterwegs—Ergebnisse einer Befragung von Dortmunder Haushalten zu Mobilität und Mobilitätsverhalten, Ergebnisbericht, Dortmund-Agentur/Graphischer Betrieb Dortmund 09/2006, 2006) and the share of older people. Also, the inner city parts appear to be much more dangerous than the outskirts of the city of Dortmund. However, can these results be trusted? Following the press office of Dortmund’s police, offences might not be uniformly reported by the districts to the office and small offences like pick-pocketing are never reported in police press reports. Therefore, this case could also be an example how an unsystematic press policy may cause an unintended bias in the public perception and media awareness. © Springer International Publishing Switzerland 2014.
    view abstract10.1007/978-3-319-01595-8_2
  • Tone onset detection using an auditory model
    Bauer, N. and Friedrichs, K. and Kirchhoff, D. and Schiffner, J. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization 47 (2014)
    Onset detection is an important step for music transcription and other tasks frequently encountered in music processing. Although several approaches have been developed for this task, neither of them works well under all circumstances. In Bauer et al. (Einfluss der Musikinstrumente auf die Güte der Einsatzzeiterkennung, 2012) we investigated the influence of several factors like instrumentation on the accuracy of onset detection. In this work, this investigation is extended by a computational model of the human auditory periphery. Instead of the original signal the output of the simulated auditory nerve fibers is used. The main challenge here is combining the outputs of all auditory nerve fibers to one feature for onset detection. Different approaches are presented and compared. Our investigation shows that using the auditory model output leads to essential improvements of the onset detection rate for some instruments compared to previous results. © Springer International Publishing Switzerland 2014.
    view abstract10.1007/978-3-319-01595-8_34
  • Auralization of auditory models
    Friedrichs, K. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2013)
    Computational auditory models describe the transformation from acoustic signals into spike firing rates of the auditory nerves by emulating the signal transductions of the human auditory periphery. The inverse approach is called auralization, which can be useful for many tasks, such as quality measuring of signal transformations or reconstructing the hearing of impaired listeners. There have been few successful attempts to auditory inversion each of which deal with relatively simple auditory models. In recent years more comprehensive auditory models have been developed which simulate nonlinear effects in the human auditory periphery. Since for this kind of models an analytical inversion is not possible, we propose an auralization approach using statistical methods. © Springer-Verlag Berlin Heidelberg 2013.
    view abstract10.1007/978-3-642-28894-4-27
  • Benchmarking local classification methods
    Bischl, B. and Schiffner, J. and Weihs, C.
    Computational Statistics 28 (2013)
    In recent years in the fields of statistics and machine learning an increasing amount of so called local classification methods has been developed. Local approaches to classification are not new, but have lately become popular. Well-known examples are the k nearest neighbors method and classification trees. However, in most publications on this topic the term "local" is used without further explanation of its particular meaning. Only little is known about the properties of local methods and the types of classification problems for which they may be beneficial. We explain the basic principles and introduce the most important variants of local methods. To our knowledge there are very few extensive studies in the literature that compare several types of local methods and global methods across many data sets. In order to assess their performance we conduct a benchmark study on real-world and synthetic tasks. We cluster data sets and considered learning algorithms with regard to the obtained performance structures and try to relate our theoretical considerations and intuitions to these results. We also address some general issues of benchmark studies and cover some pitfalls, extensions and improvements. © 2013 Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/s00180-013-0420-y
  • Comparison of classical and sequential design of experiments in note onset detection
    Bauer, N. and Schiffner, J. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2013)
    Design of experiments is an established approach to parameter optimization of industrial processes. In many computer applications however it is usual to optimize the parameters via genetic algorithms. The main idea of this work is to apply design of experiment's techniques to the optimization of computer processes. The major problem here is finding a compromise between model validity and costs, which increase with the number of experiments. The second relevant problem is choosing an appropriate model, which describes the relationship between parameters and target values. One of the recent approaches here is model combination. In this paper a musical note onset detection algorithm will be optimized using design of experiments. The optimal algorithm parameter setting is sought in order to get the best onset detection accuracy.We try different design strategies including classical and sequential designs and compare several model combination strategies. © Springer International Publishing Switzerland 2013.
    view abstract10.1007/978-3-319-00035-0-51
  • Computational prediction of high-level descriptors of music personal categories
    Rötter, G. and Vatolkin, I. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2013)
    Digital music collections are often organized by genre relationships or personal preferences. The target of automatic classification systems is to provide a music management limiting the listener's effort for the labeling of a large number of songs.Many state-of-the art methods utilize low-level audio features like spectral and time domain characteristics, chroma etc. for categorization. However the impact of these features is very hard to understand; if the listener labels some music pieces as belonging to a certain category, this decision is indeed motivated by instrumentation, harmony, vocals, rhythm and further high-level descriptors from music theory. So it could be more reasonable to understand a classification model created from such intuitively interpretable features. For our study we annotated high-level characteristics (vocal alignment, tempo, key etc.) for a set of personal music categories. Then we created classification models which predict these characteristics from low-level audio features available in the AMUSE framework. The capability of this set of low level features to classify the expert descriptors is investigated in detail. © Springer International Publishing Switzerland 2013.
    view abstract10.1007/978-3-319-00035-0-54
  • Identification of risk factors in coronary bypass surgery
    Schiffner, J. and Godehardt, E. and Hillebrand, S. and Albert, A. and Lichtenberg, A. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2013)
    In quality improvement in medical care one important aim is to prevent complications after a surgery and, particularly, keep the mortality rate as small as possible. Therefore it is of great importance to identify which factors increase the risk to die in the aftermath of a surgery. Based on data of 1,163 patients who underwent an isolated coronary bypass surgery in 2007 or 2008 at the Clinic of Cardiovascular Surgery in Düsseldorf, Germany, we select predictors that affect the in-hospital-mortality. A forward search using the wrapper approach in conjunction with simple linear and also more complex classification methods such as gradient boosting and support vector machines is performed. Since the classification problem is highly imbalanced with certainly unequal but unknown misclassification costs the area under ROC curve (AUC) is used as performance criterion for hyperparameter tuning as well as for variable selection. In order to get stable results and to obtain estimates of the AUC the variable selection is repeated 25 times on different subsamples of the data set. It turns out that simple linear classification methods (linear discriminant analysis and logistic regression) are suitable for this problem since the AUC cannot be considerably increased by more complex methods. We identify the three most important predictors as the severity of cardiac insufficiency, the patient's age as well as pulmonary hypertension. A comparison with full models trained on the same 25 subsamples shows that the classification performance in terms of AUC is not affected or only slightly decreased by variable selection. © Springer International Publishing Switzerland 2013.
    view abstract10.1007/978-3-319-00035-0-29
  • Indicator-based selection in evolutionary multiobjective optimization algorithms based on the desirability index
    Trautmann, H. and Wagner, T. and Biermann, D. and Weihs, C.
    Journal of Multi-Criteria Decision Analysis 20 (2013)
    In multiobjective optimization, the identification of practically relevant solutions on the Pareto-optimal front is an important research topic. Desirability functions (DFs) allow the preferences of the decision maker to be specified in an intuitive way. Recently, it has been shown for continuous optimization problems that an a priori transformation of the objectives by means of DFs can be used to focus the search of a hypervolume-based evolutionary algorithm on the desired part of the front. In many-objective optimization, however, the computational complexity of the hypervolume can become a crucial part. Thus, an alternative to this approach will be presented in this paper. The new algorithm operates in the untransformed objective space, but the desirability index (DI), that is, a DF-based scalarization, will be used as the second-level selection criterion in the non-dominated sorting. The diversity and uniform distribution of the resulting approximation are ensured by the use of an external archive. In the experiments, different preferences are specified as DFs, and their effects are investigated. It is shown that trade-off solutions are generated in the desired regions of the Pareto-optimal front and with a density adaptive to the DI. The efficiency of the approach with respect to increasing objective space dimension is also analysed using scalable test functions. The convergence speed is superior to other set-based and preference-based evolutionary multiobjective algorithms while the approach is of low computational complexity due to cheap DI evaluations. © 2013 John Wiley & Sons, Ltd.
    view abstract10.1002/mcda.1503
  • Multi-objective optimization of hard turning of AISI 6150 using PCA-based desirability index for correlated objectives
    Wonggasem, K. and Wagner, T. and Trautmann, H. and Biermann, D. and Weihs, C.
    Procedia CIRP 12 (2013)
    The turning process, one of the most popular material removal processes in industry, has several performance measures which are usually found to be correlated, such as tool wear, cutting force and surface finish. In order to apply optimization methods, such as the desirability index, the conditional independence assumption is usually made. However, this assumption rarely holds true in real world applications and the optimal solution obtained might be biased towards the performance measures which have strong positive correlations with the others. Despite the fact that the desirability index has been developed and frequently applied in industry for a long time, only a few studies have been carried out to solve optimization problems with correlated objectives. The modified desirability index which provides a solution for integrating the expert's preferences and the correlation information of the performance measures into the overall performance index, the principal component analysis (PCA) based desirability index (DI), has been only recently developed. In this paper, an optimization using the PCA-based DI is demonstrated based on empirical models of hard turning of AISI 6150 steel in which uncertainties are propagated by model errors. The results show that the degree of importance of each performance measure has been adjusted by the integration of the covariance information into the overall performance index.
    view abstract10.1016/j.procir.2013.09.004
  • On parameters optimization of dynamic weighted majority algorithm based on genetic algorithm
    Mejri, D. and Limam, M. and Weihs, C.
    2013 5th International Conference on Modeling, Simulation and Applied Optimization, ICMSAO 2013 (2013)
    Dynamic weighted majority-Winnow (DWM-WIN) algorithm of [5] is a powerful classification method for non-stationary environments which copes with concept drifting data streams. DWM-WIN parameters setting in a training process impacts on the classification accuracy. Unfortunately, these parameters are randomly chosen and without any rational selection. The objective of this research study is to optimize the choice of these parameters. We use genetic algorithm (GA) of [6] as an optimization method in order to dynamically search for the best parameter values of DWM-WIN and improve the classification accuracy. To assess this optimized DWM-WIN algorithm, DWM-WIN is used as a fitness function in the GA. Based on 4 datasets from UCI data sets repository, simulations have shown that the proposed DWM-WIN-GA outperforms existing classification methods. © 2013 IEEE.
    view abstract10.1109/ICMSAO.2013.6552722
  • Piano and guitar tone distinction based on extended feature analysis
    Eichhoff, M. and Vatolkin, I. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2013)
    In this work single piano and guitar tones are distinguished by means of various features of the music time series. In a first study, three different kinds of high-level features andMFCC are taken into account to classify the piano and guitar tones. The features are called high-level because they try to reflect the physical structure of a musical instrument on temporal and spectral levels. In our study, three spectral features and one temporal feature are used for the classification task. The spectral features characterize the distribution of overtones, the temporal feature the energy of a tone. In a second study as many low level and the high level features as possible proposed in the literature are combined for the classification task. © Springer-Verlag Berlin Heidelberg 2013.
    view abstract10.1007/978-3-642-28894-4-26
  • A case study on the use of statistical classification methods in particle physics
    Weihs, C. and Mersmann, O. and Bischl, B. and Fritsch, A. and Trautmann, H. and Karbach, T.M. and Spaan, B.
    Studies in Classification, Data Analysis, and Knowledge Organization (2012)
    Current research in experimental particle physics is dominated by high profile and large scale experiments. One of the major tasks in these experiments is the selection of interesting or relevant events. In this paper we propose to use statistical classification algorithms for this task. To illustrate our method we apply it to an Monte-Carlo (MC) dataset from the BaBar experiment. One of the major obstacles in constructing a classifier for this task is the imbalanced nature of the dataset. Only about 0.5% of the data are interesting events. The rest are background or noise events. We show how ROC curves can be used to find a suitable cutoff value to select a reasonable subset of a stream for further analysis. Finally, we estimate the CP asymmetry of the decay using the samples extracted by the classifiers. © 2012 Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/978-3-642-24466-7-8
  • Bias-variance analysis of local classification methods
    Schiffner, J. and Bischl, B. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2012)
    In recent years an increasing amount of so called local classification methods has been developed. Local approaches to classification are not new. Well-known examples are the k nearest neighbors method and classification trees (e.g. CART). However, the term 'local' is usually used without further explanation of its particular meaning, we neither know which properties local methods have nor for which types of classification problems they may be beneficial. In order to address these problems we conduct a benchmark study. Based on 26 artificial and real-world data sets selected local and global classification methods are analyzed in terms of the bias-variance decomposition of the misclassification rate. The results support our intuition that local methods exhibit lower bias compared to global counterparts. This reduction comes at the price of an only slightly increased variance such that the error rate in total may be improved. © 2012 Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/978-3-642-24466-7-6
  • EBSD-Orientation analysis of monocrystalline diamonds used for diamond metal composites - Influence of sample preparation
    Tillmann, W. and Biermann, D. and Weihs, C. and Ferreira, M. and Rautert, C. and Raabe, N.
    Materialwissenschaft und Werkstofftechnik 43 (2012)
    This paper focuses on a new field of application for the EBSD-technique. Generally, EBSD-mappings are performed on different metal alloys used for quality assurance and to get information about the microstructure regarding grain orientation, grain size and distribution. In contrast, the orientation determination of monocrystalline diamond grains with an EBSD system is not a conventional method. Thus, this work describes the EBSD testing sequence in detail and illustrates the preparation of orientation data for a statistical design. Furthermore, dependencies of the sample preparation, alignment to the detector, and the analyzed position on the diamond on the quality of the Kikuchi-patterns, respectively on the indexing rates, have been scrutinized. Finally, the orientation obtained of each tested diamond sample has been utilized in a statistical design to show a direct influence of the crystal orientation on the wear behavior of the diamond grains. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
    view abstract10.1002/mawe.201200954
  • Kernel k-means clustering based local support vector domain description fault detection of multimodal processes
    Khediri, I.B. and Weihs, C. and Limam, M.
    Expert Systems with Applications 39 (2012)
    The multimodal and nonlinear structure of a system makes process modeling and control quite complex. To monitor processes that have these characteristics, this paper presents a procedure based on kernel techniques for unsupervised learning that are able to separate different nonlinear process modes and to effectively detect faults. These techniques are named Kernel k-means (KK-means) clustering and support vector domain description (SVDD). In order to assess this monitoring strategy two different simulation studies as well as a real case study of an Etch Metal process are performed. Results show that the proposed control chart provides efficient fault detection performance with reduced false alarm rates. © 2011 Elsevier Ltd. All rights reserved.
    view abstract10.1016/j.eswa.2011.07.045
  • Multi-objective evolutionary feature selection for instrument recognition in polyphonic audio mixtures
    Vatolkin, I. and Preuß, M. and Rudolph, G. and Eichhoff, M. and Weihs, C.
    Soft Computing 16 (2012)
    Instrument recognition is one of the music information retrieval research topics. This task becomes very challenging if several instruments are played simultaneously because of their varying physical characteristics: inharmonic attack noise, energy development during attack-decay-sustain-release envelope or overtone distribution. In our framework, we treat instrument detection as a machine-learning task based on a large amount of preprocessed audio features with target to build classification models. Since classification algorithms are very sensitive to feature input and the optimal feature set differs from instrument to instrument, we propose to run a multi-objective feature selection procedure before building of classification models. Two objectives are considered for evaluation: classification mean-squared error and feature rate (smaller amount of features stands for reduced costs and decreased risk of overfitting). The analysis of the extensive experimental study confirms that application of an evolutionary multi-objective algorithm is a good choice to optimize feature selection for music instrument identification. © 2012 Springer-Verlag.
    view abstract10.1007/s00500-012-0874-9
  • Musical instrument recognition by high-level features
    Eichhoff, M. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2012)
    In this work different high-level features and MFCC are taken into account to classify single piano and guitar tones. The features are called high-level because they try to reflect the physical structure of a musical instrument on temporal and spectral levels. Three spectral features and one temporal feature are used for the classification task. The spectral features characterize the distribution of overtones and the temporal feature the energy of a tone. After calculating the features for each tone classification by statistical methods is carried out. Variable selection is used and an interpretation of the selected variables is presented. © 2012 Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/978-3-642-24466-7-38
  • Process monitoring using an online nonlinear data reduction based control chart
    Khediri, I.B. and Weihs, C.
    Frontiers in Statistical Quality Control 10 (2012)
    Recent advances of multivariate Statistical Process Control (SPC) show that the introduction of Principal ComponentAnalysis (PCA) methods for reduction of process data is a promising area in system monitoring and fault diagnosis. The advantage of these techniques is to identify sets of variables which describe the key variations of the operating data and which allow process handling and control based on a reduced number of charts. However, because the basic PCA method stipulates that relationships between process characteristics are linear, the application of such techniques to nonlinear systems that undergo many changes has been limited in many real cases. In order to overcome this issue, some recent studies suggested the use of nonlinear adaptive PCA methods in order to track process variation and detect abnormal events at early stages. For this reason, this study develops and analyses an online Kernel PCA chart as a key technique to model nonlinear systems and to monitor the evolution of non-stationary processes. Results based on an analysis of a simulated process show that the control chart is robust and provides a reduced rate of false alarms with high fault detection abilities. © Springer-Verlag Berlin Heidelberg 2012.
    view abstract10.1007/978-3-7908-2846-7-7
  • Resampling methods for meta-model validation with recommendations for evolutionary computation
    Bischl, B. and Mersmann, O. and Trautmann, H. and Weihs, C.
    Evolutionary Computation 20 (2012)
    Meta-modeling has become a crucial tool in solving expensive optimization problems. Much of the work in the past has focused on finding a good regression method to model the fitness function. Examples include classical linear regression, splines, neural networks, Kriging and support vector regression. This paper specifically draws attention to the fact that assessing model accuracy is a crucial aspect in the meta-modeling framework. Resampling strategies such as cross-validation, subsampling, bootstrapping, and nested resampling are prominent methods for model validation and are systematically discussed with respect to possible pitfalls, shortcomings, and specific features. A survey of meta-modeling techniques within evolutionary optimization is provided. In addition, practical examples illustrating some of the pitfalls associated with model selection and performance assessment are presented. Finally, recommendations are given for choosing a model validation technique for a particular setting. © 2012 by the Massachusetts Institute of Technology.
    view abstract10.1162/EVCO_a_00069
  • Software in music information retrieval
    Weihs, C. and Friedrichs, K. and Eichhoff, M. and Vatolkin, I.
    Studies in Classification, Data Analysis, and Knowledge Organization (2012)
    Music Information Retrieval (MIR) software is often applied for the identification of rules classifying audio music pieces into certain categories, like e.g. genres. In this paper we compare the abilities of six MIR software packages in ten categories, namely operating systems, user interface, music data input, feature generation, feature formats, transformations and features, data analysis methods, visualization methods, evaluation methods, and further development. The overall rankings are derived from the estimated scores for the analyzed criteria. © 2012 Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/978-3-642-24466-7-43
  • Tuning and evolution of support vector kernels
    Koch, P. and Bischl, B. and Flasch, O. and Bartz-Beielstein, T. and Weihs, C. and Konen, W.
    Evolutionary Intelligence 5 (2012)
    Kernel-based methods like Support Vector Machines (SVM) have been established as powerful techniques in machine learning. The idea of SVM is to perform a mapping from the input space to a higher-dimensional feature space using a kernel function, so that a linear learning algorithm can be employed. However, the burden of choosing the appropriate kernel function is usually left to the user. It can easily be shown that the accuracy of the learned model highly depends on the chosen kernel function and its parameters, especially for complex tasks. In order to obtain a good classification or regression model, an appropriate kernel function in combination with optimized pre- and post-processed data must be used. To circumvent these obstacles, we present two solutions for optimizing kernel functions: (a) automated hyperparameter tuning of kernel functions combined with an optimization of pre- and post-processing options by Sequential Parameter Optimization (SPO) and (b) evolving new kernel functions by Genetic Programming (GP). We review modern techniques for both approaches, comparing their different strengths and weaknesses. We apply tuning to SVM kernels for both regression and classification. Automatic hyperparameter tuning of standard kernels and pre- and post-processing options always yielded to systems with excellent prediction accuracy on the considered problems. Especially SPO-tuned kernels lead to much better results than all other tested tuning approaches. Regarding GP-based kernel evolution, our method rediscovered multiple standard kernels, but no significant improvements over standard kernels were obtained. © 2012 Springer-Verlag.
    view abstract10.1007/s12065-012-0073-8
  • Advanced concepts for multi-objective evolutionary optimization in aircraft industry
    Naujoks, B. and Trautmann, H. and Weihs, C. and Wessing, S.
    Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering 225 (2011)
    Evolutionary (multi-objective optimization) algorithms (EMOAs) are widely accepted to be competitive optimization methods in industry today. However, normally only standard techniques are employed by the engineering experts. Here, it is shown how these standard techniques can be completed and improved with respect to interactivity to other tools, runtime, and parameterization. The coupling with metamodels serves as an example for the interactivity to other tools, while the online convergence detection relates to runtime, i.e. stopping criteria. Finally, sequential parameter optimization improves results focussing on parameter tuning. We show that invoking all these methods on their own already enhances EMOAs for aerodynamic applications. It is concluded with an outlook on how these methods might come together to foster aerospace applications and, at a time, widen the application area to multi-disciplinary optimization tasks. © 2011, SAGE Publications. All rights reserved.
    view abstract10.1177/0954410011414120
  • Exploratory landscape analysis
    Mersmann, O. and Bischl, B. and Trautmann, H. and Preuss, M. and Weihs, C. and Rudolph, G.
    Genetic and Evolutionary Computation Conference, GECCO'11 (2011)
    Exploratory Landscape Analysis (ELA) subsumes a number of techniques employed to obtain knowledge about the properties of an unknown optimization problem, especially insofar as these properties are important for the performance of optimization algorithms. Where in a first attempt, one could rely on high-level features designed by experts, we approach the problem from a different angle here, namely by using relatively cheap low-level computer generated features. Interestingly, very few features are needed to separate the BBOB problem groups and also for relating a problem to high-level, expert designed features, paving the way for automatic algorithm selection. Copyright 2011 ACM.
    view abstract10.1145/2001576.2001690
  • Huge music archives on mobile devices
    Blume, H. and Bischl, B. and Botteck, M. and Igel, C. and Martin, R. and Roetter, G. and Rudolph, G. and Theimer, W. and Vatolkin, I. and Weihs, C.
    IEEE Signal Processing Magazine 28 (2011)
    The availability of huge nonvolatile storage capacities such as flash memory allows large music archives to be maintained even in mobile devices. With the increase in size, manual organization of these archives and manual search for specific music becomes very inconvenient. Automated classification makes it possible for the user to organize the available music archives according to different categories, which can be either predefined or user defined, enabling a better overview of these databases. © 2006 IEEE.
    view abstract10.1109/MSP.2011.940880
  • Linear dimension reduction in classification: Adaptive procedure for optimum results
    Luebke, K. and Weihs, C.
    Advances in Data Analysis and Classification 5 (2011)
    Linear dimension reduction plays an important role in classification problems. A variety of techniques have been developed for linear dimension reduction to be applied prior to classification. However, there is no single definitive method that works best under all circumstances. Rather a best method depends on various data characteristics. We develop a two-step adaptive procedure in which a best dimension reduction method is first selected based on the various data characteristics, which is then applied to the data at hand. It is shown using both simulated and real life data that such a procedure can significantly reduce the misclassification rate. © 2011 Springer-Verlag.
    view abstract10.1007/s11634-011-0091-x
  • Tracer percentage prediction of dive reflex samplers
    Bensmann, S. and Lockow, E. and Walzel, P. and Weihs, C.
    Powder Technology 208 (2011)
    Instead of the frequently applied monochromatic light probes a whie light fibre optic system was employed at the Laboratory of Mechanical Process Design, TU Dortmund University, in order to exploit the color in formations for concentration measurements within bulk solid. The system is applied to obtain local particle concentrations of blue- and red-colored quartz sand within the bed of a rotary drum. 16 solid mixtures with one or two particle sizes from 100 μm to 2000 μm and different species concentration were analyzed and the relationship between probe measurement values and red sand content was determined by statistical regression methods. After transformation of the data, linear models were found to derive the red sand content from given measurement values. Based thereupon, an all-purpose scheme for mono- and bi-disperse solid mixtures was developed and verified in an example with a mean error of 5%. © 2010 Elsevier B.V.
    view abstract10.1016/j.powtec.2010.12.004
  • Variable window adaptive Kernel Principal Component Analysis for nonlinear nonstationary process monitoring
    Khediri, I.B. and Limam, M. and Weihs, C.
    Computers and Industrial Engineering 61 (2011)
    On-line control of nonlinear nonstationary processes using multivariate statistical methods has recently prompt a lot of interest due to its industrial practical importance. Indeed basic process control methods do not allow monitoring of such processes. For this purpose this study proposes a variable window real-time monitoring system based on a fast block adaptive Kernel Principal Component Analysis scheme. While previous adaptive KPCA models allow only handling of one observation at a time, in this study we propose a way to fast update or downdate the KPCA model when a block of data is provided and not only one observation. Using a variable window size procedure to determine the model size and adaptive chart parameters, this model is applied to monitor two simulated benchmark processes. A comparison of performances of the adopted control strategy with various Principal Component Analysis (PCA) control models shows that the derived strategy is robust and yields better detection abilities of disturbances. © 2011 Elsevier Ltd. All rights reserved.
    view abstract10.1016/j.cie.2011.02.014
  • Analysis of Polyphonic Musical Time Series
    Sommer, K. and Weihs, C.
    Advances in Data Analysis, Data Handling and Business Intelligence (2010)
    A general model for pitch tracking of polyphonic musical time series will be introduced. Based on a model of Davy and Godsill (Bayesian harmonic models for musical pitch estimation and analysis, Technical Report 431, Cambridge University Engineering Department, 2002) Davy and Godsill (2002) the different pitches of the musical sound are estimated with MCMC methods simultaneously. Additionally a preprocessing step is designed to improve the estimation of the fundamental frequencies (A comparative study on polyphonic musical time series using MCMC methods. In C. Preisach et al., editors, Data Analysis, Machine Learning, and Applications, Springer, Berlin, 2008). The preprocessing step compares real audio data with an alphabet constructed from the McGill Master Samples (Opolko and Wapnick, McGill University Master Samples [Compact disc], McGill University, Montreal, 1987) and consists of tones of different instruments. The tones with minimal ltakura-Saito distortion (Gray et al., Transactions on Acoustics, Speech, and Signal Processing ASSP-28(4):367-376, 1980) are chosen as first estimates and as starting points for the MCMC algorithms. Furthermore the implementation of the alphabet is an approach for the recognition of the instruments generating the musical time series. Results are presented for mixed monophonic data from McGill and for self recorded polyphonic audio data.
    view abstract10.1007/978-3-642-01044-6_39
  • Benchmarking evolutionary multiobjective optimization algorithms
    Mersmann, O. and Trautmann, H. and Naujoks, B. and Weihs, C.
    2010 IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 IEEE Congress on Evolutionary Computation, CEC 2010 (2010)
    Choosing and tuning an optimization procedure for a given class of nonlinear optimization problems is not an easy task. One way to proceed is to consider this as a tournament, where each procedure will compete in different 'disciplines'. Here, disciplines could either be different functions, which we want to optimize, or specific performance measures of the optimization procedure. We would then be interested in the algorithm that performs best in a majority of cases or whose average performance is maximal. We will focus on evolutionary multiobjective optimization algorithms (EMOA), and will present a novel approach to the design and analysis of evolutionary multiobjective benchmark experiments based on similar work from the context of machine learning. We focus on deriving a consensus among several benchmarks over different test problems and illustrate the methodology by reanalyzing the results of the CEC 2007 EMOA competition. © 2010 IEEE.
    view abstract10.1109/CEC.2010.5586241
  • Control charts based on models derived from differential equations
    Weihs, C. and Messaoud, A. and Raabe, N.
    Quality and Reliability Engineering International 26 (2010)
    The development of technical processes over time can often be adequately modelled by means of differential equations. In order to monitor such processes, control charts may be derived from stochastic models based on such differential equations. In this work, this is demonstrated for a deep-hole drilling process used for producing holes with a high length-to-diameter ratio, good surface finish and straightness. The process is subject to dynamic disturbances classified as either chatter vibration or spiraling. For chatter, a differential equation for the drilling torque and a model known to well approximate processes with similar characteristics are used to set up monitoring procedures. For spiraling a control chart can be based on a statistical model for the spectrum of the structure-born vibrations derived from a differential equation for the deflection of the boring bar. © 2010 John Wiley & Sons, Ltd.
    view abstract10.1002/qre.1134
  • Desirability-based multi-criteria optimisation of HVOF spray experiments
    Kopp, G. and Baumann, I. and Vogli, E. and Tillmann, W. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2010)
    The reduction of the powder grain size is of key interest in the thermal spray technology to produce superfine structured cermet coatings. Due to the low specific weight and a high thermal susceptibility of such fine powders, the use of appropriate process technologies and optimised process settings are required. Experimental design and the desirability index are employed to find optimal settings of a high velocity oxygen fuel (HVOF) spraying process using fine powders (2-8μm). The independent factors kerosene, hydrogen, oxygen, gun velocity, stand-off distance, cooling pressure, carrier gas and disc velocity are considered in a 12-run Plackett-Burman Design, and their effects on the deposition efficiency and on the coating characteristics microhardness, porosity and roughness are estimated. Following an examination of possible 2-way interactions in a 25-1 fractional-factorial design, the three most relevant factors are analysed in a central composite design. Derringer's desirability function and the desirability index are applied to find optimal factor settings with respect to the above characteristics. All analyses are carried out with the statistics software "R". The optimisation of the desirability index is done using the R-package "desiRe". © 2010 Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/978-3-642-10745-0-90
  • Desirability-based multi-criteria optimization of HVOF spray experiments to manufacture fine structured wear-resistant 75Cr 3C 2-25(NiCr20) coatings
    Tillmann, W. and Vogli, E. and Baumann, I. and Kopp, G. and Weihs, C.
    Journal of Thermal Spray Technology 19 (2010)
    Thermal spraying of fine feedstock powders allow the deposition of cermet coatings with significantly improved characteristics and is currently of great interest in science and industry. However, due to the high surface to volume ratio and the low specific weight, fine particles are not only difficult to spray but also show a poor flowability in the feeding process. In order to process fine powders reliably and to preserve the fine structure of the feedstock material in the final coating morphology, the use of novel thermal spray equipment as well as a thorough selection and optimization of the process parameters are fundamentally required. In this study, HVOF spray experiments have been conducted to manufacture fine structured, wear-resistant cermet coatings using fine 75Cr 3C 2-25(Ni20Cr) powders (-8 + 2 μm). Statistical design of experiments (DOE) has been utilized to identify the most relevant process parameters with their linear, quadratic and interaction effects using Plackett-Burman, Fractional-Factorial and Central Composite designs to model the deposition efficiency of the process and the majorly important coating properties: roughness, hardness and porosity. The concept of desirability functions and the desirability index have been applied to combine these response variables in order to find a process parameter combination that yields either optimum results for all responses, or at least the best possible compromise. Verification experiments in the so found optimum obtained very satisfying or even excellent results. The coatings featured an average microhardness of 1004 HV 0.1, a roughness Ra = 1.9 μm and a porosity of 1.7%. In addition, a high deposition efficiency of 71% could be obtained. © 2009 ASM International.
    view abstract10.1007/s11666-009-9383-5
  • Dynamic Disturbances in BTA Deep-Hole Drilling: Modelling Chatter and Spiralling as Regenerative Effects
    Raabe, N. and Enk, D. and Biermann, D. and Weihs, C.
    Advances in Data Analysis, Data Handling and Business Intelligence (2010)
    The BTA deep-hole drilling process is a process that very often is one of the final steps in the production of expensive workpieces. For example axial bores in turbines or compressor shafts are produced with this process. A serious problem in deep-hole drilling is the formation of dynamic disturbances that may be subdivided into the most common disturbance types chatter and spiralling. Chatter shows in self-excited rotational vibrations which lead to an increased tool-wear while spiralling is governed by bending vibrations and causes holes with several lobes. Since such lobes are a severe impairment of the bore hole the formation of spiralling has to be prevented. One common explanation for the occurrence of spiralling is the intersection of time varying bending eigenfrequencies with multiples of the tool's rotational frequency. Little is known about which specific eigenfrequencies are crucial. Furthermore an underlying assumption of this explanation is, that the resulting holes in cross-sectional view are appearing as a curve with constant width. This assumption implies that spiralling results from a parallel displacement of the drill head. We disprove this assumption and show a way how stability charts for the classification between stable and unstable processes can be computed by means of simulations. These simulations result from statistical-physical models which model the disturbances chatter and spiralling as regenerative effects.
    view abstract10.1007/978-3-642-01044-6_68
  • Holonic and optimal medical decision making under uncertainty
    Al-Qaysi, I. and Othman, Z. and Unland, R. and Weihs, C. and Branki, C.
    Proceedings of 2010 IEEE EMBS Conference on Biomedical Engineering and Sciences, IECBES 2010 (2010)
    Holonic multi agent medical diagnosis system combines the advantages of the holonic paradigm, multi agent system technology, and swarm intelligence in order to realize a highly reliable, adaptive, scalable, flexible, and robust Internet based diagnosis system for diseases. This paper concentrate on the decision process within our system and will present our ideas, which are based on decision theory, and here, especially, on Bayesian probability since, among others, uncertainty is inherent feature of a medical diagnosis process. The presented approach focuses on reaching the optimal medical diagnosis with the minimum risk under the given uncertainty. Additional factors that play an important role are the required time for the decision process and the produced costs. © 2010 IEEE.
    view abstract10.1109/IECBES.2010.5742247
  • Local analysis of SNP data
    Müller, T. and Schiffner, J. and Schwender, H. and Szepannek, G. and Weihs, C. and Ickstadt, K.
    Studies in Classification, Data Analysis, and Knowledge Organization (2010)
    SNP association studies investigate the relationship between complex diseases and one's genetic predisposition through Single Nucleotide Polymorphisms. The studies provide the analyst with a wealth of data and lots of challenges as the moderate to small risk changes are hard to detect and, moreover, the interest focusses not on the identification of single influential SNPs, but of (high-order) SNP interactions. Thus, the studies usually contain more variables than observations. An additional problem arises as there might be alternative ways of developing a disease. To face the challenges of high dimension, interaction effects and local differences, we use associative classification and localised logistic regression to classify the observations into cases and controls. These methods contain great potential for the local analysis of SNP data as applications to both simulated and real-world whole-genome data show. © Springer-Verlag Berlin Heidelberg 2010.
    view abstract10.1007/978-3-642-10745-0-51
  • Local classification of discrete variables by Latent Class Models
    Bücker, M. and Szepannek, G. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2010)
    "Global" classifiers may fail to distinguish classes adequately in discrimination problems with inhomogeneous groups. Instead, local methods that consider latent subclasses can be adopted in this case. Three different models for local discrimination of categorical variables are presented in this work. They are based on Latent Class Models, which represent discrete finite mixture distributions. Therefore, they can be estimated via the EM algorithm. A corresponding model is constructed analogously to the Mixture Discriminant Analysis by class conditional Latent Class Models. Two other techniques are based on the idea of Common Components Models. Applicable model selection criteria and measures for the classification capability are suggested. In a simulation study, discriminative performance of the methods is compared to that of decision trees and the Naïve Bayes classifier. It turns out that the MDA-type classifier can be seen as a localization of the Naïve Bayes method. Additionally the procedures have been applied to a SNP data set. © Springer-Verlag Berlin Heidelberg 2010.
    view abstract10.1007/978-3-642-10745-0-13
  • Localized Logistic Regression for Categorical Influential Factors
    Schiffner, J. and Szepannek, G. and Monthe, T. and Weihs, C.
    Advances in Data Analysis, Data Handling and Business Intelligence (2010)
    In localized logistic regression (cp. Loader, Local regression and likelihood, Springer, New York, 1999; Tutz and Binder, Statistics and Computing 15:155-166, 2005) at each target point where a prediction is required a logistic regression model is fitted locally. This is achieved by weighting the training observations in the log-likelihood based on their distances to the target observation. For interval-scaled influential factors these weights usually depend on Euclidean distances. This paper aims to combine localized logistic regression with dissimilarity measures more suitable for categorical data. Categorical predictors are usually included into regression models by constructing design variables. Therefore, in principle distance measures can be defined based either on the original variables or on the design variables. In the first case matching coefficients, e.g., the simple or flexible matching coefficients, can be applied. In the second case Euclidean distances are suitable, too, since design variables can be considered interval-scaled. Localized logistic regression with the proposed dissimilarity measures is applied to a SNP data set from the GENICA breast cancer study (cp. Justenhoven et al., Cancer Epidemiology Biomarkers and Prevention 13:2059-2064,2004) in order to identify combinations of SNP variables that can be used to discriminate between cases and controls. By means of localized logistic regression one of the lowest error rates in combination with a maximal reduction of the number of predictors is achieved.
    view abstract10.1007/978-3-642-01044-6_17
  • Medical decision base self-organization system under uncertainty
    Al-Qaysi, I. and Unland, R. and Weihs, C. and Branki, C.
    ISCIT 2010 - 2010 10th International Symposium on Communications and Information Technologies (2010)
    The object of this study is to present a holonic medical diagnosis system, which unifies the advantages of decision theory under uncertainty with the efficiency, reliability, extensibility, and flexibility of the holonic multi agent system holonic paradigm. This paper also handles an important assumption in Baye's theorem. Clustering and discriminating provide method for solving dependence in symptoms problem. It builds on degree of dependency between symptoms with consequence of raising the efficiency and accuracy of the diagnosis. The idea is to transform raw symptoms of each disease into independent groups. The presented approach focuses on reaching the optimal medical diagnosis with the minimum risk under the given uncertainty. Additional factors that play an important role are the required time for the decision process and the produced costs. ©2010 IEEE.
    view abstract10.1109/ISCIT.2010.5665121
  • Medical diagnosis decision support HMAS under uncertainty HMDSuU
    Al-Qaysi, I. and Unland, R. and Weihs, C. and Branki, C.
    Studies in Computational Intelligence 326 (2010)
    Fast, reliable, and correct medical diagnostics is of utter importance in today's world where diseases can spread quickly. For this reason, we have developed a medical diagnosis system that is based on multi agent system theory, the holonic paradigm, and swarm intelligence techniques. More specifically, a huge number of comparatively simple agents form the basis of our system. In order to provide a solid medical diagnosis always a set of relevant agents needs to work together. These agents will provide a huge set of possible solutions, which need to be evaluated in order to conclude. The paradigm of swarm intelligence implies that a set of comparatively simple entities produces sophisticated and highly reliable results. In our scenario, it means that our agents are not provided with a real world model; i.e., it has only a very limited understanding on health issues and the process of medical diagnosis. This puts a huge burden on the decision process. This paper concentrate on the decision process within our system and will present our ideas, which are based on decision theory, and here, especially, on Bayesian probability since, among others, uncertainty is inherent feature of a medical diagnosis process. The presented approach focuses on reaching the optimal medical diagnosis with the minimum risk under the given uncertainty. Additional factors that play an important role are the required time for the decision process and the produced costs. © 2010 Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/978-3-642-16095-0_5
  • Medical optimal decision making based holonic multi agent system
    Esra, A. and Reiner, U. and Weihs, C. and Branki, C.
    2010 The 2nd International Conference on Computer and Automation Engineering, ICCAE 2010 3 (2010)
    This paper concentrates on the decision process based on multi-agent system theory; the holonic paradigm, and swarm intelligence techniques, Bayesian probability since, among others, uncertainty is an inherent feature of a medical diagnostic process with highly reliable results. The presented approach focuses on reaching the optimal medical diagnosis with the minimum risk under the given uncertainty. Additional factors that play an important role are the required time for the decision process and the produced costs. ©2010 IEEE.
    view abstract10.1109/ICCAE.2010.5451382
  • Medical optimal decision making under uncertainty without assuming independence of symptoms
    Al-Qaysi, I. and Unland, R. and Weihs, C. and Branki, C.
    Proceedings - 2nd International Conference on Intelligent Networking and Collaborative Systems, INCOS 2010 (2010)
    Efficiency and accuracy are imperative aspects in the world of medical diagnosis; for this reason, we have developed a medical diagnosis system based on holonic multi agent system. Holonic multi agent medical diagnosis system combines the advantages of the holonic paradigm, multi agent system technology, and swarm intelligence in order to realize a highly reliable, adaptive, scalable, flexible, and robust Internet- based diagnosis system for diseases. This paper also handles an important assumption in Baye's theorem. Clustering and discriminating provide method for solving dependence in symptoms problem. It builds on degree of dependency between symptoms with consequence of raising the efficiency and accuracy of the diagnosis. The idea is to transform raw symptoms of each disease into independent groups. Furthermore, decision making under uncertainty is the aim of our system that is able to achieve optimal medical diagnosis together with swarm technique and holonic paradigm without assuming independence of symptoms; whereas, independence of symptoms is the central and critical assumption in Bayes' theorem. Additional factors that play an important role are the required time for the decision process and the reduced costs. © 2010 IEEE.
    view abstract10.1109/INCOS.2010.34
  • On the distribution of EMOA hypervolumes
    Mersmann, O. and Trautmann, H. and Naujoks, B. and Weihs, C.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6073 LNCS (2010)
    In recent years, new approaches for multi-modal and multiobjective stochastic optimisation have been developed. It is a rather normal process that these experimental fields develop independently from other scientific areas. However, the connection between stochastic optimisation and statistics is obvious and highly appreciated. Recent works, such as sequential parameter optimisation (SPO, cf. Bartz-Beielstein [1]) or online convergence detection (OCD, cf. Trautmann et al [2]), have combined methods from evolutionary computation and statistics. One important aspect in statistics is the analysis of stochastic outcomes of experiments and optimization methods, respectively. To this end, the optimisation runs of different evolutionary multi-objective optimisation algorithms (EMOA, cf. Deb [3] or Coello Coello et al. [4]) are treated as experiments to analyse the stochastic behavior of the results and to approximate the distribution of the performance of the EMOA. To combine the outcome of an EMOA and receive a single performance indicator value, the hypervolume (HV) indicator is considered, which is the only known unary quality indicator in this field (cf. Zitzler et al. [5]). The paper at hand investigates and compares the HV indicator outcome of multiple runs of two EMOA on different mathematical test cases. © 2010 Springer-Verlag.
    view abstract10.1007/978-3-642-13800-3_34
  • Perceptually based phoneme recognition in popular music
    Szepannek, G. and Gruhne, M. and Bischl, B. and Krey, S. and Harczos, T. and Klefenz, F. and Dittmar, C. and Weihs, C.
    Studies in Classification, Data Analysis, and Knowledge Organization (2010)
    Solving the task of phoneme recognition in music sound files may help for several practical applications: it enables lyrics transcription and as a consequence could provide further relevant information for the task of an automatic song classification. Beyond it can be used for lyrics alignment e.g. in karaoke applications. The effect of both different feature signal representations as well as the choice of the appropriate classifier are investigated. Besides, a unified R framework for classifier optimization is be presented. © 2010 Springer-Verlag Berlin Heidelberg.
    view abstract10.1007/978-3-642-10745-0-83
  • Support Vector Regression control charts for multivariate nonlinear autocorrelated processes
    Khediri, I.B. and Weihs, C. and Limam, M.
    Chemometrics and Intelligent Laboratory Systems 103 (2010)
    Statistical process control charts are one of the most widely used techniques in industry and laboratories that allow monitoring of systems against faults. To control multivariate processes, most classical charts need to model process structure and assume that variables are linearly and independently distributed. This study proposes to use a nonparametric method named Support Vector Regression to construct several control charts that allow monitoring of multivariate nonlinear autocorrelated processes. Also although most statistical quality control techniques focused on detecting mean shifts, this research investigates detection of different parameter shifts. Based on simulation results, the study shows that, with a controlled robustness, the charts are able to detect the different applied disturbances. Moreover in comparison to Artificial Neural Networks control chart, the proposed charts are especially more effective in detecting faults affecting the process variance. © 2010 Elsevier B.V.
    view abstract10.1016/j.chemolab.2010.05.021
  • classification methods

  • data handling

  • decision theory

  • information analysis

  • learning systems

  • multiobjective optimization

« back