Institut de Mathématiques de Luminy

Abstract 2005-18

Ben Ishak Anis, Ghattas Badih.
An efficient method for variable selection using SVM based criteria.

Feature selection have become the focus of much research in areas of applications involving thousands of variables and often comparably few training examples. Gene expression array analysis was among the most investigated ones. The problem of feature selection for Support Vector Machines (SVMs) classification is investigated in the linear two classes case. We suggest a new method of feature selection based on ranking scores derived from SVMs. Before comparing the performances of these criteria, we establish equivalences between some of them. Then we analyze the retraining effects on the ranking rules based on these criteria. Our features selection algorithm consists in a forward selection strategy according to the decreasing order of the variables importance and it allows to simply determine how many selected features must be provided to the predictor. Finally we illustrate the effectiveness of our approach on linear synthetic data and some challenging benchmark problems in Microarray domain. Results demonstrate a significant improvement of generalization performance using a few variables.

Keywords: Support vector machines (SVMs), Feature selection, SVM-based criteria, Ranking rules, Bounds and margin sensitivity, forward selection, subset search strategy, bootstrap, Microarray data.


Last update : october 13, 2005, EL.