中国电子学会第十七届青年学术年会论文集
上QQ阅读APP看本书,新人免费读10天
设备和账号都新为新人

A Two-Step Feature Selection Based on Affinity Propagation and Fisher Criterion本文受国家自然科学基金项目(60872125,61001185,61171125)、教育部新世纪优秀人才支持计划、教育部重点研究项目、霍英东高等学校青年基础性研究项目、广东省自然科学基金项目(10151806001000002)和深圳市杰青项目支持。通信作者为纪震教授(jizhen@szu.edu.cn)。

Tian Tao,Ji Zhen,Zhu Zexuan

(College of Computer Science and Software Engineering,Shenzhen University,Shenzhen City Key Laboratory of Embedded System Design,Shenzhen,518060,China)

Abstract: This paper proposes a two-step feature selection algorithm for classification problems based on affinity propagation and Fisher criterion.Particularly,in the first step,the features are partitioned into clusters using affinity propagation clustering and the resultant cluster centers are chosen to form a candidate selected feature subset.Secondly the Fisher criterion based feature selection is performed on the candidate feature subset to identify the discriminative features and remove the irrelevant ones.The performance of the proposed method was evaluated on four public high dimensional datasets.The experimental results demonstrated that the proposed algorithm can significantly reduce the size of the selected feature subset without scarifying the classification accuracy,and sometimes even achieve improvement.

Key words: feature selection,affinity propagation,Fisher criterion,high dimensional,supportVector machine,SVMs

1 Introduction

With the advances of computer and database technologies,high-dimensional data sets are becoming ubiquitous in the fields of machine learning and pattern recognition.How to deal with this massive amount of data and discover meaningful information is a challenging problem for scientists and researchers.These data sets are characterized by high dimensionality but sometimes only small sample sizes,which could result in the curse of dimensionality problem.Furthermore,the data sets also involve a large number of irrelevant and redundant features which could deteriorate the learning performance and increase the computational cost.Feature selection has been widely used to address these problems.

Feature selection,also known asVariable selection,selects subsets of features that are informative to classification or other tasks.It is widely used to facilitate dataVisualization and understanding,diminish the measurement and storage requirements,reduce the training and utilization times,and to mitigate the curse of dimensionality I.Guyon,Andr,and Elisseeff,“An introduction toVariable and feature selection,” Jounal of Machine Learning Research,Vol.3,pp.1157-1182,2003..Essentially,there are two types of feature selection methods,i.e.,supervised and unsupervised feature selection I.Guyon,Andr,and Elisseeff,“An introduction toVariable and feature selection,” Jounal of Machine Learning Research,Vol.3,pp.1157-1182,2003.L.Yu,and H.Liu,“Efficient feature selectionVia analysis of relevance and redundancy,” Jounal of Machine Learning Research,Vol.5,pp.1205-1224,2004.P.Mitra,C.A.Murthy,and Sankar K.Pal,“Unsupervised Feature Selection Using Feature Similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol.24,pp.301-312,2002.Y.Saeys,I.Inza,and P.Larranaga,“A review of feature selection techniques in bioinformatics,” Bioinformatics,Vol.23,pp.2507-2517,2007..The supervised feature selection consists of filters,wrappers,and embedded methods.Filters select features based on the intrinsic characteristics of the data,regardless of the chosen leaning algorithm.In contrast,wrappers and embedded methods assess subsets of features according to a given learning algorithm.Particularly,embedded methods perform feature selection in the process of training.Generally,filter methods are simple and fast.Wrappers and embedded methods are time-consuming but give higher classification accuracy than filters.Unsupervised feature selection methods usually measure the similarity or correlation between features,base on correlation coefficient,least square regression error,maximal information compression index P.Mitra,C.A.Murthy,and Sankar K.Pal,“Unsupervised Feature Selection Using Feature Similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol.24,pp.301-312,2002.,etc.,so as to eliminate redundant features.

In this paper,we propose a novel two-step feature selection algorithm namely APFC based on unsupervised affinity propagation(AP)clustering B.J.Frey,and D.Dueck,“Clustering by passing messages between data points,” Science,Vol.315,pp.972-976,2007. and Fisher criterion O.D.Richard,E.H.Peter,and G.S.David,“Pattern Classification,Second Edition,” China Machine Press,2004,pp.117-121..Particularly,in the first step,AP is used to cluster features in terms of similarity.Features grouped in the same cluster are considered to be redundant with each other.Therefore,by selecting only the centroid features as representatives for the clusters,redundant features can be eliminated.Acquiring the candidate feature subset of all cluster centers,the Fisher criterion based filter ranking feature selection is applied in the second step to remove the irrelevant or noisy features.The performance of the proposed method is evaluated on four widely used benchmark classification data sets.

The rest of this paper is organized as follows.In Section 2,we first revisit the AP clustering algorithm and the Fisher criterion,and then describe the details of the proposed feature selection method.Empirical results are presented in Section 3 followed by conclusions in Section 4.

2 Methodologies

In this section,we begin with a brief introduction of AP and Fisher criterion.Afterward,we present the details of our method based on affinity propagation and Fisher criterion.

2.1 Affinity Propagation

AP was first proposed in B.J.Frey,and D.Dueck,“Clustering by passing messages between data points,” Science,Vol.315,pp.972-976,2007. as a message-passing clustering algorithm,which takes measures of a set of pair-wise similarities between data points as input and simultaneously considers all data points as potential exemplars.Affinity propagation tends to obtain clusters more efficiently than other state-of-the-art clustering methods such as k-centers clustering,k-means clustering,and expectation maximization algorithm inVarious real-world applications B.J.Frey,and D.Dueck,“Clustering by passing messages between data points,” Science,Vol.315,pp.972-976,2007..In this study,we use affinity propagation to group highly correlated or similar features so as to detect their redundancy.

AP iteratively clusters features based on two messages namely “responsibility” and“availability”.The responsibility denoted as r(ik)is a message sent from a feature i to the candidate cluster centroid k.Responsibility is defined to reflect the accumulated evidence for feature k to serve as the cluster center of feature i.Particularly,r(ik)is defined as:

where s(ik)is the similarity between features i and k.In this paper,we use the absolute correlation coefficient as the measure of feature similarity.k’ denotes any other candidate cluster centroids.The availability a(ik)is the message send from cluster centroid k to feature i,which evaluates the fitness of feature ito select feature k as its exemplar.The updating rule of availability is:

where i’ represents any feature excluding feature i and k.The self-availability akk)is calculated differently as akk)=Σmax(0,ri’,k)).The availabilities are initialized to zeros and in the following iterations both responsibility and availability are competitively updated based on equations(1)and(2).The procedure of AP is repeated until the number of clusters or the relativeVariation of the sum of square error is reached.After the termination of the clustering,each feature is assigned to its most similar cluster center.In each cluster,the member features are redundant to the cluster center and they could be removed if the cluster center has been selected.For more details about AP,the reader is referred to B.J.Frey,and D.Dueck,“Clustering by passing messages between data points,” Science,Vol.315,pp.972-976,2007..We perform AP clustering following the configurations suggested in B.J.Frey,and D.Dueck,“Clustering by passing messages between data points,” Science,Vol.315,pp.972-976,2007..

2.2 Fisher Criterion

The Fisher criterion is a supervised criterion.It can be used to remove the features which are noisy or irrelevant.Suppose a set of n d-dimensional samples x1,x2,…,xn with C classes D1,D2,…,Dcni in the subset Dc labeled ωii=1,2,…,c).The mean andVariance of the samples labeled ωi are mi and si,which are given by

Then,the Fisher criterion function for the rth feature can be defined as

where m(r)is the sample mean for the rth feature.The Fisher scores are calculated for all features and ranked in descending order.The top features with highestValue of λFisher are selected,while the features with smallValue which contain less discriminative information are abandoned.

2.3 Feature Selection Based on Affinity Propagation and Fisher Criterion

The motivation of the proposed two-step feature selection method APFC is to use affinity propagation clustering and Fisher based filter method to eliminate the redundant and irrelevant features,respectively.In the first step,the features are partitioned into clusters by affinity propagation clustering,and only the cluster centers are chosen as the candidate feature subset.Secondly,the Fisher criterion based method is performed on the selected cluster centers to indentify the discriminative features and remove the irrelevant features in a filter ranking manner.Let the original feature set be O={Fii=1,2,…,D},where D is the total number of features.The procedure of the proposed algorithm is outlined as follows.

Algorithm 1:The procedure of APFC

INPUT:the original feature set O.

OUTPUT:the selected feature subset T,and the number of selected features k.

BEGIN

1:Compute the feature similarities matrix s,where the element is the absolute correlation coefficient of feature Fi and Fj.The diagonal elements where i,j=1,2,…,D and ij.

2:Cluster features using affinity propagation based on s.

3:Choose the cluster centers to form the candidate selected feature subset O’={Frii=1,2,…M},where M is the number of cluster centers.

4:Calculate the Fisher score of each feature in O’ based on equation(5),then rank the features in descending order.

5:For i=1 to M do

Select the top i features with highestValue of λFisher to form a feature subset and evaluate the classification accuracy Acci)of this feature subset using LIBSVM.

End For

6:Choose the final selected feature subset T={Fr1Fr2,…,Frk} where k=arg max Acci).

END

3 Experiments and Results

3.1 Experimental Setup

In this section,we evaluate the performance of the proposed feature selection method using high-dimensional benchmark data sets.We focus on the 2-class classification problems and four widely used high-dimensional data sets namely Colon-cancer,Gisette,Duke breast-cancer,and Leukemia from C.Chang,and C.Lin,LIBSVM:a library for supportVector machines,2001.Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. are used in our experimental studies.The information of these data sets including the number of features,training instances,test instances,and class categories are summarized in Table I.

Table I.Summary of bench-mark data sets

“/” denotes absence of testing instances

3.2 Classification Performance

We compare the performance of APFC with that of full features and AP based feature selection,i.e.,using the cluster centers as the final selected feature subset.Table II records the number of features selected by the comparing algorithms.Table III tabulates the classification accuracy obtained by all algorithms in the four data sets.For data sets Gisetter and Leukemia which have test data provided,the test error is presented.For Colon-cancer and Duke breast-cancer data sets,the classification accuracy is evaluated using five-fold cross-validation.Table II and III show that APFC attains the best classification accuracy with the most compact feature subsets.The results suggest that APFC,capitalizing on the advantage of AP and Fisher criterion,is able to efficiently select the discriminative features and meanwhile remove the redundant and irrelevant features.

Table II.Number of features selected

Table III.Classification accuracies by USING LIBSVM

3.3 The Number of Selected Features Against the Classification Accuracy

The classification accuracies with different number of features selected by APFC are plotted in Fig.1~4.It is shown that including more discriminative features the classification accuracies increase as the growing of the selected feature subsets at the beginning.Whereas,after all important features have been selected,the increasing of the number of selected features only results in involving redundant and irrelevant features.The classification accuracy consequentially stops improving or even turns down if too many irrelevant features are selected.It is observed that APFC takes about only 1% of the original features to obtain the best classification accuracy on all the four data sets.The classification accuracies on Colon-cancer,Duke breast-cancer,and Leukemia data sets are more sensitive to the number of selected feature,due to the small samples sizes.

Figure 1.Colon-cancer:classification accuracy against the number of features selected

Figure 2.Gisette:classification accuracy against the number of features selected

Figure 3.Duke breast-cancer:classification accuracy against the number of features selected

Figure 4.Leukemia:classification accuracy against the number of features selected

4 Conclusion

In this paper,a two-step feature selection method named APFC is proposed for classification problems based on affinity propagation clustering and Fisher criterion.We evaluate the performance of APFC on four high-dimensional data sets.The experimental results reveal that APFC is able to significantly reduce the number of the selected feature subsets,and improve the classification accuracies.It demonstrated that the affinity propagation clustering is helpful for eliminating the feature redundancy and the Fisher criterion based feature selection is capable of identifying discriminative features and removing the most irrelevant or noisy features.