14th Mediterranean Conference on Medical and Biological Engineering and Computing (MEDICON), Paphos, CYPRUS, 31 March - 02 April 2016, vol.57, pp.512-517
With the rapid growth of huge amounts of DNA sequence, genes identification has become an important task in bioinformatics. To detect genes, it is important to accurately predict splice sites, i.e. exon intron boundaries. Moreover, in biology where structures are described by a large number of features as splice sites, the feature selection is an important step toward the classification task. It provides useful biological knowledge and allows for a faster and better classification. Feature selection techniques can be divided into two groups: feature-ranking and feature-subset selection. This paper investigates the performance of combining support vector machine (SVM) with two different feature ranking methods, namely F-score and Random Forest feature ranking competitively in splice site detection of Human genome. Also a new classification method based on Random Forest for splice site prediction is presented.