EXPERT SYSTEMS WITH APPLICATIONS, cilt.42, sa.22, ss.9001-9011, 2015 (SCI-Expanded)
In this paper, a novel method is proposed to predict the direction of Borsa Istanbul (BIST) 100 Index (BIST100) open prices using the news articles released, as well as the price data, from the day before. Although English news articles have been used for market-prediction before, to the best of our knowledge, Turkish news articles together with prices have not yet been used to predict the Turkish markets. Turkish text mining techniques are applied on news articles to form feature vectors for each trading day. The feature vectors are assigned three labels based on the direction of the price change from the closing price of the day before and whether the change is significant. News articles are represented using high dimensional features, some of which could be noisy or irrelevant for prediction. There is also the scarcity of training data. Therefore, this study incorporates feature selection methods to select features that could improve classification performance. By its nature, significant positive or negative changes in stock price happen much less than non-significant changes, resulting in an imbalanced data set. Most feature selection methods in literature aim to reduce the classification accuracy. However, for imbalanced datasets, other measures, such as macro-averaged F-measure need to be considered. The paper proposes a feature selection methods that is able to deal with the class imbalance problem through oversampling of the minority classes and consideration of an ensemble of selected features. In order to decide on importance of features, as the relevance criterion for each feature, the proposed methodology uses mutual information which can detect nonlinear dependencies between variables. Therefore, the proposed feature selection method is called Balanced Mutual Information (BMI) feature selection method. Experiments were performed based on news articles provided by two different news sources: Public Disclosure Platform of BIST and financial news websites. It was shown that, using Balanced Mutual Information feature selection method, the significant changes in the BIST100 Index were predicted with an accuracy of 0.74 and a macro-averaged F-measure of 0.68. The BMI feature selection method was compared with Mutual Information and Chi-square based feature selection methods and it was found out that BMI method results in higher performance using a smaller number of features. (C) 2015 Elsevier Ltd. All rights reserved.