Feature selection in credit scoring- a quadratic programming approach solving with bisection method based on Tabu search
MetadataShow full item record
Credit risk is one of the most important topics in the risk management. Meanwhile, it is the major risk of banks and financial institutions encountered as claimed by the Basel capital accord. As a form of credit risk measurement, credit scoring is the credit evaluation process to reduce the current and expected risk of a customer being bad credit. The credit scoring models usually use a set of features to predict the credit status, good credit (unlikely to default) and bad credit (more likely to default), of the applicants. However, with the fast growth in the credit industry and facilitation of collecting and storing information due to the new technologies, a huge amount of information on customer is available. Feature selection or subset selection is therefore essential to handle irrelevant, redundant or misleading features in order to improve predictive (classification) accuracy and reduce high complexity, intensive computation, and instability for most credit scoring models. In this study, a hybrid model is developed for credit scoring problems to predict the classification accuracy based on selected subsets by first establishing a correlation coefficient based binary quadratic programming model for feature selection. The model is then solved with the bisection method based on Tabu search algorithm (BMTS) and provides optional subsets of features in different sizes from which the satisfactory subsets for credit scoring models are selected based on both the size and overall classification accuracy rate (OCAR). The results of this proposed BMTS+SVM method, tested on two benchmark credit datasets, shed light on the improvement of the existing credit scoring systems with flexibility and robustness. This validated method is then used in an international business context to test the data on the U.S. and Chinese companies in order to find out the subsets of features that act as key factors in distinguishing good credit companies from bad credit companies in these two countries. Finally, The performance of classification models, using different classifiers, in terms of OCAR and misclassification cost is evaluated based on the U.S. and Chinese datasets. Cutoff values which give highest OCAR and minimum misclassification cost is also discussed.