# Cost-sensitive classification based on Bregman divergences

- Santos Rodríguez, Raúl

- Jesús Cid Sueiro Director

Defence university: Universidad Carlos III de Madrid

Fecha de defensa: 11 July 2011

- Antonio Artés Rodríguez Chair
- Emilio Parrado Hernández Secretary
- Rocío Aláiz Rodríguez Committee member
- Tijl De Bie Committee member
- José Luis Rojo Alvarez Committee member

Type: Thesis

## Abstract

The main object of this PhD. Thesis is the identification, characterization and study of new loss functions to address the so-called cost-sensitive classification. Many decision problems are intrinsically cost-sensitive. However, the dominating preference for cost-insensitive methods in the machine learning literature is a natural consequence of the fact that true costs in real applications are di fficult to evaluate. Since, in general, uncovering the correct class of the data is less costly than any decision error, designing low error decision systems is a reasonable (but suboptimal) approach. For instance, consider the classification of credit applicants as either being good customers (will pay back the credit) or bad customers (will fail to pay o part of the credit). The cost of classifying one risky borrower as good could be much higher than the cost of classifying a potentially good customer as bad. Our proposal relies on Bayes decision theory where the goal is to assign instances to the class with minimum expected cost. The decision is made involving both costs and posterior probabilities of the classes. Obtaining calibrated probability estimates at the classifier output requires a suitable learning machine, a large enough representative data set as well as an adequate loss function to be minimized during learning. The design of the loss function can be aided by the costs: classical decision theory shows that cost matrices de ne class boundaries determined by posterior class probability estimates. Strictly speaking, in order to make optimal decisions, accurate probability estimates are only required near the decision boundaries. It is key to point out that the election of the loss function becomes especially relevant when the prior knowledge about the problem is limited or the available training examples are somehow unsuitable. In those cases, different loss functions lead to dramatically different posterior probabilities estimates. We focus our study on the set of Bregman divergences. These divergences offer a rich family of proper losses that has recently become very popular in the machine learning community [Nock and Nielsen, 2009, Reid and Williamson, 2009a]. The first part of the Thesis deals with the development of a novel parametric family of multiclass Bregman divergences which captures the information in the cost matrix, so that the loss function is adapted to each specific problem. Multiclass costsensitive learning is one of the main challenges in cost-sensitive learning and, through this parametric family, we provide a natural framework to successfully overcome binary tasks. Following this idea, two lines are explored: Cost-sensitive supervised classification: We derive several asymptotic results. The first analysis guarantees that the proposed Bregman divergence has maximum sensitivity to changes at probability vectors near the decision regions. Further analysis shows that the optimization of this Bregman divergence becomes equivalent to minimizing the overall cost regret in non-separable problems, and to maximizing a margin in separable problems. Cost-sensitive semi-supervised classification: When labeled data is scarce but unlabeled data is widely available, semi-supervised learning is an useful tool to make the most of the unlabeled data. We discuss an optimization problem relying on the minimization of our parametric family of Bregman divergences, using both labeled and unlabeled data, based on what is called the Entropy Minimization principle. We propose the rst multiclass cost-sensitive semi-supervised algorithm, under the assumption that inter-class separation is stronger than intra-class separation. The second part of the Thesis deals with the transformation of this parametric family of Bregman divergences into a sequence of Bregman divergences. Work along this line can be further divided into two additional areas: Foundations of sequences of Bregman divergences: We generalize some previous results about the design and characterization of Bregman divergences that are suitable for learning and their relationship with convexity. In addition, we aim to broaden the subset of Bregman divergences that are interesting for cost-sensitive learning. Under very general conditions, we nd sequences of (cost-sensitive) Bregman divergences, whose minimization provides minimum (cost-sensitive) risk for non-separable problems and some type of maximum margin classifiers in separable cases. Learning with example-dependent costs: A strong assumption is widespread through most cost-sensitive learning algorithms: misclassification costs are the same for all examples. In many cases this statement is not true. We claim that using the example-dependent costs directly is more natural and will lead to the production of more accurate classifiers. For these reasons, we consider the extension of cost-sensitive sequences of Bregman losses to example-dependent cost scenarios to generate finely tuned posterior probability estimates.