Thesis

Filetyp: PDF (.pdf)
Size: 4802 Kb

Schlüsselwörter:

Statistical Learning Theory, Kernel Methods, Linear Classifiers, PAC-Bayesian Methods, Game of Go

Statistische Lerntheorie, Kernmethoden, Lineare Klassifikatoren, PAC-Bayesianische Methoden, Go-Spiel

Sachgruppe der DNB
28 Informatik, Datenverarbeitung

Doctoral Dissertation accepted by: Technical University of Berlin , School of Electrical Engineering and Computer Sciences, 2002-07-12

Abstract

The thesis deals with problems of pattern classification in the framework of machine learning. The focus of the work is on kernel methods for the supervised classification of objects. The thesis gives a detailed introduction into the field of kernel algorithms and learning theory. New contributions include learning theoretical results in the PAC-Bayesian framework, efficient sampling algorithms for Bayesian classification in kernel space, and an application of kernel methods to pattern analysis in the game of Go. Learning Theory: In the PAC-Bayesian framework we derive new bounds on the prediction error of linear classifiers (in kernel space) in terms of the normalised margin achieved on the training sample, taking into account both the concentration of the training data and the margin distribution. Assuming sparseness of the dual variables we extend the PAC-Bayesian framework to data-dependent hypotheses. Finally, we prove "egalitarian" bounds on the probability of finding classifiers with high prediction error in subsets of hypothesis space with low empirical risk - results that emphasise the importance of model selection. Learning Algorithms: We discuss Bayesian classification in kernel space and identify Bayesian transduction and the Bayes point machine as optimal procedures for classification in a Bayesian sense. Assuming a uniform prior over normalised weights for a given kernel we devise sampling schemes for the resulting piecewise constant posterior: The kernel Gibbs sampler, a Markov chain Monte Carlo method able to deal with label noise, and the permutational perceptron sampler for large scale sampling. The classification techniques are tested on handwritten-digit recognition tasks and compare favourably to support vector machines when rejecting test data based on confidence measures. Application: We apply kernel classifiers to the domain of pattern classification in the Japanese game of Go, which serves as a test domain for classification problems involving symbolic or structured objects. In order to learn mappings from points on the board to numbers that indicate territorial status or move quality we define the common fate graph as a compact representation for Go positions and show how a feature space based on relative subgraph features can be constructed. Building on the this representation we train a support vector machine to learn the quality of moves from a collection of Go problems and from actual 9x9 game records. As a result, we obtain classifiers that solve certain Go problems and play 9x9 Go at a non-trivial level of playing strength.

Die Arbeit beschäftigt sich mit der Kern-basierten überwachten Klassifikation von Mustern im Rahmen des maschinellen Lernens. Sie gibt eine detaillierte Einführung in die Themen Lerntheorie und Kern-basierte Algorithmen. Neue Beiträge bestehen in lerntheoretische Ergebnisse im Rahmen der PAC-Bayesianischen Theorie, neuen sampling Algorithmen für bayesianische Klassifikation in Kernräumen, und in der Anwendung von Kernmethoden auf Mustererkennung im japanische Brettspiel Go. Lerntheorie: Im Rahmen der PAC-Bayesianischen Theorie werden neue Schranken an den Vorhersagefehler linearer Klassifikatoren (in Kernräumen) hergeleitet, in die nicht nur der normierte margin, sondern auch die Konzentration der Trainingsdaten und die Verteilung der reellwertigen Ausgaben eingehen. Unter der Annahme dünnbesetzter dualer Variablen, wird die PAC-Bayesianische Theorie auf datenabhängige Hypothesenräume erweitert. Es werden "egalitäre" Schranken an die Wahrscheinlichkeit bewiesen, daß ein Klassifikator einen hohen Vorhersagefehler hat, obwohl er sich in einer Untermenge befindet, die im Mittel einen niedrigen Trainingsfehler aufweist. Diese Ergebnisse unterstreichen die Wichtigkeit von Modellselektion. Lernalgorithmen: Bayesian transduction und die Bayes point machine werden als optimale Klassifikationsverfahren im bayesianischen Sinne identifiziert. Es werden sampling Algorithmen für den resultierenden stückweise konstanten Posterior entwickelt: der kernel Gibbs sampler, ein MCMC Verfahren geeignet für den Fall verrauschter Klasseninformation und der permutational perceptron sampler für Probleme mit vielen Trainingsdaten. Beide Verfahren werden auf Klassifikationsproblemen zur Erkennung handgeschriebener Ziffern getestet und erweisen sich der support vector machine besonders bei der Angabe von Konfidenzwerten als überlegen. Anwendung: Als Anwendungsbeispiel für Kernklassifikatoren wird das Problem der Mustererkennung im japanischen Brettspiel Go betrachtet, das hier als Prototyp eines Klassifikationsproblems auf symbolischen bzw. strukturierten Objekten dient. Als kompakte Repräsentation von Go-Stellungen wird der common fate graph entwickelt, auf dessen Grundlage sogenannte relative subgraph features einen geeigneten Merkmalsraum aufspannen. Davon ausgehend wird eine support vector machine trainiert, um die Qualität von Zügen aus einer Sammlung von Go-Problemen und 9x9 Spielprotokollen zu lernen. Die resultierenden Klassifikatoren lösen bestimmte Problemstellungen und spielen 9x9 Go mit einer beachtlichen Spielstärke.

Betreuer	Kockelkorn, Ulrich; Prof. Dr.
Gutachter	Kockelkorn, Ulrich; Prof. Dr.
Gutachter	Wysotzki, Fritz; Prof. Dr.

Upload:	2002-11-08
URL of Theses:	http://edocs.tu-berlin.de/diss/2002/graepel_thore.pdf

Technical University of Berlin, Library (Dissertation Office)
Strasse des 17. Juni 135, 10623 Berlin, Germany