A Discriminative Model for Semi-Supervised Learning
مقال من تأليف: Balcan, Maria-Florina ; Blum, Avrim ;
ملخص: Supervised learning-that is, learning from labeled examples-is an area of Machine Learning that has reached substantial maturity. It has generated general-purpose and practically successful algorithms and the foundations are quite well understood and captured by theoretical frameworks such as the PAC-learning model and the Statistical Learning theory framework. However, for many contemporary practical problems such as classifying web pages or detecting spam, there is often additional information available in the form of unlabeled data, which is often much cheaper and more plentiful than labeled data. As a consequence, there has recently been substantial interest in semi-supervised learning-using unlabeled data together with labeled data-since any useful information that reduces the amount of labeled data needed can be a significant benefit. Several techniques have been developed for doing this, along with experimental results on a variety of different learning problems. Unfortunately, the standard learning frameworks for reasoning about supervised learning do not capture the key aspects and the assumptions underlying these semi-supervised learning methods. In this article, we describe an augmented version of the PAC model designed for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past decade in the Machine Learning community. This model provides a unified framework for analyzing when and why unlabeled data can help, in which one can analyze both sample-complexity and algorithmic issues. The model can be viewed as an extension of the standard PAC model where, in addition to a concept class C, one also proposes a compatibility notion: a type of compatibility that one believes the
لغة:
إنجليزية