This item is available under a Creative Commons License for non-commercial use only
Supervised machine learning approaches assume the existence of a large collection of manually labelled examples of the problem under consideration. However, in many cases such a collection does not exist and creating one is time consuming and expensive. This can be a barrier to the use of supervised learning in certain situations, particularly when the doubt as to whether the system will work or not makes the cost of creating a dataset unjustifable. Active learning is a machine learning technique that has been used widely to create classification systems in the absence of large numbers of labelled examples, but that can also be used to create such collections. This paper will describe a system that uses active learning to label large collections of unlabelled data. We will show that the system can create an accurately labelled dataset aproximately 10 times the size of the set of examples manually labelled by an expert. The experiments described are based on recipe data from the 1st Computer Cooking Contest to be held at ECCBR'08 and focus on identifying those recipes in the set that are desserts.
Mac Namee, B. & Delany, S (2008) Sweetening the Data Set : Using Active Learning to Label Unlabelled Datasets. Proceedings of the 19th. Irish Conference on Artificial Intelligence and Cognitive Science (AICS '08) UCC, Cork.