This item is available under a Creative Commons License for non-commercial use only
1.2 COMPUTER AND INFORMATION SCIENCE
This paper investigates methods for the prediction of tags on a textual corpus that describes hotel staff inputs in a ticketing system. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry. The paper consists of two parts: (i) exploration of existing sample data, which includes statistical analysis and visualisation of the data to provide an overview, and (ii) evaluation of tag prediction approaches. We have included different approaches from different research fields in order to cover a broad spectrum of possible solutions. As a result, we have tested a machine learning model for multi-label classification (using gradient boosting), a statistical approach (using frequency heuristics), and two simple similarity-based classification approaches (Nearest Centroid and k-Nearest Neighbours). The experiment which compares the approaches uses recall to measure the quality of results. Finally, we provide a recommendation of the modelling approach which produces the best accuracy in terms of tag prediction on the sample data.
Bozic, B., Rios, A. & Delany, S.J. (2018). Validation of tagging suggestion models for a hotel ticketing corpus. Proceedings of iiWAS2018 Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services, Yogyakarta, Indonesia,November 19 - 21, pp.1523. doi:10.1145/3282373.3282386