Document Type

Theses, Ph.D


Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence


Computer Sciences

Publication Details

Thesis successfully submitted to Technological University Dublin in fulfilment of the requirements for the award of Doctor of Philosophy (PhD).


Activity discovery (AD) refers to the unsupervised extraction of structured activity data from a stream of sensor readings in a real-world or virtual environment. Activity discovery is part of the broader topic of activity recognition, which has potential uses in fields as varied as social work and elder care, psychology and intrusion detection. Since activity recognition datasets are both hard to come by, and very time consuming to label, the development of reliable activity discovery systems could be of significant utility to the researchers and developers working in the field, as well as to the wider machine learning community.

This thesis focuses on the investigation of activity discovery systems that can deal with interleaving, which refers to the phenomenon of continuous switching between multiple high-level activities over a short period of time. This is a common characteristic of the real-world datastreams that activity discovery systems have to deal with, but it is one that is unfortunately often left unaddressed in the existing literature.

As part of the research presented in this thesis, the fact that activities exist at multiple levels of abstraction is highlighted. A single activity is often a constituent element of a larger, more complex activity, and in turn has constituents of its own that are activities. Thus this investigation necessarily considers activity discovery systems that can find these hierarchies.

The primary contribution of this thesis is the development and evaluation of an activity discovery system that is capable of identifying interleaved activities in sequential data. Starting from a baseline system implemented using a topic model, novel approaches are proposed making use of modern language models taken from the field of natural language processing, before moving on to more advanced language modelling that can handle complex, interleaved data. As well as the identification of activities, the thesis also proposes the abstraction of activities into larger, more complex activities. This allows for the construction of hierarchies of activities that more closely reflect the complex inherent structure of activities present in real-world datasets compared to other approaches.

The thesis also discusses a number of important issues relating to the evaluation of activity discovery systems, and examines how existing evaluation metrics may at times be misleading. This includes highlighting the existence of differing abstraction issues in activity discovery evaluation, and suggestions for how this problem can be mitigated. Finally, alternative evaluation metrics are investigated.

Naturally, this dissertation does not fully solve the problem of activity discovery, and work remains to be done. However, a number of the most pressing issues that affect real-world activity discovery systems are tackled head-on, and show that useful progress can indeed be made on them. This work aims to benefit systems that are as “clean slate" as possible, and hence incorporate no domain-specific knowledge. This is perhaps somewhat of an artificial handicap to impose in this problem domain, but it does have the advantage of making this work applicable to as broad a range of domains as possible.