Document Type

Dissertation

Rights

This item is available under a Creative Commons License for non-commercial use only

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE, Computer Sciences, Information Science, Bioinformatics

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Analytics)

Abstract

Clinical Trials are studies conducted by researchers in order to assess the impact of new medicine in terms of its efficacy and most importantly safety on human health. For any advancement in the field of medicine it is very important that clinical trials are conducted with right ethics supported by scientific evidence. Not all people who volunteer or participate in clinical trials are allowed to undergo the trials. Age, comorbidity and other health issues present in a patient can be a major factor to decide whether the profile is suitable or not for the trial. Profiles selected for clinical trials should be documented and also the profiles which were excluded. This research which took over a long time period conducted trials on 15,000 cancer drugs. Keeping track of so many trials, their outcomes and formulating a standard health guideline is easier said than done. In this paper, Text classification which is one of the primary assessment tasks in Natural Language Processing (NLP) is discussed. One of the most common problems in NLP, but it becomes complex when it is dealing with a specific domain like bio-medical which finds presence of quite a few jargons pertaining to the medical field. This paper proposes a framework with two major components comprising transformer architecture to produce embedding coupled with a text classifier. In the later section it is proved that pre-trained embeddings generated by BERT (Bidirectional Encoder Representations from Transformers) can perform as efficiently and achieve a better F1-score and accuracy than the current benchmark score which uses embeddings trained from the same dataset. The main contribution of this paper is the framework which can be extended to different bio-medical problems. The design can also be reused for different domains by fine-tuning. The framework also provides support for different optimization techniques like Mixed Precision, Dynamic Padding and Uniform Length Batching which improves performance by up to 3 times in GPU (Graphics Processing Unit) processors and by 60% in TPU (Tensor Processing Unit).

DOI

https://doi.org/10.21427/69qh-xn75

Share

COinS