Document Type

Conference Paper


Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence



Publication Details

ISSC2020, IEEE UK and Ireland Signal Processing Chapter and IEEE Computational Intelligence Society (UK & Ireland), Letterkenny Institute of Technology, 11-12, June 2020.


—This paper addresses a basic problem in regard to the analysis of a finite binary string or bit stream (of compact support), namely, how to tell whether the string is representative of non-random or intelligible information (involving some form of periodicity, for example), whether it is the product of an entirely random process or whether it is something in between thetwo.Thisproblemhasapplicationsthatincludecryptanalysis, quantitative finance, machine learning, artificial intelligence and other forms of signal and image processing involving the general problem of how to distinguishing real noise from information embedded in noise, for example. After providing a short introductiontotheproblem,wefocusontheapplicationofinformation entropy for solving the problem given that this fundamental metric is an intrinsic measure on information in regard to some measurable system. A brief overview on the concept of entropy is given followed by examples of how algorithms can be design to compute the binary entropy of a finite binary string including important variations on a theme such as the BiEntropy. The problem with computing a single metric of this type is that it can berepresentativeofsimilarbinarystringsandlacksrobustnessin terms of its statistically significance. For this reasons, the paper presents a solution to the problem that is based on the KullbackLeibler Divergence (or Relative Entropy) which yields a measure of how one probability distribution is different from another reference probability distribution. By repeatedly computing this metric for different reference (simulated or otherwise) random finite binary strings, it is shown how the distribution of the resultingsignalchangesforintelligibleandrandombinarystrings of a finite extent. This allows a number of standard statistical metricstobecomputedfromwhichthefoundationsforamachine learningsystemcanbedeveloped.Alimitednumberofresultsare present for different natural languages to illustrate the approach, a prototype MATLAB function being provide for interested readers to reproduce the results given as required, investigate different data sets and further develop the method considered