Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
2. ENGINEERING AND TECHNOLOGY
—This paper addresses a basic problem in regard to the analysis of a ﬁnite binary string or bit stream (of compact support), namely, how to tell whether the string is representative of non-random or intelligible information (involving some form of periodicity, for example), whether it is the product of an entirely random process or whether it is something in between thetwo.Thisproblemhasapplicationsthatincludecryptanalysis, quantitative ﬁnance, machine learning, artiﬁcial intelligence and other forms of signal and image processing involving the general problem of how to distinguishing real noise from information embedded in noise, for example. After providing a short introductiontotheproblem,wefocusontheapplicationofinformation entropy for solving the problem given that this fundamental metric is an intrinsic measure on information in regard to some measurable system. A brief overview on the concept of entropy is given followed by examples of how algorithms can be design to compute the binary entropy of a ﬁnite binary string including important variations on a theme such as the BiEntropy. The problem with computing a single metric of this type is that it can berepresentativeofsimilarbinarystringsandlacksrobustnessin terms of its statistically signiﬁcance. For this reasons, the paper presents a solution to the problem that is based on the KullbackLeibler Divergence (or Relative Entropy) which yields a measure of how one probability distribution is different from another reference probability distribution. By repeatedly computing this metric for different reference (simulated or otherwise) random ﬁnite binary strings, it is shown how the distribution of the resultingsignalchangesforintelligibleandrandombinarystrings of a ﬁnite extent. This allows a number of standard statistical metricstobecomputedfromwhichthefoundationsforamachine learningsystemcanbedeveloped.Alimitednumberofresultsare present for different natural languages to illustrate the approach, a prototype MATLAB function being provide for interested readers to reproduce the results given as required, investigate different data sets and further develop the method considered
J M Blackledge & N Mosola (2020) A Statistically Signiﬁcant Test to Evaluate the Order or Disorder for a Binary String of a Finite Length, ISSC2020, IEEE UK and Ireland Signal Processing Chapter and IEEE Computational Intelligence Society (UK & Ireland), Letterkenny Institute of Technology, 11-12, June 2020.