On Limited Length Binary Strings with an Application in Statistical Control

Makri, F.S.; Psillakis, Z.M.

On Limited Length Binary Strings with an Application in Statistical Control

F.S. Makri^{1, *}, Z.M. Psillakis²

¹ Department of Mathematics, University of Patras, 26500 Patras, Greece

² Department of Physics, University of Patras, 26500 Patras, Greece

Article Information

Identifiers and Pagination:

Year: 2017
Volume: 8
First Page: 1
Last Page: 6
Publisher Id: TOSPJ-8-1
DOI: 10.2174/1876527001708010001

Article History:

Received Date: 04/08/2016
Revision Received Date: 01/09/2016
Acceptance Date: 12/09/2016
Electronic publication date: 28/02/2017
Collection year: 2017

© 2017 Makri and Psillakis

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

^* Address correspondence to this author at the Department of Mathematics, University of Patras, 26500 Patras, Greece; Tel: 0030-2610-996738; E-mail: makri@math.upatras.gr

In a 0 - 1 sequence of Markov dependent trials we consider a statistic which counts strings of a limited length run of 0s between subsequent 1s. Its probability mass function is used to determine the chance that a stochastic process remains or not in statistical control. Illustrative numerics are presented.

Keywords: Binary strings, runs, overlapping counting, Markov chain, Statistical control.

View Abstract Download PDF Download ePub

1. INTRODUCTION AND PRELIMINARIES

Let {X_i}_i≥1 be a sequence of binary random variables (RVs) with values ordered on a line. For n ≥ 2 and two non-negative integer numbers k and l, 0 ≤ k ≤ l ≤ n – 2, we consider the statistic (random variable) M_n;k,l which enumerates strings (binary patterns) of a limited (constrained) length equal to d+2, k ≤ d ≤ l; i.e. strings which consist of a zero run (consecutive 0s) of length at least equal to k and at most equal to l between two subsequent ones. The counting of such constrained (k, l) strings is considered in the overlapping sense; that is, a 1 which is not at either end of the sequence may contribute towards counting two possible strings, the one which ends with the occurrence of it and the next one which starts with it.

As an illustration let 0100010000111011001001100 be the first n = 25 outcomes of a 0 - 1 sequence. Then M_25;1,2 = 3, M_25;1,3 = 4, M_25;0,0 = 4 and M_25;0,4 = 9.

Let A = {k + 2, k + 3,..., n} be an index set and {B, B^c} a partition of A, where B = {k + 2, k + 3,..., l + 1} and B^c = {l + 2, l + 3,..., n}. Then formally, M_n;k,l, 0 ≤ k ≤ l ≤ n - 2, can be written, on a 0 - 1 sequence , as (see Makri and Psillakis [1]):

(1)

with

(2)

Readily, for n < k + 2, M_n;k,l = 0. We notice that throughout the article we apply the conventions , for a > b; that is, an empty sum (product) is to be interpreted as a zero (unity). The support (range set) of M_n;k,l is:

(3)

where by we denote the greatest integer less than or equal to a real number x.

A statistic related to M_n;k,l is the waiting time W_m;k,l until the m-th, m ≥ 1, occurrence of a constrained (k, l) string. It is defined and connected to M_n;k,l as follows:

(4)

Hence, Eq. (4) offers an alternative way of obtaining results for the waiting time RV W_m;k,l through formulae established for the string enumerative RV M_n;k,l and vise versa.

The study, via M_n;k,l, of constrained (k, l) strings where a 1 is followed by at least k and at most l 0s before the next 1 in a 0 - 1 sequence covers as particular cases strings with zero run length at most equal (M_n;0,l, d ≤ l), exactly equal (M_n;k,k, d = k) and at least equal (M_n;k,n-2, d ≥ k) to a non-negative integer number. Some relevant contributions on the subject are the works of Sarkar et al. [2], Sen and Goyal [3], Holst [4, 5], Huffer et al. [6], Eryilmaz and Zuo [7], Eryilmaz and Yalcin [8], Dafnis et al. [9], and Makri and Psillakis [10]. Moreover, M_n;0,0 is the Ling’s [11] RV which counts overlapping runs of 1s of length 2 (with overlapping part of length at most 1), see e.g. Balakrishnan and Koutras [12].

In Section 2 of the present paper we first introduce the required notation and second we recall a recent result of Makri and Psillakis [1] which refers to the probability mass function (PMF) of M_n;k,l defined on a 0 - 1 Markov chain. The presented expressions for the PMF are then used in an application case study which is discussed in Section 3 and it refers to statistical process control.

Throughout the article, δ_i,j denotes the Kronecker delta function of the integer arguments i and j.

2. PMF OF M_n;k,l FOR MARKOV DEPENDENT TRIALS

In this Section we provide the PMF f_n;k,l(x) = P(M_n;k,l = x), , 0 ≤ k ≤ l ≤ n - 2. The 0 - 1 sequence , n ≥ 2 on which M_n;k,l is defined, is generated by a time-homogeneous first order Markov chain (MRKV) with one step transition probability matrix P = (p_i,j) and initial probability vector p⁽¹⁾ = with , t ≥ 2, = P(X₁ = 0) = 1 - P(X₁ = 1) = 1 - and . Readily, a 0 - 1 sequence , n ≥ 2 of independent and identically distributed (IID) RVs with a common probability of 1s, p = P(X_i = 1) = 1 - P(X_i = 0) = 1 - q, i = 1,2,...,n is a particular MRKV sequence with p₀₀ = p₁₀ = q, p₀₁ = p₁₁ = p and = (q, p).

We next present two combinatorial results (Lemma 1 and Corollary 1) which then are used for the calculation of f_n;k,l(x) given by Proposition 1.

Lemma 1 (Makri et al. [13]) Let C_i,r-i(α, m, k₁ - 1, k₂ - 1) be the number of allocations of α indistinguishable balls into m distinguishable cells, i specified of which have capacity k₁ - 1 and each of r-i specified cells has capacity k₂ - 1, 0 ≤ r ≤ m, 0 ≤ i ≤ r, k₁ ≥ 1, k₂ ≥ 1 . Then,

(5)

We note that the number C_i,r-i(α, m, k₁ - 1, k₂ - 1) equivalently gives the number of integer solutions of the equation x₁ + x₂ +...+ x_m = α such that 0 ≤ x_j < k₁, 1 ≤ j ≤ i and 0 ≤ x_i+j < k₂, 1 ≤ j ≤ r-i. Setting k₁ = k₂ = k in Lemma 1 we obtain, as a corollary, the following result.

Corollary 1 (Makri et al. [14]). Let H_r(α, m, k - 1) be the number of allocations of α indistinguishable balls into m distinguishable cells where each of the r, 0 ≤ r ≤ m, specified cells is occupied by at most k - 1 balls. Then, H_r(α, m, k - 1) = C_i,r-i(α, m, k - 1, k - 1). Accordingly,

(6)

Proposition 1 (Makri and Psillakis [1]). For p⁽¹⁾ = = (p₀, p₁) and the PMF f_n;k,l(x), , n ≥ k + 2 is given by:

(7)

where r_k,l = x + i + j if 0 < k ≤ l; i + j if 0 = k < l,

with

and

(8)

3. A NOTE FOR APPLICATION: STATISTICAL PROCESS CONTROL

Suppose we observe values of a stochastic process {Y_i}_i≥1 which operates under chance causes in or out an acceptable zone of interest (ZI); i.e. the process is in statistical control. We are interested in remaining the process in ZI, otherwise a risky or an awkward situation of the process might appear. For instance, a ZI might be a control zone in a statistical control chart, a trading zone in a stock or exchange market, a comfort zone in a patient mechanical support system, a zone of acceptable items with specific characteristics in an assembly line, etc.

We denote by 0 and 1 the occurrence of the value of the process in and out of ZI, respectively. We assume that each value of the process is out of ZI with probability p₀₁ or p₁₁ if the previous value was in or out of ZI, respectively. Therefore, the indicator RVs X_i = 0, if ; 1, otherwise constitute a 0 - 1 Markov chain {X_i}_i≥1 with transition probability matrix and initial probability vector p⁽¹⁾ = (1 - p₁, p₁). In practise, ZI and p₀₁, p₁₁ might be evaluated from a reference sample in Phase I (retrospective) analysis of the process whereas the monitoring of the process, we are interested in, is considered as Phase II (prospective) analysis of the process (see, e.g. Chakraborti et al. [15]).

Following Dafnis and Philippou [16] we interpret the occurrence of two subsequent 1s separated by at least k and at most l 0s as a sign that the process leaves its ZI and it enters in a risky situation for which additional care has to be paid. In such a case the number of values in ZI is not enough to compensate for the others out of it. What values of k and l should depend on the under study process.

Accordingly, the event of having m (≥1) or more constrained (k, l) strings in a stream of n future values of the process could serve as an index that the process will not any more be bounded in its ZI. The risk of experiencing of such an event would be measured by P(M_n;k,l≥m) = P(W_m;k,l≤n). Therefore, the distributions and the characteristics of the related RVs M_n;k,l and W_m;k,l would be of importance in understanding the future process behavior. Usually in practice, we consider a detector that counts such events and consequently sends an alarm signal to a process monitoring system. The detector’s tolerance (the false alarm probability) is taken to be at most equal to γ (0 < γ < 1) so that an expected number, γ100%, of false alarms happen. For a given γ a critical value of m (if there is such a value for the selected γ and the process parameters n, k, l and p₁, p₀₁, p₁₁) m_γ is: m_γ = min{m ≥ 1: P(M_n;k,l≥m) = γ* ≤ γ}. The probability γ* is the largest real number which does not exceed γ. It may not be equal (in fact it is rare to be equal) to the assigned probability γ, as it refers to the discrete RV M_n;k,l.

As a numerical example we consider Markov dependent trials with p₁ = 0.5, p₀₁ = 0.45 and p₁₁ = 0.90. That is, we assume that the process has 50-50 chance to be initially in or out its ZI and the occurrence of a value out of ZI implies that the process continues to remain out of ZI with large probability, in fact twice the probability that the process leaves its ZI. The latter one is assumed to be a little smaller than 0.5.

For a reasonable value of the detector’s tolerance γ = 0.05 and for large enough values of n, we present in Table 1 critical values m_γ, and exact probabilities γ* = P(M_n;k,l≥m_γ) for several constrained (k, l) strings of type at most (AM), at least (AL), at least - at most (AL-AM) and equal (EQ). For comparison we have included in the Table the respective m_γ and γ* for IID trials with p = 0.5.

Table 1. Critical values m_γ and exact probabilities γ* = P(M_n;k,l ≥ m_γ) for γ = 0.05.

				MRKV		IID
Type	k	l	n	m_γ	γ*	m_γ	γ*
AM	0	2	50	47	0.032167	29	0.048855
			100	90	0.044433	54	0.049354
			200	175	0.040410	103	0.040483
AL-AM	1	2	50	6	0.038992	14	0.032755
			100	10	0.039564	25	0.040928
			200	17	0.048307	46	0.047713
	1	3	50	7	0.023274	15	0.034682
			100	11	0.046771	28	0.030000
			200	20	0.032868	52	0.039427
AL	2	48	50	5	0.026426	9	0.032183
		98	100	8	0.037465	17	0.017336
		198	200	14	0.032873	31	0.024284
EQ	2	2	50	4	0.010796	7	0.016457
			100	5	0.040582	11	0.028616
			200	8	0.041990	19	0.029692

The entries of Table 1 suggest that in order to have at most 5 false alarms in a total of 100 alarms in a stream of n future values of a process, defined on MRKV with the prementioned values p₀₁, p₁₁ and p₁, we have to take larger m_γ when we use a (0, 2) string than m_γ of a (2, 2) string. This is so, because a (2, 2) string is one of the strings counted as a (0, 2) string and therefore as a more strict pattern is a less frequently appearing string, as a warning pattern, than a (0, 2) string. Analogous arguments/interpretations are also suggested for the other presented types of strings as well as when someone compares MRKV and IID trials.

CONCLUSION

In this study the RV M_n;k;l counting a flexible class of binary strings of a limited length is considered. A potential application of its PMF in describing the probabilistic behavior of a binary stochastic process is discussed. The stochastic process is assumed to operate under chance in or out an acceptable zone of interest. Such a zone can be a control chart zone, a trading zone of a stock market, etc. An index connected to the number of occurrences of limited length binary strings is properly defined via the PMF of M_n;k;l. Accordingly, it is then used to determine whether or not the process is or not under statistical control. Numerical results clarify the implementation of several types of binary strings used in the control process.

CONFLICT OF INTEREST

The authors confirm that this article content has no conflict of interest.

ACKNOWLEDGEMENTS

The authors wish to thank the anonymous referee for the through reading and suggestions which helped to improve the paper.

[1]	Makri FS, Psillakis ZM. "Exact distributions of constrained (k, l) strings of failures between subsequent successes", Stat. Papers, vol. 54, pp. 783-806, 2013. CrossRef
[2]	Sarkar A, Sen K, Anuradha . Waiting time distributions of runs in higher order Markov chains Ann Inst Stat Math 2004; 56: 317-49. CrossRef
[3]	Sen K, Goyal B. "Distributions of patterns of two failures separated by success runs of length k", J. Korean Stat. Soc., vol. 33, pp. 35-58, 2004.
[4]	Holst L. Counts of failure strings in certain Bernoulli sequences. J Appl Probab 2007; 44: 824-30. CrossRef
[5]	Holst L. The number of two consecutive successes in a Hope-Polya urn. J Appl Probab 2008; 45: 901-6. CrossRef
[6]	Huffer FW, Sethuraman J, Sethuraman S. A study of counts of Bernoulli strings via conditional Poisson processes. Proc Am Math Soc 2009; 137: 2125-34. CrossRef
[7]	Eryilmaz S, Zuo M. Constrained (k, d)-out-of-n systems. Int J Syst Sci 2010; 41: 679-85. CrossRef
[8]	Eryilmaz S, Yalcin F. On the mean and extreme distances between failures in Markovian binary sequences. J Comput Appl Math 2011; 236: 1502-10. CrossRef
[9]	Dafnis SD, Philippou AN, Antzoulakos DL. Distributions of patterns of two successes separated by a string of k-2 failures. Stat Papers 2012; 53: 323-44. CrossRef
[10]	Makri FS, Psillakis ZM. Counting certain binary strings. J Stat Plan Inference 2012; 142: 908-24. CrossRef
[11]	Ling KD. On binomial distributions of order k. Stat Probab Lett 1988; 6: 247-50. CrossRef
[12]	Balakrishnan N, Koutras MV. Runs and scans with applications. New York: Wiley 2002.
[13]	Makri FS, Philippou AN, Psillakis ZM. "Polya, inverse Polya, and circular Polya distributions of order k for l-overlapping success runs", Commun. Stat. Theory Methods, vol. 36, pp. 657-668, 2007. CrossRef
[14]	Makri FS, Philippou AN, Psillakis ZM. Success run statistics defined on an urn model. Adv Appl Probab 2007; 39: 991-1019. CrossRef
[15]	Chakraborti S, Eryilmaz S, Human SW. A phase II nonparametric control chart based on precedence statistics with runs-type signaling rules. Comput Stat Data Anal 2009; 53: 1054-65. CrossRef
[16]	Dafnis SD, Philippou AN. Distributions of patterns with applications in engineering. IAENG Int J Appl Math 2011; 41: 68-75.

RESEARCH ARTICLE

On Limited Length Binary Strings with an Application in Statistical Control

Article Information

Identifiers and Pagination:

Article History:

Abstract

1. INTRODUCTION AND PRELIMINARIES

2. PMF OF M_n;k,l FOR MARKOV DEPENDENT TRIALS

3. A NOTE FOR APPLICATION: STATISTICAL PROCESS CONTROL

CONCLUSION

CONFLICT OF INTEREST

ACKNOWLEDGEMENTS

REFERENCES

Published Contents

About the Journal

The Open Mathematics, Statistics and Probability Journal

RESEARCH ARTICLE

On Limited Length Binary Strings with an Application in Statistical Control

Article Information

Identifiers and Pagination:

Article History:

Abstract

1. INTRODUCTION AND PRELIMINARIES

2. PMF OF Mn;k,l FOR MARKOV DEPENDENT TRIALS

3. A NOTE FOR APPLICATION: STATISTICAL PROCESS CONTROL

CONCLUSION

CONFLICT OF INTEREST

ACKNOWLEDGEMENTS

REFERENCES

Published Contents

About the Journal

The Open Mathematics, Statistics and Probability Journal

2. PMF OF M_n;k,l FOR MARKOV DEPENDENT TRIALS