Comparing Different Information Levels

Given a sequence of random variables ${\bf X}=X_1,X_2,\ldots$ suppose the aim is to maximize one's return by picking a `favorable' $X_i$. Obviously, the expected payoff crucially depends on the information at hand. An optimally informed person knows all the values $X_i=x_i$ and thus receives $E (\sup X_i)$. We will compare this return to the expected payoffs of a number of observers having less information, in particular $\sup_i (EX_i)$, the value of the sequence to a person who only knows the first moments of the random variables. In general, there is a stochastic environment (i.e. a class of random variables $\cal C$), and several levels of information. Given some ${\bf X} \in {\cal C}$, an observer possessing information $j$ obtains $r_j({\bf X})$. We are going to study `information sets' of the form $$ R_{\cal C}^{j,k} = \{ (x,y) | x = r_j({\bf X}), y=r_k({\bf X}), {\bf X} \in {\cal C} \}, $$ characterizing the advantage of $k$ relative to $j$. Since such a set measures the additional payoff by virtue of increased information, its analysis yields a number of interesting results, in particular `prophet-type' inequalities.

R j,k C = {(x, y)|x = r j (X), y = r k (X), X ∈ C}, characterizing the advantage of k relative to j. Since such a set measures the additional payoff by virtue of increased information, its analysis yields a number of interesting results, in particular 'prophet-type' inequalities.

Several Information Levels
Suppose there is a sequence of bounded random variables X = X 1 , X 2 , . . . and the aim is to maximize one's return by picking a 'favorable' X i . The first aim of this contribution is to study observers with different kinds of information: Suppose an observer knows all the realizations of the random variables and may thus choose the largest one. His expected return is therefore which is called the value to a prophet. Since the prophet always picks the largest realization his value m is a natural upper bound, given a sequence X.
Traditionally, m has been compared to the value obtained by a statistician who observes the process sequentially. This gambler, studied in detail in Chow, Robbins & Sigmund (1971), relies on stopping rules T , which have to be measurable with respect to the σ-field of past events. Behaving optimally the statistician may thus receive If there is a finite horizon n, one defines v = sup T ∈T ,T ≤n EX T and m = E(max 1≤i≤n X i ).
To avoid trivialities, we assume n ≥ 2 throughout this article.
A minimally informed gambler has to make his choice on the basis that he only knows the random variables' expected values. Behaving optimally, he gets an amount that is entirely due to his (weak) prior information, and is a straightforward counterpart to E(sup i X i ).
One might think that a person who knows the common distribution L(X 1 , . . . , X n ) (but none of the observations) should receive a larger payoff. However, no matter how this gambler makes up his mind, at the end of the day he has to choose an index i ∈ {1, . . . , n}, and thus his expected reward will be largest if EX i = u. Thus, although he knows much more than the minimally informed gambler his superior knowledge does not pay off.
In other words, it's the observations that make a difference. Suppose a person knows the dependence structure among the random variables and some of the observations, w.l.o.g. x 1 , . . . , x j . Notice, that there is no sequential unfolding of information, however, this partially informed gambler may use the values known to him to update his knowledge on the variables not observed, i.e. he may refer to conditional expectations. Thus he obtains max[x 1 , . . . , x j , E(X j+1 |x 1 , . . . , x j ), . . . , E(X n |x 1 , . . . , x j )], and his expected return is w = W (X) = E(max(X 1 , . . . , X j , E(X j+1 |X 1 , . . . , X j ), . . . , E(X n |X 1 , . . . , X j ))).
This observer can be reduced to a classical situation as follows: Given x 1 , . . . , x j , he will only consider the largest of these values; and the same with E(X j+1 |x 1 , . . . , x j ), . . . , E(X n |x 1 , . . . , x j ). Thus w.l.o.g. it suffices to compare max(x 1 , . . . , x j ) and max(E(X j+1 |x 1 , . . . , x j ), . . . , E(X n |x 1 , . . . , x j )) which is tantamount to the comparison of v and m if n = 2 and if arbitrary dependencies are allowed. In this situation the statistician behaves optimally if he chooses x 1 whenever x 1 ≥ E(X 2 |x 1 ). Thus, the set of all possible values here is given by

Stochastic environments (classes of random variables)
For some fixed X, the difference between two observers with different amounts of information can be nonexistent or arbitrarily large. In order to quantify the "value" of information it is thus necessary to shift attention to some class of random variables C, where M(X) is finite (and nonnegative) for all X ∈ C. It is then natural to consider the worst case scenarios. Traditionally these have been called prophet inequalities M(X) − V (X) ≤ a and M(X)/V (X) ≤ b with smallest possible constants a and b that hold for all X ∈ C.
Such stochastic inequalities follow easily from the more fundamental prophet region, that is, where f C is called the upper boundary function corresponding to C. Since it is only the latter set that gives a complete description of some informational advantage, it is more fundamental and should be considered in its own right.
In general, an information set characterizes the environment C, evaluated with the help of two particular levels of information. One could also prioritise the information edge and say that the difference between two levels of information (e.g. minimal vs. sequential) is studied in a certain environment. It is the second major aim of this article to illustrate a number of possible applications of these ideas.

Minimal versus maximum information
In this section we systematically compare u and m. That is, we are going to derive corresponding information sets (called prophet regions since m is involved) in two standard random environments: C(I, n), the class of all sequences of independent, [0, 1]valued random variables with horizon n; and C(G, n), the class of all sequences of [0, 1]-valued random variables with horizon n.
Theorem 1 (independent environment). Let X = (X 1 , . . . , X n ) ∈ C(I, n), U(X) = max EX i and M(X) = E(max X i ). Then the prophet region {(x, y) | x = U(X), y = M(X), X ∈ C(I, n)} is precisely the set Theorem 2 (general environment). Let X = (X 1 , . . . , X n ) ∈ C(G, n), U(X) = max EX i , and M(X) = E max X i . Then the upper boundary function h n of the prophet region Proof of Theorem 1: Without loss of generality let x = EX 1 ≥ max 2≤i≤n EX i . Hill and Kertz (1981: Lemma 2.2) prove that X can be replaced by a 'dilated' vector Y of Bernoulli random variables Y 1 , . . . , Y n such that EX i = EY i , 1 ≤ i ≤ n, and M(X) ≤ M(Y). Replacing Y by a vector of iid Bernoulli random variables Z = (Z 1 , . . . , Z n ) such that EZ i = x, 1 ≤ i ≤ n, does not improve the value to the gambler, i.e. U(X) = U(Y) = U(Z) = x, however, M(Y) ≤ M(Z) = 1 − (1 − x) n . Since any X ∈ C(I, n) can be replaced by a vector Z of iid Bernoulli random variables without changing the value to the gambler, f n (x) is the upper boundary function. Defining the independent random variables Z ′ 1 , . . . , Z ′ n by means of P (Z and 0 ≤ λ ≤ 1 proves that all points between (x, x) and (x, 1 − (1 − x) n ) also belong to the region. ♦ Notice that for every fixed x > 0, lim n→∞ f n (x) ↑ 1 holds. Inspecting f n (x)/x and f n (x) − x immediately yields: Corollary 1 The prophet inequalities corresponding to C(I, n) are In the latter case, Z = (Z 1 , . . . , Z n ) ∈ C(I, n) attains equality if the Z i are iid Bernoulli random variables such that U(Z) = EZ i = P (Z i = 1) = 1 − n −1/(n−1) .

Proof of Theorem 2:
Denote by e i the i-th canonical unit vector. First consider the random variable Z = (Z 1 , . . . , Z n ) having the distribution P (Z = e 1 ) = . . . = P (Z = e n ) = 1/n. A minimally informed person picks any of the random variables Z i , which is 1 with probability 1/n and obtains U(Z) = 1/n. Since there is always exactly one i such that Z i = 1, whereas all the other random variables are zero, To get a U(Z) ≥ 1/n, let P (Z = e 1 ) = x ≥ 1/n and distribute the remaining probability equally among the other canonical unit vectors, i.e. P (Z = e 2 ) = . . . = P (Z = e n ) = (1 − x)/(n − 1) ≤ x. Thus the minimally informed gambler may always pick the first random variable, giving him U(Z) = EZ 1 = x and for the same reasons as before E(max Z i ) = 1. Replacing e i by λe i where 0 ≤ λ ≤ 1 and i = 2, . . . , n does not change the value to the gambler, but the value to the prophet decreases towards x if λ ↓ 0.
In the case of equality choose any component (e.g. the first) where the maximum is attained.
By construction, at most one component of Y(ω) is larger than zero. Thus max In the previous line equality is achieved if all expected values agree. Defining the distribution of Z = (Z 1 , . . . , Z n ) via P (Z = e 1 ) = . . . = P (Z = e n ) = y ≤ x < 1/n and P (Z = 0) = 1 − ny immediately yields U(Z) = y and M(Z) = ny. Since y may assume any value in the interval [0, 1/n) we have shown that h n (x) = nx is the upper boundary function if x < 1/n. A similar construction as before shows that all points between (x, x) and (x, nx) belong to the prophet region. ♦ An immediate consequence of the last theorem is: Corollary 2 The prophet inequalities corresponding to C(G, n) are M(X)/U(X) ≤ n and M(X)−U(X) ≤ 1−1/n. In the latter case equality is attained by P (Z = e i ) = 1/n, (i = 1, . . . , n) where e i denotes the i-th canonical unit vector.
Remark. Although we focus on the prophet, other comparisons, in particular involving the statistician, would be interesting too. Comparing u and v for example, reveals the difference between prior information on the one hand and additional acquired information (sequential observations) on the other.

Applying information sets
In this section we restrict attention to classical prophet-statistician comparisons (v vs. m). However, the same kind of systematic analysis can be performed on any random environment and observers with different levels of information. An example will be given in the last section where we will compare u an m.

Some well-known results
To illustrate how information sets may be used, we first collect a number of wellknown results. To this end we introduce further random environments: C iid , the class of all sequences of iid, [0, 1]-valued random variables; C I , the class of all sequences of independent, [0, 1]-valued random variables; C G , the class of all sequences of [0, 1]valued random variables, and their corresponding counterparts with finite horizon i.e. C n iid , C n I = C(I, n) and Closely related are random variables X 1 , . . . , X n with "increasing bounds", i.e. a i ≤ X i ≤ b i and nondecreasing sequences (a i ) and (b i ). In both cases it suffices to study n = 2, i.e. X 1 = αY 1 , X 2 = Y 2 and X 1 = Y 1 , X 2 = βY 2 , respectively, where α, β ∈ [0, 1], and (Y 1 , Y 2 ) ∈ C 2 I . The following table collects a number of well-known "prophet" results, i.e. systematic comparisons of v and m (see Hill and Kertz (1983), Hill (1983), Kertz (1986), Boshuizen (1991), and Saint-Mont (1998)):

Random Environment Upper boundary function
In general, the difficult part consists in finding an upper boundary function, yet it is easy to show that all pairs (x, y) with x ≤ y < f C (x) belong to some prophet region. Moreover, prophet inequalities follow straightforwardly from prophet regions. As an example, look at R I : Since

Graphical comparisions
What can be learned from this upon comparing two gamblers with different information levels? For every fixed horizon n, we have R n iid ⊆ R n I ⊆ R n G . It also turns out that R n iid ⊂ R m iid and R n G ⊂ R m G whenever n < m. Thus the longer the horizon or the more general the environment, the better the outcome for the prophet (or the better informed person in general). On the other hand, restrictions of any kind, in particular the range of the random variables makes the corresponding prophet (or information) region smaller. For example, R 2 α and R β must be subsets of R I . The following illustration combines results achieved so far. Since for any environment R v,m C ⊆ R u,m C , we must have g 4 ≤ h 4 and f I ≤ f 4 . In the case n = 2 the functions f I and f 2 agree. This is no coincidence since X 1 ≡ x and x = P (X 2 = 1) = 1 − P (X 2 = 0) is the (standard) worst case scenario for the statistician, and x = P (X i = 1) = 1 − P (X i = 0) (i = 1, 2) is the worst case scenario for the minimally informed gambler considered above. In both scenarios their values agree (e.g. they may both choose the second random variable) giving the prophet a maximum advantage of x(1 − x).

The overall information difference
The diagonal 'y = x' collects all situations where the information edge of a better informed person does not result in a larger payoff. Thus, a degenerated prophet region indicates that given a stochastic environment the information lead of the prophet never pays off. Yet, the further some upper boundary function is away from the identical function, the larger the better informed gambler's overall advantage. A natural measure of this advantage is the area between these functions, i.e. the integral Given C I , the prophet's advantage is q I = 1 0 x (1 − x) dx = 1/6. In the discounted environment, after some algebra, we obtain .
In the "increasing bounds" environment, after a little algebra, we obtaiñ Note thatq(1) = q I = 1/6, and l'Hopital's rule gives lim α↓0q (α) = 0. Moreover,q(α) is a concave function. Illustration 2. α and β are shown on the x-axis. The functions on the unit interval from the top down are the constant q I = 1/6,q(α) and q(β). The vertical and the horizontal lines will be explained in Section 3.5.
However, given C(G, n), q I is augmented to .

Inverse problems
Given a stochastic environment C, and according to the above derivation, the standard interpretation of a prophet inequality, is to look for a value to the statistician x 0 = V (X), such that the difference f C (x) − x is maximized. In the same vein one may look for a value y 0 on the y-axis, where the difference between the upper boundary and the identical function is at its greatest point. In the independent case this amounts to inverting f I (x) = 2x−x 2 , which yields f −1 I (y) = 1− √ 1 − y. Maximizing y−(1− √ 1 − y) gives 1/4, which is obtained for y 0 = 3/4. Why do both perspectives agree with respect to the maximum difference? The reason is that the statement M(X) − V (X) ≤ 1/4 holds for all X ∈ C I , and thus is a property of the stochastic environment (and the two levels of information considered). The pair (1/2, 3/4) ∈ R I is a point in two-dimensional space, attained by certain extremal sequences X * . Thus, no matter how we choose to look at some region R C , the corresponding prophet inequalities must hold.
However, the analytic considerations involving the inverse of the upper boundary function may be quite different. In the discounted case, Otherwise, it is easily seen that f −1 β is a linear, strictly decreasing function of y, and f −1 β (1) = 1. The maximum of the function y − β(1 − 1 − y/β) occurs at the point y = 3β/4 and is β/4. Notice that Thus, 3β/4 < y(β) for all β > 0. Due to continuity of f −1 β , this yields β/4 as the overall maximum of the difference, always occuring at y = 3β/4. Traditionally, one would have said that the maximum difference of β/4 occurs at x = β/2.

Comparing stochastic environments
Switching stochastic environments amounts to a systematic comparison of the associated regions. In particular, if A is less general than B, we have R A ⊆ R B . Obviously, it suffices to consider the upper boundary functions f A , f B of the two environments involved. Traditionally, one would only determine sup x (f B (x) − f A (x)). However, the inverse problem sup y (f −1 A (y) − f −1 B (y)), and the area ) dx are also natural measures of discrepancy.
To illustrate the above, let us compare C I and C G : which has the explicit solution x 0 = −W 0 (−2/e 2 )/2 ≈ 0, 406376/2, where W 0 is the principal (upper) real branch of the Lambert W function (see Corless, Gonnet, Hare & Jeffrey 1996: 331). The point (x 0 , d(x 0 )) ≈ (0.2, 0.162) may be interpreted as follows: For every value x to the statistician, f I (x) is the best a prophet can obtain in the independent environment C I , and he can get arbitrary close to f G (x) if he is confronted with the general environment C G . Given x, the difference f G (x) − f I (x) reflects the additional gain (almost) obtainable to the prophet when moving from C I to C G , i.e. from the restricted to the more general situation. The additional sequences of random variables provide him with an additional reward of d(x) = x(x − ln x − 1), which is maximized if x = −W 0 (−2/e 2 )/2, yielding 0.162 as the additional payoff.
Third, starting with the prophet, the difference to be considered is δ(y) = f −1 I (y) − f −1 G (y) = 1− √ 1 − y−exp(1+W −1 (−y/e)). Thus, conditional on y, the statistician may (almost) lose this amount when the stochastic environment switches from independent to arbitrary sequences of random variables. Determining the value y 0 where δ(y) is at its greatest, means looking for a constellation where the loss occuring to the statistician is the most pronounced when moving from C I to C G . Now δ ′ (y) = 0 is equivalent to finding the unique root of the equation As a function of y, both the left hand side (L) and the right hand side (R) of the equation are twice differentiable. On the unit interval L(y) is convex, strictly increasing, L(0) = −2, and L(1) = 0. R(y) is concave, strictly increasing, lim y↓0 R(y) = −∞, and R(0) = 0. Numerically, this yields the solution (y 0 , δ(y 0 )) ≈ (0.70, 0.119). Thus, in the worst case, the statistician loses about 0.119, which is considerably less than the prophet can hope to obtain when the environment extends from C I to C G .
The next illustration summarizes these results: A different kind of analysis may be explicated using the regions C 2 α and C 2 β : Illustration 2 points out that restricting the range of the second random variable (β-discounting), always produces a smaller region than restricting the range of the first random variable by the same amount α. The largest difference between the size of the regions occurs if α = β ≈ 0.45 and is approximately 0.077. On the other hand suppose that the areas of R α and R β agree. This is tantamount to fixing a point on the y-axis. In this case the largest difference between the parameter values occurs if the area covered by each of the regions is about 1/8. There α ≈ 0.38 and β ≈ 0.83, thus the largest difference between the parameter values is approximately 0.452.
Of course, analyses along the same lines can be carried out for other regions, e.g., R I and R n iid , R n iid and R n+1 iid , R n G and R n+1 G , or R n G and R G .

Typical differences and ratios
Classical prophet inequalities are 'worst case' scenarios. They refer to the maximum advantage of the prophet over the statistician. Additionally, it is straightforward to ask for a 'typical' advantage, in particular a 'typical' difference or ratio. To do so, one would have to define a probability measure on some environment C. Since the classes of random variables considered are rather large, it is by no means clear how to do so in a natural way. However, starting with a stochastic environment and two distinguished levels of information, it is natural to consider uniform measure on the corresponding prophet region R C .
Given the independent environment, the size of R I is 1/6. Thus, we obtain as the typical difference between M(X) and V (X) Moreover, one may ask about the probability that a typical difference or ratio exceeds a certain bound. The ratio y/x = c ⇔ y = cx is a straight line through the origin, so, given C I , the question amounts to calculating where t = 2 − c ≥ 0 is determined by the equation cx = y = 2x − x 2 , and 1 ≤ c ≤ 2. Given C G , we obtain where t is determined by the equation cx = x − x ln x ⇔ t = exp(1 − c), and c ≥ 1.
In the case of the difference we are interested in the probability that it exceeds a certain bound d ≥ 0. Again, consider C I first. Since y −x = d ⇔ y = x+d, we have to calculate Here, 0 ≤ d ≤ 1/4, and s and t are determined by the roots of the equation x + d = 2x − x 2 in the unit interval, that is s = 1/2 − √ 1 − 4d/2 and t = 1/2 + √ 1 − 4d/2.
Finally, given C G , we obtain with 0 ≤ d ≤ 1/e where s and t are determined by the roots of the equation d = −x ln x in the unit interval. Some algebra is needed to get s = exp(W −1 (−d)) and t = exp(W 0 (−d)). The subsequent integration results in

A systematic study
In the following we are going to apply the 'program' outlined in the last section to u and m, using the independent and the general stochastic environments: The overall information difference. Let us first compute the areas of R u,m C(I,n) , Thus, their overall information distance is the size of the set R u,m C(G,n) \R u,m C(I,n) , Inverse Problems. In the independent case, f −1 n (y) = 1 − n √ 1 − y is the inverse function. The maximum of y − (1 − n √ 1 − y) is attained for y 0 = 1 − n −n/(n−1) and equals n −1/(n−1) − n −n/(n−1) . In the general case, the inverse function is h −1 n (y) = y/n. Thus, the maximum of y − y/n is attained at y 0 = 1, giving a maximum difference of 1 − 1/n.
Thus, the typical difference d I and ratio r I in the independent situation are d I = 2(n − 1)(n + 1) 3(n + 2)(2n + 1) → 1/3, r I = (n + 1)( For C n G , analogous integrations yield Thus, the typical difference d G and ratio r G in the general environment are d G = n − 1 3n → 1/3 , r G = n ln n n − 1 → ∞. Probabilities that a typical difference or ratio exceeds a certain bound. For 1 ≤ c ≤ n this amounts to calculating P (r I ≥ c) = 2(n + 1) where t is the unique root of the equation cx = 1 − (1 − x) n in the unit interval, and P (r G ≥ c) = 2n n − 1 Notice that lim n→∞ (n − c)/(c(n − 1)) = 1/c.
In the case of the difference, given C n I , and thus 0 ≤ d ≤ n −1/(n−1) − n −n/(n−1) , we calculate where the values of s and t (s < t) are determined by the roots of the equation x + d = 1 − (1 − x) n in the unit interval. Again, in general, s and t cannot be given explicitly. Finally, given C n G , we obtain with 0 ≤ d ≤ 1 − 1/n P (d G ≥ d) = 2n n − 1 1/n a/(n−1) In both cases the prophet regions R u,m C(I,n) and R u,m C(G,n) converge towards the upper triangle T = {(x, y)|0 ≤ x ≤ y ≤ 1} in the unit square. Thus, in the limit, the typical ratios and differences agree and can be computed directly via T , yielding the probabilities 1/c and (1 − d) 2 .