## RESEARCH ARTICLE

# Consistency of the Semi-parametric MLE under the Cox Model with Right-Censored Data

**Qiqing Yu**

^{1, *}### Article Information

#### Identifiers and Pagination:

**Year:**2020

**Volume:**10

**First Page:**21

**Last Page:**27

**Publisher Id:**TOSPJ-10-21

**DOI:**10.2174/2666148902010010021

#### Article History:

**Received Date:**14/05/2020

**Revision Received Date:**14/08/2020

**Acceptance Date:**19/08/2020

**Electronic publication date:**23/10/2020

**Collection year:**2020

**© 2020 Qiqing Yu.**

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

## Abstract

### Objective:

We studied the consistency of the semi-parametric maximum likelihood estimator (SMLE) under the Cox regression model with right-censored (RC) data.

### Methods:

Consistency proofs of the MLE are often based on the Shannon-Kolmogorov inequality, which requires finite *E*(lnL), where L is the likelihood function.

### Results:

The results of this study show that one property of the semi-parametric MLE (SMLE) is established.

### Conclusion:

Under the Cox model with RC data, E(lnL) may not exist. We used the Kullback-Leibler information inequality in our proof.

**Keywords:**Cox model, Maximum likelihood estimator, Consistency, Kullback-Leibler Inequality, Shannon-Kolmogorov inequality, Without loss of generality (WLOG).

## 1. INTRODUCTION

We studied the consistency of the semi-parametric maximum likelihood estimator (SMLE) under the Cox model with right-censored (RC) data.

Let *Y* be a random survival time, **X** a *p*-dimensional random covariate. Conditional on **X** = **x**, Y satisfies the Cox model if its hazard function satisfies

(1.1) |

where *h _{o}* is the baseline hazard function,

*i.e.*,

*h*(

_{o}*y*) =

*f*(

_{o}*y*)

**/**

*S*(

_{o}*y*-),

*f*is a density function,

_{o}*S*(y) =

_{o}*S*(

*y*|0)

*P*(

*Y*>

*y*|

**X**=

**0**),

*F*= 1 -

_{o}*S*,

_{o}*τ*=

_{Y}*sup*{

*t*:

*S*(

_{Y}*t*) > 0},

*h*(

*y*|

**x**) = ,

*S*(·|·)

*f*(·|·)

**or**

*F*(·|·)) is the conditional survival function (density function (df) or cumulative distribution function (cdf)) of

*Y*given

**X**=

**x**. The restriction

*y<*

*τ*is not in the original definition of the PH model, but is necessary if

_{Y}*S*is discontinuous at

_{o}*τ*(see Remark 1 [1])

_{Y}## 2. METHODS

In this paper, we shall make use of the assumptions as follows:

**AS1**. Suppose that *C* is a random variable with the df *f _{C}* (

*t*) and the survival function

*S*(

_{C}*t*),

**X**takes at least

*p*+1 values, say

**0**,

**x**

_{1}, ...,

**x**

*, where*

_{p}**x**

_{1}, ...,

**x**

*are linearly independent, (*

_{p}*Y*,

**X**) and

*C*are independent. Let (

*Y*

_{1},

**X**

_{1},

*C*

_{1}), ..., (

*Y*,

_{n}**X**

*,*

_{n}*C*) be i.i.d. random vectors from (

_{n}*Y*,

**X**,

*C*).

*M*=

*min(Y,C)*and

*δ*=

**1**(

*Y*≤

*C*), where

**1**(

*A*) is the indicator function of the event

*A*. Let (

*M*

_{1},

*δ*

_{1}

**X**

_{1}), ..., (

*M*,

_{n}*δ*,

_{n}**X**

*) be i.i.d. RC observations from (*

_{n}*M*,

*δ*,

**X**) with the df are as follows:

(1.2) |

and *S*(*t***|x**) is a function of (*S _{o}*,

*β*) (see Eq. (1.1)), but not

*f*

**and**

_{x}*f*(the df’s of

_{C}**X**and

*C*).

Due to (AS1) and Eq. (1.2), the generalized likelihood function can be written as:

(1.3) |

which coincides with the standard form of the generalized likelihood [2]. Eq. (1.3) is identical to the next expression:

(1.4) |

where *η _{n}* = min{

**|**

*M*

_{i}*M*

_{j}**|**:

*M*≠

_{i}*M*,

_{j}*i*,

*j*{1,2, ...,

*n*}}. This form allows

*S*to be arbitrary (discrete or continuous, or others), thus is more convenient in the later proofs. If

_{o}*Y*is continuous then

*S*(

*t*

**|x**) = (

*S*(

*t*

**|**0))

^{exp(}

^{X'}

^{β}^{)}= (

*S*(

_{o}*t*))

^{exp(}

^{X'}

^{β}^{)}, but

(1.5) |

If Y is discrete then *S*(*t***|x**) = ∏* _{s≤t}*(1 -

*h*(

*s*|

**x**)) = ∏

*(1 - h*

_{s≤t}_{0 }(s)e

^{X'β}) If Y has a mixture distribution, then

*S*(

*t*

**|x**)=

*p*(

*S*

_{01}(

*t*))

^{exp(X'β}) + (1 -

*p*) ∏

*(1 - h*

_{s≤t}_{02}(s)e

^{X'β}where

*p*(0,1),

*h*

_{01}and

*h*

_{02}are two hazard functions.

*h*

_{0 }(

*t*) =

*ph*

_{01}+ (1 -

*p*)

*h*

_{02}and

*S*

_{0 }(

*t*) =

*pS*

_{01}+ (1 -

*p*)

*S*

_{02}

The SMLE of (S* _{o}*,

*β*) maximizes L (

*S*,

**b**) overall possible survival function

*S*and

**b**

**R**^{p}, denoted by (). The SMLE of

*S*(

*t*

**|x**) is denoted by (

*t*

**|x**), which is a function of (). The computation issue of the SMLE under the Cox model has been studied, but its consistency has not been established under the model [3]. Their simulation results suggest that the SMLE is more efficient than the partial likelihood estimator under the Cox model.

The partial likelihood estimator is a common estimator under the Cox model, which maximizes the partial likelihood: , where *D* is the collection of indices of the exact observations and **R*** _{i}* is the risk set {j:

*M*≥

_{j}*Y*}. The asymptotic properties of the estimator are well understood [4].

_{j}The consistency of the SMLE under the continuous Cox model with interval-censored (IC) data has been established, making use of the following result [5]:

**The Shannon-Kolmogorov (S-K) inequality**. *Let f _{o} and f be two densities with respect to (w.r.t.)* a measure

*μ*and ∫

*f*

_{0 }(t)ln

*f*

_{0 }(

*t*)

*dμ*(

*t*) is finite. Then, ∫

*f*

_{0 }(t)ln

*f*

_{0 }(

*t*)

*dμ*(

*t*) ≥ ∫

*f*

_{0 }(t)ln

*f*(

*t*)

*dμ*(

*t*), with equality

*iff f*=

*f*a.e. w.r.t.

_{o}*μ*.

Under the Cox model with IC data, the S-K inequality becomes *E* (ln*L*(*S _{o}*,

*β*))

*E*(ln

*L*(

*S*,

**b**)) (

*S*,

**b**), where

*L*(

*S*is a baseline survival function and

**b**

*R*^{p}. Their approach cannot be extended to the Cox model with RC data as the key assumption (in the S-K inequality) [3].

That is, finite *E* (lnL (*S _{o}*,

*β*)), may not hold. Indeed, if

*Y*has a df and

*β*= 0, then L

A related inequality is as follows.

**The Kullback-Leibler (K-L) information inequality**. *Let f _{o} and f be two densities w.r.t. a measure μ*.

*Then*∫

*f*

_{0 }(

*t*)ln (

*f*

_{0 }/

*f*)(

*t*)

*dμ*(

*t*) ≥ 0,

*with equality iff f*=

*f*

_{o}

*a.e. w.r.t. μ*.

The K-L inequality says that ∫ *f*_{0 }(*t*)ln (*f*_{0 }/*f*)(*t*)*dμ*(*t*) exists, though it maybe

In this note, we show that the SMLE under the Cox model is consistent, making use of the Kullback-Leibler information inequality [6]

**2. The Main Results.** Notice that under the assumption that *h _{o}* exists,

*S*,

_{o}*f*,

_{o}*F*and

_{o}*h*are equivalent, in the sense that given one of them, the other 3 functions can be derived. Thus, the Cox model is applicable only to the distributions that the density functions exist, that is,

_{o}*Y*is either continuous, or discrete, or the mixture of the previous two. Since the expression of

*S*(

*t*

**|x**) varies in these three cases, for simplicity, we only prove the consistency of the SMLE under the Cox model in the first two cases.

**Theorem 1**. *Under the Cox model with RC data, if Y is either continuous or discrete, and if**S _{o}* (τ

*) <1,*

_{M}*then the SMLE*() is consistent

*t*D (see Eq. (1.2)).

The proof of Theorem 1 makes use of a modified K-L inequality. K-L inequality requires that *f*_{0 } and *f* are both densities w.r.t. the measure *μ*. That is ∫ *f*(*t*)*dμ*(*t* = 1. However, in our case, we encounter the case that ∫ *f*(*t*)*dμ*(*t*) [0,1].

**Lemma 1** (the modified K-L inequality). If *f _{i}* ≥ 0,

*μ*

_{1}is a measure, ∫

*f*

_{1}(

*t*)

*dμ*

_{1}(

*t*= 1 and ∫

*f*

_{2}(

*t*)

*dμ*

_{1}(

*t*≤ 1, then ∫

*f*

_{1}(

*t*)ln

*dμ*

_{1}(

*t*) ≥ 0, with equality iff

*f*

_{1}=

*f*

_{2}a.e. w.r.t.

*μ*

_{1}.

**Proof.** In view of the K-L inequality, it suffices to prove the inequality ∫ *f*_{1}(*t*)ln *dμ*_{1}(*t*) ≥ 0 under the additional assumptions that ∫ *f*_{2}(*t*)*dμ*_{1}(*t* < 1, ∫ *f*_{1}(*t*)*dμ*_{2}(*t* = 0 and ∫ *f*_{2}(*t*)*dμ*(*t* < 1, where *μ*_{2} is a measure and *μ* = *μ*_{1} + *μ*_{2} Since ∫ *f*_{2}(*t*)*dμ*(*t*) = 1, *f*_{1} and *f*_{2} are df's w.r.t. *μ*.

**Proof of Theorem 1.** Let *Ω _{0 }* be the subset of the sample space

*Ω*such that the empirical distribution function (edf) , (

*t*,

*s*,

*) based on (*

**x***M*,

_{i}*δ*,

_{i}*) converges to*

**X**_{i}*F*(

*t,s*,

**x**), the cdf of (

*M*,

*δ*,

**X**). It is well-known that

*P*(Ω

*,) =1. Notice that the SMLE () is a function of (*

_{0}*ω, n*), say (

_{o,n}(

*t*)(

*ω*),

_{o,n}(

*t*)(

_{n}*ω*) , where

*ω*Ω and

*n*is the sample size. Hereafter, fix an

*ω*Ω

_{0 }, since (=

_{n}(

*ω*)) is a sequence of vectors in

*, there is a convergent subsequence with the limit*

**R**^{p}*β**, where the components of

*β*

_{*}can be

*S*(=

_{o}*S*

_{o}_{,}

*(*

_{n}*, there exists a further subsequence which is convergent. Without loss of generality (WLOG), we assume that*

_{o}*→*

_{o}*S*and →

_{*}*β*. Of course, (

_{*}*β*,

_{*}*S*) depends on

_{*}*ω*( Ω

_{0 }). We prove in Theorem 2 for the discrete case and in Theorem 3 for the continuous case that:

(2.1) |

Since *ω* can be arbitrary in Ω* _{0 }* and P(Ω

*) = 1, the SMLE is consistent.*

_{0 }Before we prove Theorems 2 and 3, we present a preliminary result.

**Lemma 2** (Proposition 17 in Royden (1968), page 231). *Suppose that**μ _{n} is a sequence of measures on the measurable space (J, ) such that μ_{n}(B) μ(B),*

*B*,

*g*and

_{n}*f*measurable functions, and (

_{n}are non-negative*f*)(

_{n}, g_{n}*x*) = (

*f, g*)(

*x*)

*Then*,

**Corollary 1.*** Suppose that μ _{n} is a sequence of measures on the measurable space (J , B) such that*

*μ*(

_{n}*B*) →

*μ*(

*B*),

*B*,

*f and*

*f*(

_{n}*n*≥ 1)

*are integrable functions that are bounded below and*

*f*(

*x*)

_{n→∞}=

*lim f*(

_{n}*x*).

*Then*∫

*f dμ*≤ lim

_{n→∞}∫

*f*.

_{n}dμ_{n}**Proof.** Let *k* = *inf _{n}* in

*f*

_{x}*f*(

_{n}*x*). If

*k*≥ 0 then the corollary follows from Lemma 2. Otherwise, let

*f*(

_{n}^{-}*x*) = 0 Λ

*f*(

_{n}*x*),

*f*(

_{n}^{+}*x*) = 0 v

*f*(

_{n}*x*),

*f*(

^{-}*x*) = 0 Λ

*f*(

*x*) and

*f*(

^{ +}*x*) = 0 v

*f*(

*x*). Then,

*f*→

_{n}^{+}*f*and

^{+}*f*→

_{n}^{-}*f*point wisely, as,

^{-}*f*→

_{n}*f*

lim_{n→∞} ∫ *f _{n} dμ_{n}* = lim

_{n→∞}∫ (

*f*+

_{n}^{+}*f*)

_{n}^{-}*dμ*= lim

_{n}_{n→∞}[∫

*f*+

_{n}^{+}dμ_{n}*f*] ≥ ∫ lim

_{n}^{-}dμ_{n}_{n→∞}

*f*+ ∫ lim

_{n}^{+}dμ_{n→∞}

*f*(by Lemma2, as

_{n}^{-}dμ*f*(

_{n}^{+}*x*) is nonnegative and |

*f*

^{-}(

*x*)| ≤

*k*) = ∫

*f*+ ∫

^{ +}dμ*f*= ∫ (

^{-}dμ*f*)

^{ +}+ f^{-}*dμ*= ∫

*f dμ*.

**Theorem 2.** Under the discrete Cox model with RC data, Eq. (2.1) holds.

**Proof.** For the given *ω* Ω_{0 } and (*S _{*}, β_{*}*) in the proof of Theorem 1, as assumed, () (

*ω*) → (

*S*). Defining

_{*}, β_{*}*h*(

_{*}*t*) = and

*h*(

_{*}*t*|

*) =*

**x***h*(

_{*}*t*)

^{eβ*'x}(for

*S*(

_{*}*t*-) > 0) yeilds

*S*(

_{*}*t*|

*) and*

**x***f*(

_{*}*t*|

*), which are continuous functions of*

**x***S*and

_{*}*β*. Consequently, (·|·) →

_{*}*S*(·|·).

_{*}Let *G _{n}*(

*S*) = lnL(

_{0 }, β*S*)/

_{0 }, β*n*(see Eq.(1.3)). Then, the SMLE () satisfies

(2.2) |

. |

where *B* is a measurable set in **R**^{p+1}. To apply Lemma 2,

(2.3) |

(2.4) |

(2.5) |

(2.6) |

(2.7) |

(2.8) |

(2.9) |

and *v _{n}* converges set wisely to a finite measure

*v*(see (2.9)), by a similar argument as in (2.4), (2.6), (2.7) and (2.8), we have:

(2.10) |

Thus, ∫ *ln**dF*(*t*, 0, **x**) + ∫ *ln**dF*(*t*, 1, **x**). Hence, (*S*_{0 }(*t*),*β*) = (*S*_{*}(*t*),*β*)*t*D by the 2nd statement of the K-L inequality.

**Theorem 3.***Under the Cox model with RC data, if Y is continuous then Eq. (2.1) holds*.

**Proof.** For the given *ω*Ω* _{0}* and (

*S*) in the proof of Theorem 1, as well as (

_{*},β_{*}*ω*) and (

*t*|

**x**)(

*ω*), we have

*S*(

_{*}*t*|

**x**) = (

*S*(

_{*}*t*))

^{exp(β*'x)}. By a similar argument as in proving Eq. (2.8), we can show:

(2.11) |

In view of Eq. (1.4) due to *Y* is continuous, we denote:

(2.12) |

(2.13) |

as *S _{*}* is a monotone function,

*S*exists a.e., and so do

_{*}^{'}*S*(

_{*}^{'}*t*|

**x**) and

*F*(

_{*}^{'}*t*|

**x**). We have

(2.14) |

The reason is as follows. For each (*t*, **x**) such that *F ^{'}*(

*t*|

**x**) > 0 and Eq. (2.13) holds,

*F _{*}^{'}*(

*t*|

**x**)

**/**

*F*(

^{'}*t*|

**x**) (=

*f*(

_{*}*t*|

**x**)

**/**

*f*(

*t*

**|x**)) is finite. Then, there exists

*n*such that

_{o}*G*(

*t*,

**x**,

*n*) < 1 +

*F*(

_{*}^{'}*t*|

**x**)

**/**

*F'*(

*t*

**|x**) for

*n*≥

*n*. On the other hand,

_{o}*G*(

*t*,

**x**,

*n*) is finite for

*n*=1, ...,

*n*. Thus,

_{o}*G*(

*t*,

**x**,

*n*) <

*k*for some

*k*. Since Eq. (2.1) holds a.e. and ∫ 1

*dF*(

*t, s*,

**x**) = 1, Eq. (2.14) holds.

We shall prove in Lemma 3 that

(2.15) |

. |

(2.16) |

. |

## 3. RESULTS

The last inequality further implies that ∫*ln**d F*(*t,0,***x**) + ∫*ln**d F*(*t,1,***x**) = 0. Thus, (*S*_{0 }(*t*),*β*) = (*S*_{*}(*t*),*β*_{*}) *t* D by the 2nd statement of the K-L inequality and by the assumption ASI.

**Lemma 3.** Inequality (2.15) holds.

Proof. Let *k* ≥ 1 and , where B i*s* a measurable set and

. |

## CONCLUSION

Since *H*((*S* (*t*-*η _{n}*|

**x**) - (

*S*(

*t*|

**x**))/(( (

*t*-

*η*|

_{n}**x**) - ( (

*t*|

**x**))) ≥ - 1/e and

*v*converges set wisely to a finite measure

_{n}*v*by a similar argument as in (2.4), (2.6), (2.7) and (2.8), we can show that:

. |

### CONSENT FOR PUBLICATION

Not applicable.

### AVAILABILITY OF DATA AND MATERIALS

Not applicable.

### FUNDING

None.

### CONFLICT OF INTEREST

The author declare no conflict of interest, financial or otherwise.

### ACKNOWLEDGEMENTS

The author would like to thank the editor and two referees for their invaluable comments.