Thursday, October 11, 2012

A reply to Testing via credible sets

Last week I posted a manuscript on arXiv entitled On decision-theoretic justifications for Bayesian hypothesis testing through credible sets. A few days later, a discussion of it appeared on Xi'ans' Og. I've read papers and books by Christian Robert with great interest and have been a follower of his "Og" for quite some time, and so was honoured and excited when he chose to blog about my work. I posted a comment to his blog post, but for some reason or other it has not yet appeared on the site. I figured that I'd share my thoughts on his comments here on my own blog for the time being.

The main goal of the paper was to discuss decision-theoretic justifications for testing the point-null hypothesis Θ0={θ0} against the alternative Θ1={θ: θ≠θ0} using credible sets. In this test procedure, Θ0 is rejected if θis not in the credible set. This is not the standard solution to the problem, but certainly not uncommon (I list several examples in the introduction to the paper). Tests of composite hypotheses are also discussed.

Judging from his blog post, Xi'an is not exactly in love with the manuscript. (Hmph! What does he know about Bayesian decision theory anyway? It's not like he wrote the book on... oh, wait.) To some extent however, I think that his criticism is due to a misunderstanding.

Before we get to the misunderstanding though: Xi'an starts out by saying that he doesn't like point-null hypothesis testing, so the prior probability that he would like it was perhaps not that great. I'm not crazy about point-null hypotheses either, but the fact remains that they are used a lot in practice and that there are situations where they are very natural. Xi'an himself gives a few such examples in Section 5.2.4 of The Bayesian Choice, as do Berger and Delampady (1987).

What is not all that natural, however, is the standard Bayesian solution to point-null hypothesis testing. It requires a prior with a mass on θ0, which seems like a very artificial construct to me. Apart from leading to such complications as Lindley's paradox, it leads to very partial priors. Casella and Berger (1987, Section 4) give an example where the seemingly impartial prior probabilities P(θ0)=1/2 and P(Θ1)=1/2 actually yield a test with strong bias towards the null hypothesis. One therefore has to be extremely careful when applying the standard tests of point-null hypotheses, and carefully think about what the point-mass really means and how it affects the conclusions.

Tests based on credible sets, on the other hand, allows us to use a nice continuous prior for θ. It can, unlike the prior used in the standard solution, be non-informative. As for informative priors, it is often easier to construct a continuous prior based on expert opinion than it is to construct a mixed prior.

Theorem 2 of my paper presents a weighted 0-1-type loss function that leads to the acceptance region being the central (symmetric) credible interval. The prior distribution is assumed to be continuous, with no point-mass in θ0. The loss is constructed using directional conclusions, meaning that when θ0 is rejected, it is rejected in favour of either {θ: θ<θ0} or {θ: θ>θ0}, instead of simply being rejected in favour of {θ: θ≠θ0}. Indeed, this is how credible and confidence intervals are used in practice: if θis smaller than all values in the interval, then θis rejected and we conclude that θ>θ0. The theorem shows that tests based on central intervals can be viewed as a solution to the directional three-decision problem - a solution that does not require a point-mass for the null hypothesis. I therefore do not agree with Xi'an's comment that "[tests using credible sets] cannot bypass the introduction of a prior mass on Θ0". While a test traditionally only has one way to reject the null hypothesis, allowing two different directions in which Θcan be rejected seems perfectly reasonable for the point-null problem.

Regarding this test, Xi'an writes that it "essentially [is] a composition of two one-sided tests, [...], so even at this face-value level, I do not find the result that convincing". But any (?) two-sided test can be said to be a composition of two one-sided tests (and therefore implicitly includes a directional conclusion), so I'm not sure why he regards it as a reason to remain unconvinced about the validity of the result.

As for the misunderstanding, Theorem 3 of the paper deals with one-sided hypothesis tests. It was not meant as an attempt to solve the problem of testing point-null hypotheses, but rather to show how credible sets can be used to test composite hypotheses - as was Theorem 4. Xi'an's main criticism of the paper seems to be that the tests in Theorems 3 and 4 fail for point-null hypotheses, but they were never meant to be used for such hypotheses in the first place. After reading his comments, I realized that this might not have been perfectly clear in the first draft of the paper. In particular, the abstract seemed to imply that the paper only dealt with point-null hypotheses, which is not the case. In the submitted version (not yet uploaded to arXiv), I've tried to make the fact that both point-null and composite hypotheses are studied clearer.

There are certainly reasons to question the use of credible sets for testing, chief among them being that the evidence against Θis evaluated in a roundabout way. On the other hand, credible sets are reasonably easy to compute and tend to have favourable properties in frequentist analysis. It seems to me that a statistician that would like to use a method that is reasonable both in Bayesian and frequentist inference would want to consider tests based on credible sets.