Thursday, October 11, 2012

A reply to Testing via credible sets

Last week I posted a manuscript on arXiv entitled On decision-theoretic justifications for Bayesian hypothesis testing through credible sets. A few days later, a discussion of it appeared on Xi'ans' Og. I've read papers and books by Christian Robert with great interest and have been a follower of his "Og" for quite some time, and so was honoured and excited when he chose to blog about my work. I posted a comment to his blog post, but for some reason or other it has not yet appeared on the site. I figured that I'd share my thoughts on his comments here on my own blog for the time being.

The main goal of the paper was to discuss decision-theoretic justifications for testing the point-null hypothesis Θ0={θ0} against the alternative Θ1={θ: θ≠θ0} using credible sets. In this test procedure, Θ0 is rejected if θis not in the credible set. This is not the standard solution to the problem, but certainly not uncommon (I list several examples in the introduction to the paper). Tests of composite hypotheses are also discussed.

Judging from his blog post, Xi'an is not exactly in love with the manuscript. (Hmph! What does he know about Bayesian decision theory anyway? It's not like he wrote the book on... oh, wait.) To some extent however, I think that his criticism is due to a misunderstanding.

Before we get to the misunderstanding though: Xi'an starts out by saying that he doesn't like point-null hypothesis testing, so the prior probability that he would like it was perhaps not that great. I'm not crazy about point-null hypotheses either, but the fact remains that they are used a lot in practice and that there are situations where they are very natural. Xi'an himself gives a few such examples in Section 5.2.4 of The Bayesian Choice, as do Berger and Delampady (1987).

What is not all that natural, however, is the standard Bayesian solution to point-null hypothesis testing. It requires a prior with a mass on θ0, which seems like a very artificial construct to me. Apart from leading to such complications as Lindley's paradox, it leads to very partial priors. Casella and Berger (1987, Section 4) give an example where the seemingly impartial prior probabilities P(θ0)=1/2 and P(Θ1)=1/2 actually yield a test with strong bias towards the null hypothesis. One therefore has to be extremely careful when applying the standard tests of point-null hypotheses, and carefully think about what the point-mass really means and how it affects the conclusions.

Tests based on credible sets, on the other hand, allows us to use a nice continuous prior for θ. It can, unlike the prior used in the standard solution, be non-informative. As for informative priors, it is often easier to construct a continuous prior based on expert opinion than it is to construct a mixed prior.

Theorem 2 of my paper presents a weighted 0-1-type loss function that leads to the acceptance region being the central (symmetric) credible interval. The prior distribution is assumed to be continuous, with no point-mass in θ0. The loss is constructed using directional conclusions, meaning that when θ0 is rejected, it is rejected in favour of either {θ: θ<θ0} or {θ: θ>θ0}, instead of simply being rejected in favour of {θ: θ≠θ0}. Indeed, this is how credible and confidence intervals are used in practice: if θis smaller than all values in the interval, then θis rejected and we conclude that θ>θ0. The theorem shows that tests based on central intervals can be viewed as a solution to the directional three-decision problem - a solution that does not require a point-mass for the null hypothesis. I therefore do not agree with Xi'an's comment that "[tests using credible sets] cannot bypass the introduction of a prior mass on Θ0". While a test traditionally only has one way to reject the null hypothesis, allowing two different directions in which Θcan be rejected seems perfectly reasonable for the point-null problem.

Regarding this test, Xi'an writes that it "essentially [is] a composition of two one-sided tests, [...], so even at this face-value level, I do not find the result that convincing". But any (?) two-sided test can be said to be a composition of two one-sided tests (and therefore implicitly includes a directional conclusion), so I'm not sure why he regards it as a reason to remain unconvinced about the validity of the result.

As for the misunderstanding, Theorem 3 of the paper deals with one-sided hypothesis tests. It was not meant as an attempt to solve the problem of testing point-null hypotheses, but rather to show how credible sets can be used to test composite hypotheses - as was Theorem 4. Xi'an's main criticism of the paper seems to be that the tests in Theorems 3 and 4 fail for point-null hypotheses, but they were never meant to be used for such hypotheses in the first place. After reading his comments, I realized that this might not have been perfectly clear in the first draft of the paper. In particular, the abstract seemed to imply that the paper only dealt with point-null hypotheses, which is not the case. In the submitted version (not yet uploaded to arXiv), I've tried to make the fact that both point-null and composite hypotheses are studied clearer.

There are certainly reasons to question the use of credible sets for testing, chief among them being that the evidence against Θis evaluated in a roundabout way. On the other hand, credible sets are reasonably easy to compute and tend to have favourable properties in frequentist analysis. It seems to me that a statistician that would like to use a method that is reasonable both in Bayesian and frequentist inference would want to consider tests based on credible sets.

Wednesday, July 25, 2012

Online resources for statisticians

My students often look up statistical methods on Wikipedia. Sometimes they admit this with a hint of embarrassment in their voices. They are right to be cautious when using Wikipedia (not all pages are well-written) and I'm therefore pleased when they ask me if there are other good online resources for statisticians.

I usually tell them that Wikipedia actually is very useful, especially for looking up properties of various distributions, such as density functions, moments and relationships between distributions. I wouldn't cite the Wikipedia page on, say, the beta distribution in a paper, but if I need to check what the mode of said distribution is, it is the first place that I look. While not as exhaustive as the classic Johnson & Kotz books, the Wikipedia pages on distributions tend to contain quite a bit of surprisingly accurate information. That being said, there are misprints to be found, just as with any textbook (the difference being that you can fix those misprints - I've done so myself on a few occasions).

Another often-linked online resource is Wolfram MathWorld. While I've used it in the past when looking up topics in mathematics, I'm more than a little reluctant to use it after I happened to stumble upon their description of significance tests:

A test for determining the probability that a given result could not have occurred by chance (its significance).

...which is a gross misinterpretation of hypothesis testing and p-values (a topic which I've treated before on this blog).

The one resource that I really recommend though is Cross Validated, a questions-and-answers site for all things statistics. There are some real gems among the best questions and answers, that make worthwhile reading for any statistician. It is also the place to go if you have a statistics question that you are unable to find the answer to, regardless of whether its about how to use the t-test or about the finer aspects of LeCam theory. I strongly encourage all statisticians to add a visit to Cross Validated to their daily schedules. Putting my time where my mouth is, I've been actively participating there myself for the last few months.

Finally, Google and Google Scholar are the statistician's best friends. They are extremely useful for finding articles, lecture notes and anything else that has ended up online. It's surprising how often the answer to a question that someone asks you is "let me google that for you".

For questions on R or more mathematical topics, Stack Overflow and the Mathematics Stack Exchange site are the equivalents of Cross Validated.

My German colleagues keep insisting that German Wikipedia is far superior when it comes to statistics. While I can read German fairly well (in a fit of mathematical pretentiousness I once forced myself to read Kolmogorov's Grundbegriffe), I still haven't gathered my guts to venture beyond the English version.

Friday, April 27, 2012

Speeding up R computations Pt II: compiling

A year ago I wrote a post on speeding up R computations. Some of the tips that I mentioned then have since been made redundant by a single package: compiler. Forget about worrying about curly brackets and whether to write 3*3 or 3^2compiler solves those problems for you.

compiler provides a byte code compiler for R, which offers the possibility of compiling your functions before executing them. Among other things, this speeds up those embarrassingly slow for loops that you've been using:

> myFunction<-function() { for(i in 1:10000000) { 1*(1+1) } }
> library(compiler)
> myCompiledFunction <- cmpfun(myFunction) # Compiled function

> system.time( myFunction() )
   user  system elapsed 
 10.002   0.014  10.021 
> system.time( myCompiledFunction() )
   user  system elapsed 
  0.692   0.008   0.700 

That's 14 times faster!

Functions written in C (imported using Rcpp) are still much faster than the compiled byte code, but for those of us who

  • don't know C,
  • know C but prefer to write code in R,
  • know C but don't want to rewrite functions that we've already written in R,

compiler offers a great way of speeding up computations. It's included in the recommended R packages since R 2.13 (meaning that it comes with your basic R installation) and since R 2.14 most standard functions are already compiled. If you still are running an older version of R it's definitely time to update.