Debate between N. Matloff and F. Harrell on Bayesian vs. frequentist Analysis X.com, end of March 2025
Here I have tried to piece together a rather long thread with many branches. I think it is mostly complete, though an exchange between Frank and me regarding the effect of sampling design is missing; I simply can't find it. If someone can find it, please send it to me.
Frank Harrell @f2harrell Mar 27 I improved my second most popular blog article: https://fharrell.com/post/journey #Statistics #bayes fharrell.com My Journey from Frequentist to Bayesian Statistics – Statistical Thinking This is the story of what influenced me to become a Bayesian statistician after being trained as a classical frequentist statistician, and practicing only that mode of statistics for many years.
Norm Matloff 一啲都唔明 @matloff Mar 27 Eloquently stated, but IMO still unnecessarily and misleadingly based on hypothesis testing, a straw man.
Judea Pearl @yudapearl Echoing @f2harrell's journey from a frequentist to a Bayesian statistician, I am re-posting my journey from a Bayesian to a "Half Bayesian" AI researcher: https://ucla.in/2nZN7IH Readers who have seen it before can just skip, and those who have not, can take this re-posting as a confirmation that I still stand behind every word of my 2001 confession.
Norm Matloff 一啲都唔明 @matloff Even more eloquently written than @f2harrell's, but still missing the point IMO. The key word in your quote of Savage in the opening statement is "know." If the word is literally true, than none of us frequentists would object (nor would we consider it non-frequentist). Empirical Bayes is fine, for instance.
The problem occurs when "know" is replaced by "feel," in which case the analysis becomes subjective and possibly very biased. One can use the data to check frequentist assumptions but one cannot check feelings.
Feelings should have no place in scientific research, especially medical research, which is a public good. If someone wants to incorporate their feelings into their own private analysis of the stock market, say, good for them.
Judea Pearl @yudapearl All scientific knowledge is based on some assumptions. Are you proposing to purge science from Bayes analysis?
Norm Matloff 一啲都唔明 @matloff Mar 27 Please reread my point about checking assumptions.
Judea Pearl @yudapearl Mar 27 By "assumptions" I meant "untestable assumptions". Can't find your point about it, please enlighten.
Norm Matloff 一啲都唔明 @matloff Mar 27 I had thought you meant something like, "We assume this variable's distribution to be approximately normal," which IS verifiable. What untestable assumptions do you have in mind?
Frank Harrell @f2harrell Mar 28 No that is not verifiable. The sample size needed to ”know” that is surprisingly large. And binary decisions about normality disturbs frequentist operating characteristics.
Norm Matloff 一啲都唔明 @matloff Mar 28 First, a point of clarification: When I said "a variable," I meant a variable, say height or Hmg, not an estimator. One does not need a large n to look at that. OTOH, it really doesn't play much role in what I do.
Now concerning "large n" estimators, I agree with you. Actually, very early in my career, I did extensive simulations to get an idea as to "How large is 'large'?" in various settings, and encourage every early-career statistician to do this.
I don't know what you mean by "frequentist operating characteristics." Please explain.
Norm Matloff 一啲都唔明 @matloff Mar 27 I think the role of likelihood is overemphasized in practical terms, but I consider it part of the frequentist realm, which is the realm I consider the proper one.
Frank Harrell @f2harrell Mar 28 That is incorrect. The likelihood is at the heart of both likelihoodist inference and Bayes. And you continue to ignore severe defects with frequentism that I cataloged in my article.
Norm Matloff 一啲都唔明 @matloff Mar 28 Sorry, by context I was only referring to the role of likelihood in the frequentist realm, not the Bayesian one.
There is the Birnbaum Principle, a very philosophical matter at the heart of likelihood. It says that likelihood captures all we need to know in order to perform stat analysis. I strongly disagree with that, because "All models are wrong but some are useful." :-) To place so much emphasis on a quantity known a priori to be wrong seems quite unfounded to me.
For that reason, I use MLE only as a last resort. I prefer the Method of Moments when it is feasible, as more intuitive, especially given the fragility of the likelihood, which I hold is worse than for the moments. But as I've written here recently, estimation of moments higher than 2 requires a very large n. So I mainly rely on asymptotic distributions, though again there is the "large n" issue that you point out.
Frank Harrell @f2harrell Mar 28 Ah the old illusion of objectivity of frequentist methods. Never has been true.
Juan Carlos Silva @chobitoso Mar 27 I found these discussions somewhat repetitive. Even some frequentist algorithms and interpretations rely on Bayesian analysis. For instance, diagnostic testing is a pure Bayesian exercise. They are not completely independent perspectives.
Juan Carlos Silva @chobitoso Mar 27 In epidemiology and other fields, you would classify a patient with a test that has two properties sensitivity and specificity. However these properties are affected by a third variable, the prevalence of the condition. Thus, you need to update the tests results even if you…
Norm Matloff 一啲都唔明 @matloff That's entirely frequentist. It's in Fisher LDA for instance, and it is part of the intercept term in the logit model. It is thus part of the process of estimating from the data, again thoroughly, classically frequentist.
Now if the mix in the data is not representative of the population, one can then do "what if" analysis, looking at various cases. That is frequentist as long as one does not model the disease prevalence by a distribution, which becomes subjective.
Frank Harrell @f2harrell Mar 28 This is a bit of an oversimplification, but I don't love Bayes because it's perfect. I love Bayes because of (1) its problem-solving capabilities and (2) frequentist approaches are deeply flawed.
Norm Matloff 一啲都唔明 @matloff Mar 28 OK, I've gone through your article again, and really don't see anything new or view-changing. 90% of it is on the problems with hypothesis testing, which as you know I oppose.
(You probably don't remember this, but the very first interaction I had with you was when in some online forum you said that R should not be reporting the "1 star, 2 stars, 3 stars" p-values paradigm, and I said something like "YES!! Finally glad to see someone other than me say it.")
A couple of your statements did bring up very important issues (maybe these are among the "new" ones). First, "I came to not believe in the possibility of infinitely many repetitions of identical experiments, as required to be envisioned in the frequentist paradigm." I disagree; on the contrary, we all go through life implicitly making decisions on this basis, whether we are buying insurance, gambling in a casino or whatever. Ask any gambler (I'm not one) what it means that the probability of a payoff is 0.22, and after some thought he/she will indeed talk in terms of repeated trials.
Ironically, I hope you will recall that I've often said, "We are all Bayesians." We do have our priors in the numerous decision-making contexts we encounter in life, and informally act on those priors. Again, this is fine with me -- when I said "We all" above, I meant it -- but priors should have no place in scientific research, due to the subjectivity.
The other statement in your article that caught my eye concerned frequentists' need "to completely specify the experimental design, sampling scheme, and data generating process..." I would agree, providing one says "to contemplate the implications of" rather than "to completely specify." In fact, there was an example of this earlier in this thread, when we were discussing the overall prevalence of a disease or condition. I said that we must look at the nature of the sampling process, asking whether it is representative of the target population. I then said we might engage in "What if" questions along these lines.
Arman Oganisian @StableMarkets Mar 28 How would you answer a doc who wants the probability that the patient in their clinic now will relapse w/n 1 yr? not “the proportion of patients drawn from an infinite subpopulation of patients with the same features as that patient” but the probability for that specific patient.
Norm Matloff 一啲都唔明 @matloff Mar 28 The doc should first say "x% of people like you have a relapse." But a good doctor should add, "Of course, there are many factors underlying this, some of which we know and some we don't know, so it's possible that one of the latter factors makes your case very different."
Arman Oganisian @StableMarkets Mar 28 That is an entirely valid frequentist answer - and actually matches the second sentence in my post. Unfortunately, it also entirely fails to answer the question the physician asks!
Norm Matloff 一啲都唔明 @matloff Mar 28 Why is there any difference? Doctor, patient
Arman Oganisian @StableMarkets Mar 29 What I mean is the doctor didn’t ask for the rate in the subpopulation of patients who share certain features with patient i. They asked for the probability that that particular patient i would relapse. The valid frequentist response doesn’t answer this.
Arman Oganisian @StableMarkets Mar 29 From a Bayes perspective this is a well-defined request for the posterior predictive probability of relapse Y_i given data on previous i-1 subjects the doc has seen: P(Y_i =1 | y_1:i-1 ). It’s not equivocating - just viewing probability as more than just sampling error.
Arman Oganisian @StableMarkets There are (at least?) two valid frequentist responses. 1) Norm’s response, which answers a different question. 2) saying that the probability of relapse is either 100% or 0% and we won’t know for 1 yr - see Blackwell. Going with 2) would ensure no doc works with me ever again!
Norm Matloff 一啲都唔明 @matloff Mar 29 Comments:
1. Putting such questions to mathematicians, e.g. Blackwell, is generally counterproductive. They haven't had much if any experience working on real-life problems with real-life data, and tend to appreciate the mathematical eloquence of Bayesian methods. Same BTW for philosophers, specifically meaning @learnfromerror.
2. As a patient, I want clinicians to give me reasonable assessments. Even if they give me a Bayesian answer, it should mention the role of unknown factors in the studies etc. Earlier in this thread, I talked of "ethical, responsible" physicians, and the same goes for statisticians. To say, "I'll give them an overly simplistic answer, to shut them up," is unethical and irresponsible.
3. It's wrong to say "The probability is either 1 or 0." My usual explanation is to refer to the Monty Hall game show example. (A couple of you brough in game theory; well, this is my game theory. :-) )
Dylan Armbruster @dylanarmbruste3 Mar 27 Would you say you lean more towards Frequentist?
Norm Matloff 一啲都唔明 @matloff Mar 27 Not lean towards. 100% for anything that affects the public.
Dylan Armbruster @dylanarmbruste3 Mar 27 What do you mean. How would you classify yourself then?
Norm Matloff 一啲都唔明 @matloff Mar 27 1. I personally do not use (subjective) Bayesian methods. 2. If someone wants to use such methods for their personal, private use, that's fine with me. 3. But (subjective) Bayesian methods should not be used in settings affecting the public, e.g. medical research.
Frank Harrell @f2harrell Mar 28 If you think that Bayesian methods are more subjective than frequentist ones you have not deeply examined frequentist methods.
Norm Matloff 一啲都唔明 @matloff Mar 28 I think this is at least the third time you and I have debated freq. vs. Bayes here on Twitter/X. I've looked deeply at whatever you've brought up, but you now say I've overlooked some points in your newly-expanded article. I will definitely take a look.
Frank Harrell @f2harrell Mar 28 Thanks for all those comments Norm. I have to take issue with your belief that life decisions are made like sampling statisticians think or that gamblers work like that. A resounding "no" to those two. Life and gambling are all about playing the odds in one-time situations.
Norm Matloff 一啲都唔明 @matloff Actually, after I posted this tweet, I put the question to several LLMs, asking how people perceive that 0.22 figure for winning a game. All of them cited repeated trials as one of the ways that number is perceived. (The other ways they cited didn't really pertain to the question, for example noting that many people tend to underestimate probabilities.)
One can dismiss responses from LLMs -- I didn't ask the LLMs whether they are frequentist or Bayesian :-) -- but still I think their responses here are worth considering, don't you agree?
The fact that people are making one-time decisions is not really relevant. People make many many one-time decisions of various kinds over their lifespans, and in principle many of those will involve probabilities of 0.22 or close to it. So there is in fact a long run to consider, even if it consists of a series of one-time settings.
I am certainly not saying that the gambler in this scenario will say, "Hmm, if I play this game many times...” but if you actually ask him/her what the 22% figure means, they will give it some thought and then give some sort of answer based on repeated trials.
I think the situation becomes clearer if one asks the same question about expected value, E(X). Putting aside the point that it is misnamed, how is it perceived by people? Of course, most people have never heard the term, but if you explain that it is a long run average based on the probabilities of the events in question, they will immediately understand, and accept that it is a useful quantity to have, even though you bring it up in the context of one-time decision. I would humbly submit that explaining expected value in a Bayesian context is a bit of a challenge. If you have experience with this, I would certainly like to hear it.
Frank Harrell @f2harrell Mar 28 No, I don't think that idea of probability in that setting is worth considering. A successful poker player is good at estimating probability of ultimately winning the hand in the one-time situation (one-time because it's not only the cards; it's also the players).
Norm Matloff 一啲都唔明 @matloff All true (in the second sentence), but still not addressing the issue of how that player is assessing/interpreting the 0.22 probability.
As I said, the one-time nature is irrelevant, as this player, can and I claim does view this in the context of a lifetime of poker with 0.22 probability situations or a lifetime of 0.22 probability events of various kinds (games and nongames).
Note the word situations; this player may play with different people, be dealt different hands and so on, but will encounter various situations which, though basically different in lots of ways, still have probabity 0.22 (or near it). In other words, there are repeated trials with the same probability even though the trials are qualitative different. It's like tossing many different but fair coins; we still have probability 0.5 in each toss, even though it's with a different coin each time.
Biostatsfun @biostatsfun What are you talking about? You can check the impact of different priors. Wouldn’t describe Bayesian as “feeling” either. In terms of prior, it can be proposed by previous data or based on plausibility.
Norm Matloff 一啲都唔明 @matloff Mar 27 Looking at the "impact" is a world of difference away from checking validity. If there really is previous data to justify a prior, then the analysis becomes frequentist, as I said. If it's just based on "plausiblity," we are back to feelings, aren't we?
Biostatsfun @biostatsfun Mar 27 But you check prior sensitivity, what’s the issue?
How does justifying a prior = frequentist?
No, I wouldn’t equate plausibility with feelings. On feelings, I’d trust a physicians “feelings” over my own.
. Norm Matloff 一啲都唔明 @matloff Fine, but the problem is that one doctor's feelings will be different from another's. That's why it's not scientific.
As to checking sensitivity, think of the implication: If one gets essentially the same results with different priors, one is basically back to the frequentist realm. If one gets different results with different priors, then which one is correct? It's all superstition.
Biostatsfun @biostatsfun Mar 27 I don’t think it’s problem at all for experts to disagree and it’s actually great to see the range of answers given the diversity of opinion. That actually reflects science.
And No… a posterior distribution has a diff interpretation than a p-value.
Norm Matloff 一啲都唔明 @matloff Mar 27 I think it's great to get the opinions of several experts and present them. But that is not what is done in the medical literature, as you know.
Re "a posterior distribution has a diff interpretation than a p-value," is this statement directed at me? If so, you need to go back and read my tweets in this thread. I never said anything like this.