A Type M error is an error of magnitude. But these are not estimates of a "true p-value", such a thing doesn't exist. "The p < .05 framework would ask me to report these as mixed support for the hypothesis.". Assuming that there is always some sort of synthesis with other evidence and prior beliefs, the reporting of the study abstract is unimportant. Given some description of the measurement apparatus you can then give a p value for this null. The survey included a demographics section where age, sex, ethnicity, and ZIP code were queried, as well as the country of birth of the participant and of the participant's parents. For example, null = normal(0, sd) which sd should we choose?". Young voters showed up in never-before-seen levels in 2018, with 36% of those who were eligible participating, according to the U.S. Census. The point of my paper with Loken is not about intent-to-treat or anything like that; rather, it's a general issue that when noise is added to a study (for example, from noncompliance), this increases standard errors and thus increases the sense in which a statistically significant estimate (or, in this case, a nearly statistically significant estimate) gives an overestimate of the magnitude of the effect size. which null hypothesis should we be concerned with? Any opinions in the examples do not represent the opinion of the Cambridge Dictionary editors or of Cambridge University Press or its licensors. The p-value is a property of the data (and the assumed sampling distribution if the null hypothesis were true). Or maybe I should ask: skeptical of what? If this was presented to a guideline committee as the only relevant study for the question it may well lead to a statement that vit D does not reduce risk and a recommendation against supplementation for cancer prevention. Therefore *even if* we were willing to use a p-value threshold in principle, we shouldn't get excited by a difference of p=0.04 vs p=0.06 because the estimate of p is just too noisy. Say I am testing the same theoretical point 4 times with 4 different data sources/measures: Study 1 p = .001 (boom! How do we only have an estimate of the p value? There's no question about "what the null should look like"; if you didn't know what the distribution of the test statistic was under the null, you'd have no test! Okay…but that has nothing to do with p-values. Medication studies generally report results on two sub-groups of recruited subjects, Intent-to-Treat and Per-Protocol. Jeff points us to a recent example, presented in this letter from Elizabeth Hatch, Lauren Wise, and Kenneth Rothman: I'm not so sure. In most experiments, you only know the sample data, not the population data. Little is known about the characteristics of areas in Idaho with high suicide rates. Census definition: A census is an official survey of the population of a country that is carried out in... from the abstract "..clear reductions were evident in the intervention arm for concussion incidence (RR=0.71, 0.48 to 1.05)", See:, Exploring this further, they claim to be using "magnitude based inference", which, as far as I can see, is only used within sports medicine, and seems to be a more permissive form of NHST – there is a commentary on the method with some responses here:, Statistical Modeling, Causal Inference, and Social Science, "P-hacking" and the intention-to-cheat effect