“Some equations are dangerous if you know them, and others are dangerous if you do not. The first category may pose danger because the secrets within its bounds open doors behind which lies terrible peril. The obvious winner in this is Einstein’s iconic equation $E=mc^2$, for it provides a measure of the enormous energy hidden within ordinary matter. […] Instead I am interested in equations that unleash their danger not when we know about them, but rather when we do not. Kept close at hand, these equations allow us to understand things clearly, but their absence leaves us dangerously ignorant.”

——Howard Wainer

# Moivre’s equation $$SE=\frac{\sigma}{\sqrt{ n }}$$ SE: standard error; $\sigma$: standard deviation; n: sample size eg: fewer students $\neq$ better eduacation, sometimes just **Greater SE**

casual_1

As Taleb puts it in his book, Fooled by Randomness:

Probability is not a mere computation of odds on the dice or more complicated variants; it is the acceptance of the lack of certainty in our knowledge and the development of methods for dealing with our ignorance.

Standard Error of Our Estimates

first calculate standard deviation:
$$\hat{\sigma}=\sqrt{ \frac{1}{N-1}\sum^N_{i=1}(x_{i}-\bar{x})^2 }$$
$$SE=\frac{\sigma}{\sqrt{ n }}$$

Confidence Intervals

To calculate the confidence interval, we use the central limit theorem. This theorem states that means of experiments are normally distributed. From statistical theory, we know that 95% of the mass of a normal distribution is between 2 standard deviations above and below the mean. Technically, 1.96, but 2 is close enough.
The Standard Error of the mean serves as our estimate of the distribution of the experiment means. So, if we multiply it by 2 and add and subtract it from the mean of one of our experiments, we will construct a 95% confidence interval for the true mean.
你提到的这个观点其实非常重要,关于信赖区间(Confidence Interval,简称CI)的解释确实有一些常见误解。在频率统计学中,信赖区间并不是直接描述“某个特定区间包含真实均值的概率”,而是描述了通过重复实验或样本抽取,使用相同的统计方法计算出的区间包含真实参数的频率。
具体来说,95%置信区间的解释是:如果你在同样的条件下进行很多次独立的实验,每次计算置信区间,那么有95%的置信区间会包含真实的总体参数(例如,均值)。但这并不意味着某一个特定的区间有95%的概率包含真实值,因为真实值要么在这个区间内,要么不在。
为了更清晰地理解,可以举个例子:
假设我们进行100次相同的实验,每次都计算一个95%的置信区间。假如有95次的区间包含了真实的总体均值,而5次不包含,那么我们可以说:我们的统计方法是可靠的,95%的信赖区间会包含真实的均值。但对于单个实验的结果,我们不能说“这个区间有95%的概率包含真实均值”。
简而言之,CI是描述统计方法的长期表现,而不是对某个具体实验结果的概率评估。

Hypothesis Testing

$$\mathcal{N}(\mu_{1}, \sigma_{1}^2)+\mathcal N(\mu_{2}, \sigma_{2}^2)=\mathcal{N}(\mu_{1}+\mu_{2}, \sigma^2_{1}+\sigma_{2}^2)$$
$$\mathcal{N}(\mu_{1}, \sigma_{1}^2)-\mathcal N(\mu_{2}, \sigma_{2}^2)=\mathcal{N}(\mu_{1}-\mu_{2}, \sigma^2_{1}+\sigma_{2}^2)$$
And the same for SE
$$\begin{aligned}
&\mu_{diff} = \mu_{1} - \mu_{2}
\\
&SE_{diff} = \sqrt{ SE_{1}^2+ SE_{2}^2 }=\sqrt{ \frac{\sigma_{1}^2}{N_{1}} + \frac{\sigma_{2}^2}{N_{2}} }
\end{aligned}$$

z-statistic

$$\begin{aligned} z &=\frac{{\mu_{diff}-H_{0}}}{SE}\\\\ &=\frac{{\mu_{1}-\mu_{2}-H_{0}}}{\sqrt{ \frac{\sigma_{1}^2}{N_{1}} + \frac{\sigma_{2}^2}{N_{2}} }} \end{aligned} $$

The z statistic is a measure of how extreme the observed difference is. We will use contradiction to test our hypothesis that the difference in the means is statistically different from zero. We will assume that the opposite is true; we will assume that the difference is zero. This is called a null hypothesis, or $H_{0}$ .
Under $H_{0}$, the z statistic follows a standard normal distribution. So, if the difference is indeed zero, we would see the z statistic within 2 standard deviations of the mean 95% of the time. The direct consequence is that if z falls above or below 2 standard deviations, we can reject the null hypothesis with 95% confidence.
To specify, $H_{0}$ stands for $\mu_{1}-\mu_{2}$, which is always 0
z-statistic(z统计量)在假设检验中通常用于检验两个样本均值之间的差异,尤其是在满足某些条件下,比如样本量较大(通常n > 30)或者总体标准差已知的情况下。z检验常见的应用有:

  1. 单样本z检验:用于检验一个样本的均值是否与已知的总体均值有显著差异。
    • 例如,假设你想检验某个工厂生产的产品的平均重量是否等于标称值。
  2. 两样本z检验:用于检验两个独立样本的均值是否有显著差异。
    • 例如,你想比较两个不同工厂生产的产品的平均重量是否相同。
  3. z检验的条件:通常要求已知总体标准差,或者样本量足够大以使得样本标准差可以较好地估计总体标准差。这个条件也可以通过中心极限定理来理解,即大样本的分布趋近于正态分布。

P-value

the p-value is the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct
P值是假设零假设成立的前提下,获取至少与实验结果一样极端的数据的概率
It measures how unlikely it is that you are seeing a measurement if the null hypothesis is true. Naturally, this often gets confused with the probability of the null hypothesis being true. Note the difference here. The p-value is NOT $P(H_{0}|data)$, but rather $P(data|H_{0})$.
也就是说,P-value不是说给定数据的前提下,$H_{0}$成立的概率,而是假设$H_{0}$成立的前提下,获得该数据的概率
casual_2
以上为单边检验,此外还有双边检验,双边检验关注参数是否偏离某个值,无论偏离的方向是增大还是减小。双边检验相较于单边更加难以显著,这是因为双边同时需要考虑两边的偏差,而单边只需要考虑一边的偏差,也就意味着单边其实已经有了一个前提,也就是我们已知偏差会位于左侧(或者右侧),正是这个已知的事实减少了样本空间的大小,让概率升高。

双边检验的必要性

a. 无法预知偏离的方向时

  • 如果没有足够的理论依据或先验知识来预测偏差的方向,双边检验是唯一合理的选择。在这种情况下,使用单边检验可能导致忽视另一个方向上的潜在效应。

b. 安全性和准确性考虑

  • 在医学、工程和其他关键领域,错误地选择单边检验可能会导致过度自信或误判。尤其在某些实验中,我们无法确保效应的方向,如果单边检验的假设方向错误,可能会导致严重的后果。例如,药物效果的测试,如果只假设药物会增强效果而忽视了副作用,可能会导致对副作用的忽视。

c. 多假设检验中的一致性

  • 在某些复杂的多假设检验场景中,使用双边检验可以保持检验的一致性。例如,分析基因表达时,如果我们事先不知道某个基因是否会上调或下调,双边检验可以确保我们对上调和下调的效应都进行充分检验。

Thanks for watching! and this my learning note of the blog of Matheus Facure Alves.
感谢观看,这是我学习Matheus Facure Alves博客的笔记。
Cover image icon by Dewi Sari from Flaticon