MAS291 - Công thức Thống kê
1. Phân phối (Distributions)
Loại 1: Uniform Distribution
1. Rời rạc (Discrete)
Probability Mass Function (PMF):
\[ P(X = a) = \frac{1}{n} \quad (\text{a bất kỳ}) \]Mean:
\[ \text{Mean} = E(X) = \mu = \frac{a+b}{2} \]Variance:
\[ \text{Variance} = V(X) = \sigma^2 = \frac{(b - a + 1)^2 - 1}{12} \]2. Liên tục (Continuous)
Probability Density Function (PDF):
\[ f(x) = \frac{1}{b-a} \quad \text{for } a \le x \le b \]Mean:
\[ \text{Mean} = E(X) = \mu = \frac{a+b}{2} \]Variance:
\[ \text{Variance} = V(X) = \sigma^2 = \frac{(b - a)^2}{12} \]Loại 2: Exponential Distribution (Liên tục)
Ghi nhớ (Key relation):
\[ \mu = \sigma = \frac{1}{\lambda} \]Probability Density Function (PDF):
\[ f(x) = \lambda e^{-\lambda x}, \quad x \ge 0 \]Cumulative Distribution Function (CDF):
\[ F(x) = P(X \le x) = 1 - e^{-\lambda x}, \quad x \ge 0 \]Loại 3: Binomial, Poisson, Normal Distribution
1. Binomial (Rời rạc)
Ghi nhớ (Mean & Std Dev):
\[ \mu = np \] \[ \sigma = \sqrt{np(1-p)} \]2. Poisson (Rời rạc)
Ghi nhớ (Mean & Std Dev):
\[ \mu = \sigma = \lambda \]3. Normal (Liên tục)
Ghi nhớ (Parameters):
\(\mu, \sigma\) thường cho sẵn.
2. Kiểm định giả thuyết thống kê cho 2 mẫu độc lập (Means)
2.1 Nếu đề cho sẵn \(\sigma_1, \sigma_2\) (Known Variances) - Case 1
Confidence Interval for \(\mu_1 - \mu_2\):
\[ (\bar{x}_1 - \bar{x}_2) - E \le \mu_1 - \mu_2 \le (\bar{x}_1 - \bar{x}_2) + E \]Margin of Error:
\[ E = Z_{\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \]Test Statistic for \(H_0: \mu_1 - \mu_2 = \Delta_0\):
\[ Z_0 = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \]2.2 Nếu đề có "suppose that \(\sigma_1 = \sigma_2\)" (Unknown but Equal Variances) - Case 2
Pooled Variance Estimate:
\[ s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} \]Degrees of Freedom:
\[ df = n_1 + n_2 - 2 \]Confidence Interval for \(\mu_1 - \mu_2\):
\[ (\bar{x}_1 - \bar{x}_2) - E \le \mu_1 - \mu_2 \le (\bar{x}_1 - \bar{x}_2) + E \]Margin of Error:
\[ E = t_{\alpha/2, df} \cdot s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \]Test Statistic for \(H_0: \mu_1 - \mu_2 = \Delta_0\):
\[ t_0 = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \]2.3 Nếu đề không cho gì cả (Unknown and Unequal Variances) - Case 3
Degrees of Freedom (Welch-Satterthwaite):
\[ df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{\left(\frac{s_1^2}{n_1}\right)^2}{n_1 - 1} + \frac{\left(\frac{s_2^2}{n_2}\right)^2}{n_2 - 1}} \]Confidence Interval for \(\mu_1 - \mu_2\):
\[ (\bar{x}_1 - \bar{x}_2) - E \le \mu_1 - \mu_2 \le (\bar{x}_1 - \bar{x}_2) + E \]Margin of Error:
\[ E = t_{\alpha/2, df} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \]Test Statistic for \(H_0: \mu_1 - \mu_2 = \Delta_0\):
\[ t_0^* = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]3. Kiểm định giả thuyết thống kê cho tỷ lệ (Proportions)
Loại 1: Cho 1 mẫu (One Sample)
Sample Proportion:
\[ \hat{p} = \frac{x}{n} \]Confidence Interval for \(p\):
\[ \hat{p} - E \le p \le \hat{p} + E \]Margin of Error (CI):
\[ E = Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]Test Statistic for \(H_0: p = p_0\):
\[ Z_0 = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]Loại 2: Cho 2 mẫu độc lập (Two Independent Samples)
Sample Proportions:
\[ \hat{p}_1 = \frac{x_1}{n_1}, \quad \hat{p}_2 = \frac{x_2}{n_2} \]Confidence Interval for \(p_1 - p_2\):
\[ (\hat{p}_1 - \hat{p}_2) - E \le p_1 - p_2 \le (\hat{p}_1 - \hat{p}_2) + E \]Margin of Error (CI):
\[ E = Z_{\alpha/2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]Pooled Proportion (for Test Statistic under \(H_0: p_1 = p_2\)):
\[ \bar{p} = \frac{x_1 + x_2}{n_1 + n_2} \]Test Statistic for \(H_0: p_1 - p_2 = 0\) (or \(H_0: p_1 = p_2\)):
\[ Z_0 = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\bar{p}(1-\bar{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \]4. Hồi quy tuyến tính đơn (Simple Linear Regression)
4.1 Tổng bình phương (Sums of Squares)
Sum of Squares for X:
\[ S_{xx} = \sum (x_i - \bar{x})^2 = \sum x_i^2 - \frac{(\sum x_i)^2}{n} \]Sum of Cross-products:
\[ S_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y}) = \sum x_i y_i - \frac{(\sum x_i)(\sum y_i)}{n} \]Total Sum of Squares (Y):
\[ SST = \sum (y_i - \bar{y})^2 = \sum y_i^2 - \frac{(\sum y_i)^2}{n} \]Regression Sum of Squares:
\[ SSR = \sum (\hat{y}_i - \bar{y})^2 = \hat{\beta}_1 S_{xy} \]Error (Residual) Sum of Squares:
\[ SSE = \sum (y_i - \hat{y}_i)^2 = SST - SSR \]Estimate of Error Variance (Mean Squared Error):
\[ s^2 = \frac{SSE}{n-2} \]4.2 Hệ số hồi quy (Coefficients)
Slope Estimate:
\[ \hat{\beta}_1 = \frac{S_{xy}}{S_{xx}} \]Intercept Estimate:
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]Correlation Coefficient (R):
\[ R = \frac{S_{xy}}{\sqrt{S_{xx} SST}} \](Note: Coefficient of Determination \(R^2 = \frac{SSR}{SST} = \hat{\beta}_1 \frac{S_{xy}}{SST}\))
4.3 Kiểm định giả thuyết (Testing)
Test Statistic for \(H_0: \beta_1 = \beta_{1,0}\) (often \(H_0: \beta_1 = 0\)):
\[ t_0 = \frac{\hat{\beta}_1 - \beta_{1,0}}{se(\hat{\beta}_1)}, \quad \text{where } se(\hat{\beta}_1) = \sqrt{\frac{s^2}{S_{xx}}} \](Degrees of freedom: \(df = n-2\))
Test Statistic for \(H_0: \beta_0 = \beta_{0,0}\):
\[ t_0 = \frac{\hat{\beta}_0 - \beta_{0,0}}{se(\hat{\beta}_0)}, \quad \text{where } se(\hat{\beta}_0) = \sqrt{s^2 \left(\frac{1}{n} + \frac{\bar{x}^2}{S_{xx}}\right)} \](Degrees of freedom: \(df = n-2\))
Test Statistic for \(H_0: \rho = 0\) (Correlation = 0, equivalent to testing \(H_0: \beta_1 = 0\)):
\[ t_0 = \frac{R \sqrt{n-2}}{\sqrt{1 - R^2}} \](Degrees of freedom: \(df = n-2\))