MAS291 - Công thức Thống kê

1. Phân phối (Distributions)

Loại 1: Uniform Distribution

1. Rời rạc (Discrete)

Probability Mass Function (PMF):

\[ P(X = a) = \frac{1}{n} \quad (\text{a bất kỳ}) \]

Mean:

\[ \text{Mean} = E(X) = \mu = \frac{a+b}{2} \]

Variance:

\[ \text{Variance} = V(X) = \sigma^2 = \frac{(b - a + 1)^2 - 1}{12} \]

2. Liên tục (Continuous)

Probability Density Function (PDF):

\[ f(x) = \frac{1}{b-a} \quad \text{for } a \le x \le b \]

Mean:

\[ \text{Mean} = E(X) = \mu = \frac{a+b}{2} \]

Variance:

\[ \text{Variance} = V(X) = \sigma^2 = \frac{(b - a)^2}{12} \]

Loại 2: Exponential Distribution (Liên tục)

Ghi nhớ (Key relation):

\[ \mu = \sigma = \frac{1}{\lambda} \]

Probability Density Function (PDF):

\[ f(x) = \lambda e^{-\lambda x}, \quad x \ge 0 \]

Cumulative Distribution Function (CDF):

\[ F(x) = P(X \le x) = 1 - e^{-\lambda x}, \quad x \ge 0 \]

Loại 3: Binomial, Poisson, Normal Distribution

1. Binomial (Rời rạc)

Ghi nhớ (Mean & Std Dev):

\[ \mu = np \] \[ \sigma = \sqrt{np(1-p)} \]

2. Poisson (Rời rạc)

Ghi nhớ (Mean & Std Dev):

\[ \mu = \sigma = \lambda \]

3. Normal (Liên tục)

Ghi nhớ (Parameters):

\(\mu, \sigma\) thường cho sẵn.

2. Kiểm định giả thuyết thống kê cho 2 mẫu độc lập (Means)

2.1 Nếu đề cho sẵn \(\sigma_1, \sigma_2\) (Known Variances) - Case 1

Confidence Interval for \(\mu_1 - \mu_2\):

\[ (\bar{x}_1 - \bar{x}_2) - E \le \mu_1 - \mu_2 \le (\bar{x}_1 - \bar{x}_2) + E \]

Margin of Error:

\[ E = Z_{\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \]

Test Statistic for \(H_0: \mu_1 - \mu_2 = \Delta_0\):

\[ Z_0 = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \]

2.2 Nếu đề có "suppose that \(\sigma_1 = \sigma_2\)" (Unknown but Equal Variances) - Case 2

Pooled Variance Estimate:

\[ s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} \]

Degrees of Freedom:

\[ df = n_1 + n_2 - 2 \]

Confidence Interval for \(\mu_1 - \mu_2\):

\[ (\bar{x}_1 - \bar{x}_2) - E \le \mu_1 - \mu_2 \le (\bar{x}_1 - \bar{x}_2) + E \]

Margin of Error:

\[ E = t_{\alpha/2, df} \cdot s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \]

Test Statistic for \(H_0: \mu_1 - \mu_2 = \Delta_0\):

\[ t_0 = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \]

2.3 Nếu đề không cho gì cả (Unknown and Unequal Variances) - Case 3

Degrees of Freedom (Welch-Satterthwaite):

\[ df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{\left(\frac{s_1^2}{n_1}\right)^2}{n_1 - 1} + \frac{\left(\frac{s_2^2}{n_2}\right)^2}{n_2 - 1}} \]

Confidence Interval for \(\mu_1 - \mu_2\):

\[ (\bar{x}_1 - \bar{x}_2) - E \le \mu_1 - \mu_2 \le (\bar{x}_1 - \bar{x}_2) + E \]

Margin of Error:

\[ E = t_{\alpha/2, df} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \]

Test Statistic for \(H_0: \mu_1 - \mu_2 = \Delta_0\):

\[ t_0^* = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

3. Kiểm định giả thuyết thống kê cho tỷ lệ (Proportions)

Loại 1: Cho 1 mẫu (One Sample)

Sample Proportion:

\[ \hat{p} = \frac{x}{n} \]

Confidence Interval for \(p\):

\[ \hat{p} - E \le p \le \hat{p} + E \]

Margin of Error (CI):

\[ E = Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Test Statistic for \(H_0: p = p_0\):

\[ Z_0 = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]

Loại 2: Cho 2 mẫu độc lập (Two Independent Samples)

Sample Proportions:

\[ \hat{p}_1 = \frac{x_1}{n_1}, \quad \hat{p}_2 = \frac{x_2}{n_2} \]

Confidence Interval for \(p_1 - p_2\):

\[ (\hat{p}_1 - \hat{p}_2) - E \le p_1 - p_2 \le (\hat{p}_1 - \hat{p}_2) + E \]

Margin of Error (CI):

\[ E = Z_{\alpha/2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]

Pooled Proportion (for Test Statistic under \(H_0: p_1 = p_2\)):

\[ \bar{p} = \frac{x_1 + x_2}{n_1 + n_2} \]

Test Statistic for \(H_0: p_1 - p_2 = 0\) (or \(H_0: p_1 = p_2\)):

\[ Z_0 = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\bar{p}(1-\bar{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \]

4. Hồi quy tuyến tính đơn (Simple Linear Regression)

4.1 Tổng bình phương (Sums of Squares)

Sum of Squares for X:

\[ S_{xx} = \sum (x_i - \bar{x})^2 = \sum x_i^2 - \frac{(\sum x_i)^2}{n} \]

Sum of Cross-products:

\[ S_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y}) = \sum x_i y_i - \frac{(\sum x_i)(\sum y_i)}{n} \]

Total Sum of Squares (Y):

\[ SST = \sum (y_i - \bar{y})^2 = \sum y_i^2 - \frac{(\sum y_i)^2}{n} \]

Regression Sum of Squares:

\[ SSR = \sum (\hat{y}_i - \bar{y})^2 = \hat{\beta}_1 S_{xy} \]

Error (Residual) Sum of Squares:

\[ SSE = \sum (y_i - \hat{y}_i)^2 = SST - SSR \]

Estimate of Error Variance (Mean Squared Error):

\[ s^2 = \frac{SSE}{n-2} \]

4.2 Hệ số hồi quy (Coefficients)

Slope Estimate:

\[ \hat{\beta}_1 = \frac{S_{xy}}{S_{xx}} \]

Intercept Estimate:

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Correlation Coefficient (R):

\[ R = \frac{S_{xy}}{\sqrt{S_{xx} SST}} \]

(Note: Coefficient of Determination \(R^2 = \frac{SSR}{SST} = \hat{\beta}_1 \frac{S_{xy}}{SST}\))

4.3 Kiểm định giả thuyết (Testing)

Test Statistic for \(H_0: \beta_1 = \beta_{1,0}\) (often \(H_0: \beta_1 = 0\)):

\[ t_0 = \frac{\hat{\beta}_1 - \beta_{1,0}}{se(\hat{\beta}_1)}, \quad \text{where } se(\hat{\beta}_1) = \sqrt{\frac{s^2}{S_{xx}}} \]

(Degrees of freedom: \(df = n-2\))

Test Statistic for \(H_0: \beta_0 = \beta_{0,0}\):

\[ t_0 = \frac{\hat{\beta}_0 - \beta_{0,0}}{se(\hat{\beta}_0)}, \quad \text{where } se(\hat{\beta}_0) = \sqrt{s^2 \left(\frac{1}{n} + \frac{\bar{x}^2}{S_{xx}}\right)} \]

(Degrees of freedom: \(df = n-2\))

Test Statistic for \(H_0: \rho = 0\) (Correlation = 0, equivalent to testing \(H_0: \beta_1 = 0\)):

\[ t_0 = \frac{R \sqrt{n-2}}{\sqrt{1 - R^2}} \]

(Degrees of freedom: \(df = n-2\))