假設:
若將兩組資料合併為同一組,設新的平均數為 \(\mu\),標準差為 \(\sigma\),則:
其中,\(p=\dfrac{m}{m+n}\)、\(q=\dfrac{n}{m+n}\)
\(
\begin{array}{rcl}
X &=& \left\{ x_1, x_2, \cdots, x_m \right\} \\
Y &=& \left\{ y_1, y_2, \cdots, y_n \right\}
\end{array}
\)
為兩組不同的數據,它們的平均數與標準差分別為 \(\mu_x\)、\(\sigma_x\) 與 \(\mu_y\)、\(\sigma_y\)。若將兩組資料合併為同一組,設新的平均數為 \(\mu\),標準差為 \(\sigma\),則:
- \( \mu = p\mu_x + q \mu_y \)
-
\(
\sigma^2 = p \sigma_x^2 + q\sigma_y^2 + pq(\mu_x - \mu_y)^2
\)
其中,\(p=\dfrac{m}{m+n}\)、\(q=\dfrac{n}{m+n}\)
因為:
另外,由「標準差公式」一文可知:
將 (1), (2), (3) 式代入 (4) 式,可得:
\(
\dfrac{x_1 + x_2 + \cdots + x_m}{m} = \mu_x \;\;\;
\)
\(
\dfrac{y_1 + y_2 + \cdots + y_n}{n} = \mu_y \;\;\;
\)
所以:
\(
\begin{array}{rcl}
\mu &=& \dfrac{(x_1 + x_2 + \cdots + x_m) + (y_1 + y_2 + \cdots + y_n)}{m+n} \\
&=& \dfrac{m \mu_x + n \mu_y}{m+n} \\
&=& p \mu_x + q \mu_y \;\;\; ········· (1)
\end{array}
\)
另外,由「標準差公式」一文可知:
\(
\dfrac{x_1^2 + x_2^2 + \cdots + x_m^2}{m} = \mu_x^2 + \sigma_x^2 \;\;\;
\) ········· (2)
\(
\dfrac{y_1^2 + y_2^2 + \cdots + y_n^2}{n} = \mu_y^2 + \sigma_y^2 \;\;\;
\) ········· (3)
\(
\dfrac{(x_1^2 + \cdots + x_m^2) + (y_1^2 + \cdots + y_n^2)}{m+n} = \mu^2 + \sigma^2 \;\;\;
\) ········· (4)
將 (1), (2), (3) 式代入 (4) 式,可得:
\(
\begin{array}{rcl}
\dfrac{m(\mu_x^2 + \sigma_x^2) + n(\mu_y^2 + \sigma_y^2)}{m+n} &=& (p \mu_x + q \mu_y)^2 + \sigma^2 \\
\\
p(\mu_x^2 + \sigma_x^2) + q(\mu_y^2 + \sigma_y^2) &=& (p \mu_x + q \mu_y)^2 + \sigma^2
\end{array}
\)
因此:
\(
\begin{array}{rcl}
\sigma^2 &=& p(\mu_x^2 + \sigma_x^2) + q(\mu_y^2 + \sigma_y^2) - (p \mu_x + q \mu_y)^2 \\
&=& p \sigma_x^2 + q \sigma_y^2 + (p-p^2)\mu_x^2 - 2pq \mu_x \mu_y + (q-q^2)\mu_y^2 \\
&=& p \sigma_x^2 + q \sigma_y^2 + p(1-p)\mu_x^2 - 2pq \mu_x \mu_y + q(1-q)\mu_y^2 \\
&=& p \sigma_x^2 + q \sigma_y^2 + pq(\mu_x^2 - 2 \mu_x \mu_y + \mu_y^2) \\
&=& p \sigma_x^2 + q \sigma_y^2 + pq(\mu_x - \mu_y)^2
\end{array}
\)
沒有留言:
張貼留言