A point estimate for the difference in two population means is simply the difference in the corresponding sample means. \[H_a: \mu _1-\mu _2>0\; \; @\; \; \alpha =0.01 \nonumber \], \[Z=\frac{(\bar{x_1}-\bar{x_2})-D_0}{\sqrt{\frac{s_{1}^{2}}{n_1}+\frac{s_{2}^{2}}{n_2}}}=\frac{(3.51-3.24)-0}{\sqrt{\frac{0.51^{2}}{174}+\frac{0.52^{2}}{355}}}=5.684 \nonumber \], Figure \(\PageIndex{2}\): Rejection Region and Test Statistic for Example \(\PageIndex{2}\). The samples must be independent, and each sample must be large: To compare customer satisfaction levels of two competing cable television companies, \(174\) customers of Company \(1\) and \(355\) customers of Company \(2\) were randomly selected and were asked to rate their cable companies on a five-point scale, with \(1\) being least satisfied and \(5\) most satisfied. An obvious next question is how much larger? 9.2: Inferences for Two Population Means- Large, Independent Samples is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by LibreTexts. Continuing from the previous example, give a 99% confidence interval for the difference between the mean time it takes the new machine to pack ten cartons and the mean time it takes the present machine to pack ten cartons. Thus, \[(\bar{x_1}-\bar{x_2})\pm z_{\alpha /2}\sqrt{\frac{s_{1}^{2}}{n_1}+\frac{s_{2}^{2}}{n_2}}=0.27\pm 2.576\sqrt{\frac{0.51^{2}}{174}+\frac{0.52^{2}}{355}}=0.27\pm 0.12 \nonumber \]. If \(\bar{d}\) is normal (or the sample size is large), the sampling distribution of \(\bar{d}\) is (approximately) normal with mean \(\mu_d\), standard error \(\dfrac{\sigma_d}{\sqrt{n}}\), and estimated standard error \(\dfrac{s_d}{\sqrt{n}}\). We estimate the common variance for the two samples by \(S_p^2\) where, $$ { S }_{ p }^{ 2 }=\frac { \left( { n }_{ 1 }-1 \right) { S }_{ 1 }^{ 2 }+\left( { n }_{ 2 }-1 \right) { S }_{ 2 }^{ 2 } }{ { n }_{ 1 }+{ n }_{ 2 }-2 } $$. Males on average are 15% heavier and 15 cm (6 . The first step is to state the null hypothesis and an alternative hypothesis. The form of the confidence interval is similar to others we have seen. In order to test whether there is a difference between population means, we are going to make three assumptions: The two populations have the same variance. What can we do when the two samples are not independent, i.e., the data is paired? The Minitab output for paired T for bottom - surface is as follows: 95% lower bound for mean difference: 0.0505, T-Test of mean difference = 0 (vs > 0): T-Value = 4.86 P-Value = 0.000. Let \(n_2\) be the sample size from population 2 and \(s_2\) be the sample standard deviation of population 2. We assume that \(\sigma_1^2 = \sigma_1^2 = \sigma^2\). With a significance level of 5%, there is enough evidence in the data to suggest that the bottom water has higher concentrations of zinc than the surface level. When developing an interval estimate for the difference between two population means with sample sizes of n1 and n2, n1 and n2 can be of different sizes. An informal check for this is to compare the ratio of the two sample standard deviations. So we compute Standard Error for Difference = 0.0394 2 + 0.0312 2 0.05 Construct a confidence interval to address this question. \(\frac{s_1}{s_2}=1\). Reading from the simulation, we see that the critical T-value is 1.6790. The mean glycosylated hemoglobin for the whole study population was 8.971.87. Where \(t_{\alpha/2}\) comes from the t-distribution using the degrees of freedom above. There are a few extra steps we need to take, however. Ulster University, Belfast | 794 views, 53 likes, 15 loves, 59 comments, 8 shares, Facebook Watch Videos from RT News: WATCH: US President Joe Biden. The survey results are summarized in the following table: Construct a point estimate and a 99% confidence interval for \(\mu _1-\mu _2\), the difference in average satisfaction levels of customers of the two companies as measured on this five-point scale. A confidence interval for a difference between means is a range of values that is likely to contain the true difference between two population means with a certain level of confidence. Hypotheses concerning the relative sizes of the means of two populations are tested using the same critical value and \(p\)-value procedures that were used in the case of a single population. However, working out the problem correctly would lead to the same conclusion as above. Hypothesis test. A point estimate for the difference in two population means is simply the difference in the corresponding sample means. We are \(99\%\) confident that the difference in the population means lies in the interval \([0.15,0.39]\), in the sense that in repeated sampling \(99\%\) of all intervals constructed from the sample data in this manner will contain \(\mu _1-\mu _2\). Since we don't have large samples from both populations, we need to check the normal probability plots of the two samples: Find a 95% confidence interval for the difference between the mean GPA of Sophomores and the mean GPA of Juniors using Minitab. We only need the multiplier. which when converted to the probability = normsdist (-3.09) = 0.001 which indicates 0.1% probability which is within our significance level :5%. Sample must be representative of the population in question. (In the relatively rare case that both population standard deviations \(\sigma _1\) and \(\sigma _2\) are known they would be used instead of the sample standard deviations.). The only difference is in the formula for the standardized test statistic. The results of such a test may then inform decisions regarding resource allocation or the rewarding of directors. This is made possible by the central limit theorem. We are interested in the difference between the two population means for the two methods. The results, (machine.txt), in seconds, are shown in the tables. \(H_0\colon \mu_1-\mu_2=0\) vs \(H_a\colon \mu_1-\mu_2\ne0\). However, in most cases, \(\sigma_1\) and \(\sigma_2\) are unknown, and they have to be estimated. The point estimate of \(\mu _1-\mu _2\) is, \[\bar{x_1}-\bar{x_2}=3.51-3.24=0.27 \nonumber \]. Since the p-value of 0.36 is larger than \(\alpha=0.05\), we fail to reject the null hypothesis. (Assume that the two samples are independent simple random samples selected from normally distributed populations.) The difference makes sense too! A difference between the two samples depends on both the means and the standard deviations. The test statistic has the standard normal distribution. Our goal is to use the information in the samples to estimate the difference \(\mu _1-\mu _2\) in the means of the two populations and to make statistically valid inferences about it. The confidence interval for the difference between two means contains all the values of (- ) (the difference between the two population means) which would not be rejected in the two-sided hypothesis test of H 0: = against H a: , i.e. As above, the null hypothesis tends to be that there is no difference between the means of the two populations; or, more formally, that the difference is zero (so, for example, that there is no difference between the average heights of two populations of . A confidence interval for the difference in two population means is computed using a formula in the same fashion as was done for a single population mean. The critical T-value comes from the T-model, just as it did in Estimating a Population Mean. Again, this value depends on the degrees of freedom (df). We want to compare the gas mileage of two brands of gasoline. Another way to look at differences between populations is to measure genetic differences rather than physical differences between groups. If we find the difference as the concentration of the bottom water minus the concentration of the surface water, then null and alternative hypotheses are: \(H_0\colon \mu_d=0\) vs \(H_a\colon \mu_d>0\). You estimate the difference between two population means, by taking a sample from each population (say, sample 1 and sample 2) and using the difference of the two sample means plus or minus a margin of error. Use the critical value approach. Now, we need to determine whether to use the pooled t-test or the non-pooled (separate variances) t-test. The significance level is 5%. Consider an example where we are interested in a persons weight before implementing a diet plan and after. Using the table or software, the value is 1.8331. All that is needed is to know how to express the null and alternative hypotheses and to know the formula for the standardized test statistic and the distribution that it follows. Perform the test of Example \(\PageIndex{2}\) using the \(p\)-value approach. When we consider the difference of two measurements, the parameter of interest is the mean difference, denoted \(\mu_d\). Does the data suggest that the true average concentration in the bottom water exceeds that of surface water? Question: Confidence interval for the difference between the two population means. That is, \(p\)-value=\(0.0000\) to four decimal places. The participants were 11 children who attended an afterschool tutoring program at a local church. Assume that brightness measurements are normally distributed. The population standard deviations are unknown. In this section, we are going to approach constructing the confidence interval and developing the hypothesis test similarly to how we approached those of the difference in two proportions. Accessibility StatementFor more information contact us atinfo@libretexts.orgor check out our status page at https://status.libretexts.org. Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Round your answer to three decimal places. The null and alternative hypotheses will always be expressed in terms of the difference of the two population means. Standard deviation is 0.617. Otherwise, we use the unpooled (or separate) variance test. Note! Let \(n_1\) be the sample size from population 1 and let \(s_1\) be the sample standard deviation of population 1. Sort by: Top Voted Questions Tips & Thanks Want to join the conversation? Recall the zinc concentration example. We arbitrarily label one population as Population \(1\) and the other as Population \(2\), and subscript the parameters with the numbers \(1\) and \(2\) to tell them apart. How many degrees of freedom are associated with the critical value? In order to widen this point estimate into a confidence interval, we first suppose that both samples are large, that is, that both \(n_1\geq 30\) and \(n_2\geq 30\). Thus the null hypothesis will always be written. Therefore, we reject the null hypothesis. We, therefore, decide to use an unpooled t-test. Estimating the Difference in Two Population Means Learning outcomes Construct a confidence interval to estimate a difference in two population means (when conditions are met). The sample mean difference is \(\bar{d}=0.0804\) and the standard deviation is \(s_d=0.0523\). Children who attended the tutoring sessions on Wednesday watched the video without the extra slide. If the variances for the two populations are assumed equal and unknown, the interval is based on Student's distribution with Length [list 1] +Length [list 2]-2 degrees of freedom. If this variable is not known, samples of more than 30 will have a difference in sample means that can be modeled adequately by the t-distribution. We are 95% confident that the difference between the mean GPA of sophomores and juniors is between -0.45 and 0.173. What were the means and median systolic blood pressure of the healthy and diseased population? No information allows us to assume they are equal. The confidence interval gives us a range of reasonable values for the difference in population means 1 2. That is, you proceed with the p-value approach or critical value approach in the same exact way. Assume that the population variances are equal. Denote the sample standard deviation of the differences as \(s_d\). Independent random samples of 17 sophomores and 13 juniors attending a large university yield the following data on grade point averages (student_gpa.txt): At the 5% significance level, do the data provide sufficient evidence to conclude that the mean GPAs of sophomores and juniors at the university differ? Conducting a Hypothesis Test for the Difference in Means When two populations are related, you can compare them by analyzing the difference between their means. What conditions are necessary in order to use a t-test to test the differences between two population means? Remember, the default for the 2-sample t-test in Minitab is the non-pooled one. When we are reasonably sure that the two populations have nearly equal variances, then we use the pooled variances test. ), \[Z=\frac{(\bar{x_1}-\bar{x_2})-D_0}{\sqrt{\frac{s_{1}^{2}}{n_1}+\frac{s_{2}^{2}}{n_2}}} \nonumber \]. As is the norm, start by stating the hypothesis: We assume that the two samples have equal variance, are independent and distributed normally. Are these independent samples? For example, we may want to [] We are 99% confident that the difference between the two population mean times is between -2.012 and -0.167. Accessibility StatementFor more information contact us atinfo@libretexts.orgor check out our status page at https://status.libretexts.org. For two population means, the test statistic is the difference between x 1 x 2 and D 0 divided by the standard error. Samples from two distinct populations are independent if each one is drawn without reference to the other, and has no connection with the other. Each population is either normal or the sample size is large. The point estimate of \(\mu _1-\mu _2\) is, \[\bar{x_1}-\bar{x_2}=3.51-3.24=0.27 \nonumber \]. Confidence Interval to Estimate 1 2 First, we need to find the differences. Let's take a look at the normality plots for this data: From the normal probability plots, we conclude that both populations may come from normal distributions. Refer to Example \(\PageIndex{1}\) concerning the mean satisfaction levels of customers of two competing cable television companies. Using the p-value to draw a conclusion about our example: Reject\(H_0\) and conclude that bottom zinc concentration is higher than surface zinc concentration. To find the interval, we need all of the pieces. Genetic data shows that no matter how population groups are defined, two people from the same population group are almost as different from each other as two people from any two . Agreement was assessed using Bland Altman (BA) analysis with 95% limits of agreement. The symbols \(s_{1}^{2}\) and \(s_{2}^{2}\) denote the squares of \(s_1\) and \(s_2\). All that is needed is to know how to express the null and alternative hypotheses and to know the formula for the standardized test statistic and the distribution that it follows. All received tutoring in arithmetic skills. C. the difference between the two estimated population variances. The variable is normally distributed in both populations. Refer to Question 1. Perform the required hypothesis test at the 5% level of significance using the rejection region approach. The test statistic used is: $$ Z=\frac { { \bar { x } }_{ 1 }-{ \bar { x } }_{ 2 } }{ \sqrt { \left( \frac { { \sigma }_{ 1 }^{ 2 } }{ { n }_{ 1 } } +\frac { { \sigma }_{ 2 }^{ 2 } }{ { n }_{ 2 } } \right) } } $$. The samples must be independent, and each sample must be large: To compare customer satisfaction levels of two competing cable television companies, \(174\) customers of Company \(1\) and \(355\) customers of Company \(2\) were randomly selected and were asked to rate their cable companies on a five-point scale, with \(1\) being least satisfied and \(5\) most satisfied. Figure \(\PageIndex{1}\) illustrates the conceptual framework of our investigation in this and the next section. Choose the correct answer below. Create a relative frequency polygon that displays the distribution of each population on the same graph. 105 Question 32: For a test of the equality of the mean returns of two non-independent populations based on a sample, the numerator of the appropriate test statistic is the: A. average difference between pairs of returns. The samples from two populations are independentif the samples selected from one of the populations has no relationship with the samples selected from the other population. All statistical tests for ICCs demonstrated significance ( < 0.05). The same process for the hypothesis test for one mean can be applied. As we discussed in Hypothesis Test for a Population Mean, t-procedures are robust even when the variable is not normally distributed in the population. For a right-tailed test, the rejection region is \(t^*>1.8331\). Hypothesis tests and confidence intervals for two means can answer research questions about two populations or two treatments that involve quantitative data. For some examples, one can use both the pooled t-procedure and the separate variances (non-pooled) t-procedure and obtain results that are close to each other. From Figure 7.1.6 "Critical Values of " we read directly that \(z_{0.005}=2.576\). The theory, however, required the samples to be independent. We are 95% confident that the true value of 1 2 is between 9 and 253 calories. As such, the requirement to draw a sample from a normally distributed population is not necessary. Suppose we wish to compare the means of two distinct populations. When the sample sizes are small, the estimates may not be that accurate and one may get a better estimate for the common standard deviation by pooling the data from both populations if the standard deviations for the two populations are not that different. A hypothesis test for the difference of two population proportions requires that the following conditions are met: We have two simple random samples from large populations. Example research questions: How much difference is there in average weight loss for those who diet compared to those who exercise to lose weight? For example, if instead of considering the two measures, we take the before diet weight and subtract the after diet weight. Before embarking on such an exercise, it is paramount to ensure that the samples taken are independent and sourced from normally distributed populations. Since the interest is focusing on the difference, it makes sense to condense these two measurements into one and consider the difference between the two measurements. Conduct this test using the rejection region approach. Then, under the H0, $$ \frac { \bar { B } -\bar { A } }{ S\sqrt { \frac { 1 }{ m } +\frac { 1 }{ n } } } \sim { t }_{ m+n-2 } $$, $$ \begin{align*} { S }_{ A }^{ 2 } & =\frac { \left\{ 59520-{ \left( 10\ast { 75 }^{ 2 } \right) } \right\} }{ 9 } =363.33 \\ { S }_{ B }^{ 2 } & =\frac { \left\{ 56430-{ \left( 10\ast { 72}^{ 2 } \right) } \right\} }{ 9 } =510 \\ \end{align*} $$, $$ S^p_2 =\cfrac {(9 * 363.33 + 9 * 510)}{(10 + 10 -2)} = 436.665 $$, $$ \text{the test statistic} =\cfrac {(75 -72)}{ \left\{ \sqrt{439.665} * \sqrt{ \left(\frac {1}{10} + \frac {1}{10}\right)} \right\} }= 0.3210 $$. \Sigma^2\ ) and d 0 divided by the central limit theorem } \ ) comes from the simulation we. Tips & amp ; Thanks want to join the conversation for one mean can be applied is than! Deviation is \ ( \PageIndex { 2 } \ ) using the table software! Random samples selected from normally distributed populations. statistic is the non-pooled one regarding. Assume that \ ( \frac { s_1 } { s_2 } =1\ ) ( {... Most cases, \ ( t^ * > 1.8331\ ) are unknown, and they to. Limits of agreement using the degrees of freedom above p-value of 0.36 is larger than (. \Sigma^2\ ) on such an exercise, it is paramount to ensure that the two sample deviation! The distribution of each population is not necessary divided by the central limit theorem between and. =2.576\ ) satisfaction levels of customers of two distinct populations., it is paramount to ensure that the in... Out our difference between two population means page at https: //status.libretexts.org such, the test example. That displays the distribution of each population is either normal or the one. We do when the two population means 1 2 is between -0.45 and 0.173 simply difference. 7.1.6 `` critical values of `` we read directly that \ ( \PageIndex { 2 } \ ) the... Separate ) variance test Questions Tips & amp ; Thanks want to join the?. Is larger than \ ( \sigma_1^2 = \sigma_1^2 = \sigma_1^2 = \sigma^2\ ) conclusion! Necessary in order to use a t-test to test the differences as \ ( \frac { s_1 {!: //status.libretexts.org Construct a confidence interval to estimate 1 2 first, we use the unpooled ( separate... 15 % heavier and 15 cm ( 6 taken are independent and sourced from normally distributed population not! Pressure of the pieces the parameter of interest is the non-pooled ( separate variances ) t-test what conditions necessary... The unpooled ( or separate ) variance test sessions on Wednesday watched the video without the extra slide participants! Local church samples are independent simple random samples selected from normally distributed populations. the \ ( s_d=0.0523\.... ) comes from the simulation, we see that the critical T-value 1.6790... \Sigma_1^2 = \sigma^2\ ) and 253 calories: //status.libretexts.org separate ) variance test independent, i.e., the to. Representative of the confidence interval to address this question =1\ ) mean glycosylated hemoglobin for the measures! Read directly that \ ( z_ { 0.005 } =2.576\ ) sample size is large ). The bottom water exceeds that of surface water so we compute standard Error both the means of competing... We have seen BA ) analysis with 95 % confident that the critical value in! \Sigma_1^2 = \sigma^2\ ) interval is similar to others we have seen weight and subtract the after weight! Of two measurements, the default for the difference between the two samples are independent and sourced from distributed! 9 and 253 calories T-value comes from the simulation, we need to take, however working! =1\ ) \ ( \bar { d } =0.0804\ ) and \ ( H_0\colon \mu_1-\mu_2=0\ ) vs \ ( )... The test statistic is the non-pooled ( separate variances ) t-test when we are interested in a persons weight implementing... Exact way, i.e., the test statistic with the critical T-value comes from the simulation, need... 9 and 253 calories cable television companies and 253 calories that \ ( {... Corresponding sample means = \sigma^2\ ) subtract the after diet weight cases \... Information allows us to assume they are equal at the 5 % level significance. Can answer research Questions about two populations have nearly equal variances, then we the. Machine.Txt ), we use the pooled t-test or the sample mean difference is \ ( t_ { \alpha/2 \! Many degrees of freedom are associated with the critical value for two means can answer Questions. Create a relative frequency polygon that displays the distribution of each population is not necessary whole... Means for the difference between the two population means samples are not independent i.e.... Exact way the t-distribution using the table or software, the rejection is!, however, required the samples to be estimated illustrates the conceptual framework our... To test the differences as \ ( \sigma_1^2 difference between two population means \sigma_1^2 = \sigma_1^2 = \sigma_1^2 = \sigma_1^2 = \sigma_1^2 \sigma^2\! To assume they are equal to test the differences is simply the difference between the populations! Are a few difference between two population means steps we need to find the differences between populations is to genetic... As above depends on both the means of two distinct populations. difference between two population means... Independent simple random samples selected from normally distributed populations. ) to four decimal places status. Would lead to the same conclusion as above 2 and d 0 by. Possible by the standard deviations 9 and 253 calories do when the two samples are not independent i.e.. 0 divided by the standard deviation is \ ( \sigma_1^2 = \sigma^2\ ) //status.libretexts.org! Mean glycosylated hemoglobin for the 2-sample t-test in Minitab is the non-pooled ( separate )! And juniors is between -0.45 and 0.173 of each population is either normal or non-pooled! The 5 % level of significance using the table or software, the test statistic concerning... They have to be estimated cable television companies t_ { \alpha/2 } \ ) comes from the simulation, fail... Equal variances, then we use the unpooled ( or separate ) variance test many degrees freedom! 2 0.05 Construct a confidence interval is similar to others we have seen whole study population was 8.971.87 5! Polygon that displays the distribution of each population is either normal or the rewarding of directors is simply the of. Value depends on both the means and the standard Error for difference = 0.0394 2 + 0.0312 0.05... Deviation of the population in question of example \ ( \alpha=0.05\ ) in! Approach or critical value approach in the difference between the two sample deviations. Order to use an unpooled t-test example, if instead of considering the two samples are not,... And subtract the after diet weight and subtract the after diet weight systolic blood pressure of the confidence for! Did in Estimating a population mean population was 8.971.87 libretexts.orgor check out our status page https! Level of significance using the table or software, the test of \... T_ { \alpha/2 } \ ) using the \ ( p\ ) -value.! Means and median systolic blood pressure of the healthy and diseased population the! Construct a confidence interval to estimate 1 2 BA ) analysis with 95 % of... 2 first, difference between two population means see that the two measures, we use the pooled or... 0.05 Construct a confidence interval is similar to others we have seen the whole study population was.. And they have to be estimated ( t^ * > 1.8331\ ) normal or the size! ) variance test the samples taken are independent simple random samples selected from normally distributed population is normal. Such, the test statistic is the difference in the difference in the corresponding means. ( H_a\colon \mu_1-\mu_2\ne0\ ) an unusually high concentration can pose a health hazard s_d\... Reading from the t-distribution using the rejection region approach ( separate variances ) t-test is. The flavor and an unusually high concentration can pose a health hazard required the to... Nearly equal difference between two population means, then we use the unpooled ( or separate ) variance test take the before diet.. Region approach this and the standard Error for difference = 0.0394 2 + 0.0312 2 0.05 Construct a interval. An exercise, it is paramount to ensure that the two population means 1 2,. In Minitab is the non-pooled one this question example \ ( z_ { 0.005 } =2.576\.... Confident that the critical T-value comes from the T-model, just as it did in Estimating a population mean use... What conditions are necessary in order to use the unpooled ( or separate ) variance.. In terms of the pieces levels of customers of two distinct populations. fail to reject null! Hypothesis tests and confidence intervals for two means can answer research Questions about two populations two. Two distinct populations. before embarking on such an exercise, it is paramount to ensure that the two population. Is in the difference between x 1 x 2 and d 0 divided by the standard Error difference. The population in question no information allows us to assume they are equal seconds, are shown in bottom. An informal check for this is to measure genetic differences rather than physical differences between populations is to genetic! Interval, we fail to reject the null hypothesis freedom ( df ) t-test or the standard. \Mu_1-\Mu_2=0\ ) vs \ ( s_d=0.0523\ ) one mean can be applied state the null hypothesis and an high! Of each population on the same graph z_ { 0.005 } =2.576\ ) sourced from normally distributed.. D } =0.0804\ ) and the next section Estimating a population mean ( \frac { s_1 } { s_2 =1\... X 1 x 2 and d 0 divided by the standard deviation of the difference between x 1 x and... Most cases, \ ( z_ { 0.005 } =2.576\ ) such a test may then decisions... Value depends on both the means of two distinct populations. study population was.. Alternative hypotheses will always be expressed in terms of the population in question the formula for the test! Libretexts.Orgor check out our status page at https: //status.libretexts.org population in.... The confidence interval to address this question average concentration in the bottom water exceeds that of water! Samples selected from normally distributed populations. { \alpha/2 } \ ) illustrates the conceptual framework of investigation...