Plot 90 confidence interval. Construction of a confidence interval for the mathematical expectation of the general population

An example of interval estimating is confidence interval. A confidence interval is a segment whose center is a point estimate of a numerical characteristic, including the true value of this numerical characteristic with a given probability. This probability is called confidence probability. Thus, the confidence interval is a measure of the accuracy of the estimate, and the confidence probability characterizes its reliability. The size of the confidence interval depends on what value of the confidence probability is given by the experimenter. The higher the confidence level, the wider the interval must be in order to include the true value of the numerical characteristic with a given probability. Often a confidence value of P d = 0.95 is chosen, thus believing that this value is large enough to consider that the confidence interval “almost always” covers the true value. Only sometimes, in the case of responsible and very responsible research, P d = 0.99 and 0.999, respectively, are assumed.

The procedure for constructing a confidence interval includes two steps:

A record of a probabilistic statement regarding some random function, which includes the difference or ratio of an estimate and a numerical characteristic. Such a function carries information about the degree of closeness of the mentioned values. It is necessary that the distribution law of the function be known;

The probabilistic statement is transformed into a form in which the boundaries of the confidence interval of the numerical characteristic are presented in an explicit form.

Examples of functions with a known distribution that satisfy the required requirements are the following:

having a normal distribution if the value of X is normally distributed, and the value of s[X] is known;

2) (3.25)

having a Student's distribution c m = N-1, if the value of X is normally distributed, and the value of s[X] is not known in advance, but its estimate can be obtained from experimental data using formula (3.23);

3) (3.26)

having a Pearson distribution with m = N-1 if the value of X is normally distributed.

Recall that the distribution parameters m are the numbers of degrees of freedom. In addition, the following notations are used here: - arithmetic mean value, - root mean square value equal to the square root of the variance, [X] - estimate of the mean frame value, defined as the square root of the unbiased estimate of the variance, N - sample size.

The Z and t functions can be used to construct a confidence interval for the mean, while the c 2 function constructs a confidence interval for the variance.


Let us construct a confidence interval for the mathematical expectation, provided that we have at our disposal the results of N observations of a normally distributed quantity X, and the mean square value is known in advance from independent observations. Since the function Z is normally distributed, you can use the corresponding table to determine the value of z a such that outside - z a and + z a there remains a part of the area under the distribution curve in the sum equal to a, while within [- z a ,+ z a ] lies part of the area , equal to 1 - a . What has just been said corresponds to the following probabilistic statement:

Р(- z a £ £+z a )= 1-a. (3.27)

(The probability of fulfilling the inequality enclosed in curly brackets is 1-a.). Let's transform the expression in brackets:

Р(-z a )= 1 - a

We call the value 1-a = Р d the confidence probability Р d. According to (3.28), with this confidence probability, the confidence interval for M[X] is given by the limits:

. (3.29)

Comment: Unfortunately, normal distribution tables are built differently in different books. Sometimes the probability integral is given

Ф(z) =

Any sample gives only an approximate idea of ​​the general population, and all sample statistical characteristics (mean, mode, variance ...) are some approximation or say an estimate of the general parameters, which in most cases cannot be calculated due to the inaccessibility of the general population (Figure 20) .

Figure 20. Sampling error

But you can specify the interval in which, with a certain degree of probability, lies the true (general) value of the statistical characteristic. This interval is called d confidence interval (CI).

So the general average with a probability of 95% lies within

from to, (20)

Where t - tabular value of Student's criterion for α =0.05 and f= n-1

Can be found and 99% CI, in this case t chosen for α =0,01.

What is the practical significance of a confidence interval?

    A wide confidence interval indicates that the sample mean does not accurately reflect the population mean. This is usually due to an insufficient sample size, or to its heterogeneity, i.e. large dispersion. Both give a large error in the mean and, accordingly, a wider CI. And this is the reason to return to the research planning stage.

    Upper and lower CI limits assess whether the results will be clinically significant

Let us dwell in more detail on the question of the statistical and clinical significance of the results of the study of group properties. Recall that the task of statistics is to detect at least some differences in general populations, based on sample data. It is the clinician's task to find such (not any) differences that will help diagnosis or treatment. And not always statistical conclusions are the basis for clinical conclusions. Thus, a statistically significant decrease in hemoglobin by 3 g/l is not a cause for concern. And, conversely, if some problem in the human body does not have a mass character at the level of the entire population, this is not a reason not to deal with this problem.

We will consider this position in example.

The researchers wondered if boys who had some kind of infectious disease were lagging behind their peers in growth. For this purpose, a selective study was conducted, in which 10 boys who had this disease took part. The results are presented in table 23.

Table 23. Statistical results

lower limit

upper limit

Specifications (cm)

middle

From these calculations, it follows that the selective average height of 10-year-old boys who have had some kind of infectious disease is close to normal (132.5 cm). However, the lower limit of the confidence interval (126.6 cm) indicates that there is a 95% probability that the true average height of these children corresponds to the concept of "short stature", i.e. these children are stunted.

In this example, the results of the confidence interval calculations are clinically significant.

Often the appraiser has to analyze the real estate market of the segment in which the appraisal object is located. If the market is developed, it can be difficult to analyze the entire set of presented objects, therefore, a sample of objects is used for analysis. This sample is not always homogeneous, sometimes it is required to clear it of extremes - too high or too low market offers. For this purpose, it is applied confidence interval. The purpose of this study is to conduct a comparative analysis of two methods for calculating the confidence interval and choose the best calculation option when working with different samples in the estimatica.pro system.

Confidence interval - calculated on the basis of the sample, the interval of values ​​of the characteristic, which with a known probability contains the estimated parameter of the general population.

The meaning of calculating the confidence interval is to build such an interval based on the sample data so that it can be asserted with a given probability that the value of the estimated parameter is in this interval. In other words, the confidence interval with a certain probability contains the unknown value of the estimated quantity. The wider the interval, the higher the inaccuracy.

There are different methods for determining the confidence interval. In this article, we will consider 2 ways:

  • through the median and standard deviation;
  • through the critical value of the t-statistic (Student's coefficient).

Stages of a comparative analysis of different methods for calculating CI:

1. form a data sample;

2. we process it with statistical methods: we calculate the mean value, median, variance, etc.;

3. we calculate the confidence interval in two ways;

4. Analyze the cleaned samples and the obtained confidence intervals.

Stage 1. Data sampling

The sample was formed using the estimatica.pro system. The sample included 91 offers for the sale of 1-room apartments in the 3rd price zone with the type of planning "Khrushchev".

Table 1. Initial sample

The price of 1 sq.m., c.u.

Fig.1. Initial sample



Stage 2. Processing of the initial sample

Sample processing by statistical methods requires the calculation of the following values:

1. Arithmetic mean

2. Median - a number that characterizes the sample: exactly half of the sample elements are greater than the median, the other half is less than the median

(for a sample with an odd number of values)

3. Range - the difference between the maximum and minimum values ​​in the sample

4. Variance - used to more accurately estimate the variation in data

5. The standard deviation for the sample (hereinafter referred to as RMS) is the most common indicator of the dispersion of adjustment values ​​around the arithmetic mean.

6. Coefficient of variation - reflects the degree of dispersion of adjustment values

7. oscillation coefficient - reflects the relative fluctuation of the extreme values ​​of prices in the sample around the average

Table 2. Statistical indicators of the original sample

The coefficient of variation, which characterizes the homogeneity of the data, is 12.29%, but the coefficient of oscillation is too large. Thus, we can state that the original sample is not homogeneous, so let's move on to calculating the confidence interval.

Stage 3. Calculation of the confidence interval

Method 1. Calculation through the median and standard deviation.

The confidence interval is determined as follows: the minimum value - the standard deviation is subtracted from the median; the maximum value - the standard deviation is added to the median.

Thus, the confidence interval (47179 CU; 60689 CU)

Rice. 2. Values ​​within confidence interval 1.



Method 2. Building a confidence interval through the critical value of t-statistics (Student's coefficient)

S.V. Gribovsky in the book "Mathematical methods for assessing the value of property" describes a method for calculating the confidence interval through the Student's coefficient. When calculating by this method, the estimator himself must set the significance level ∝, which determines the probability with which the confidence interval will be built. Significance levels of 0.1 are commonly used; 0.05 and 0.01. They correspond to confidence probabilities of 0.9; 0.95 and 0.99. With this method, the true values ​​of the mathematical expectation and variance are considered to be practically unknown (which is almost always true when solving practical evaluation problems).

Confidence interval formula:

n - sample size;

The critical value of t-statistics (Student's distributions) with a significance level ∝, the number of degrees of freedom n-1, which is determined by special statistical tables or using MS Excel (→"Statistical"→ STUDRASPOBR);

∝ - significance level, we take ∝=0.01.

Rice. 2. Values ​​within the confidence interval 2.

Step 4. Analysis of different ways to calculate the confidence interval

Two methods of calculating the confidence interval - through the median and Student's coefficient - led to different values ​​of the intervals. Accordingly, two different purified samples were obtained.

Table 3. Statistical indicators for three samples.

Index

Initial sample

1 option

Option 2

Average value

Dispersion

Coef. variations

Coef. oscillations

Number of retired objects, pcs.

Based on the calculations performed, we can say that the values ​​of the confidence intervals obtained by different methods intersect, so you can use any of the calculation methods at the discretion of the appraiser.

However, we believe that when working in the estimatica.pro system, it is advisable to choose a method for calculating the confidence interval, depending on the degree of market development:

  • if the market is not developed, apply the method of calculation through the median and standard deviation, since the number of retired objects in this case is small;
  • if the market is developed, apply the calculation through the critical value of t-statistics (Student's coefficient), since it is possible to form a large initial sample.

In preparing the article were used:

1. Gribovsky S.V., Sivets S.A., Levykina I.A. Mathematical methods for assessing the value of property. Moscow, 2014

2. Data from the estimatica.pro system

The method for estimating a random error is based on the principles of probability theory and mathematical statistics. It is possible to estimate a random error only in the case when repeated measurements of the same quantity have been carried out.

Let, as a result of the performed measurements, P quantity values X: X 1 , X 2 , …, x n. Denote by the arithmetic mean

In probability theory, it is proved that with an increase in the number of measurements P the arithmetic mean value of the measured value approaches the true:

With a small number of measurements ( P£ 10) the average value may differ significantly from the true one. In order to know how accurately the value characterizes the measured value, it is necessary to determine the so-called confidence interval of the result obtained.

Since an absolutely accurate measurement is impossible, the probability of the correctness of the statement " x has a value exactly equal to» is equal to zero. The probability of the statement x has a value» is equal to one (100%). Thus, the probability of the correctness of any intermediate statement lies in the range from 0 to 1. The purpose of the measurement is to find such an interval in which, with a predetermined probability a(0 < a < 1) находится истинное значение измеряемой величины. Этот интервал называется confidence interval , and the value inextricably linked with it aconfidence level (or reliability factor). The average value calculated by formula (3) is taken as the middle of the interval. Half the width of the confidence interval is the random error D s x(Fig. 1).



Obviously, the width of the confidence interval (and hence the error D s x) depends on how much the individual measurements of quantity x i from the mean value. The "scatter" of the measurement results relative to the average is characterized by root mean square error s, which is found by the formula

, (4)

The width of the desired confidence interval is directly proportional to the root mean square error:



. (5)

Proportionality factor t n, a called Student's coefficient; it depends on the number of experiments P and confidence level a.

On fig. 1, a, b It is clearly shown that, other things being equal, in order to increase the probability that the true value falls into the confidence interval, it is necessary to increase the width of the latter (the probability of "covering" the value X wider interval above). Therefore, the value t n, a should be greater, the higher the confidence level a.

With an increase in the number of experiments, the average value approaches the true value; so with the same probability a the confidence interval can be taken narrower (see Fig. 1, a, c). Thus, with the growth P the sudent coefficient should decrease. Table of values ​​of the Student's coefficient depending on P And a given in the appendices to this manual.

It should be noted that the confidence level has nothing to do with the accuracy of the measurement result. Value a are set in advance, based on the requirements for their reliability. In most technical experiments and in laboratory practice, the value a is taken equal to 0.95.

Calculation of a random error in measuring a quantity X carried out in the following order:

1) the sum of the measured values ​​is calculated, and then the average value of the quantity is calculated according to the formula (3);

2) for each i th experiment, the difference between the measured and average values ​​is calculated, as well as the square of this difference (deviation) (D x i) 2 ;

3) the sum of squared deviations is found, and then the root mean square error s according to formula (4);

4) according to a given confidence level a and the number of experiments P from the table on p. 149 applications select the appropriate value of the Student's coefficient t n, a and the random error D s x according to formula (5).

For the convenience of calculations and verification of intermediate results, the data are entered in a table, the last three columns of which are filled in according to the model of Table 1.

Table 1

Experience number X D X (D X) 2
P
S= S=

In each particular case, the value X has a certain physical meaning and corresponding units of measurement. This can be, for example, the acceleration of free fall g (m/s 2), fluid viscosity h (Pa×s) etc. Missing columns of the table. 1 may contain intermediate measured values ​​necessary to calculate the corresponding values X.

Example 1 To determine acceleration A body movements measured time t passing their way S no initial speed. Using the known relation , we obtain the calculation formula

Path Measurement Results S and time t are given in the second and third columns of Table. 2. After performing calculations using formula (6), we fill in

fourth column with acceleration values a i and find their sum, which we write under this column in the cell "S =". Then we calculate the average value according to the formula (3)

.

table 2

Experience number S, m t, c A, m/s 2 D A, m/s 2 (D A) 2 , (m/s 2) 2
2,20 2,07 0,04 0,0016
2,68 1,95 -0,08 0,0064
2,91 2,13 0,10 0,0100
3,35 1,96 -0,07 0,0049
S= 8,11 S= 0,0229

Subtracting from each value a i average, find the differences D a i and put them in the fifth column of the table. Squaring these differences, we fill in the last column. Then we calculate the sum of squared deviations and write it down in the second cell "S =". According to formula (4), we determine the root-mean-square error:

.

Given the value of the confidence probability a= 0.95, for the number of experiments P= 4 from the table in the appendices (p. 149) select the value of the Student's coefficient t n, a= 3.18; using formula (5), we estimate the random error in measuring the acceleration

D s a= 3.18×0.0437 » 0.139 ( m/s 2) .

Estimation of confidence intervals

Learning objectives

The statistics consider the following two main tasks:

    We have some estimate based on sample data and we want to make some probabilistic statement about where the true value of the parameter being estimated is.

    We have a specific hypothesis that needs to be tested based on sample data.

In this topic, we consider the first problem. We also introduce the definition of a confidence interval.

A confidence interval is an interval that is built around the estimated value of a parameter and shows where the true value of the estimated parameter lies with an a priori given probability.

After studying the material on this topic, you:

    learn what is the confidence interval of the estimate;

    learn to classify statistical problems;

    master the technique of constructing confidence intervals, both using statistical formulas and using software tools;

    learn to determine the required sample sizes to achieve certain parameters of accuracy of statistical estimates.

Distributions of sample characteristics

T-distribution

As discussed above, the distribution of the random variable is close to a standardized normal distribution with parameters 0 and 1. Since we do not know the value of σ, we replace it with some estimate s . The quantity already has a different distribution, namely, or Student's distribution, which is determined by the parameter n -1 (number of degrees of freedom). This distribution is close to the normal distribution (the larger n, the closer the distributions).

On fig. 95
Student's distribution with 30 degrees of freedom is presented. As you can see, it is very close to the normal distribution.

Similar to the functions for working with the normal distribution NORMDIST and NORMINV, there are functions for working with the t-distribution - STUDIST (TDIST) and STUDRASPBR (TINV). An example of the use of these functions can be found in the STUDRIST.XLS file (template and solution) and in fig. 96
.

Distributions of other characteristics

As we already know, to determine the accuracy of the expectation estimate, we need a t-distribution. To estimate other parameters, such as variance, other distributions are required. Two of them are the F-distribution and x 2 -distribution.

Confidence interval for the mean

Confidence interval is an interval that is built around the estimated value of the parameter and shows where the true value of the estimated parameter lies with a priori given probability.

The construction of a confidence interval for the mean value occurs in the following way:

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager plans to randomly select 40 visitors from among those who have already tried it and ask them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected number of points that the new product will receive and construct a 95% confidence interval for this estimate. How to do it? (see file SANDWICH1.XLS (template and solution).

Solution

To solve this problem, you can use . The results are presented in fig. 97
.

Confidence interval for the total value

Sometimes, according to sample data, it is required to estimate not the mathematical expectation, but the total sum of values. For example, in a situation with an auditor, it may be of interest to estimate not the average value of an invoice, but the sum of all invoices.

Let N be the total number of items, n is the sample size, T 3 is the sum of the values ​​in the sample, T" is the estimate for the sum over the entire population, then , and the confidence interval is calculated by the formula , where s is the estimate of the standard deviation for the sample, is the estimate average for the sample.

Example

Let's say a tax office wants to estimate the amount of total tax refunds for 10,000 taxpayers. The taxpayer either receives a refund or pays additional taxes. Find the 95% confidence interval for the refund amount, assuming a sample size of 500 people (see file REFUND AMOUNT.XLS (template and solution).

Solution

There is no special procedure in StatPro for this case, however, you can see that the bounds can be obtained from the bounds for the mean using the above formulas (Fig. 98
).

Confidence interval for proportion

Let p be the expectation of a share of customers, and pv be an estimate of this share, obtained from a sample of size n. It can be shown that for sufficiently large the estimate distribution will be close to normal with mean p and standard deviation . The standard error of the estimate in this case is expressed as , and the confidence interval as .

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager randomly selected 40 visitors from among those who had already tried it and asked them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected proportion of customers who rate the new product at least than 6 points (he expects these customers to be the consumers of the new product).

Solution

Initially, we create a new column on the basis of 1 if the client's score was more than 6 points and 0 otherwise (see the SANDWICH2.XLS file (template and solution).

Method 1

Counting the amount of 1, we estimate the share, and then we use the formulas.

The value of z cr is taken from special normal distribution tables (for example, 1.96 for a 95% confidence interval).

Using this approach and specific data to construct a 95% interval, we obtain the following results (Fig. 99
). The critical value of the parameter z cr is 1.96. The standard error of the estimate is 0.077. The lower limit of the confidence interval is 0.475. The upper limit of the confidence interval is 0.775. Thus, a manager can assume with 95% certainty that the percentage of customers who rate a new product 6 points or more will be between 47.5 and 77.5.

Method 2

This problem can be solved using standard StatPro tools. To do this, it suffices to note that the share in this case coincides with the average value of the Type column. Next apply StatPro/Statistical Inference/One-Sample Analysis to build a confidence interval for the mean value (expectation estimate) for the Type column. The results obtained in this case will be very close to the result of the 1st method (Fig. 99).

Confidence interval for standard deviation

s is used as an estimate of the standard deviation (the formula is given in section 1). The density function of the estimate s is the chi-squared function, which, like the t-distribution, has n-1 degrees of freedom. There are special functions for working with this distribution CHI2DIST (CHIDIST) and CHI2OBR (CHIINV) .

The confidence interval in this case will no longer be symmetrical. The conditional scheme of the boundaries is shown in fig. 100 .

Example

The machine should produce parts with a diameter of 10 cm. However, due to various circumstances, errors occur. The quality controller is concerned about two things: first, the average value should be 10 cm; secondly, even in this case, if the deviations are large, then many details will be rejected. Every day he makes a sample of 50 parts (see file QUALITY CONTROL.XLS (template and solution). What conclusions can such a sample give?

Solution

We construct 95% confidence intervals for the mean and for the standard deviation using StatPro/Statistical Inference/ One-Sample Analysis(Fig. 101
).

Further, using the assumption of a normal distribution of diameters, we calculate the proportion of defective products, setting a maximum deviation of 0.065. Using the capabilities of the lookup table (the case of two parameters), we construct the dependence of the percentage of rejects on the mean value and standard deviation (Fig. 102
).

Confidence interval for the difference of two means

This is one of the most important applications of statistical methods. Situation examples.

    A clothing store manager would like to know how much more or less the average female shopper spends in the store than a male.

    The two airlines fly similar routes. A consumer organization would like to compare the difference between the average expected flight delay times for both airlines.

    The company sends out coupons for certain types of goods in one city and does not send out in another. Managers want to compare the average purchases of these items over the next two months.

    A car dealer often deals with married couples at presentations. To understand their personal reactions to the presentation, couples are often interviewed separately. The manager wants to evaluate the difference in ratings given by men and women.

Case of independent samples

The mean difference will have a t-distribution with n 1 + n 2 - 2 degrees of freedom. The confidence interval for μ 1 - μ 2 is expressed by the ratio:

This problem can be solved not only by the above formulas, but also by standard StatPro tools. To do this, it is enough to apply

Confidence interval for difference between proportions

Let be the mathematical expectation of the shares. Let be their sample estimates built on samples of size n 1 and n 2, respectively. Then is an estimate for the difference . Therefore, the confidence interval for this difference is expressed as:

Here z cr is the value obtained from the normal distribution of special tables (for example, 1.96 for 95% confidence interval).

The standard error of the estimate is expressed in this case by the relation:

.

Example

The store, in preparation for the big sale, undertook the following marketing research. The top 300 buyers were selected and randomly divided into two groups of 150 members each. All of the selected buyers were sent invitations to participate in the sale, but only for members of the first group was attached a coupon giving the right to a 5% discount. During the sale, the purchases of all 300 selected buyers were recorded. How can a manager interpret the results and make a judgment about the effectiveness of couponing? (See COUPONS.XLS file (template and solution)).

Solution

For our particular case, out of 150 customers who received a discount coupon, 55 made a purchase on sale, and among 150 who did not receive a coupon, only 35 made a purchase (Fig. 103
). Then the values ​​of the sample proportions are 0.3667 and 0.2333, respectively. And the sample difference between them is equal to 0.1333, respectively. Assuming a confidence interval of 95%, we find from the normal distribution table z cr = 1.96. The calculation of the standard error of the sample difference is 0.0524. Finally, we get that the lower limit of the 95% confidence interval is 0.0307, ​​and the upper limit is 0.2359, respectively. The results obtained can be interpreted in such a way that for every 100 customers who received a discount coupon, we can expect from 3 to 23 new customers. However, it should be kept in mind that this conclusion in itself does not mean the efficiency of using coupons (because by providing a discount, we lose in profit!). Let's demonstrate this on concrete data. Suppose that the average purchase amount is 400 rubles, of which 50 rubles. there is a store profit. Then the expected profit per 100 customers who did not receive a coupon is equal to:

50 0.2333 100 \u003d 1166.50 rubles.

Similar calculations for 100 buyers who received a coupon give:

30 0.3667 100 \u003d 1100.10 rubles.

The decrease in the average profit to 30 is explained by the fact that, using the discount, buyers who received a coupon will, on average, make a purchase for 380 rubles.

Thus, the final conclusion indicates the inefficiency of using such coupons in this particular situation.

Comment. This problem can be solved using standard StatPro tools. To do this, it suffices to reduce this problem to the problem of estimating the difference of two averages by the method, and then apply StatPro/Statistical Inference/Two-Sample Analysis to build a confidence interval for the difference between two mean values.

Confidence interval control

The length of the confidence interval depends on following conditions:

    directly data (standard deviation);

    significance level;

    sample size.

Sample size for estimating the mean

Let us first consider the problem in the general case. Let us denote the value of half the length of the confidence interval given to us as B (Fig. 104
). We know that the confidence interval for the mean value of some random variable X is expressed as , Where . Assuming:

and expressing n , we get .

Unfortunately, we do not know the exact value of the variance of the random variable X. In addition, we do not know the value of t cr as it depends on n through the number of degrees of freedom. In this situation, we can do the following. Instead of the variance s, we use some estimate of the variance for some available realizations of the random variable under study. Instead of the t cr value, we use the z cr value for the normal distribution. This is quite acceptable, since the density functions for the normal and t-distributions are very close (except for the case of small n ). Thus, the desired formula takes the form:

.

Since the formula gives, generally speaking, non-integer results, rounding with an excess of the result is taken as the desired sample size.

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager randomly plans to select a number of visitors from among those who have already tried it, and ask them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected number of points that the new product will receive. product and plot the 95% confidence interval of that estimate. However, he wants half the width of the confidence interval not to exceed 0.3. How many visitors does he need to poll?

as follows:

Here r ots is an estimate of the fraction p, and B is a given half of the length of the confidence interval. An inflated value for n can be obtained using the value r ots= 0.5. In this case, the length of the confidence interval will not exceed the given value B for any true value of p.

Example

Let the manager from the previous example plan to estimate the proportion of customers who prefer a new type of product. He wants to construct a 90% confidence interval whose half length is less than or equal to 0.05. How many clients should be randomly sampled?

Solution

In our case, the value of z cr = 1.645. Therefore, the required quantity is calculated as .

If the manager had reason to believe that the desired value of p is, for example, about 0.3, then by substituting this value in the above formula, we would get a smaller value of the random sample, namely 228.

Formula to determine random sample sizes in case of difference between two means written as:

.

Example

Some computer company has a customer service center. Recently, the number of customer complaints about the poor quality of service has increased. The service center mainly employs two types of employees: those with little experience, but who have completed special training courses, and those with extensive practical experience, but who have not completed special courses. The company wants to analyze customer complaints over the past six months and compare their average numbers per each of the two groups of employees. It is assumed that the numbers in the samples for both groups will be the same. How many employees must be included in the sample to get a 95% interval with a half length of no more than 2?

Solution

Here σ ots is an estimate of the standard deviation of both random variables under the assumption that they are close. Thus, in our task, we need to somehow obtain this estimate. This can be done, for example, as follows. Looking at customer complaint data over the past six months, a manager may notice that there are generally between 6 and 36 complaints per employee. Knowing that for a normal distribution virtually all values ​​are no more than three standard deviations from the mean, he can reasonably believe that:

Whence σ ots = 5.

Substituting this value into the formula, we get .

Formula to determine the size of a random sample in the case of estimating the difference between the shares looks like:

Example

Some company has two factories for the production of similar products. The manager of a company wants to compare the defect rates of both factories. According to available information, the rejection rate at both factories is from 3 to 5%. It is supposed to build a 99% confidence interval with a half length of no more than 0.005 (or 0.5%). How many products should be selected from each factory?

Solution

Here p 1ot and p 2ot are estimates of two unknown fractions of rejects at the 1st and 2nd factories. If we put p 1ots \u003d p 2ots \u003d 0.5, then we will get an overestimated value for n. But since in our case we have some a priori information about these shares, we take the upper estimate of these shares, namely 0.05. We get

When some population parameters are estimated from sample data, it is useful to provide not only a point estimate of the parameter, but also a confidence interval that shows where the exact value of the parameter being estimated may lie.

In this chapter, we also got acquainted with quantitative relationships that allow us to build such intervals for various parameters; learned ways to control the length of the confidence interval.

We also note that the problem of estimating the sample size (experiment planning problem) can be solved using standard StatPro tools, namely StatPro/Statistical Inference/Sample Size Selection.

Similar articles

2023 liveps.ru. Homework and ready-made tasks in chemistry and biology.