What are general and sample populations? General and sample populations

So, the patterns to which the random variable under study is subject are physically completely determined by the real set of conditions for its observation (or experiment), and are mathematically specified by the corresponding probability space or, what is the same, by the corresponding law of probability distribution. However, when conducting statistical research, another terminology associated with the concept of a general population turns out to be somewhat more convenient.

The general population is the totality of all conceivable observations (or all mentally possible objects of the type we are interested in, from which observations are “taken”) that could be made under a given real set of conditions. Since the definition refers to all mentally possible observations (or objects), the concept of a general population is a conventionally mathematical, abstract concept and should not be confused with real populations subject to statistical research. Thus, having examined even all enterprises of the sub-industry from the point of view of recording the values ​​of the technical and economic indicators characterizing them, we can consider the surveyed population only as a representative of a hypothetically possible wider population of enterprises that could operate within the same real set of conditions

IN practical work It is more convenient to associate the choice with the objects of observation rather than with the characteristics of these objects. We select machines, geological samples, people for study, but not the values ​​of the characteristics of machines, samples, people. On the other hand, in mathematical theory, objects and the set of their characteristics do not differ and the duality of the introduced definition disappears.

As we see, the mathematical concept of “general population” is physically completely determined, just like the concepts of “probability space”, “random variable” and “probability distribution law”, by the corresponding real set of conditions, and therefore all these four mathematical concepts can be considered in a certain meaning synonyms. A population is called finite or infinite depending on whether the collection of all conceivable observations is finite or infinite.

From the definition it follows that continuous population(consisting of observations of signs of a continuous nature) are always infinite. Discrete general populations can be either infinite or finite. For example, if a batch of N products is analyzed for grade (see example in clause 4.1.3), when each product can be assigned to one of four grades, the random variable under study is the grade number of a product randomly extracted from the batch, and the set of possible values random variable consists respectively of four points (1, 2, 3 and 4), then, obviously, the population will be finite (only N conceivable observations).

The concept of an infinite population is a mathematical abstraction, as is the idea that the measurement of a random variable can be repeated an infinite number of times. An approximately infinite general population can be interpreted as a limiting case of a finite one, when the number of objects generated by a given real set of conditions increases indefinitely. So, if in the example just given, instead of batches of products, we consider continuous mass production of the same products, then we will arrive at the concept of an infinite general population. In practice, such a modification is equivalent to the requirement

A sample from a given population is the results of a limited series of observations of a random variable. A sample can be considered as a kind of empirical analogue of a general population, something that we most often deal with in practice, since surveying the entire general population can be either too labor-intensive (in the case of large N) or fundamentally impossible (in the case of infinite general populations).

The number of observations that form a sample is called the sample size.

If the sample size is large and we are dealing with a one-dimensional continuous value(or with a one-dimensional discrete data, the number of possible values ​​of which is quite large, say more than 10), then it is often more convenient, from the point of view of simplifying further statistical processing of observation results, to move on to the so-called “grouped” sample data. This transition is usually carried out as follows:

a) the smallest and highest value in the sample;

b) the entire surveyed range is divided into a certain number of 5 equal grouping intervals; in this case, the number of intervals s should not be less than 8-10 and more than 20-25: the choice of the number of intervals significantly depends on the sample size; for an approximate orientation in the choice 5, you can use the approximate formula

which should be taken rather as a lower estimate for s (especially for large

c) the extreme points of each of the intervals are marked in ascending order, as well as their midpoints

d) the number of sample data falling into each of the intervals is counted: (obviously); sample data that falls on the boundaries of the intervals are either evenly distributed over two adjacent intervals, or they are agreed to be assigned only to one of them, for example, to the left one.

Depending on the specific content of the problem, some modifications may be made to this grouping scheme (for example, in some cases it is advisable to abandon the requirement of equal lengths of grouping intervals).

In all further arguments using sample data, we will proceed from the notation just described.

Let us recall that the essence of statistical methods is to use a certain part of the general population (i.e., a sample) to make judgments about its properties as a whole.

One of the most important issues, the successful solution of which determines the reliability of the conclusions obtained as a result of statistical processing of data, is the issue of representativeness of the sample, i.e. the question of the completeness and adequacy of its representation of the properties of the analyzed general population that interest us. In practical work, the same group of objects taken for study can be considered as a sample from different general populations. Thus, a group of families randomly selected from the cooperative houses of one of the housing maintenance offices (ZhEK) of one of the city districts for a detailed sociological survey can be considered both as a sample from the general population of families (with a cooperative form of housing) of this ZhEK, and as a sample from the general population families of a given area, and as a sample from the general population of all families in the city, and, finally, as a sample from the general population of all families in the city living in cooperative houses. The meaningful interpretation of the testing results significantly depends on which general population we are considering the selected group of families as a representative of, for which general population this sample can be considered representative. The answer to this question depends on many factors. In the above example, in particular, it depends on the presence or absence of a special (perhaps hidden) factor that determines the family’s affiliation with a given housing office or the district as a whole (such a factor could be, for example, the average per capita income of the family, the geographic location of the district in the city, “ age" of the area, etc.).


The entire array of individuals of a certain category is called the general population. The size of the population is determined by the objectives of the study.

If any species of wild animals or plants is studied, then the general population will be all individuals of this species. In this case, the volume of the population will be very large and in calculations it is taken as an infinitely large value.

If the effect of an agent on plants and animals of a certain category is being studied, then the general population will be all plants and animals of that category (species, sex, age, economic purpose) to which the experimental objects belonged. This is no longer a very large number of individuals, but it is still inaccessible for comprehensive study.

The volume of the general population is not always available for a comprehensive study. Sometimes small populations are studied, for example, the average milk yield or the average wool clipping of a group of animals assigned to a certain worker is determined. In such cases, the population will be a very small number of individuals, all of which are studied. A small population is also found when studying plants or animals found in a collection in order to characterize a certain group in this collection.

Characteristics of group properties (etc.) related to the entire population are called general parameters.

A sample is a group of objects that differ in three features:

1 is part of the general population;

2 randomly selected in a certain way;

3 studied to characterize the entire population.

In order to obtain a sufficiently accurate characteristic of the entire population from a sample, it is necessary to organize the correct selection of objects from the population.

Theory and practice have developed several systems for selecting individuals for the sample. All these systems are based on the desire to provide the maximum opportunity to select any object from the general population. Tendency and bias in the selection of objects for a sample study prevent the receipt of correct general conclusions and make the results of a sample study non-indicative of the entire population, i.e., unrepresentative.

To obtain a correct, undistorted characteristic of the entire population, it is necessary to strive to ensure the possibility of selecting any object from any part of the population into the sample. This basic requirement must be fulfilled the more strictly, the more variable the trait being studied. It is understandable that when diversity approaches zero, such as in the case of studies of hair or feather color in some species, any method of sample selection will produce representative results.

In various studies, the following methods of selecting objects in the sample are used.

4 Random repeated selection, in which objects of study are selected from the general population without first taking into account the development of the characteristic being studied, i.e., in a random (for a given characteristic) order; After selection, each object is studied and then returned to its population, so that any object can be re-selected. This method of selection is equivalent to selection from an infinitely large general population, for which the main indicators of the relationship between sample and general values ​​have been developed.

5 Random non-repetitive selection, in which objects selected, as in the previous method, by chance, do not return to the general population and cannot be re-entered into the sample. This is the most common way to organize a sample; it is equivalent to selection from a large but limited population, which is taken into account when determining general indicators from samples.

6 Mechanical selection, in which objects are selected from individual parts of the general population, and these parts are preliminarily designated mechanically according to squares of the experimental field, according to random groups of animals taken from different areas of the population, etc. Usually as many such parts are outlined as are expected to be taken objects to be studied, so the number of parts is equal to the size of the sample. Mechanical selection is sometimes carried out by choosing to study individuals after a certain number, for example, by passing animals through a split and selecting every tenth, hundredth, etc., or by taking a mow every 100 or 200 m, or by selecting one object every 10 encountered. 100, etc. specimens when studying the entire population.

8 Serial (cluster) selection, in which the general population is divided into parts - series, some of them are studied entirely. This method is used successfully in cases where the objects under study are fairly evenly distributed in a certain volume or over a certain territory. For example, when studying the contamination of air or water with microorganisms, samples are taken and subjected to complete examination. In some cases, agricultural objects can also be surveyed using the nesting method. When studying the yield of meat and other processed products of a meat breed of livestock, the sample can take all animals of this breed that arrived at two or three meat processing plants. When studying egg size in collective farm poultry farming, it is possible to study this trait in several collective farms across the entire chicken population.

Characteristics of group properties (μ, s etc.) obtained for the sample are called sample indicators.

Representativeness

Direct study of a group of selected objects provides, first of all, primary material and characteristics of the sample itself.

All sample data and summary indicators are important as primary facts revealed by the study and are subject to careful consideration, analysis and comparison with the results of other works. But this does not limit the process of extracting information inherent in the primary research materials.

The fact that objects were selected for the sample using special methods and in sufficient quantity makes the results of the study of the sample indicative not only for the sample itself, but also for the entire population from which this sample was taken.

A sample, under certain conditions, becomes a more or less accurate reflection of the entire population. This property of a sample is called representativeness, which means representativeness with a certain accuracy and reliability.

Like any property, the representativeness of sample data can be expressed to a sufficient or insufficient extent. In the first case, reliable estimates of the general parameters are obtained in the sample; in the second, unreliable estimates are obtained. It is important to remember that obtaining unreliable estimates does not detract from the value of sample indicators for characterizing the sample itself. Obtaining reliable estimates expands the scope of application of the achievements obtained in a sample study.

Research usually begins with some assumption that requires testing with facts. This assumption - a hypothesis - is formulated in relation to the connection of phenomena or properties in a certain set of objects. To test such assumptions against facts, it is necessary to measure the corresponding properties of their bearers. But it is impossible to measure, for example, anxiety in all adolescents. Therefore, when conducting research, it is limited to only a relatively small group of representatives of the relevant populations of people.

Population- this is the entire set of objects in relation to which a research hypothesis is formulated. Theoretically, it is believed that the size of the population is unlimited. In practice, the volume of the general population is always limited and can vary depending on the subject of observation and the task that the psychologist has to solve. Typically, the general population includes a very large number of objects - university students, schoolchildren, enterprise employees, pensioners, etc. A complete study of general populations is extremely difficult, therefore, as a rule, a small part of the general population is studied, called a sample population, or sample.

Sampling - this is a limited number of objects (in psychology - subjects, respondents), specially selected from the general population to study its properties. Accordingly, studying the properties of a population using a sample is called sampling research. Almost all psychological studies are selective, and their conclusions apply to general populations.

A number of mandatory requirements are applied to the sample, determined primarily by the goals and objectives of the study. It should be such that the generalization of the findings of a sample study is justified - generalization, extension of them to the general population.

The sample must satisfy the following conditions:



1. This is a group of objects available for study. The sample size is determined by the tasks and capabilities of observation and experiment.

2. It is part of a pre-designated population.

3. It is a group selected at random so that any item in the population has an equal chance of being included in the sample.

The main criteria for the validity of research findings are the representativeness of the sample and the statistical reliability of the (empirical) results.

Representativeness - in other words, its representativeness is the ability to characterize the corresponding population with a certain accuracy and sufficient reliability. If the sample of subjects is representative of the general population in its characteristics, then there is reason to extend the results obtained from its study to the entire general population.

Ideally, a representative sample should be such that each of the main characteristics, traits, personality traits, etc. studied by a psychologist is represented in it in proportion to these same characteristics in the general population.

Representativeness errors arise in two cases:

1. A small sample characterizing the general population.

2. Discrepancy between the properties (parameters) of the sample and the parameters of the general population.

Statistical significance The , or statistical significance, of a study's results is determined using statistical inference techniques. These methods will be discussed in more detail in the topic “Testing Hypotheses.” Note that they impose certain requirements on the size or size of the sample.

The largest sample size is required when developing a diagnostic technique - from 200 to 1000-2500 people.

If it is necessary to compare 2 samples, their total number must be at least 50 people; the number of samples being compared should be approximately the same.

If the relationship between any properties is being studied, the sample size should be at least 30-35 people.

The greater the variability of the property being studied, the larger the sample size should be. Therefore, variability can be reduced by increasing the homogeneity of the sample, for example by gender, age, etc. This naturally reduces the possibility of generalizing conclusions.

Dependent and independent samples. A common research situation is when a property of interest to a researcher is studied on two or more samples for the purpose of further comparison. These samples can be in different proportions, depending on the procedure for their organization. Independent samples are characterized by the fact that the probability of selection of any subject in one sample does not depend on the selection of any of the subjects in the other sample. On the contrary, dependent samples are characterized by the fact that each subject from one sample is matched according to a certain criterion by a subject from another sample.

The most typical example of an independent sample is, for example, a comparison of men and women in terms of intelligence.

Population - the set of people about whom the sociologist seeks to obtain information in his research. Depending on how broad the research topic is, the population will be equally broad.

Sample population – reduced population model; those to whom the sociologist distributes questionnaires, who are called respondents, who, finally, are the object of sociological research.

Who exactly is included in the general population is determined by the objectives of the study, and who is included in the sample population is decided mathematical methods. If a sociologist intends to look at the Afghan war through the eyes of its participants, the general population will include all Afghan soldiers, but he will have to interview a small part - the sample population. In order for the sample to accurately reflect the general population, the sociologist adheres to the rule: any Afghan soldier, regardless of place of residence, place of work, health status and other circumstances, must have the same probability of being included in the sample population.

Once the sociologist has decided who he wants to interview, he determines sampling frame. Then the question of the type of sampling is decided.

The samples are divided into three large classes:

A) solid(censuses, referendums). All units from the population are surveyed;

b) random;

V) non-random.

Random and non-random types of sampling, in turn, are divided into several types.

Random ones include:

1) probabilistic;

2) systematic;

3) zoned (stratified);

4) nesting

Non-random ones include:

1) "spontaneous";

2) quota;

3) "main array" method.

A complete and accurate list of units in the sample population forms sampling frame . The elements intended for selection are called selection units . Sampling units may be the same as observation units because unit of observation is considered to be an element of the general population from which information is directly collected. Typically the unit of observation is the individual. Selection from a list is best done by numbering the units and using a table of random numbers, although a quasi-random method is often used, when every nth element is taken from a simple list.

If the sampling frame includes a list of sampling units, then the sampling structure implies grouping them according to some important characteristics, for example, the distribution of individuals by profession, qualifications, gender or age. If in the general population, for example, there are 30% young people, 50% middle-aged people and 20% elderly people, then the same percentage proportions of the three ages must be observed in the sample population. Ages can be supplemented by classes, gender, nationality, etc. For each, percentage proportions are established in the general and sample populations. Thus, sampling frame – percentage proportions of the characteristics of the object, on the basis of which the sample population is compiled.

While sample type tells us how people are included in the sample, sample size tells us how many people are included.

Sample size – number of units in the sample population. Since the sample population is a part of the general population selected using special methods, its volume is always less than the volume of the general population. Therefore, it is so important that the part does not distort the idea of ​​the whole, that is, it represents it.

The reliability of the data is influenced not by the quantitative characteristics of the sample population (its volume), but by the qualitative characteristics of the general population - the degree of its homogeneity. The discrepancy between the general population and the sample population is called representativeness error , permissible deviation – 5%.

Here are some ways to avoid the error:

    each unit in the population should have an equal probability of being included in the sample;

    it is advisable to select from homogeneous populations;

    you need to know the characteristics of the population;

    When compiling a sample population, random and systematic errors must be taken into account.

If the sample population (sample) is drawn up correctly, then the sociologist obtains reliable results that characterize the entire population.

What are the main sampling methods?

Mechanical sampling method, when the required number of respondents is selected from the general list of the general population at regular intervals (for example, every 10th).

Serial sampling method. In this case, the general population is divided into homogeneous parts and units of analysis are proportionally selected from each (for example, 20% of men and women in an enterprise).

Cluster sampling method. The units of selection are not individual respondents, but groups with subsequent continuous research in them. This sample will be representative if the composition of the groups is similar (for example, one group of students from each stream of a university department).

Main Array Method– survey of 60–70% of the general population.

Quota sampling method. Most complex method, requiring the determination of at least four characteristics by which respondents are selected. Usually used with a large population.

As a result of studying the material in Chapter 2, the student should:

know

  • basic concepts of general and sample populations;
  • estimation methods, types and properties of estimates of parameters of the general population;
  • basic methods for statistical testing of hypotheses regarding the parameters of univariate and multivariate populations;

be able to

  • find estimates of the parameters of unidimensional and multidimensional general populations using sample data;
  • analyze the properties of parameters;
  • test hypotheses regarding the parameters and type of distribution of the population;
  • compare parameters of several general populations;

own

  • skills statistical evaluation parameters of one-dimensional and multidimensional general populations;
  • skills in testing hypotheses regarding the parameters and type of distribution of the population when conducting socio-economic research using analytical software.

Population distribution

Probabilistic statistical methods data analysis assume that the patterns to which the variable under study (random variable) obeys are completely determined by the set of conditions for its observation. Mathematically, these patterns are given by the corresponding law of probability distribution. However, when conducting statistical research, the concept of a general population is more convenient.

Thus, the mathematical concepts “general population”, “random variable” and “law of probability distribution”, corresponding to a given set of conditions, can be considered in a certain sense synonymous.

General population name the set of all conceivable observations that could be made under a given set of conditions.

Since the definition deals with mentally possible observations (or objects), the general population is an abstract concept, and it should not be confused with real populations subject to statistical research. Thus, having examined even all the enterprises of a sub-industry, we can consider them as representatives of a hypothetically possible broader set of enterprises that could operate within a set of conditions.

The general population can be either finite or infinite. Ultimate aggregation occurs, for example, in a survey of family budgets, when a sample is taken from the totality of families actually existing in the country. Then observations are made of the income and expenses of selected families. Infinite the general population is observed, for example, in scientific research when we are interested average result a large number of experiments.

In the simplest case, the population is a one-dimensional random variable X with a distribution function that determines the probability that X will take a value less than a fixed real number.

In general, general populations are studied that include several characteristics (usually more than two). The set of features under consideration is denoted by a vector having k components, each of which characterizes the corresponding feature. For vector analysis X Multivariate statistical methods are used.

Thus, the object of study in multivariate analysis is a random vector X, or a random point in ft-dimensional Euclidean space, system To random (one-dimensional) variables, ft-dimensional random variable

Random vector distribution function is a deterministic non-negative quantity determined by the formula

where is a dimensional vector of fixed real numbers.

Deterministic non-negative quantity F(X)

There are:

  • continuous k-dimensional random variables, all components of which are continuous (one-dimensional) random variables;
  • discrete k-dimensional random variables, all components of which are discrete random variables;
  • mixed k-dimensional random variables, among the components of which there are both discrete and continuous random variables.

Distribution function F(X) for continuous k-dimensional random variable is continuous by definition.

Probability distribution density of a continuous k-dimensional random variable satisfies the condition

Density f(X) has the following properties:

The area bounded above by the density graph is always equal to unity:

where through k the total number (multiplicity) of integrals is indicated;

Probability of point () hitting some area G equal to

From the definition of density it follows that if we integrate the joint distribution density of two quantities X 1, X 2 one at a time, for example, within infinite limits, we obtain the probability distribution density of another value:

Similarly we have

Probability densities, distribution functions of subsystems, random variables of the system To random variables are called private or marginal distributions .

Conditional distributions random vector X are called distributions of a subsystem, its components, provided that the remaining components are fixed. These components will be separated from non-fixed ones by a slash.

For a continuous random variable, for example, formulas are valid that determine the density of the conditional distribution of a two-dimensional random variable (), which is a subsystem of the system (), provided that the last three components are fixed in it:

Subsystem, component and additional subsystem vector components X are called independent(stochastically, probabilistically), if the equality is true

In particular, the components of the vector X are called independent, If

In the case of independence, similar formulas are valid for the products of densities or probabilities of marginal distributions and the coincidence of conditional distributions with the corresponding marginal ones (23).

Related articles

2024 liveps.ru. Homework and ready-made problems in chemistry and biology.