The number of hours spent by high school students on. Calculation of correlation dependencies in Microsoft Excel

In the first grade there are no at all, in 2-3 it is one and a half hours, in 4-5 classes - two hours, at 6-8 - two and a half hours, and from the 9th to the 11th grade, the student should spend on homework Not more than 3.5 hours a day. At the same time, difficult educational items, according to which the school usually ask a lot, should not stand in the schedule in one day. Simply put, there can be no chemistry, biology, physics, mathematics in one day.

The same thing, by the way, is stated in Sanpins, who entered into force in 2011. Why did you need to remind the schools to school again? In the Ministry of Education and Science "RG" said that recently there are many complaints from parents to a large amount of homework and the load at school.

The ministry issued a draft change in the order of the organization and implementation of educational activities, prescribing the requirements approved by Sanpin in it to pay special attention to schools on the volume of permissible loads, explained "RG" in the Ministry of Education and Science.

There are really problems with the "homework". Between schools compete for grants, subsidies, the results of the exam and place in the ratings. This year, mandatory monitoring will be held even in elementary school. Naturally, no one wants to be in duders. Teachers tripled the efforts in the lessons and added homework.

There are schools, where for the performance of homework with children in extinguishing take money and consider it an additional service.

Sometimes the teacher thinks: the more he asks, the better. Home tasks are needed, but need their regulation. Fully teachers who make a lot of children load a lot. Do not have a child to sit behind lessons at six o'clock a day! - expresses the deputy director of the famous Moscow School N57 Boris Davidovich. "But I would define my homework not in time, but by volume of material: they made four examples in class - it is impossible to give it more than six.

In the 57th Physical and Mathematical School, the task of the house student can solve all day or week. "And sometimes all my life!" - Judit Boris Mikhailovich.

We switched to a five-day to unload children, "says the director of the N17 school Bryansk Irina Golikova," we try to give less tasks for the weekend. The second shift is studying up to seven in the evening, and the lessons usually do children in the morning. "" Whether it turns out, without parents? - I specify. - How does this affect the progress? "-" We have one of the best schools! "

Do you really need homework? Once in the XIX century on this occasion, the dispute was already flared up, but the "home" resisted. In 1917, it was canceled and returned in the 30s of the last century. Now some experts are asked again: if we talk about the transition to a full day school, refusing a tight framework of the lesson, maybe it's really time to cancel and homework?

Irina Ilina, Mother of the Timberry, Professor RGSU:

The son spends the lessons as much time as some students in preparation for the seminar are about two hours. But the seminar is once a week, and lessons - every day.

"The theory of Darwin's evolution was a great impression on Galton, and especially the idea that the individuals belonging to one biodiversity differ from each other. Individual features that contribute to survival are subject to "natural selection" and are transmitted to descendants. Galton believed that intelligence is a feature that differs from all people is important for survival and is inherited in the same way as physical characteristics, for example, eye color or growth. He collected the facts confirming the inputness of the intellect, and published two books on this issue: "hereditary geniuses" (1869) and "British scientists: nature and education" (1874). The last work was popularized by the well-known terms "Nature" and "Education" (Nurture). In his works, Gapiton noted a statistical trend in the fact that the genius and abilities that manifest themselves in certain areas (for example, the ability to chemistry or jurisprudence) are traced in several generations inside the family. However, he underestimated the influence of the environment and concluded that genius arises as a result of the transfer of hereditary information. He argued his point of view, in particular, the fact that the intellect in the population has a normal distribution. Other inherited features (for example, height) also have a normal distribution, and therefore Galton adopted this statistical fact for an indicator of heredity influence.

Only in 1888, the scientist managed to show a high frequency of appearance of such features as genius in families: he formulated his ideas in the work, called "correlation and its measurement". First, Galton discovered that data can be organized in a special way according to the ranks and columns and received the prototype of today's scattering graph. Secondly, Galton noticed that when the "correlation" was incomplete, one regularity began to manifest. Parents with the growing above average were high children, but quite often they were not as high as the mother and father. Parents with growth below the average children were low, but not so much. This means that the growth in children tends to shift, or regress, in the direction of the average arithmetic value in the population.

The phenomenon of "regression to the average", which represents the threat of internal research validity, is one of the most prominent discoveries of Galton.

The third observation of Galton consisted that the schedule built by the values \u200b\u200bof the average arithmetic for each scattering table column gives a more or less straight line. In fact, it is a kind of "regression line". Thus, Galton opened the main characteristics of correlation analysis.

After reading the work of Galton, Karl Pearson continued its surveys in this area and developed a formula for calculating the correlation coefficient. He marked the coefficient of the letter "R", which means "regression", in honor of the discovery made by Galton, the opening of regression to the average. Following Galton Pearson believed that the correlation analysis confirms the idea of \u200b\u200binhereting many properties that find themselves in individual families. " (Quote. On Goodwin D., Research in psychology. Peter, 2004, p.312-313).

It is believed that variables correlate if there is any relationship between them. This implies the term "correlation" itself - mutual relationship, attitude. In the case of a direct or positive correlation, the relationship is such that high values \u200b\u200bof one variable are associated with high values \u200b\u200bof the other, and low values \u200b\u200bof the first second values \u200b\u200bof the second. Negative correlation means reverse relationship. High values \u200b\u200bof one variable are associated with low values \u200b\u200bof the other, and vice versa.

The relationship between time dedicated to classes and estimates is an example of a positive correlation. An example of a negative correlation may be the relationship between useless time spent and middle score. Useless time spent time operational Determine as the number of hours a week spent on certain classes, such as video games or viewing television series.

The correlation force shows a special magnitude of descriptive statistics - "correlation coefficient". The correlation coefficient is -1.00 in the case of direct negative correlation, 0.00 in the absence of interconnection and +1.00 with a complete positive correlation. The most common correlation coefficient is Mr. Pearson. Pearsonovo g is calculated for data obtained using interval scales or ratio. In the case of other measurement scales, other types of correlation are considered. For example, for sequence data (i.e. ordered), ρ (RO) of the spirote (otherwise this statisticians are denoted as R s).

Just as an arithmetic average and standard deviation, the correlation coefficient is the size of descriptive statistics. During the final analysis, it is determined whether a particular correlation is significantly larger (or less) zero. Thus, for the correlation studies, the zero hypothesis (H 0) says that the actual value of r \u003d 0 (i.e. there is no interconnection), and an alternative hypothesis (H 1) - that g ≠ 0. Reject the zero hypothesis - it means to solve, that there is a significant relationship between two variables.

Scattering schedule

The correlation force can be detected by considering the scattering schedule. It is a graphical display of the relationship to which the correlation indicates. In the case of a full positive or complete negative correlation, the point form a straight line, and the zero correlation gives a scattering schedule (A), the points of which are randomly distributed. Compared to moderate correlation (g and e), the points are strongly located closer to each other (b and c). In general, as the point correlation loa additionally, the scattering graph is increasingly removed from the diagonal binding point when elections equal to +1.00 or -1.00.

a) r \u003d 0 b) r \u003d -0.9 c) r \u003d +0.9

d) r \u003d - 0.56 d) r \u003d +0.61

The scattering schedules considered above (except a) are approximated by straight lines, that is, reflected linear dependencies. However, not all relationships are linear, and the calculation R Pearson for a nonlinear case will not help identify the nature of such interrelation. The following figure shows a hypothetical example of communication between the excitation and execution of the task, the Ilspective Law of Yerks-Dodson: complex tasks are performed well with an average level of excitation, but it is bad at very low and very high. From the scattering graph, it can be seen that the points fall along a certain curve, but when trying to apply a linear correlation, we get R, close to zero.

When conducting a correlation research, it is important to take into account people whose estimates fall into a wide range. Restricting the range of one or both variables reduces the correlation. Suppose we study the relationship between the middle score of the school certificate and the academic performance in the university (assessed by middle points received by the first holidays at the end of the year). In fig. a) It is shown how the scattering schedule may be in the study of 25 students. The correlation coefficient is +0.87. But if you explore this interconnection vyaz on the example of students who received the middle ball in school 4.5 and above, t the correlation will change, it drops to +0.27.

a) r \u003d 0.87 b) r \u003d 0.27

Coefficient of determination - G 2

It is important to keep in mind that it is quite easy wrong understand the meaning of the concrete value of Picksonov G. If it is equal +0.70, then the relationship is really relatively strong, but do not think that +0.70 is somehow connected with 70%, And in this case, the relationship is set by 70%. This is not true. To interpret the correlation value, the determination coefficient should be used (M 2). It is the construction of a square, and therefore its meaning is never negative. This coefficient is formally defined as the degree of variability of one variable correlation caused by the variability of another variable. Let us explain this on a specific example.

A study is carried out, during which 100 participants measure the level of emotional depression and the average score. We check the relationship between two variables and detect a negative correlation: the higher the depression level, the lower the average score, and vice versa, the weaker depression, the higher the average score. Consider two correlation values \u200b\u200bthat can be obtained as a result of this study - -1.00 and -0.50. The determination coefficient will be equal to 1.00 and 0.25, respectively. To understand the meaning of these values, to begin with, we will pay attention to the fact that the average score of 100 people studied is likely to vary from 3.0 to 5.0. As researchers, we want to figure out the cause of such variability- Why one person gets 3.2 points, and another 4.4, etc. In other words, we want to know what causes individual differences in medium ballasts? In reality, the reason for this may be several factors: training habits, general level of intelligence, emotional stability, tendency to the choice of light items for study, etc. As the estimates of the test for depression are shown, in our hypothetical study, one of these factors is studied.- emotional stability, g. 2 shows how much the variability of medium points can be connecteddirectly with depression.In the first case, when r \u003d -1.00, a g 2 \u003d 1.00, we can conclude that 100% of the variability of medium-sized points is associated with variability of depression estimates. Therefore, it can be said that 100% of differences between average points (3.2 and 4.4, etc.) are caused by depression. In real study, such a result, of course, cannot be obtained. In the second case, when r \u003d -0.5, a g 2 \u003d 0.25, only one quarter (25%) of the variability of medium-sized points will be associated with depression. The remaining 75% are associated with other factors similar to those listed above. Speaking briefly, the determination coefficient better characterizes the power of relations than Mr. Pearson.

Regression analysis: Building assumptions

The most important feature of correlation research is the possibility with a strong correlation build assumptions about future behavior. The correlation between two variables makes it possible on the basis of the values \u200b\u200bof one of them to predict the value of another. It is easy to show on the example with medium scallers. If we know that time dedicated to study and the average score correlate, and that someone is engaged in 45 hours a week, we will be able to accurately predict a relatively high middle score for such a student. Similarly, the high medium score will allow you to predict the time paid to study. Building assumptions on the basis of correlation studies is called regression analysis.

In fig. The scattering schedule is presented for: a) time dedicated to study and medium score and b) useless time and middle score. Each chart is displayed and regression line, which is used to build assumptions. Regression line is also called the "optimal line": it represents the best of possible ways to summarize scattering points. This means that the absolute values \u200b\u200bof the vertical distances between each point of the graph and the regression line are minimal.

The regression line is calculated by the formula y \u003d a. + b.X, where a is a point in which the straight intersects the axis y (i.e. the segment cut off on the axis y), a b. - It is the angle of inclination direct, or its relative steepness. X is a known value, and the value we are trying to predict. From 1) correlation force and 2) standard deviation for correlating variables, you can calculate the value b., knowing 1) meaning b.and 2) average values \u200b\u200bof correlating variables, you can find but.

In regression analysis for predicting the value of Y (for example, a middle score) based on the value of X (for example, the time dedicated to the study) uses the regression equation. Y is sometimes called criteria variable, and x - predict-torn variable. However, to build accurate assumptions, the correlation must be significantly higher than zero. The higher the correlation, the closer the scattering schedule points to the regression line and the more confidence will be that your assumptions are correct. Thus, the previously marked the problem of limiting the range, which reduces the correlation, also reduces the accuracy of predictions.

The graph reflecting the regression equation shows how to build assumptions using the regression line.

For example, what a middle score should expect a student who spends at 34 hours a week. To get an answer, we carry out perpendicular from the x axis to the regression line, and then from the intersection point to the Y axis. The value of the point on the Y axis and will be an estimated value (remember that the correctness of the assumption depends on the correlation force). Thus, in the time of study, equal to 40 o'clock, one can predict an average score, equal to 3.4, and on useless 41 hours spent 41 hours - the average score is slightly higher than 2.3. Via formulasregression can be calculated more accurate values \u200b\u200band make more accurate predictions.

It should be aware that regression analysis is applied in most studies that we learn from the media.

For example, we can meet a study on the study of "infarct risk factors", in which, on the basis of a significant correlation between smoking and heart disease, it was concluded that people abusing smoking are more likely to develop cardiovascular diseases than non-smoking. This means that smoking is the basis for predicting the development of heart disease. Based on another study dedicated to the study of the "portrait of a cruel spouse (spouse)," it may be concluded that the likelihood of such behavior increases if the culprit is unemployed. This follows from the presence of correlation between unemployment and a tendency to offensive behavior. Based on the presence of correlation using regression analysis, knowing the first, it is possible to make an assumption about the second.

\u003e\u003e Informatics: Computer Workshop: Work 15. Calculation of correlation dependencies in MS Excel

Computer workshop

Work 15. Calculation of correlation dependencies in MS Excel

Objectives:

Obtaining a view of the correlation dependence of values;

Mastering the method for calculating the correlation coefficient using a correla function.

Used software funds: MS Excel tabular processor.

Exercise 1

In the following table contains data on paired measurements of two values \u200b\u200bproduced in some school; Air temperature in the class of x and stakeholders from:

The dependence is statistical, since it is impossible to reliably say, for example, that at a temperature of 15 ° C at school, 5% of students are sick, and at a temperature of 20 ° C - 2%. In addition to temperature, there are other factors affecting colds, various for different schools, and it is impossible to control them all.

Sequentially do the following:

\u003d\u003e Enter the data in Excel So, as presented in Fig. 2.12 (see the topic 9);

\u003d\u003e Build a point diagram using a master diagrams, visually displays a table dependence;

\u003d\u003e answer the question if it is possible on the basis of this point to push the hypothesis about the presence of a linear correlation between values;

\u003d\u003e if the answer is obviously negative, then correct the table so that the hypothesis about the presence of linear correlation has become more believable;


Task 2.

Come up with the table of paired measurements of the values \u200b\u200bof some values \u200b\u200bbetween which there is a hypothetical correlation dependence. Implement this dependence on the presence of a linear correlation.

Examples of the corresponding related values \u200b\u200bcan be:

Education level (measured, for example, in the years of study as a whole) and the level of monthly income;

the level of education and the level of office (for the latter come up with a conditional scale);

The number of computers in the school included on one student and an average estimate when testing to the level of ownership of standard information processing technologies;

The number of hours "spent high school students to perform homework, and an average estimate;

The number of fertilizers introduced into the soil and the yield of one or another agricultural culture.

Semaakin I.G., Henner E.K., Informatics and ICT, 11

Sent to readers from Internet sites

Design of lesson Abstract lesson reference frame presentation lesson accelerative methods interactive technologies Practice Tasks and exercises self-test Workshop, trainings, cases, quests Home tasks Discussion issues Rhetorical questions from students Illustrations Audio, video clips and multimedia Photos, pictures, tables, Schemes of humor, jokes, jokes, Comics Proverbs, sayings, crosswords, quotes Supplements Abstracts Articles Chips for Curious Cheat Sheets Textbooks Basic and Additional Globes Other Terms Improving textbooks and lessons Fixing errors in the textbook Updating fragment in the textbook. Innovation elements in the lesson replacing outdated knowledge new Only for teachers Perfect lessons Calendar Plan for the year Methodical recommendations of the discussion program Integrated lessons

Correlation dependences

Regression mathematical models are built in cases where it is known that the relationship between two factors exists and is required to obtain its mathematical description. And now we will consider the tasks of another kind. Let an important characteristic of some complex system be factor A. The many other factors can be influenced at the same time: in, C, D, and so on.

We will look at two types of tasks - you need to determine:

1. Is a factor in any noticeable regular effect on factor A;

As an example of a complex system, we will consider school. Let for the first type of tasks factor A is the average student achievement of school students, a factor in the financial expenses of the school on economic needs: building repair, renewal of furniture, aesthetic design of the room, etc. Here is the influence of the factor in the factor and not obvious. Probably, other reasons are much stronger on performance: the level of qualifications of teachers, the contingent of students, the level of technical training and others.

Statistics specialists know that in order to identify dependence on some particular factor, it is necessary to maximally exclude the influence of other factors. Simply put, collecting information from different schools, you need to choose such schools in which approximately the same contingent of students, teachers' qualifications, etc., but school expenditures are different (some schools can be rich sponsors, others have no).


So, let the economic expenditures of the school are expressed by the number of rubles assigned to the number of students in school (rub / person) spent during a certain period of time (for example, over the past 5 years). Her hopefully estimated by the middle score of the school students based on the results of the end of the last school year. Once again we draw your attention to the fact that there are usually relative and averaged values \u200b\u200bin statistical calculations.

The results of data collection in 20 schools introduced into the spreadsheet are presented in Fig. 1. In fig. 2 shows a point diagram built according to these data.

Fig. 1 Statistical data

Fig. 2 spot diagram

The values \u200b\u200bof both quantities: financial costs and student performance have a significant scatter and, at first glance, the relationship between them is not visible. However, it may well exist.

The dependences between the values, each of which is subject to not controlled completely variation, are called correlation dependencies.

The section of mathematical statistics, which explores such dependencies is called correlation analysis. Correlation analysis studies averaged law of behavior of each of the values \u200b\u200bdepending on the values \u200b\u200bof another value, as well as the measure of such a dependence.

The estimation of correlation values \u200b\u200bbegin with the statements of the hypothesis about the possible nature of the relationship between their values. Most often allow the presence of linear dependence. In this case, the correlation measure is the value called the correlation coefficient. As before, we will not write formulas by which it is calculated; They are not difficult to write, it is much more difficult to understand why they are like that. At this stage you need to know the following:

· Correlation coefficient (usually denoted by the Greek letter ρ) is the number concluded in the range from -1 to +1;

· If this is a module is close to 1, then there is a strong correlation, if to 0, then weak;

· The proximity of ρ K +1 means that an increase in one set of values \u200b\u200bcorresponds to an increase in another set, proximity to -1 means the opposite;

· The value ρ is easy to find using Excel (built-in statistical functions).

In Excel, the correlation coefficient function is called a cornel and enters a group of statistical functions. Let's show how to use it. On the same leaf of Excel, where the table is presented in Fig. 1, you need to install the cursor on any free cell and run the correla function. It will ask for two ranges of values. We indicate costs and academic performance. After their input, the answer will be displayed: ρ \u003d 0,. This value indicates the average level of correlation.

The presence of a relationship between the economic costs of the school and the performance is not difficult to understand. Pupils are happy to go to a clean, beautiful, cozy school, feel them there at home and therefore it is better to learn.

In the following example, a study is carried out to determine the dependence of student studies of high schools from two factors: the security of the school library textbooks and the security of the school computers. And the one and the other characteristic are quantitatively expressed as a percentage of the norm. The norm of the textbooks is their full set, that is, such a quantity when each student is issued from the library all the books you need. We will consider such a provision of computers with computers, in which every four high school students at school accounts for one computer. It is assumed that computers are used not only on computer science, but also in other lessons, as well as after school hours.


In the table shown in fig. 3, the results of measuring both factors in 11 different schools are given. Recall that the influence of each factor is investigated independently of others (that is, the influence of other essential factors should be approximately the same).

For both dependencies, the coefficients of linear correlation are obtained. As can be seen from the table, the correlation between the security of textbooks and the performance is stronger than the correlation between computer support and performance (although the same and other correlation coefficients are not very large). From here we can conclude that while the book remains a more significant source of knowledge than the computer.

Briefly about the main thing

Dependencies between values, each of which is subject to not controlled completely variation, are called correlation.

Using the correlation analysis, you can solve the following tasks: to determine whether one factor has a significant impact on another factor; From several factors to choose the most significant.

Quantitative measure of the correlation of two values \u200b\u200bis the correlation coefficient.

The value of the correlation coefficient lies between -1 and +1. The closer its value according to the module to 1, the time the correlation (communication) is stronger.

In MS Excel, a correlation function from a group of statistical functions is used to determine the correlation coefficient.

Questions and tasks

1. What is a correlation dependence?

2. What is a correlation analysis?

3. What types of tasks can be solved using correlation analysis?

4. What value is a quantitative measure of correlation? What values \u200b\u200bdoes she take?

5. Using what table processor tools can you calculate the correlation coefficient?

6. For data from the table shown in Fig. 3 Build two linear regression models.

7. For the same data, calculate the correlation coefficient. Compare with shown in Fig. 3 results.

Computer workshop "Calculation of correlation dependencies in MS Excel"

Objectives: Obtaining the view of the correlation dependence of values; Mastering the method for calculating the correlation coefficient using a correla function.

Used software: MS Excel tabular processor.

Task 1. The table below contains data on paired measurements of two quantities produced in some school: air temperature in the class x and shares of colds from:

The dependence is statistical, since it is impossible to reliably say, for example, that at a temperature of 15 ° C at school, 5% of students are sick, and at a temperature of 20 ° C - 2%. In addition to temperature, there are other factors affecting colds, various for different schools, and it is impossible to control them all.

Perform the following:

Þ Construct using a point diagram visually displays a table dependency;

Þ answer the question whether it is possible to put forward the hypothesis on the presence of a linear correlation between values;

Þ If the answer is obviously negative, then correct the table so that the hypothesis about the presence of linear correlation has become more believable;

Þ Using the correla function, find the correlation coefficient and confirm or refute the specified hypothesis.

Task 2. Come up with the table of paired measurements of the values \u200b\u200bof certain values \u200b\u200bbetween which there is a hypothetical correlation dependence. Implement this dependence on the presence of a linear correlation.

Examples of the corresponding related values \u200b\u200bcan be:

ü Level of education (measured, for example, in years of study in general) and the level of monthly income;

ü Level of education and the level of office (for the latter, come up with a conditional scale);

ü the number of computers in school per student and an average estimate when testing to the level of ownership of standard information processing technologies;

ü The number of hours spent by a high school student to perform homework, and an average estimate;

ü The number of fertilizers introduced into the soil and the yield of one or another agricultural culture.

At the same time you can walk in two ways. The first, more serious and practically useful - you are not just inventing hypothetical correlation dependence, but also find valid data about it in the literature. The second way, easier - you are considering this task as the game necessary to understand what is the correlation dependence, and the production of technical skills of its analysis, and invent the relevant data, trying to do it the most believable way.

Purpose of work: obtaining a view of the correlation dependence of values; Mastering the method for calculating the correlation coefficient using the koppel function.
Software used: Microsoft Office Excel tabular processor.

Exercise 1

It is required to perform calculations of the correlation dependence of student achievement from the economic expenses of the school, described in § 38 of the textbook.
1. Fill out the spreadsheet in the following data:

2. Construct a point diagram of dependence values.

3. Perform the statistical function Koppel, specifying in the dialog box of the values: B2: B21 and C2: C21.
4. To write the value of the correlation coefficient.

Task 2.

Perform calculations of correlation dependences of students' academic performance against textbooks and computers provided by computers presented in the following table.

Task for independent execution on the topic "Correlation Dependencies"

Crop the table of paired measurements of the values \u200b\u200bof certain values \u200b\u200bbetween which there is a hypothetical correlation dependence. To analyze this dependency on the presence of a linear correlation.

      Examples of the corresponding related values \u200b\u200bcan be:
      education level (measured, for example, in the years of study as a whole) and the level of monthly income;
      the level of education and the level of office (for the latter, come up with a conditional scale);
      the number of computers in school per student and an average estimate when testing Pa Level of ownership of standard information processing technologies;
      the number of hours spent by high school students to perform homework, and average evaluation;
      The number of fertilizers introduced into the soil and the yield of one or another agricultural culture.

At the same time you can walk in two ways. The first, more serious and practically useful: You are not just inventing hypothetical correlation dependence, but also find valid data about it in the literature. The second way, easier: you are considering this as a game needed to understand what is the correlation dependence, and the development of technical skills of its analysis, and invent the relevant data, trying to do it the most believable way.

Similar articles

2021 liveps.ru. Home tasks and ready-made tasks in chemistry and biology.