How is entropy measured in information theory. Information entropy

1. Introduction.

2. What did Claude Shannon measure?

3. Limits of evolutionary variability of information systems.

4. Limited adaptation of biological species.

5. Stages of development of the theory of entropy.

6. Methods for calculating the amount of structural information and informational entropy of texts.

7. Information-entropy ratios of adaptation and development processes.

8. Information and energy.

9. Conclusion.

10. Bibliography.

INTRODUCTION

In the second half of the 20th century, two events took place that, in our opinion, largely determine the further paths of scientific comprehension of the world. We are talking about the creation of information theory and the beginning of research into the mechanisms of antientropic processes, for the study of which synergetics uses all the latest achievements of nonequilibrium thermodynamics, information theory and general systems theory.

The fundamental difference between this stage of the development of science and the previous stages is that before the creation of the listed areas of research, science was able to explain only the mechanisms of processes leading to an increase in chaos and an increase in entropy. As for the biological and evolutionary concepts developed since the time of Lamarck and Darwin, they still do not have strict scientific justifications and contradict the Second Law of Thermodynamics, according to which the increase in entropy accompanying all processes in the world is an indispensable physical law.

The merit of non-equilibrium thermodynamics lies in the fact that it was able to reveal the mechanisms of anti-entropy processes that do not contradict the Second Law of Thermodynamics, since a local decrease in entropy within a self-organizing system is always paid for by a large increase in entropy in absolute value external environment.

The most important step towards comprehending the nature and mechanisms of antientropic processes is the introduction of a quantitative measure of information. Initially, this measure was intended only to solve purely applied tasks communication technology. However, subsequent research in the field of physics and biology made it possible to identify the universal measures proposed by K. Shannon, which allow establishing the relationship between the amount of information and physical entropy and, ultimately, determining the essence of a new scientific interpretation of the concept of "information" as a measure of the structural ordering of systems that are most diverse in nature .

Using a metaphor, we can say that before the introduction of a single informational quantitative measure into science, the world presented in natural scientific concepts, as it were, “relied on two whales”: energy and matter. The “third whale” is now information, which is involved in all the processes taking place in the world, from microparticles, atoms and molecules to the functioning of the most complex biological and social systems.

Naturally, the question arises: do the latest data of modern science confirm or refute the evolutionary paradigm of the origin of life and biological species?

To answer this question, it is necessary first of all to understand what properties and aspects of the multifaceted concept of “information” reflect the quantitative measure that K. Shannon introduced into science.

Using the measure of the amount of information makes it possible to analyze the general mechanisms of information-entropy interactions that underlie all spontaneous processes of information accumulation in the surrounding world, which lead to self-organization of the system structure.

At the same time, information-entropy analysis also makes it possible to identify gaps in evolutionary concepts, which are nothing more than untenable attempts to reduce the problem of the origin of life and biological species to simple mechanisms of self-organization without taking into account the fact that systems of such a level of complexity can only be created on the basis of that information. , which was originally laid down in the plan preceding their creation.

Held modern science studies of the properties of information systems give every reason to assert that all systems can be formed only according to the rules descending from the upper hierarchical levels, and these rules themselves existed before the systems themselves in the form of the original plan (idea of ​​creation).

WHAT DID CLAUD SHANNON MEASURE?

The theory of information is based on the method proposed by K. Shannon for calculating the amount of new (unpredictable) and redundant (predictable) information contained in messages transmitted through technical communication channels.

The method proposed by Shannon for measuring the amount of information turned out to be so universal that its application is no longer limited to the narrow confines of purely technical applications.

Contrary to the opinion of K. Shannon himself, who warned scientists against the hasty spread of the method he proposed beyond the limits of applied problems of communication technology, this method began to find more and more widespread use in studies of physical, biological, and social systems.

The key to a new understanding of the essence of the phenomenon of information and the mechanism of information processes was the relationship between information and physical entropy established by L. Brillouin. This relationship was originally laid in the very foundation of information theory, since Shannon proposed to use a probable entropy function borrowed from statistical thermodynamics to calculate the amount of information.

Many scientists (beginning with K. Shannon himself) were inclined to consider such borrowing as a purely formal device. L. Brillouin showed that between the amount of information calculated according to Shannon and physical entropy, there is not a formal, but a meaningful relationship.

In statistical physics, using the probabilistic function of entropy, processes are studied that lead to thermodynamic equilibrium, in which all states of molecules (their energies, speeds) approach equiprobable, and entropy tends to a maximum value.

Thanks to the theory of information, it became obvious that with the help of the same function it is possible to investigate systems that are far from the state of maximum entropy, such as, for example, a written text.

Another important conclusion is that

using the probabilistic function of entropy, one can analyze all stages of the transition of the system from the state of complete chaos, which corresponds to equal values probabilities and the maximum value of entropy, to the state of ultimate order (rigid determination), which corresponds to the only possible state of its elements.

This conclusion turns out to be equally true for such dissimilar systems in nature as gases, crystals, written texts, biological organisms or communities, etc.

At the same time, if for a gas or a crystal, when calculating entropy, only the microstate (i.e., the state of atoms and molecules) and the macrostate of these systems (i.e., gas or crystal as a whole) are compared, then for systems of a different nature (biological, intellectual, social) entropy can be calculated at one or another arbitrarily chosen level. In this case, the calculated value of the entropy of the system under consideration and the amount of information characterizing the degree of ordering of this system and equal to the difference between the maximum and real value of entropy will depend on the probability distribution of the states of the elements of the underlying level, i.e. the elements that together form these systems.

In other words,

the amount of information stored in the structure of the system is proportional to the degree of deviation of the system from the state of equilibrium, due to the order preserved in the structure of the system.

Without suspecting it, Shannon armed science with a universal measure, suitable in principle (provided that the values ​​of all probabilities are revealed) for assessing the degree of orderliness of all systems existing in the world.

Having defined the information measure introduced by Shannon as a measure of the order of movement, one can establish the relationship between information and energy, considering energy is a measure of traffic intensity. At the same time, the amount of information stored in the structure of systems is proportional to the total energy of internal connections of these systems.

Simultaneously with the discovery common properties information as a phenomenon, there are also fundamental differences related to different levels of complexity of information systems.

So, for example, all physical objects, unlike biological ones, do not have special organs of memory, recoding of signals coming from the outside world, information communication channels. The information stored in them is, as it were, "smeared" throughout their structure. At the same time, if crystals were not able to store information in the internal links that determine their order, it would not be possible to create artificial memory and technical devices intended for information processing based on crystalline structures.

At the same time, it must be taken into account that the creation of such devices became possible only thanks to the mind of a person who was able to use the elementary information properties of crystals to build complex information systems.

The simplest biological system surpasses in its complexity the most advanced of the information systems created by man. Already at the level of the simplest unicellular organisms, the most complex informational genetic mechanism necessary for their reproduction is activated. In multicellular organisms, in addition to information system heredity, there are specialized organs for storing information and processing it (for example, systems that recode visual and auditory signals coming from the outside world before sending them to the brain, systems for processing these signals in the brain). The most complex network of information communications ( nervous system) permeates and transforms the entire multicellular organism into a whole.

Information and entropy

Discussing the concept of information, it is impossible not to touch on another related concept - entropy. For the first time the concepts of entropy and information were connected by K. Shannon.

Claude Elwood Shannon ( Claude Elwood Shannon), 1916-2001 - a distant relative of Thomas Edison, an American engineer and mathematician, was an employee of Bell Laboratories from 1941 to 1972. In his work "Mathematical Theory of Communication" (http://cm.bell-labs.com/cm/ms /what/shannonday/), published in 1948, was the first to determine the measure of the information content of any message and the concept of an information quantum - a bit. These ideas formed the basis of the theory of modern digital communication. Shannon's other work "Communication Theory of Secrecy Systems", published in 1949, contributed to the transformation of cryptography into scientific discipline. He is the founder information theory, which has found application in modern high-tech communication systems. Shannon made a huge contribution to the theory of probabilistic schemes, the theory of automata and the theory of control systems - sciences that are united by the concept of "cybernetics".

Physical definition of entropy

For the first time the concept of entropy was introduced by Clausius in 1865 as a function of the thermodynamic state of the system

where Q is heat, T is temperature.

The physical meaning of entropy manifests itself as part of the internal energy of the system, which cannot be converted into work. Clausius empirically obtained this function by experimenting with gases.

L. Boltzmann (1872) by methods statistical physics derived a theoretical expression for entropy

where K is a constant; W is the thermodynamic probability (the number of permutations of ideal gas molecules that does not affect the macrostate of the system).

The Boltzmann entropy was derived for an ideal gas and is treated as a measure of disorder, a measure of the chaos of a system. For an ideal gas, the entropies of Boltzmann and Clausius are identical. Boltzmann's formula became so famous that it is inscribed as an epitaph on his grave. There is an opinion that entropy and chaos are one and the same. Although entropy describes only ideal gases, it began to be uncritically used to describe more complex objects.

Boltzmann himself in 1886. tried to use entropy to explain what life is. According to Boltzmann, life is a phenomenon capable of reducing its entropy. According to Boltzmann and his followers, all processes in the Universe are changing in the direction of chaos. The universe is heading towards heat death. This gloomy forecast dominated science for a long time. However, the deepening of knowledge about the surrounding world gradually shook this dogma.

The classics did not associate entropy with information.

Entropy as a measure of information

Note that the concept of "information" is often interpreted as "information", and the transfer of information is carried out with the help of communication. K. Shannon considered entropy as a measure useful information in the processes of signal transmission over wires.

To calculate entropy, Shannon proposed an equation that resembles the classical expression for entropy found by Boltzmann. We consider an independent random event x with N possible states and p i -probability of the i-th state. Then the entropy of the event x

This quantity is also called the average entropy. For example, we can talk about the transmission of a message in natural language. When transmitting different letters, we convey a different amount of information. The amount of information per letter is related to the frequency of use of this letter in all messages formed in the language. The rarer the letter we transmit, the more information it contains.

Value

H i = P i log 2 1/P i = -P i log 2 P i ,

is called private entropy characterizing only the ith state.

Let's explain with examples. When tossing a coin, heads or tails fall out, this is certain information about the results of the toss.

For a coin, the number of equiprobable possibilities is N = 2. The probability of getting heads (tails) is 1/2.

When throwing a die, we get information about the loss of a certain number of points (for example, three). When do we get more information?

For the die, the number of equiprobable possibilities is N = 6. The probability of getting three points of the die is 1/6. The entropy is 2.58. The implementation of a less likely event provides more information. The greater the uncertainty before receiving a message about an event (tossing a coin, dice), the more information comes when the message is received.

This approach to the quantitative expression of information is far from universal, since the units adopted do not take into account such important properties of information as its value and meaning. Abstraction from the specific properties of information (its meaning, value) about real objects, as it turned out later, made it possible to identify general patterns information. The units (bits) proposed by Shannon for measuring the amount of information are suitable for evaluating any messages (the birth of a son, the results of a sports match, etc.). Subsequently, attempts were made to find such measures of the amount of information that would take into account its value and meaning. However, universality was immediately lost: for different processes, the criteria of value and meaning are different. In addition, definitions of the meaning and value of information are subjective, while the measure of information proposed by Shannon is objective. For example, the smell carries a huge amount of information for the animal, but is elusive for humans. The human ear does not perceive ultrasonic signals, but they carry a lot of information for a dolphin, etc. Therefore, the measure of information proposed by Shannon is suitable for studying all types of information processes, regardless of the "tastes" of the information consumer.

Measuring Information

From the course of physics, you know that before measuring the value of any physical quantity, enter the unit of measure. Information also has such a unit - a bit, but its meaning is different for different approaches to the definition of the concept of "information".

There are several different approaches to the problem of measuring information.

“Information is a form of life,” wrote the American poet and essayist John Perry Barlow. Indeed, we constantly come across the word "information" - it is received, transmitted and stored. Find out the weather forecast or the result of a football match, the content of a movie or book, talk on the phone - it is always clear what kind of information we are dealing with. But what is the information itself, and most importantly - how it can be measured, no one usually thinks. Meanwhile, information and ways of its transmission are an important thing that largely determines our life, an integral part of which have become information Technology. The scientific editor of Laba.Media Vladimir Gubailovsky explains what information is, how to measure it, and why the most difficult thing is to transmit information without distortion.

The space of random events

In 1946, the American statistician John Tukey proposed the name BIT (BIT, BInary digiT - "binary number" - "Hi-tech") - one of the main concepts of the 20th century. Tukey chose a bit to denote a single binary digit capable of taking on the value 0 or 1. Claude Shannon, in his keynote paper "The Mathematical Theory of Communication," proposed to measure the amount of information in bits. But this is not the only concept introduced and explored by Shannon in his paper.

Imagine a space of random events that consists of the tossing of a single fake coin with heads on both sides. When does the eagle fall? It is clear that always. We know this in advance, because this is how our space is arranged. Getting heads is a certain event, that is, its probability is 1. How much information will we report if we say about the fallen heads? No. We will consider the amount of information in such a message to be 0.

Now let's toss the correct coin: it has heads on one side and tails on the other, as it should be. Getting heads or tails will be two different events that make up our space of random events. If we report the outcome of one throw, then this will indeed be new information. On heads we will report 0, and on tails we will report 1. In order to report this information, we only need 1 bit.

What changed? Uncertainty has appeared in our event space. We have something to tell about it to someone who does not throw a coin himself and does not see the outcome of the throw. But in order to properly understand our message, it must know exactly what we are doing, what 0 and 1 mean. Our event spaces must match, and the decoding process must unambiguously recover the result of the throw. If the event space of the transmitting and receiving does not match or there is no possibility of unambiguous decoding of the message, the information will remain only noise in the communication channel.

If two coins are tossed independently and simultaneously, then there will be four equally likely outcomes: heads-heads, heads-tails, tails-heads, and tails-tails. To transmit information, we need 2 bits already, and our messages will be as follows: 00, 01, 10 and 11. The information has become twice as much. This happened because uncertainty increased. If we try to guess the outcome of such a double throw, we are twice as likely to make a mistake.

The greater the uncertainty of the event space, the more information the message about its state contains.

Let's complicate our event space a bit. So far, all the events that have happened have been equally probable. But in real spaces, not all events have equal probability. Let's say the probability that the crow we see will be black is close to 1. The probability that the first passerby we meet on the street will be a man is about 0.5. But to meet a crocodile on the streets of Moscow is almost unbelievable. Intuitively, we understand that a message about a meeting with a crocodile has a much greater informational value than about a black crow. The lower the probability of an event, the more information in the message about such an event.

Let the space of events not be so exotic. We just stand at the window and look at the passing cars. Cars of four colors pass by, which we need to report. To do this, we encode the colors: black - 00, white - 01, red - 10, blue - 11. To report which car passed, we just need to transmit 2 bits of information.

But watching the cars for quite a long time, we notice that the color of the cars is unevenly distributed: black - 50% (every second), white - 25% (every fourth), red and blue - 12.5% ​​each (every eighth). Then you can optimize the transmitted information.

Most of the cars are black, so let's call black - 0 - the shortest code, and let the code of all the others start at 1. Of the remaining half, white - 10, and the remaining colors start at 11. Finally, let's call red - 110, and blue - 111.

Now, passing information about the color of cars, we can encode it more densely.

Entropy according to Shannon

Let our event space consist of n different events. When throwing a coin with two heads, there is exactly one such event, when throwing one correct coin - 2, when throwing two coins or watching cars - 4. Each event corresponds to the probability of its occurrence. When a coin is tossed with two heads, there is only one event (heads) and its probability is p1 = 1. When a correct coin is tossed, there are two events, they are equally probable and the probability of each is 0.5: p1 = 0.5, p2 = 0.5. When tossing two correct coins, there are four events, all of them are equally probable and the probability of each is 0.25: p1 = 0.25, p2 = 0.25, p3 = 0.25, p4 = 0.25. When observing cars, there are four events, and they have different probabilities: black - 0.5, white - 0.25, red - 0.125, blue - 0.125: p1 = 0.5, p2 = 0.25, p3 = 0.125, p4 = 0.125.

This is not a coincidence. Shannon chose entropy (a measure of uncertainty in the event space) in such a way that three conditions were met:

  • 1The entropy of a certain event with a probability of 1 is 0.
  • The entropy of two independent events is equal to the sum of the entropies of these events.
  • Entropy is maximum if all events are equally likely.

All these requirements are quite consistent with our ideas about the uncertainty of the event space. If there is only one event (the first example), there is no uncertainty. If the events are independent - the uncertainty of the sum is equal to the sum of the uncertainties - they just add up (example with tossing two coins). And, finally, if all events are equally probable, then the degree of uncertainty of the system is maximum. As in the case of tossing two coins, all four events are equally likely and the entropy is 2, which is greater than in the case of cars, when there are also four events, but they have different probabilities - in this case, the entropy is 1.75.

The value of H plays a central role in information theory as a measure of the amount of information, choice and uncertainty.

Claude Shannon

Claude Elwood Shannon- American engineer, cryptanalyst and mathematician. Considered the "Father of the Information Age". Founder of information theory, which has found application in modern high-tech communication systems. He provided fundamental concepts, ideas and their mathematical formulations, which currently form the basis for modern communication technologies.

In 1948, he proposed using the word "bit" to refer to the smallest unit of information. He also demonstrated that the entropy he introduced is equivalent to a measure of the uncertainty of the information in the transmitted message. Shannon's articles "Mathematical Theory of Communication" and "The Theory of Communication in Secret Systems" are considered fundamental to information theory and cryptography.

During World War II, Shannon developed cryptographic systems at Bell Laboratories, which later helped him discover methods for error-correcting coding.

Shannon made key contributions to the theory of probabilistic schemes, game theory, automata theory, and control system theory - areas of science included in the concept of "cybernetics".

Coding

Both the tossed coins and the passing cars are not like the numbers 0 and 1. In order to communicate the events taking place in the spaces, one has to come up with a way to describe these events. This description is called encoding.

Messages can be encoded indefinitely different ways. But Shannon showed that the shortest code could not be less in bits than the entropy.

That is why the entropy of a message is a measure of the information in a message. Since in all cases considered the number of bits in the encoding is equal to the entropy, it means that the encoding was optimal. In short, it is no longer possible to encode messages about events in our spaces.

With optimal coding, not a single transmitted bit can be lost or distorted in the message. If at least one bit is lost, then the information will be distorted. But all real communication channels do not give 100% certainty that all bits of the message will reach the recipient undistorted.

To eliminate this problem, it is necessary to make the code not optimal, but redundant. For example, to transmit along with the message its checksum - a specially calculated value obtained by converting the message code, and which can be checked by recalculating when receiving the message. If the transmitted checksum matches the calculated one, the probability that the transmission went through without errors will be quite high. And if the checksum does not match, then you need to request a retransmission. This is how most communication channels work today, for example, when transmitting packets of information over the Internet.

Natural language messages

Consider the event space, which consists of messages in natural language. This is a special case, but one of the most important. The events here will be the transmitted characters (letters of a fixed alphabet). These characters occur in the language with different probability.

The most frequent symbol (that is, the one that is most often found in all texts written in Russian) is a space: out of a thousand characters, an average space occurs 175 times. The second most frequent is the symbol “o” - 90, followed by other vowels: “e” (or “ё” - we will not distinguish them) - 72, “a” - 62, “i” - 62, and only further occurs the first consonant "t" is 53. And the rarest "f" - this symbol occurs only twice per thousand characters.

We will use the 31-letter alphabet of the Russian language (it does not differ between "e" and "e", as well as "b" and "b"). If all letters were found in the language with the same probability, then the entropy per character would be H = 5 bits, but if we take into account the actual character frequencies, then the entropy will be less: H = 4.35 bits. (This is almost two times less than with traditional encoding, when a character is transmitted as a byte - 8 bits).

But the entropy of a character in a language is even lower. The probability of the next character appearing is not entirely determined by the average frequency of the character in all texts. Which character follows depends on the characters already transmitted. For example, in modern Russian, after the symbol "ъ" the symbol of a consonant sound cannot follow. After two consecutive vowels "e", the third vowel "e" is extremely rare, except in the word "long neck". That is, the next character is somewhat predetermined. If we take into account such predetermination of the next symbol, the uncertainty (i.e. information) of the next symbol will be even less than 4.35. According to some estimates, the next character in Russian is predetermined by the structure of the language by more than 50%, that is, with optimal encoding, all information can be transmitted by deleting half the letters from the message.

Another thing is that not every letter can be painlessly crossed out. High-frequency "o" (and vowels in general), for example, is easy to cross out, but rare "f" or "e" is quite problematic.

The natural language in which we communicate with each other is highly redundant, and therefore reliable, if we missed something - fear not, the information will still be transmitted.

But until Shannon introduced a measure of information, we could not understand that the language is redundant, and to what extent we can compress messages (and why text files are compressed so well by the archiver).

Natural language redundancy

In the article “About how we write text” (the title sounds exactly like that!) A fragment of Ivan Turgenev’s novel “ Noble Nest” and subjected to some transformation: 34% of the letters were deleted from the fragment, but not random. The first and last letters in words were left, only vowels were deleted, and not all of them. The goal was not just to be able to recover all the information from the converted text, but also to ensure that the person reading this text does not experience any special difficulties due to letter omissions.

Why is it relatively easy to read this corrupted text? It really contains necessary information to recover whole words. A native Russian speaker has a certain set of events (words and whole sentences) that he uses in recognition. In addition, the carrier also has at its disposal standard language constructs that help him recover information. For example, "She's more blissful"- with high probability can be read as "She was more sensitive". But a single phrase "She's better", rather, will be restored as "She was whiter". Since in everyday communication we deal with channels in which there is noise and interference, we are quite good at recovering information, but only that which we already know in advance. For example, the phrase “Her devils are not far from pleasant, even though they flared and merged a lot” reads well except for the last word "splls" - "merged". This word is not in the modern lexicon. At speed reading word "spls" it reads more like “stuck together”, with a slow one it just baffles.

Signal digitization

Sound, or acoustic vibrations, is a sinusoid. This can be seen, for example, on the sound editor screen. To accurately convey the sound, you need an infinite number of values ​​- the entire sinusoid. This is possible with an analog connection. He sings - you listen, the contact is not interrupted as long as the song lasts.

With digital communication over a channel, we can only transmit a finite number of values. Does this mean that the sound cannot be accurately transmitted? It turns out not.

Different sounds are differently modulated sinusoid. We transmit only discrete values ​​(frequencies and amplitudes), and the sinusoid itself does not need to be transmitted - it can be generated by the receiving device. It generates a sinusoid, and modulation is applied to it, created from the values ​​transmitted over the communication channel. There are exact principles of which discrete values ​​must be transmitted so that the sound at the input to the communication channel coincides with the sound at the output, where these values ​​are superimposed on some standard sinusoid (this is just the Kotelnikov theorem).

Kotelnikov's theorem (in English literature - the Nyquist-Shannon theorem, the sampling theorem)- a fundamental statement in the field of digital signal processing, relating continuous and discrete signals and stating that "any function F (t), consisting of frequencies from 0 to f1, can be continuously transmitted with any accuracy using numbers consecutively through 1 /(2*f1) seconds.

Noise-correcting coding. Hamming codes

If the encoded text of Ivan Turgenev is transmitted over an unreliable channel, albeit with a certain number of errors, then a completely meaningful text will be obtained. But if we need to transmit everything to within a bit, the problem will be unsolved: we do not know which bits are wrong, because the error is random. Even the checksum does not always save.

That is why today, when transmitting data over networks, they strive not so much for optimal coding, in which the maximum amount of information can be pushed into the channel, but for such coding (obviously redundant) in which errors can be restored - approximately, as we restored words in reading when fragment of Ivan Turgenev.

There are special error-correcting codes that allow you to recover information after a failure. One of them is the Hamming code. Let's say our entire language consists of three words: 111000, 001110, 100011. Both the source of the message and the receiver know these words. And we know that errors occur in the communication channel, but when transmitting one word, no more than one bit of information is distorted.

Suppose we first pass the word 111000. As a result of at most one error (errors we have highlighted), it can turn into one of the words:

1) 111000, 0 11000, 10 1000, 110 000, 1111 00, 11101 0, 111001 .

When the word 001110 is transmitted, any of the words can be obtained:

2) 001110, 1 01110, 01 1110, 000 110, 0010 10, 00110 0, 001111 .

Finally, for 100011 we can get:

3) 100011, 0 00011, 11 0011, 101 011, 1001 11, 10000 1, 100010 .

Note that all three lists are pairwise disjoint. In other words, if any word from list 1 appears at the other end of the communication channel, the recipient knows for sure that word 111000 was transmitted to him, and if any word from list 2 appears, word 001110, and from list 3, word 100011. In this case say our code fixed one bug.

The fix came about due to two factors. First, the recipient knows the entire "dictionary", that is, the event space of the recipient of the message is the same as the space of the sender of the message. When the code was transmitted with only one error, a word came out that was not in the dictionary.

Secondly, the words in the dictionary were chosen in a special way. Even if an error occurred, the recipient could not confuse one word with another. For example, if the dictionary consists of the words “daughter”, “dot”, “bump”, and when transmitted it turned out to be “vochka”, then the recipient, knowing that such a word does not exist, could not correct the error - any of the three words could turn out to be correct. If the dictionary includes “dot”, “daw”, “branch” and we know that no more than one error is allowed, then “vochka” is obviously a “dot”, and not a “daw”. In error-correcting codes, words are chosen in such a way that they are "recognizable" even after an error. The only difference is that there are only two letters in the code "alphabet" - zero and one.

The redundancy of such encoding is very large, and the number of words that we can convey in this way is relatively small. After all, we need to exclude from the dictionary any word that, in case of an error, can match the whole list corresponding to the transmitted words (for example, the words “daughter” and “dot” cannot be in the dictionary). But the exact transmission of the message is so important that a lot of effort is spent on the study of error-correcting codes.

Sensation

The concepts of entropy (or uncertainty and unpredictability) of a message and redundancy (or predestination and predictability) correspond very naturally to our intuitive ideas about the measure of information. The more unpredictable the message (the greater its entropy, because the probability is less), the more information it carries. A sensation (for example, a meeting with a crocodile on Tverskaya) is a rare event, its predictability is very small, and therefore the information value is high. Often, information is called news - messages about events that have just occurred, about which we still do not know anything. But if we are told about what happened a second and third time in approximately the same words, the redundancy of the message will be great, its unpredictability will drop to zero, and we simply will not listen, brushing off the speaker with the words “I know, I know.” That's why the media tries so hard to be first. It is this correspondence to the intuitive sense of novelty that gives rise to really unexpected news, and played a major role in the fact that Shannon's article, completely not designed for the mass reader, became a sensation that was picked up by the press, which was accepted as a universal key to understanding nature by scientists of various specialties. - from linguists and literary critics to biologists.

But Shannon's concept of information is a rigorous mathematical theory, and its application outside of communication theory is very unreliable. But in the theory of communication itself, it plays a central role.

semantic information

Shannon, having introduced the concept of entropy as a measure of information, got the opportunity to work with information - first of all, to measure it and evaluate such characteristics as channel capacity or coding optimality. But the main assumption that allowed Shannon to successfully operate with information was the assumption that the generation of information is a random process that can be successfully described in terms of probability theory. If the process is non-random, that is, it obeys patterns (and not always clear, as it happens in natural language), then Shannon's reasoning is inapplicable to it. Everything that Shannon says has nothing to do with the meaningfulness of information.

As long as we are talking about symbols (or letters of the alphabet), we may well think in terms of random events, but as soon as we move on to the words of the language, the situation changes dramatically. Speech is a process organized in a special way, and here the structure of the message is no less important than the symbols with which it is transmitted.

Until recently, it seemed that we could do nothing to somehow get closer to measuring the meaningfulness of a text, but in last years the situation began to change. And this is primarily due to the use of artificial neural networks for the tasks of machine translation, automatic abstracting of texts, extracting information from texts, generating reports in natural language. In all these tasks, the transformation, encoding and decoding of meaningful information contained in natural language takes place. And gradually there is an idea about information losses during such transformations, and therefore - about the measure of meaningful information. But to date, the clarity and accuracy that Shannon's information theory has are not yet present in these difficult tasks.

concept entropy first introduced in 1865 by R. Clausius in thermodynamics to determine the measure of irreversible energy dissipation. Entropy is used in various branches of science, including information theory, as a measure of the uncertainty of any experience, test, which may have different outcomes. These definitions of entropy have a deep internal connection. So, on the basis of ideas about information, all the most important provisions of statistical physics can be deduced. [BES. Physics. M: Large Russian encyclopedia, 1998].

Information Binary Entropy for Independent (Non-Equiprobable) Random Events x With n possible states (from 1 to n, p- probability function) is calculated from Shannon's formula:

This value is also called average entropy messages. The entropy in the Shannon formula is the average characteristic - mathematical expectation distribution random variable.
For example, in the sequence of letters that make up any sentence in Russian, different letters appear at different frequencies, so the occurrence uncertainty for some letters is less than for others.
In 1948, while investigating the problem of rational transmission of information through a noisy communication channel, Claude Shannon proposed a revolutionary probabilistic approach to understanding communications and created the first truly mathematical theory of entropy. His sensational ideas quickly served as the basis for the development of information theory, which uses the concept of probability. The concept of entropy, as a measure of randomness, was introduced by Shannon in his article "A Mathematical Theory of Communication", published in two parts in the Bell System Technical Journal in 1948.

In the case of equally probable events (a special case), when all options are equally probable, the dependence remains only on the number of options considered, and the Shannon formula is greatly simplified and coincides with the Hartley formula, which was first proposed by an American engineer Ralph Hartley in 1928 as one of scientific approaches to evaluating messages:

, where I is the amount of transmitted information, p is the probability of an event, N is the possible number of different (equiprobable) messages.

Task 1. Equally probable events.
There are 36 cards in a deck. How much information is contained in the message that a card with a portrait of “ace” was taken from the deck; "ace of spades"?

Probability p1 = 4/36 = 1/9 and p2 = 1/36. Using the Hartley formula we have:

Answer: 3.17; 5.17 bit
Note (from the second result) that 6 bits are needed to encode all maps.
It is also clear from the results that the lower the probability of an event, the more information it contains. (This property is called monotony)

Task 2. On unequal events
There are 36 cards in a deck. Of these, 12 cards with "portraits". In turn, one of the cards is taken from the deck and shown to determine whether a portrait is depicted on it. The card is returned to the deck. Determine the amount of information transmitted each time one card is shown.

Information entropy- a measure of the uncertainty or unpredictability of a certain system (in statistical physics or information theory), in particular, the uncertainty of the appearance of any symbol of the primary alphabet. In the latter case, in the absence of information loss, the entropy is numerically equal to the amount of information per symbol of the transmitted message.

For example, in the sequence of letters that make up any sentence in Russian, different letters appear at different frequencies, so the occurrence uncertainty for some letters is less than for others. If we take into account that some combinations of letters (in this case we speak of entropy n (\displaystyle n) th order, see ) are very rare, then the uncertainty decreases even more.

The concept of informational entropy can be illustrated with the help of Maxwell's demon. The concepts of information and entropy have deep connections with each other [ which?] , but despite this, the development of theories in statistical mechanics and information theory took many years to make them correspond to each other [ ] .

Entropy- this is the amount of information per elementary message of the source that generates statistically independent messages.

Encyclopedic YouTube

    1 / 5

    ✪ Understanding entropy

    ✪ What is Entropy?

    ✪ Information entropy

    ✪ Entropy and the second law of thermodynamics (video 3) | Energy| Biology

    ✪ What is entropy? Jeff Phillips #TED-Ed

    Subtitles

    So, we have given two definitions of entropy as a state variable. Entropy is denoted by the letter S. According to the thermodynamic definition, changes in entropy are equal to the heat added divided by the temperature at which that heat is added. However, if the temperature changes as heat is added (which is what usually happens), then we will have to do some calculations. And you can consider this as a mathematical, or statistical, or combinatorial definition of entropy. According to this definition, entropy equals the natural logarithm of the number of states that a system can take on, multiplied by a constant number. And in such a case, all states have the same probability. If we are talking about an unimaginably large number of molecules that can have an even greater number of states, we can assume that they will all differ in approximately equal probability. There is also a slightly more complicated definition - for cases with a probability of a different order, but now we will not touch on it. Now that we've covered these two definitions, it's time to tell you about the second law of thermodynamics. Here he is. This is a fairly simple law, which at the same time explains a very wide range of different phenomena. According to this law, changes in the entropy in the Universe during the implementation of any process will always be greater than or equal to 0. That is, when something happens in the Universe, the result of this is an increase in entropy. This is a very important conclusion. Let's see if we can apply this law to specific situations and thus understand its meaning. Let's say I have two tanks connected to each other. Here I have T1. Let this be our hot tank. And here we have T2. This will be the cold tank. Well, we know from experience... What happens if a vessel of hot water shares a wall with a vessel of cold water? What happens in such a case? Yes, the temperature of the water in them levels off. If we are talking about the same substance, then the process will stop approximately in the middle if they are in the same phase. Thus, we are dealing with the transfer of heat from a hotter substance to a colder one. We have some heat, Q, that is transferred from a hotter substance to a colder one. Of course, in everyday reality, you won't see heat being transferred from a colder substance to a hotter one. If you put an ice cube in, say, hot tea, then of course the ice doesn't get colder and the tea doesn't get hotter. The temperature of both substances will become approximately equal, that is, in fact, the tea will give part of the heat to the ice. We are also talking about two tanks, and I assume that their temperature remains constant. This can only happen if both are infinitely large, which of course does not exist in the real world. IN real world T1 will decrease and T2 will increase. But let's see if this should happen, according to the second law of thermodynamics. So what's going on here? What is the net entropy change for T1? According to the second law of thermodynamics, the change in entropy for the universe is greater than 0. But in this case it is equal to the change in entropy for T1, plus the change in entropy for ... although not exactly ... instead of T1, let's just call it 1 ... for system 1, that is, here for of this hot system plus the change in entropy for system 2. So, what is the change in entropy for system 1? It loses Q1 at high temperature. It turns out minus Q (because the system gives off heat) divided by T1. Then we have to take into account the heat added to the T2 system. So let's add Q divided by T2. We get the entropy change for system 2, right? This reservoir, which has a temperature of 1 higher, loses heat. And the reservoir, which has a lower temperature 2, receives heat. Wouldn't it be higher than 0? Let's think a little. If we divide... let me rewrite it... I'll write it another way: Q divided by T2, minus this. I'm just rearranging the numbers... minus Q divided by T1. And what is the higher score now? T2 or T1? Well, T1 is bigger, right? Now that we have a higher score... When we use the word "higher" we mean a certain comparison. So T1 is above this one. Moreover, in the numerator in both cases we have the same number, right? That is, if I take, say, 1/2 minus 1/3, then I get an indicator greater than 0. This indicator is greater than this one, because this one has a larger denominator. You divide by a larger number. This is worth thinking about. You divide Q by this number, and then subtract Q divided by the larger number. So this fraction here will have a lower absolute value. And it will be greater than 0. Accordingly, the second law of thermodynamics is confirmed by our observation, according to which heat passes from a hot body to a cold one. Now you can say, hey Sal, I can prove you wrong. You can tell if I put air conditioning in the room... Here is the room, and here is what is outside. And you say - look what the air conditioner does! It's already cold in the room, but it's already hot outside. But what does an air conditioner do? It makes cold even colder and hot even hotter. He takes some Q and moves in this direction. Right? It takes heat from a cold room and releases it into hot air. And you say it violates the second law of thermodynamics. You just refuted it. You deserve Nobel Prize! But I'll tell you - you're forgetting one little fact. Inside this air conditioner there is a compressor and an engine that are actively working and creating such a result. And this engine, I'll highlight it in pink, also releases heat. Let's call it the Q engine. Thus, if you want to calculate the total entropy generated for the entire universe, it would be the entropy of a cold room, plus the change in entropy for the street. Cold room entropy plus outdoor entropy change. Let's mark a room here... You can say - okay. This change in entropy for a room that gives off heat ... let's say that the room maintains a constant temperature for at least one millisecond. The room gives off some Q at a certain temperature T1. And then... you have to put a minus here... then the street gets some heat at a certain temperature T2. And you say: this figure is less than this. Because the denominator is higher. Then it will be negative entropy, and you can say that this violates the second law of thermodynamics. No! Here we must take into account one more point: that the street also receives heat from the engine. Heat from the engine divided by outside temperature. And I guarantee that this variable, I will not give numbers right now, will make this whole expression positive. This variable will turn the total net entropy for the universe into a positive one. Now let's think a little about what entropy is in terms of terminology. In a chemistry class, it is not uncommon for a teacher to say that entropy equals disorder. It's not a mistake. Entropy equals disorder. This is not a mistake, because entropy is really a disorder, but you have to be very careful with the definition of disorder. Because one of the most common examples is: take a clean room - let's say your bedroom is clean, but then it becomes dirty. And they say - look, the universe has become more disordered. A dirty room has more clutter than a clean one. But this is not an increase in entropy. So this is not a very good example. Why? Yes, because clean and dirty are only the states of the room. And we remember that entropy is a macro variable of state. you use it for system descriptions when you're not in the mood to sit here and tell me exactly what each particle does. And it's a macro variable that shows how long it takes to tell me what each particle does. This variable indicates how many states there are in this case, or how much information about the states I would like to receive from you. In the case of a clean room and a dirty room, we only have two different states of the same room. If the room is kept at the same temperature and has the same number of molecules and so on, then it will have the same entropy. So, as the room gets dirtier, the entropy doesn't increase. For example, I have a dirty cold room. Let's say I walked into this room and put a lot of effort into cleaning it. So I add a portion of heat to the system, and the molecules of my sweat scatter all over the room - accordingly, there is more content in it, and it becomes warmer, turning into a hot, clean room with droplets of sweat. This content can be arranged in a lot of ways, and since the room is hot, each molecule in it can take on more states, right? Since the average kinetic energy is high, one can try to find out how many kinetic energies each molecule can have, and in the potential this amount can be quite large. In essence, this is an increase in entropy. From a dirty, cold room to a hot and clean one. And that fits in pretty well with what we know. That is, when I enter a room and start cleaning it, I bring warmth into it. And the universe is getting more... I guess we can say that entropy is increasing. So where is the confusion here? Let's say I have a ball and it hits the ground and hits it. And here we must ask a question that has been constantly asked since the discovery of the first law of thermodynamics. As soon as the ball hits the ground... The ball hits the ground, right? I threw it: in its upper part there is a certain potential energy, which then turns into kinetic energy, and the ball hits the ground and then stops. This is where a completely logical question arises - what happened to all this energy? Law of energy conservation. Where did she all go? Right before hitting the ground, the ball had kinetic energy and then stopped. It seems that the energy has disappeared. But it's not. When the ball falls, it has a lot ... as you know, everything has its own warmth. But what about the earth? Its molecules vibrated with a certain kinetic energy and potential energy. And then the molecules of our ball began to vibrate a little. But their movement was mostly downward, right? The movement of most of the ball's molecules was directed downwards. When it hits the ground, then... let me draw the surface of the ball that is in contact with the ground. The molecules of the ball in its front part will look like this. And there are quite a few of them. This solid. Probably with a lattice structure. And then the ball hits the ground. When this happens… the earth is another solid body… Great, here we have a microstate. What will happen? These molecules will interact with these and transfer their downward kinetic energy... They will transfer it to these particles of the earth. And face them. And when, say, this particle collides with this one, it can move in this direction. And this particle will begin to oscillate like this, back and forth. This particle here can bounce off this one and move in this direction, and then collide with this one and move here. And then, because this particle here hits here, this one hits here, and because this one hits here, this one hits here. From the point of view of the ball, there is a relatively directional movement, but when it comes into contact with the molecules of the earth, it begins to generate kinetic energy and create movement in a variety of directions. This molecule here will move this one here, and this one will move here. Now the movement will not be directed if we have so many molecules ... I will mark them with a different color ... well, if we have many molecules and they all move in exactly the same direction, then the microstate will look like a macrostate. The whole body will be in this direction. If we have a lot of v and they all move in different directions, then my ball as a whole will stay in place. We can have the same amount of kinetic energy on molecular level , but they will all collide with each other. And in this case, we can describe the kinetic energy as internal energy or as temperature, which is the average kinetic energy. Thus, when we say that the world is getting more chaotic, we are thinking about the order of the speeds or energies of the molecules. Before they are ordered, the molecules may vibrate a little, but mostly they will fall down. But when they hit the ground, they all immediately begin to vibrate in different directions a little more. And the earth also begins to vibrate in different directions. So – at the microstate level – things get a lot messier. There is another rather interesting question. There is another possibility… You might think, “Look, this ball has fallen and hit the ground. Why doesn't he just--couldn't it be that the molecules of the earth themselves change their order so that they properly hit the molecules of the ball? There is a certain probability that, due to the random motion, at some point in time, all the molecules of the earth will simply hit the molecules of the ball in such a way that it bounces up again. Yes it is. There is always an infinitesimal chance that this will happen. There is a possibility that the ball will just lie on the ground... which is quite interesting... You will probably have to wait a hundred million years for this to happen, if it ever happens... and the ball may just bounce up. There is a very small possibility that these molecules will randomly vibrate in such a way as to be ordered for a second, and then the ball will bounce. But the probability of this is practically 0. So, when people talk about order and disorder, the disorder increases, because now these molecules will move in different directions and take on more potential states. And we saw it. As you know, at a certain level, entropy looks like something magical, but at other levels it seems quite logical. In one video... I think that was the last video... I had a lot of molecules, and then there was this extra space right here, after which I removed the wall. And we saw that these molecules... it is clear that there were some molecules that were repelled from this wall before, because there was a certain pressure associated with it. Then, as soon as we remove this wall, the molecules that would have hit it will continue to move. There is nothing to stop them. The movement will be carried out in this direction. They can collide with other molecules and with these walls. But as far as this direction is concerned, the probability of collision, especially for these molecules, is basically 0. Therefore, there will be an expansion and filling of the container. So everything is quite logical. But most importantly, the second law of thermodynamics, as we saw in this video, says the same thing. That is, that the molecules will move and fill the container. And it is very unlikely that they will all return to an ordered state. Of course, there is a certain possibility that, by moving randomly, they will return to this position. But this probability is very, very small. Moreover, and I want to emphasize this, S is a macro state. We never talk about entropy in relation to a single molecule. If we know what an individual molecule does, we don't have to worry about entropy. We have to think about the system as a whole. So if we look at the whole system and ignore the molecules, we won't know what really happened. In this case, we can pay attention only to the statistical properties of molecules. How many molecules do we have, what is their temperature, their macrodynamics, pressure... and you know what? The container in which these molecules are placed has more states than a smaller container with a wall. Even if suddenly all the molecules randomly gather here, we will not know that this happened, because we do not look at microstates. And this is very important to keep in mind. When someone says that a dirty room has a higher entropy than a clean one, we must understand that they are talking about microstates. And entropy is, first of all, a concept associated with a macrostate. You can simply say that a room has a certain amount of entropy. That is, the concept of entropy is related to the room as a whole, but it will only be useful when you do not know exactly what is happening in it. You only have the most general idea about what the room is filled with, what temperature is in it, what pressure. These are all common macro properties. Entropy will tell us how many macrostates this macrosystem can have. Or how much information, after all, there is the concept of information entropy, how much information I have to provide you so that you get an accurate idea of ​​the microstate of the system at the appropriate moment in time. Like that. I hope this discussion has been of some help to you and cleared up some misconceptions about entropy, as well as helping you get an idea of ​​what it really is. Until the next video!

Formal definitions

Informational binary entropy for independent random events x (\displaystyle x) With n (\displaystyle n) possible states distributed with probabilities ( i = 1 , . . . , n (\displaystyle i=1,...,n)), is calculated by the formula

H (x) = − ∑ i = 1 n p i log 2 ⁡ p i . (\displaystyle H(x)=-\sum _(i=1)^(n)p_(i)\log _(2)p_(i).)

This value is also called average message entropy. Value H i = − log 2 ⁡ p i (\displaystyle H_(i)=-\log _(2)(p_(i))) called private entropy characterizing only i (\displaystyle i)-e state. In general, the base of the logarithm in the definition of entropy can be anything greater than 1; its choice determines the entropy unit. So, often (for example, in problems of mathematical statistics) it may be more convenient to use the natural logarithm.

Thus, the entropy of the system x (\displaystyle x) is the sum with the opposite sign of all the relative frequencies of occurrence of the state (event) with the number i (\displaystyle i), multiplied by their own binary logarithms . This definition for discrete random events can be formally extended to continuous distributions given by the density distribution of probabilities , however, the resulting functional will have somewhat different properties (see differential entropy).

Definition according to Shannon

The definition of Shannon's entropy is connected with the concept of thermodynamic entropy. Boltzmann and Gibbs did a great job of statistical thermodynamics, which contributed to the adoption of the word "entropy" in information theory. There is a connection between thermodynamic and informational entropy. For example, Maxwell's demon also contrasts the thermodynamic entropy of information, and gaining any amount of information is equal to lost entropy.

Definition using own information

It is also possible to determine the entropy of a random variable by first introducing the concepts of the distribution of a random variable X (\displaystyle X), which has a finite number of values:

P X (x i) = p i , p i ≥ 0 , i = 1 , 2 , … , n (\displaystyle P_(X)(x_(i))=p_(i),\quad p_(i)\geqslant 0,\ ;i=1,\;2,\;\ldots ,\;n) ∑ i = 1 n p i = 1 (\displaystyle \sum _(i=1)^(n)p_(i)=1) I (X) = − log ⁡ P X (X) . (\displaystyle I(X)=-\log P_(X)(X).)

Then the entropy is defined as:

H (X) = E (I (X)) = − ∑ i = 1 n p (i) log ⁡ p (i) . (\displaystyle H(X)=E(I(X))=-\sum _(i=1)^(n)p(i)\log p(i).)

The unit of measurement of the amount of information and entropy depends on the base of the logarithm: bit, nat, trit or hartley.

Properties

Entropy is a quantity defined in the context of a probabilistic model for a data source. For example, tossing a coin has entropy:

− 2 (1 2 log 2 ⁡ 1 2) = − log 2 ⁡ 1 2 = log 2 ⁡ 2 = 1 (\displaystyle -2\left((\frac (1)(2))\log _(2)( \frac (1)(2))\right)=-\log _(2)(\frac (1)(2))=\log _(2)2=1) bits per throw (assuming it is independent), and the number possible states equals: 2 1 = 2 (\displaystyle 2^(1)=2) possible states(meanings) ("eagle" and "tails").

A source that generates a string consisting only of the letters "A" has zero entropy: − ∑ i = 1 ∞ log 2 ⁡ 1 = 0 (\displaystyle -\sum _(i=1)^(\infty )\log _(2)1=0), and the quantity possible states equals: 2 0 = 1 (\displaystyle 2^(0)=1) possible state(value) ("A") and does not depend on the base of the logarithm.
This is also information that must also be taken into account. An example of storage devices that use bits with entropy equal to zero, but with amount of information equal to 1 possible state, i.e. non-zero are data bits written to ROM, in which each bit has only one possible state.

So, for example, it can be experimentally established that the entropy English text is equal to 1.5 bits per character, which of course will vary for different texts. The degree of entropy of a data source means the average number of bits per data element required to encrypt it without loss of information, with optimal encoding.

  1. Some data bits may not carry information. For example, data structures often store redundant information, or have identical sections regardless of the information in the data structure.
  2. The amount of entropy is not always expressed as an integer number of bits.

Mathematical properties

  1. Non-negativity: H (X) ⩾ 0 (\displaystyle H(X)\geqslant 0).
  2. Limitation: H (X) = − E (log 2 ⁡ p i) = ∑ i = 1 n p i log 2 ⁡ 1 p i = ∑ i = 1 n p i f (g i) ⩽ f (∑ i = 1 n p i g i) = log 2 ⁡ n (\displaystyle H(X)=-E(\log _(2)p_(i))=\sum _(i=1)^(n)p_(i)\log _(2)(\frac (1)(p_ (i)))=\sum _(i=1)^(n)p_(i)f(g_(i))\leqslant f\left(\sum _(i=1)^(n)p_(i )g_(i)\right)=\log _(2)n), which follows from the Jensen inequality for the concave function f (g i) = log 2 ⁡ g i (\displaystyle f(g_(i))=\log _(2)g_(i)) And g i = 1 p i (\displaystyle g_(i)=(\frac (1)(p_(i)))). If all n (\displaystyle n) elements from X (\displaystyle X) equiprobable, H (X) = log 2 ⁡ n (\displaystyle H(X)=\log _(2)n).
  3. If independent, then H (X ⋅ Y) = H (X) + H (Y) (\displaystyle H(X\cdot Y)=H(X)+H(Y)).
  4. Entropy is an upwardly convex function of the probability distribution of elements.
  5. If X , Y (\displaystyle X,\;Y) have the same probability distribution of elements, then H (X) = H (Y) (\displaystyle H(X)=H(Y)).

Efficiency

The alphabet may have a probability distribution far from uniform. If the original alphabet contains n (\displaystyle n) characters, then it can be compared with an “optimized alphabet”, the probability distribution of which is uniform. The entropy ratio of the original and optimized alphabet is efficiency source alphabet, which can be expressed as a percentage. The efficiency of the original alphabet with n (\displaystyle n) characters can also be defined as its n (\displaystyle n)-ary entropy.

Entropy limits the maximum possible lossless (or almost lossless) compression that can be realized using a theoretically-typical set or, in practice, Huffman coding, Lempel-Ziv-Welch coding, or arithmetic coding.

Variations and Generalizations

b-ary entropy

In general b-ary entropy(Where b equals 2, 3, ...) sources S = (S , P) (\displaystyle (\mathcal (S))=(S,\;P)) with original alphabet S = ( a 1 , … , a n ) (\displaystyle S=\(a_(1),\;\ldots ,\;a_(n)\)) And discrete distribution probabilities P = ( p 1 , … , p n ) , (\displaystyle P=\(p_(1),\;\ldots ,\;p_(n)\),) Where p i (\displaystyle p_(i)) is the probability ( p i = p (a i) (\displaystyle p_(i)=p(a_(i)))), is determined by the formula:

H b (S) = − ∑ i = 1 n p i log b ⁡ p i . (\displaystyle H_(b)((\mathcal (S)))=-\sum _(i=1)^(n)p_(i)\log _(b)p_(i).)

In particular, when b = 2 (\displaystyle b=2), we get the usual binary entropy, measured in bits. At b = 3 (\displaystyle b=3), we get a trinary entropy measured in trits (one trit has an information source with three equiprobable states). At b = e (\displaystyle b=e), we get information measured in nats.

Conditional entropy

If the sequence of alphabetic characters is not independent (for example, in French, the letter “q” is almost always followed by “u”, and after the word “leader” in Soviet newspapers, the word “production” or “labor” was usually followed), the amount of information carried the sequence of such symbols (and hence the entropy) is obviously smaller. Conditional entropy is used to account for such facts.

Conditional entropy The first order (similarly for the first order Markov model) is called the entropy for the alphabet, where the probabilities of the appearance of one letter after another are known (that is, the probabilities of two-letter combinations):

H 1 (S) = − ∑ i p i ∑ j p i (j) log 2 ⁡ p i (j) , (\displaystyle H_(1)((\mathcal (S)))=-\sum _(i)p_(i) \sum _(j)p_(i)(j)\log _(2)p_(i)(j),)

Where i (\displaystyle i) is the state dependent on the preceding character, and p i (j) (\displaystyle p_(i)(j)) is the probability j (\displaystyle j) provided that i (\displaystyle i) was the previous character.

For example, for the Russian language without the letter "ё" H 0 = 5 , H 1 = 4.358 , H 2 = 3 , 52 , H 3 = 3 , 01 (\displaystyle H_(0)=5,\;H_(1)=4(,)358,\;H_( 2)=3(,)52,\;H_(3)=3(,)01) .

The partial and general conditional entropies completely describe information losses during data transmission in a noisy channel. For this, the so-called channel matrices. To describe the losses on the source side (that is, the sent signal is known), the conditional probability of receiving a symbol by the receiver is considered, provided that the symbol was sent a i (\displaystyle a_(i)). In this case, the channel matrix has the following form:

b 1 (\displaystyle b_(1)) b 2 (\displaystyle b_(2)) b j (\displaystyle b_(j)) b m (\displaystyle b_(m))
a 1 (\displaystyle a_(1)) p (b 1 ∣ a 1) (\displaystyle p(b_(1)\mid a_(1))) p (b 2 ∣ a 1) (\displaystyle p(b_(2)\mid a_(1))) p (b j ∣ a 1) (\displaystyle p(b_(j)\mid a_(1))) p (b m ∣ a 1) (\displaystyle p(b_(m)\mid a_(1)))
a 2 (\displaystyle a_(2)) p (b 1 ∣ a 2) (\displaystyle p(b_(1)\mid a_(2))) p (b 2 ∣ a 2) (\displaystyle p(b_(2)\mid a_(2))) p (b j ∣ a 2) (\displaystyle p(b_(j)\mid a_(2))) p (b m ∣ a 2) (\displaystyle p(b_(m)\mid a_(2)))
a i (\displaystyle a_(i)) p (b 1 ∣ a i) (\displaystyle p(b_(1)\mid a_(i))) p (b 2 ∣ a i) (\displaystyle p(b_(2)\mid a_(i))) p (b j ∣ a i) (\displaystyle p(b_(j)\mid a_(i))) p (b m ∣ a i) (\displaystyle p(b_(m)\mid a_(i)))
a m (\displaystyle a_(m)) p (b 1 ∣ a m) (\displaystyle p(b_(1)\mid a_(m))) p (b 2 ∣ a m) (\displaystyle p(b_(2)\mid a_(m))) p (b j ∣ a m) (\displaystyle p(b_(j)\mid a_(m))) p (b m ∣ a m) (\displaystyle p(b_(m)\mid a_(m)))

Obviously, the probabilities located along the diagonal describe the probability of correct reception, and the sum of all elements of any row gives 1. The losses attributable to the transmitted signal a i (\displaystyle a_(i)), are described in terms of partial conditional entropy:

H (B ∣ a i) = − ∑ j = 1 m p (b j ∣ a i) log 2 ⁡ p (b j ∣ a i) . (\displaystyle H(B\mid a_(i))=-\sum _(j=1)^(m)p(b_(j)\mid a_(i))\log _(2)p(b_( j)\mid a_(i)).)

To calculate the transmission loss of all signals, the total conditional entropy is used:

H (B ∣ A) = ∑ i p (a i) H (B ∣ a i) . (\displaystyle H(B\mid A)=\sum _(i)p(a_(i))H(B\mid a_(i)).)

H (B ∣ A) (\displaystyle H(B\mid A)) means entropy on the source side, similarly considered H (A ∣ B) (\displaystyle H(A\mid B))- entropy from the receiver side: instead of p (b j ∣ a i) (\displaystyle p(b_(j)\mid a_(i))) is indicated everywhere p (a i ∣ b j) (\displaystyle p(a_(i)\mid b_(j)))(by summing the elements of a row, you can get p (a i) (\displaystyle p(a_(i))), and the elements of the diagonal mean the probability that exactly the character that was received was sent, that is, the probability of a correct transmission).

Mutual entropy

Mutual entropy or union entropy is designed to calculate the entropy of interconnected systems (the entropy of the joint appearance of statistically dependent messages) and is denoted H (A B) (\displaystyle H(AB)), Where A (\displaystyle A) characterizes the transmitter, and B (\displaystyle B)- receiver.

Similar articles

2023 liveps.ru. Homework and ready-made tasks in chemistry and biology.