# ENTROPY | what is properties of entropy in information theory (AVERAGE INFORMATION)

By   July 31, 2020

what is properties of entropy in information theory ?
ENTROPY (I.e., AVERAGE INFORMATION)
(i) Definition
In a practical communication system, we usually transmit long sequences of symbols from an information source. Thus, we are more interested in the average information that a source produces than the information content of a single symbol.
In order to get the information content of the symbol, we take notice of\ the fact that the flow of information in a system can fluctuate widely because of randomness involved into the selection of the symbols. Thus, we require to talk about the average information content of the symbols in a long message.
(ii)        Assumptions
Thus, for quantitative representation of average information per symbol we Make the following assumptions:
(i)         The source is stationary so that the probabilities may remain constant with time.
(ii)        The successive symbols are statistically independent and come form the source at a average rate of r symbols per second.
(iii)       Mathematical Expression
The mean value of I(xi) over the alphabet of source X with m different symbols is represented by H(X) and given by
*          Since log [AB] = log A + log B
equation
or                                                  equation
The quantity H(X) is known as the entropy of source X. It is a measure of the average information content per source symbol. The source entropy H(X) can be considered as the average amount of uncertainty within source X that is resolved by use of the alphabet.
(iv)       Entropy for Binary Source
It may be noted that for a binary source X which generates independent symbols 0 and 1 with equal probability, the source entropy H(X) is
H(X) =  log2  log2  = 1b/symbol              …(9.8)
(v)        Lower and Upper Bounds on Entropy H(X)
The source entropy H(X) satisfies the following relation:
0 £ H(X) £ log2m                    …(9.9)
where m is the size (number of symbols) of the alphabet of source X. The lower bound corresponds to no uncertainty, which occurs when one symbol has

 DO YOU KNOW? A simple interpretation of the source entropy is the following : on the average, we can expect to get H bits of information per symbol in long messages from the information source even through we cannot say in advance what symbol sequences will occur.

probability P(xi) = 1 while P(xi) = 0 for j  i, so X emits the same symbol xi all the time. The upper bound corresponds to the maximum uncertainty which occurs when P(xi) = 1/m for all i, that is, when all symbols are equality likely to be emitted by X.

9.7       INFORMATION RATE
Definition and Mathematical Expression
If the time rate at which source X emits symbols is r (symbols s), the information rate R of the source is given by
R = rH(X) b/s                                      …(9.10)
Here R is information rate.
H(X) is Entropy or average information
and r is rate at which symbols are generated.
Information rate R is represented in average number of bits of information per second. It is calculated as under:
R =             …(9.11)
or                             R = Information bits/second                                      …(9.12)
EXAMPLE 9.8. Verify the following expression:
I(xixj) = I(xi) + I(xi) if xi and xj are independent
*          Using the relation I(xi) = log2 .
Solution: If xi and xj .indepedent, then we know that
P(xixk) = P(xi) P(xj)
Also                             I(xixj) = log
equation
or                                 I(xixj) = I(xi) + I(xj)     Hence Proved.
EXAMPLE 9.9. A Discrete Memoryless Source (DMS) X has four symbols x1, x2, x3, x4 with probabilities P(x1) = 0.4, P(x2) = 0.3, P(x3) = 0.2, P(x4) = 0.1.
(i)         Calculate H(X).
(ii)        Find the amount of information contained in the messages x1 x2 x1 x3 and x4 x3, x3 x2, and compare with the H(X) obtained in part (i).
(U.P. Tech-Semester Exam. 2002-2003)
Solution: (i) We know that entropy is given by
equation
Substituting values and simplifying, we get
H(X) = – 0.4 log2 0.4 – 0.3 log2 0.3 – 0.2 log2 0.2 – 0.1 log2 0.1
Solving, we get
H(X) = 1.85 b/symbol                         Ans.
(ii) Now, we have
P(x1 x2 x1 x3) (0.4)(0.3)(0.4)(0.2) = 0.0096
Therefore,
I(x1x2x1x3) = – log2 0.0096 = 6.70 b/symbol* Ans.
Thus,               I(x1x2x1x3) < 7.4 [= 4H(X)] b/symbol
Also,                P(x4x3x3x2) = (0.1)(0.2)2(0.3) = 0.0012
Therefore
I(x4x3x3x2) = – log2 0.0012 = 9.70 b/symbol
We conclude that
I(x4x3x3x2) > 7.4 [= 4H(X)J b/symbol             Ans.
EXAMPLE 9.10. Consider a binary memoryless source X with two symbols x1 and x2. Prove that H(X) is maximum when both x1 and x2 equiprobable.
Solution: Here, let us assume that
P(x1) = α so that P(x2) = 1 – α
We know that entropy is given by
equation
Thus,               H(X) = – α log2 α – (1- α) log2 (1- α)
Differentiating above equation with respect to α, we get
or                                  =  [-α log2 α- (1 – α) log2 (1-α)]                  …(i)
Using the relation
equation
We get                          = – log2 α + log2 (1 – α) = log2
Note that the maximum value of H(X) requires that
This means that
or   α =
Note that H(X) = 0 when α = 0 or 1.
When               P(x1) = P(x2) = , H(X) is maximum and is given by
H(X)  =  log2 +   log2 = 1 b/symbol                                …(ii)
EXAMPLE 9.11. Verify the following expression:
0 ≤ P(xi) log2 m
where m is the size of the alphabet of X.
Solution: Proof of the lower bound:
Since                                       0 ≤ P(xi) ≤ 1,
and      log2
Then, it follows that
P(xi) log2
Thus,                                       equation
Next, we note that
P(xi) log2
If and only if P(xi)= 0 or 1. Since
equation
When P(xi) = 1 then P(xj) = 0 for j  i. Thus, only in this case, H(X) = 0.
Proof of the upper bound
Let us consider two probability distributions [P(xi)= Pi] and [Q(xi) = Qi] on the alphabet {xi}, i = 1, 2, ….m, such that
equation
We know that
equation
Next, using the inequality
In α ≤ α – 1 α ≥ 0                                             …(iii)
and noting that the equality holds only if α = 1, we obtain
equation
equation  Pi = 0 by using equations (ii).
Hence, we have
equation
where the equality holds only if Qi= Pi for all i.
Setting                                     Qi                            i = 1,2, …,m                 …(iv)
We get                         equation
Therefore, we get                    H(X) ≤ log2 m
and the quality holds only if the symbols in X are equiprobable.
EXAMPLE 9.12. A high-resolution black-and-white TV picture consists of about 2 x 106 picture elements and 16 different brightness levels. Pictures are repeated at the rate of 32 per second. All picture elements are assumed to be independent, and all levels have equal likelihood of occurrence. Calculate the average rate of information conveyed by this TV picture source.
(U.P. Tech. (C.O.), 2003)
Solution: We know that entropy is given by
log2 P(xi) bits/symbol                             …(i)
Here, given that                      m = 16, P(xi) =
Substituting all the values in equation (i), we get
equation
The rate, r, at which symbols are generated is given by
r = 2(106)(32) = 64(106) elements/second
Hence, the average rate of information conveyed is given by
R = rH(X) = 64(106)(4) = 256(106) b/s = 256 Mb/s Ans.
EXAMPLE 9.13. Given a telegraph source having two symbols, dot and dash. The dot duration is 0.2s The dash duration is 3 times the dot duration. The probability of the dot’s occurring is twice that of the dash, and the time between symbols is 0.2s. Calculate the information rate of the telegraph source.
Solution: Given that
P(dot) = 2P(dash)
P(dot) + P(dash) = 3P(dash) = 1
From above two equations, we have
P(dash) =  and P(dot) =
Further, we know that the entropy is given by
log2 P(xi) bits/symbol
Expanding accordingly, we get
H(X) = -P(dot) log2 P(dot) – P(dash) log2 P(dash)
Substituting all the values, we get
H(X) = 0.667(0.585) + 0.333(1.585) = 0.92 b/symbol
Also, given that
tdot = 0.2s, tdash = 0.6s, tspace = 0.2 s
Thus, the average time per symbol will be
Ts = P(dot)tdot + P(dash)tdash + tspace = 0.5333 s/symbol
Further, the average symbol rate will be given by
r =  = 1.875 symbols/s
Therefore, the average information rate of the telegraph source will be
R = rH(X) = 1.875(0.92) = 1.725 b/s Ans.
EXAMPLE 9.14. An analog signal is bandlimited to fm. Hz and sampled at Nyquist rate. The samples are quantized into 4 levels. Each level represent one symbols. Thus there are 4 symbols. The probabilities of occurrence of these 4 levels (symbols) are P(x1) = P(x4) = and P(x2) = P(x3) =  Obtain information rate of the source.                                               (Delhi University, 2000)
Solution: We are given four symbols with probabilities P(xi) = P(x4) =  and P(x2) = P(x3) = .
Average information H(X) (or entropy) is expressed as,
H(X) = P(x1) log2  + P(x2) log2  + P(x3) log2  + P(x2) = P(x3) =
Substituting all the given values, we get
or         H(X) =  log2, 8 +  log2  +  log2  +  log2 (8)
or         H(X) = 1.8 symbol Ans.
It is given that the signal is sampled at Nyquist rate. Nyquist rate for fm Hz bandlimited signal is,
Nyquist rate = 2fm samples/sec.
Since every sample generates one source symbol,
Therefore, symbols per second, r = 2fm symbols/sec.
Information rate is given by
R = rH(X)
Putting, values of r and H(X) in this equation, we get
R = 2fm symbols/sec x 1.8 bits/symbol = 3.6 fm its/sec                     Ans.
In this example, there are four levels. Those four levels may he coded using binary PCM as shown in Table 9.1
Table 9.1

 S.No. Symbol or level Probability Binary digits 1. Q1 0 0 2. Q2 0 1 3. Q3 1 0 43. Q4 1 1

Hence, two binary digits (binits) are required to send each symbol. We know that symbols are sent at the rate of 2fm, symbols/sec. Therefore, transmission rate of binary digits will be,
Binary digits (binits) rate = 2 binits/symbol x 2fm symbols/sec,
=  4fm binits/sec.
Because one binit is capable of conveying 1 bit of information therefore, the above coding scheme is capable of conveying 4fm, bits of information per second. But in this example, we have obtained that we are transmitting 9.6 fm bits of information per second. This mean, that the information carrying ability of binary PCM is not completely utilized by the transmission scheme.
EXAMPLE 9.15. In the transmission scheme of last example, obtain the information rate if all symbols are equally likely.
Also, comment on the result.
Solution: We are given that there are four symbols. Since they are equally likely, therefore, their probabilities will be equal to  i.e.,
P(x1) = P(x2) = P(x3) = P(x4) =
We know that, average information per symbol (Entropy) is given as,
H(X) = P(x1) log2  + P(x2) log2  + P(x3) log2  + P(x4) log2
since                            P(x1) = P(x2) = P (x3) = P(x4)
Therefore,                    H(X) = 4 P(x). log2
Again, since    P(x1) = P(x2) = P(x3) = P(x4) = P(x) =
Thus,                           H(X) = log2 (4) = 2 bits/symbol
We know that the information rate is expressed as,
R = rH(X)
Here, r = 2fm symbols/sec. as obtained in last example. Substituting, these values in above equation, we get,
R = 2fm symbols/sec. x 2 bits/symbol = 4fm bits/sec. Ans.
Comments: Just before this example we have seen that a binary coded PCM with 2 binits per message is capable of conveying 4B bits of information per second. The transmission scheme discussed in above example transmits 4B bits of information per second. This has been made possible since all the messages are equally likely. Thus with binary PCM clding, the maximum information rate is achieved if all messages are equally likely.
EXAMPLE 9.16. A discrete source emits one of five symbols once every millisecond with probabilities 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Determine the source entropy and information rate.
Solution: We know that the source entropy is given as
equation
or         H(X) =  log2(2) +  log2 (4) +  log2(8) +  log2 (16) +  log2 (16)
or         H(X)    =  = 1.875 bit/symbol   Ans.
The symbol rate r  =  = 1000 symbols/sec.
Therefore, the information rate is expressed as
R = rH(X) = 1000 x 1.875 = 1875 bits-sec.                Ans.
EXAMPLE 9.17. The probabilities of the five possible outocomes of an experiment are given as
P(x1) = , P(x2) =  P(x3) =  P(x4) = P(x5) =
Determine the entropy and information rate if there are 16 outcomes per second.                                                                                            (Delhi University, 1999)
Solution: The entropy of the system is given as
equation
or         H(X) =  log2 2 +  log2 4 +  log2 8 +  log2 16 +  log2 16
or         H(X)    =
or         H(X) =  =  = 1.875 bits/outcome    Ans.
Now, rate of outcomes r = 16 outcomes/sec.
Therefore, the rate of information R will be
R = rH(X) = 16 x  = 30 bits/sec      Ans.
EXAMPLE 9.18. An analog signal bandlimited to 10 kHz is quantized in 8 levels of a PCM system with probabilities of 1/4, 1/5 1/5, 1/10, 1/10, 1/20, 1/20 and 1/20 respectively. Find the entropy and the rate of information.
Solution: We know that according to the sampling theorem, the signal must be sampled at a frequency given as
fs = 10 x 2 kHz = 20 kHz
Considering each of the eight quantized levels as a message, the entropy of the source may written as
H(X)=  log2 4 +  log2 5 +  log2 5 +  log2 10 +  log2 10 +  log2 20 +  log2 20 +  log2 20
or         H(X)  log2 4 +  log2 4 +  log2 20 = 2.84 bits/message

 DO YOU KNOW? The reader should be aware of the fact that the data rate and information rate are two distinctly different quantities.

As the sampling frequency is 20 kHz then, the rate at which message is produced, will be given as
or         r = 20000 messages/sec.
Hence the information rate is
R = rH(X) = 20000 x 2.84 = 56800 bit/sec.   Ans.