# joint and conditional entropies | what is conditional entropy , joint entropy example proof definition

By   August 2, 2020

info about joint and conditional entropies | what is conditional entropy , joint entropy example proof definition ?
THE CONDITIONAL AND JOINT ENTROPIES
Using the input probabilities P(xi), output probabilities P(yi), transition probabilities P(yj|xi), and probabilities P(xi, yj), let us define the following various entropy functions for a channel with m inputs and n outputs:
equation                                      …(9.23)
equation                                      …(9.24)
equation                                      …(9.25)
equation                                      …(9.26)
equation                                      …(9.27)
These entropies can be interpreted as under:
H(X) is the average uncertainty of the channel input, and H(Y) is the average uncertainty of the channel output.

 DO YOU KNOW? The maximum rate of trans-mission occurs when the source is matched to the channel.

The conditional entropy H(X | Y) is a measure of the average uncertainty remaining about the channel input after the channel output has been observed.
Also, H(X|Y) is sometimes called the equivocation of X with respect to Y.
The conditional entropy H(Y|X) is the average uncertainty of the channel output given that X was transmitted. The joint entropy H(Y|Y) is the average uncertainty of the communication channel as a whole.
Two useful relationships among the above various entropies are as under:
H(X, Y) = H (X|Y) + H(Y)                            …(9.28)
H(X,Y) = H(Y|X) + H(X)                              …(9.29)
9.11 THE MUTUAL INFORMATION          (U.P. Tech., Sem. Exam., 2003-2004)
The mutual information denoted by I(X ; Y) of a channel is defined by
I(X ; Y) = H(X) – H(X|Y) b/symbol               ..,(9.30)
Since H(X) represents the uncertainty about the channel input before the channel output is observed and H(X|Y) represents the uncertainty about the channel input after the channel output is observed, the mutual information I(X;Y) represents the uncertainty about the channel input that is resolved by observing the channel output.
Properties of Mutual Information I(X; Y)
(i)         I(X; Y) = I(Y;X)                                                                                    …(9.31)
(ii)        I(X;Y) ≥ 0                                                                                              …(9.32)
(iii)       I(X;Y) = H(Y) – H(Y|X)                                                                        …(9.33)
(iv)       I(X;Y) = H(X) + H(Y) – H(X,Y)                                               …(9.34)
EXAMPLE 9.22. For a lossless channel prove that
H(X|Y) =0                                                                   …(i)
Solution: When we observe the output yj in a lossless channel shown in figure 9.10, it is clear which xi was transmitted. This means that
P(xi|yj) = 0 or 1                                    …(ii)
Further, we know that
equation
or                                               equation
Note that all the terms in the inner summeation are zero because they are in the form of 1 x log2 1 or 0 x log2 0. Hence, we conclude that for a lossless channel.
H(X|Y) = 0                              Hence Proved.
diagram
figure 9.10 Lossless channel.
EXAMPLE 9.23. Given a noiseless channel with m input symbols and m output symbols as shown in figure 9.11. Prove that
H(X) = H(Y)                                                   …(i)
and                                          H(Y|X) = 0                                                            …(ii)
Solution: For noiseless channel, the transition probabilities are given by
diagram
figure 9.11 Noiseless channel.
equation
Hence                                        equation
and                                            equation
Also, we have             equation
Thus, using equations(v) and (vi), we have
equation
or                                               equation
Further, we know that
equation
Now, using equations (iii), (iv) and (vii) we have
equation
or                                                equation
EXAMPLE 9.24 Verify the following expression:
H(X,Y) = H (X|Y) + H(Y)
Solution: We know that
P(xi, yj) = P(xi|yj) P(yj)
and                                          equation
Also, we have             equation
equation
or                                             equation
or                                             equation
or                                             equation
EXAMPLE 9.25 Show that the mutual information I(X;Y) for the channel described, by equation (9.11) with the input probabilities (Pxi), i = 1, 2, …m and the output probabilities P(yj), j = 1, 2, …n, can be expressed as
equation
Solution: We know that
I (X;Y) = H(X) – H(X|Y)
Also, we have
equation
equation
equation
equation
EXAMPLE 9.26. Verify the following expression:
I(X;Y) = I(X;Y)
Solution: We can express I(Y;X as)
equation
We know that             P(yj,xi) = P(xi,yj)
and                                           =
Thus, comparing equation (i) with equation (i), in problem 9.25, we conclude that
I(X;Y) = I(Y;X)                                  Hence Proved
EXAMPLE 9.27. Verify the following expression:
I(X:Y) ≥ 0
Solution : We have
equation
Using Bayes rule, we have
=
We can write equation (i) as under:
equation
Also, we know that                 in α ≤ α -1
Therefore, we have
Equation
or                                               equation
Since                                         equation
equation
Equation (iii) reduces to
– I(X;Y) ≤ 0
I(X;Y) ≥ 0                              Hence Proved.
EXAMPLE 9.28 Given a binary symmetric channel (BSC) (figure 9.12) with P(x1) = α.
(i)         Show that the mutual information I(X;Y) is given by
I(X;Y) = H(Y) + p log2 p + (1 – p) log2 (1 – p)                       …(i)

(ii)        Calculate I(X;Y) for α = 0.5 and p = 0.1.
(iii)       Repeat part (ii) for a = 0.5 and p = 0.5, and comment on the reuslt.
Figure 9.12 shows the diagram of the binary symmetric channel (BSC) with associated input probabilities.
Solution: (i) Using equations (9.18), (9.19) and (9.22) we have
diagram
figure 9.12 Binary symmetric channel (BSC)
[P(X,Y)] =
or                                             equation
We know that H(Y|X) =         equation
By above expression, we have
H(Y|X) = -P(x1,y1) log2 P(y1|x1) – P(x1,y2) log2 P(y2|x1)
P(x2,y1) log2 P(y1|x2) – P(x2,y2) log2 P(y2|x2)
or                                 H(Y|X) = – α (1 – p) log2 (1 – p) = αp log2 p – (1 – α)p
zlog2 p – (1 – α) (1 – p) log2 (1-p)
or                                 H(Y|X) = -p log2 p – (1 – p) log2 (1 – p)                  …(ii)
We know that I(X;Y) = H(Y) – H(Y|X) = H(Y) + p log2 p + (1 – p)                                      log2 (1 – p)
(ii)        We know that
[P(Y)] = [P(X)] [P(Y|X)]
when α = 0.5 and p = 0.1, then, we have
[P(Y)] = [0.5   0.5]   = [0.5  0.5]
Thus,                           P(y1) = P(y2) = 0.5
Now, using the expression
equation
H(Y) = -P(y1) log2 P(y1) – P(yj) log2 P(y2)
or                     H(Y) = -0.5 log2 0.5 – 0.5 log2 0.5 = 1
p log2 p + (1 – p) log2 (1 – p) = 0.1 log2 0.1 + 0.9 log2 0.9 = -0.469

 DO YOU KNOW? Telephone channels that are affected by switching transients and dropouts, and microwave radio links that are subjected to fading are examples of channels with memory.

Thus,   I(X;Y) = 1 – 0.469 = 0.531     Ans.
(iii)       When  α = 0.5 and p = 0.5, we have
[P(Y)] = [0.5   0.5]   = [0.5  0.5]
H(Y) = 1
p log2 p + (1 – p) log2 (1 – p) = 0.5 log2 0.5 + 0.5 log2 0.5 = -1
Thus,               I(X;Y) = 1 – 1 = 0
NOTE: It may be noted that in this case, (p = 0.5) no information is being transmitted at all. An equally acceptable decision could be made by dispensing with the channel entirely and “flipping a coin” at the receiver. When I(X;Y) = 0, the channel is said to be useless.