info about joint and conditional entropies | what is conditional entropy , joint entropy example proof definition ?

**THE CONDITIONAL AND JOINT ENTROPIES **

Using the input probabilities P(x_{i}), output probabilities P(y_{i}), transition probabilities P(y_{j}|x_{i}), and probabilities P(x_{i}, y_{j}), let us define the following various entropy functions for a channel with *m* inputs and *n* outputs:

**equation** …(9.23)

**equation** …(9.24)

**equation** …(9.25)

**equation** …(9.26)

**equation** …(9.27)

These entropies can be interpreted as under:

H(X) is the average uncertainty of the channel input, and H(Y) is the average uncertainty of the channel output.

DO YOU KNOW? |

The maximum rate of trans-mission occurs when the source is matched to the channel. |

The conditional entropy H(X | Y) is a measure of the average uncertainty remaining about the channel input after the channel output has been observed.

Also, H(X|Y) is sometimes called the equivocation of *X* with respect to *Y*.

The conditional entropy H(Y|X) is the average uncertainty of the channel output given that *X* was transmitted. The joint entropy *H*(Y|Y) is the average uncertainty of the communication channel as a whole.

Two useful relationships among the above various entropies are as under:

H(X, Y) = H (X|Y) + H(Y) …(9.28)

H(X,Y) = H(Y|X) + H(X) …(9.29)

**9.11 THE MUTUAL INFORMATION (U.P. Tech., Sem. Exam., 2003-2004)**

The mutual information denoted by I(X ; Y) of a channel is defined by

I(X ; Y) = H(X) – H(X|Y) b/symbol ..,(9.30)

Since H(X) represents the uncertainty about the channel input before the channel output is observed and H(X|Y) represents the uncertainty about the channel input after the channel output is observed, the mutual information

*I*(X;Y) represents the uncertainty about the channel input that is resolved by observing the channel output.

**Properties of Mutual Information I(X; Y)**

(i)

*I*(X; Y) =

*I*(Y;X) …(9.31)

(ii)

*I*(X;Y) ≥ 0 …(9.32)

(iii)

*I*(X;Y) = H(Y) – H(Y|X) …(9.33)

(iv)

*I*(X;Y) = H(X) + H(Y) – H(X,Y) …(9.34)

**EXAMPLE 9.22. For a lossless channel prove that**

H(X|Y) =0 …(i)

**Solution:**When we observe the output y

_{j}in a lossless channel shown in figure 9.10, it is clear which x

_{i}was transmitted. This means that

P(x

_{i}|y

_{j}) = 0 or 1 …(ii)

Further, we know that

**equation**

or

**equation**

Note that all the terms in the inner summeation are zero because they are in the form of 1 x log2 1 or 0 x log2 0. Hence, we conclude that for a lossless channel.

H(X|Y) = 0

**Hence Proved.**

**diagram**

**figure 9.10**

*Lossless channel.*

**EXAMPLE 9.23. Given a noiseless channel with m input symbols and m output symbols as shown in figure 9.11. Prove that**

H(X) = H(Y) …(i)

and H(Y|X) = 0 …(ii)

**Solution:**For noiseless channel, the transition probabilities are given by

**diagram**

**figure 9.11**

*Noiseless channel.*

**equation**

Hence

**equation**

and

**equation**

Also, we have

**equation**

Thus, using equations(v) and (vi), we have

**equation**

or

**equation**

Further, we know that

**equation**

Now, using equations (iii), (iv) and (vii) we have

**equation**

or

**equation**

EXAMPLE 9.24 Verify the following expression:

H(X,Y) = H (X|Y) + H(Y)

**Solution:**We know that

P(x

_{i}, y

_{j}) = P(x

_{i}|y

_{j}) P(y

_{j})

and

**equation**

Also, we have

**equation**

**equation**

or

**equation**

or

**equation**

or

**equation**

**EXAMPLE 9.25 Show that the mutual information I(X;Y) for the channel described, by equation (9.11) with the input probabilities (Px**

_{i}), i = 1, 2, …m and the output probabilities P(y_{j}), j = 1, 2, …n, can be expressed as**equation**

**Solution:**We know that

I (X;Y) = H(X) – H(X|Y)

Also, we have

**equation**

**equation**

**equation**

**equation**

**EXAMPLE 9.26. Verify the following expression:**

*I***(X;Y) =**

*I*(X;Y)**Solution:**We can express I(Y;X as)

**equation**

We know that

*P*(y

_{j},x

_{i}) =

*P*(x

_{i},y

_{j})

and =

Thus, comparing equation (i) with equation (i), in problem 9.25, we conclude that

I(X;Y) = I(Y;X)

**Hence Proved**

**EXAMPLE 9.27. Verify the following expression:**

**I(X:Y) ≥ 0**

**Solution :**We have

**equation**

Using Bayes rule, we have

=

We can write equation (i) as under:

**equation**

Also, we know that in α ≤ α -1

Therefore, we have

**Equation**

or

**equation**

Since

**equation**

**equation**

Equation (iii) reduces to

– I(X;Y) ≤ 0

I(X;Y) ≥ 0

**Hence Proved.**

**EXAMPLE 9.28 Given a binary symmetric channel (BSC) (figure 9.12) with P(x**

_{1}) = α.**(i) Show that the mutual information I(X;Y) is given by**

*I***(X;Y) = H(Y) + p log**

_{2}p + (1 – p) log_{2}(1 – p) …(i)(ii) Calculate I(X;Y) for α = 0.5 and p = 0.1.

(iii) Repeat part (ii) for a = 0.5 and p = 0.5, and comment on the reuslt.

**Figure 9.12 shows the diagram of the binary symmetric channel (BSC) with associated input probabilities.**

**Solution:**(i) Using equations (9.18), (9.19) and (9.22) we have

**diagram**

**figure 9.12**Binary symmetric channel (BSC)

[P(X,Y)] =

or

**equation**

We know that H(Y|X) =

**equation**

By above expression, we have

H(Y|X) = -P(x

_{1},y

_{1}) log

_{2}

*P*(y

_{1}|x

_{1}) –

*P*(x

_{1},y

_{2}) log

_{2}

*P*(y

_{2}|x

_{1})

–

*P*(x

_{2},y

_{1}) log

_{2}

*P*(y

_{1}|x

_{2}) –

*P*(x

_{2},y

_{2}) log

_{2}

*P*(y

_{2}|x

_{2})

or H(Y|X) = – α (1 – p) log

_{2}(1 – p) = αp log

_{2}p – (1 – α)p

zlog

_{2}p – (1 – α) (1 – p) log

_{2}(1-p)

or H(Y|X) = -p log

_{2}p – (1 – p) log

_{2}(1 – p) …(ii)

We know that I(X;Y) = H(Y) – H(Y|X) = H(Y) + p log

_{2}p + (1 – p) log

_{2}(1 – p)

(ii) We know that

[P(Y)] = [P(X)] [P(Y|X)]

when α = 0.5 and p = 0.1, then, we have

[P(Y)] = [0.5 0.5] = [0.5 0.5]

Thus, P(y

_{1}) = P(y

_{2}) = 0.5

Now, using the expression

**equation**

H(Y) = -P(y

_{1}) log

_{2}P(y

_{1}) – P(y

_{j}) log

_{2}P(y

_{2})

or H(Y) = -0.5 log

_{2}0.5 – 0.5 log

_{2}0.5 = 1

p log

_{2}p + (1 – p) log

_{2}(1 – p) = 0.1 log

_{2}0.1 + 0.9 log

_{2}0.9 = -0.469

DO YOU KNOW? |

Telephone channels that are affected by switching transients and dropouts, and microwave radio links that are subjected to fading are examples of channels with memory. |

Thus, I(X;Y) = 1 – 0.469 = 0.531 **Ans.**

(iii) When α = 0.5 and p = 0.5, we have

[P(Y)] = [0.5 0.5] = [0.5 0.5]

H(Y) = 1

p log_{2} p + (1 – p) log_{2} (1 – p) = 0.5 log_{2} 0.5 + 0.5 log_{2} 0.5 = -1

Thus, I(X;Y) = 1 – 1 = 0

**NOTE:** It may be noted that in this case, (p = 0.5) no information is being transmitted at all. An equally acceptable decision could be made by dispensing with the channel entirely and “flipping a coin” at the receiver. When *I*(X;Y) = 0, the channel is said to be useless.