Now we want to calculate the probability of bigram occurrences. Each word token in the document gets to be first in a bigram once, so the number of bigrams is 7070-1=7069. We can then calculate the following bigram probabilities:
We can lay these results out in a table. Note the marginal totals.
holmes | ![]() |
Total | |
sherlock | 0.00099 | 0.00000 | 0.00099 |
![]() |
0.00552 | 0.99349 | 0.99901 |
Total | 0.00651 | 0.99349 | 1.00000 |
If text really was word confetti, we could assume that the probability of the second word is unaffected by the probability of the first word. We can represent this in the table by multiplying the marginal probabilities for each cell.
holmes | ![]() |
Total | |
sherlock | ![]() |
![]() |
0.00099 |
![]() |
![]() |
![]() |
0.99901 |
Total | 0.00651 | 0.99349 | 1.00000 |
To calculate the expected frequencies from probabilities, you multiply everything by 7069:
holmes | ![]() |
Total | |
sherlock | 0.05 | 6.95 | 7 |
![]() |
45.95 | 7016.05 | 7062 |
Total | 46 | 7023 | 7069 |