Example
Before a patient is screened, a relevant question is on the accuracy of the test. Once the test result comes back, an important question is on whether positive result means having the disease and negative result means healthy. Here’s the two questions that are of interest:
Both questions involve conditional probabilities. In fact, the conditional probabilities in the second question are the reverse of the ones in the first question. To illustrate, we use the following example.
Example. Suppose that the prevalence of a disease is 1%. This disease has no particular symptoms but can be screened by a medical test that is 90% accurate. This means that the test result is positive about 90% of the times when it is applied on patients who have the disease and that the test result is negative about 90% of the time when it is applied on patients who do not have the disease. Suppose that you take the test and the test shows a positive result. Then the burning question is: how likely is it that you have the disease? Similarly, how likely is it that the patient is healthy if the test result is negative?
The accuracy of the test is 90% (0.90 as a probability). Since there is a 90% chance the test works correctly, if the patient has the disease, there is a 90% chance that the test will come back positive and if the patient is healthy, there is a 90% chance the test will come back negative. If a patient tested positive, wouldn’t it mean that there is a 90% chance that the patient has the disease?
Note that the given number of 90% is for the conditional events: “if disease, then positive” and “if healthy, then negative.” The example asks for the probabilities for the reversed events – “if positive, then disease”and “if negative, then healthy.” It is a common misconception that the two probabilities are the same.
Tree Diagrams
Let’s use a tree diagrams to look at this problem in a systematic way. First let be the event that the patient being tested is healthy (does not have the disease in question) and let be the event that the patient being tested is sick (has the disease in question). Let denote the event that the test result is positive if the patient has the disease. Let denote the event that the test result is negative if the patient is healthy.
Then and . These two conditional probabilities are based on the accuracy of the test. These probabilities are in a sense chronological – the patient either is healthy or sick and then is being tested. The example asks for the conditional probabilities and , which are backward from the given conditional probabilities, and are also backward in a chronological sense. We call and forward conditional probabilities. We call and backward conditional probabilities. Bayes’ formula is a good way to compute the backward conditional probabilities. The following diagram shows the structure of the tree diagram.
At the root of the tree diagram is a randomly chosen patient being tested. The first level of the tree shows the disease status (H or S). The events at the first level are unconditional events. The next level of the tree shows the test status (+ or -). Note that the test status is a conditional event. For example, the + that follows H is the event and the – that follows H is the event . The next diagram shows the probabilities that are in the tree diagram.
The probabilities at the first level of the tree are the unconditional probabilities and , where is the prevalence of the disease. The probabilities at the second level of the tree are conditional probabilities (the probabilities of the test status conditional on the disease status). A path probability is the product of the probabilities in a given path. For example, the path probability of the first path is , which equals . Thus a path probability is the probability of the event “disease status and test status.” The next diagram displays the numerical probabilities.
Figure 3 shows four paths – “H and +”, “H and -“, “S and +” and “S and -“. With , the sum of the four path probabilities is 1.0. These probabilities show the long run proportions of the patients that fall into these 4 categories. The path that is most likely is the path “H and -“, which happens 89.1% of the time. This makes sense since the disease in question is one that has low prevalence (only 1%). The two paths marked in red are the paths for positive test status. Thus . Thus about 10.8% of the patients being tested will show a positive result. Of these, how many of them actually have the disease?
In this example, the forward conditional probability is . As the tree diagrams have shown, the backward conditional probability is . Of all the positive cases, only 8.33% of them are actually sick. The other 91.67% are false positives. Confusing the forward conditional probability as the backward conditional probability is a common mistake. In fact, sometimes medical doctors got it wrong according to a 1978 article in New England Journal of Medicine. Though we are using tree diagrams to present the solution, the answer of 8.33% is obtained by the Bayes’ formula. We will discuss this in more details below.
According to Figure 3, . Of these patients, how many of them are actually healthy?
Most of the negative results are actual negatives. So there are very few false negatives. Once gain, the forward conditional probability is not to be confused with the backward conditional probability .
Bayes’ Formula
The result seems startling. Thus if a patient tested positive, there is only a slightly more than 8% chance that the patient actually has the disease! It seems that the test is not very accurate and seems to be not reliable. Before commenting on this result, let’s summarize the calculation implicit in the tree diagrams.
Though not mentioned by name, the above tree diagrams use the idea of Bayes’ formula or Bayes’ rule to reverse the forward conditional probabilities to obtain the backward conditional probabilities. This process has been discribed in this previous post.
The above tree diagrams describe a two-stage experiment. Pick a patient at random and the patient is either healthy or sick (the first stage in the experiment). Then the patient is tested and the result is either positive or negative (the second stage). A forward conditional probability is a probability of the status in the second stage given the status in the first stage of the experiment. The backward conditional probability is the probability of the status in the first stage given the status in the second stage. A backward conditional probability is also called a Bayes probability.
Let’s examine the backward conditional probability . The following is the definition of the conditional probability .
Note that two of the paths in Figure 3 have positive test results (marked with red). Thus is the sum of two quantities with . One of the quantities is for the case of the patient being healthy and the other is for the case of the patient being sick. With , and plugging in ,
The above is the Bayes’ formula in the specific context of a medical diagnostic test. Though a famous formula, there is no need to memorize it. If using the tree diagram approach, look for the two paths for the positive test results. The ratio of the path for “sick” patients to the sum of the two paths would be the backward conditional probability .
Regardless of using tree diagrams, the Bayesian idea is that a positive test result is explained by two causes. One is that the patient is healthy. Then the contribution to a positive result is . The other cause of a positive result is that the patient is sick. Then the contribution to a positive result is . The ratio of the “sick” cause to the sum total of the two causes is the backward conditional probability . However, a tree diagram is clearly a very handy device to clarify the Bayesian calculation.
Further Discussion of the Example
The calculation in Figure 3 is based on the prevalence of the disease of 1%, i.e. . The hypothetical disease in the example affects one person in 100. With being relatively small (just 8.33%), we cannot place much confidence on a positive result. One important point to understand is that the confidence on a positive result is determined by the prevalence of the disease in addition to the accuracy of the test. Thus the less common the disease, the less confidence we can place on a positive result. On the other hand, the more common the disease, the more confidence we can place on a positive result.
Let’s try some extreme examples. Suppose that we are to test for a disease that nobody has (think testing for ovarian cancer among men or prostate cancer among women). Then we would have no confidence on a positive test result. In such a scenario, all positives would be healthy people. Any healthy patient that receives a positive result would be called a false positive. Thus in the extreme scenario of a disease with 0% prevalence among the patients being tested, we do not have any confidence on a positive result being correct.
On the other hand, suppose we are to test for a disease that everybody has. Then it would then be clear that a positive result would always be a correct result. In such a scenario, all positives would be sick patients. Any sick patient that receives a positive test result is called a true positive. Thus in the extreme scenario of a disease with 100% prevalence, we would have great confidence on a positive result being correct.
Thus prevalence of a disease has to be taken into account in the calculation for the backward conditional probability . For the hypothetical disease discussed here, let’s look at the long run results of applying the test to 10,000 patients. The next tree diagram shows the results.
Out of 10,000 patients being tested, 100 of them are expected to have the disease in question and 9,900 of them are healthy. With the test being 90% accurate, about 90 of the 100 sick patients would show positive results (these are the true positives). On the other hand, there would be about 990 false positives (10% of the 9,900 healthy patients). There are 990 + 90 = 1,080 positives in total and only 90 of them are true positives. Thus is 90/1080 = 8.33%.
What if the disease in question has a prevalence of 8.33%? What would be the backward conditional probability assuming that the test is still 90% accurate?
With , there is a great deal more confidence on a positive result. With the test accuracy being the same (90%), the greater confidence is due to the greater prevalence of the disease. With being greater than 0.01, a greater portion of the positives would be true positives. The higher the prevalence of the disease, the greater the probability . Just to further illustrate this point, suppose the test for a disease has a 90% accuracy rate and the prevalence for the disease is 45%. The following calculation gives .
With the prevalence being 45%, the probability of a positive being a true positive is 88%. The calculation shows that when the disease or condition is widespread, a positive result should be taken seriously.
One thing is clear. The backward conditional probability is not to be confused with the forward conditional probability . Furthermore, it will not be easy to invert the forward conditional probability without using Bayes’ formula (either using the formula explicitly or using a tree diagram).
Bayesian Updating Based on New Information
The calculation shown above using the Bayes’ formula can be interpreted as updating probabilities in light of new information, in this case, updating risk of having a disease based on test results. With the hypothetical disease having a prevalence of 1% being discussed above, the initial risk is 1%. With one round of testing using a test with 90% accuracy, the risk is updated to 8.33%. For the patients who test positive in the first round of testing, the risk is raised to 8.33%. They can then go through a second round of testing using another test (but also with 90% accuracy). For the patients who test positive in the second round, the risk is updated to 45%. For the positives in the third round of testing, the risk is updated to 88%. The successive Bayesian calculation can be regarded as sequential updating of probabilities. Such updating would not be easy without the idea of the Bayes’ rule or formula.
Sensitivity and Specificity
The sensitivity of a medical diagnostic test is the ability to give correct results for the people who have the disease. Putting it in another way, the sensitivity is the true positive rate, which would be the percentage of sick people who are correctly identified as having the disease. In other words, the sensitivity of a test is the probability of a correct test result for the people with the disease. In our discussion, the sensitivity is the conditional forward probability .
The specificity of a medical diagnostic test is the ability to give correct results for the people who do not have the disease. The specificity is then the true negative rate, which would be the percentage of healthy people who are correctly identified as not having the disease. In other words, the specificity of a test is the probability of a correct test result for healthy people. In our discussion, the specificity is the conditional forward probability .
With the sensitivity being the conditional forward probability , the discussion in this post shows that the sensitivity of a test is not the same as backward conditional probability . The sensitivity may be 90% but the probability can be much lower depending on the prevalence of the disease. The sensitivity only tells us that 90% of the people who have the disease will have a positive result. It does not take into account of the prevalence of the disease (called the base rate). The above calculation shows that the rarer the disease (the lower the base rate), the lower the likelihood that a positive test result is a true positive. Likewise, the more common the disease, the higher the likelihood that a positive test result is a true positive.
In the example discussed here, both the sensitivity and specificity are 90%. This scenario is certainly ideal. In medical testing, the accuracy of a test for a disease may not be the same for the sick people and for the healthy people. For a simple example, let’s say we use chest pain as a criterion to diagnose a heart attack. This would be a very sensitive test since almost all people experiencing heart attack will have chest pain. However, it would be a test with low specificity since there would be plenty of other reasons for the symptom of chest pain.
Thus it is possible that a test may be very accurate for the people who have the disease but nonetheless identify many healthy people as positive. In other words, some tests have high sensitivity but have much lower specificity.
In medical testing, the overriding concern is to use a test with high sensitivity. The reason is that a high true positive rate leads to a low false positive rate. So the goal is to have as few false negative cases as possible in order to correctly diagnose as many sick people as possible. The trade off is that there may be a higher number of false positives, which is considered to be less alarming than missing people who have the disease. The usual practice is that a first test for a disease has high sensitivity but lower specificity. To weed out the false positives, the positives in the first round of testing will use another test that has a higher specificity.
2017 – Dan Ma
]]>First, the format of the book. It is a short paper of 127 pages, plus 40 pages of glossary, appendices, references and index. I eventually found the name of the publisher, Sebtel Press, but for a while thought the book was self-produced. While the LaTeX output is fine and the (Matlab) graphs readable, pictures are not of the best quality and the display editing is minimal in that there are several huge white spaces between pages. Nothing major there, obviously, it simply makes the book look like course notes, but this is in no way detrimental to its potential appeal. (I will not comment on the numerous appearances of Bayes’ alleged portrait in the book.)
“… (on average) the adjusted value θ^{MAP} is more accurate than θ^{MLE}.” (p.82)
Bayes’ Rule has the interesting feature that, in the very first chapter, after spending a rather long time on Bayes’ formula, it introduces Bayes factors (p.15). With the somewhat confusing choice of calling the prior probabilities of hypotheses marginal probabilities. Even though they are indeed marginal given the joint, marginal is usually reserved for the sample, as in marginal likelihood. Before returning to more (binary) applications of Bayes’ formula for the rest of the chapter. The second chapter is about probability theory, which means here introducing the three axioms of probability and discussing geometric interpretations of those axioms and Bayes’ rule. Chapter 3 moves to the case of discrete random variables with more than two values, i.e. contingency tables, on which the range of probability distributions is (re-)defined and produces a new entry to Bayes’ rule. And to the MAP. Given this pattern, it is not surprising that Chapter 4 does the same for continuous parameters. The parameter of a coin flip. This allows for discussion of uniform and reference priors. Including maximum entropy priors à la Jaynes. And bootstrap samples presented as approximating the posterior distribution under the “fairest prior”. And even two pages on standard loss functions. This chapter is followed by a short chapter dedicated to estimating a normal mean, then another short one on exploring the notion of a continuous joint (Gaussian) density.
“To some people the word Bayesian is like a red rag to a bull.” (p.119)
Bayes’ Rule concludes with a chapter entitled Bayesian wars. A rather surprising choice, given the intended audience. Which is rather bound to confuse this audience… The first part is about probabilistic ways of representing information, leading to subjective probability. The discussion goes on for a few pages to justify the use of priors but I find completely unfair the argument that because Bayes’ rule is a mathematical theorem, it “has been proven to be true”. It is indeed a maths theorem, however that does not imply that any inference based on this theorem is correct! (A surprising parallel is Kadane’s Principles of Uncertainty with its anti-objective final chapter.)
All in all, I remain puzzled after reading Bayes’ Rule. Puzzled by the intended audience, as contrary to other books I recently reviewed, the author does not shy away from mathematical notations and concepts, even though he proceeds quite gently through the basics of probability. Therefore, potential readers need some modicum of mathematical background that some students may miss (although it actually corresponds to what my kids would have learned in high school). It could thus constitute a soft entry to Bayesian concepts, before taking a formal course on Bayesian analysis. Hence doing no harm to the perception of the field.
]]>_________________________________________________________________
Problem A
Let be a random variable with the density function where . For each realized value , the conditional variable is uniformly distributed over the interval , denoted symbolically by . Obtain solutions for the following:
_________________________________________________________________
Problem B
Let be a random variable with the density function where . For each realized value , the conditional variable is uniformly distributed over the interval , denoted symbolically by . Obtain solutions for the following:
_________________________________________________________________
Discussion of Problem A
Problem A-1
The support of the joint density function is the unbounded lower triangle in the xy-plane (see the shaded region in green in the figure below).
Figure 1
The unbounded green region consists of vertical lines: for each , ranges from to (the red vertical line in the figure below is one such line).
Figure 2
For each point in each vertical line, we assign a density value which is a positive number. Taken together these density values sum to 1.0 and describe the behavior of the variables and across the green region. If a realized value of is , then the conditional density function of is:
Thus we have . In our problem at hand, the joint density function is:
As indicated above, the support of is the region and (the region shaded green in the above figures).
Problem A-2
The unconditional density function of is (given above in the problem) is the density function of the sum of two independent exponential variables with the common density (see this blog post for the derivation using convolution method). Since is the independent sum of two identical exponential distributions, the mean and variance of is twice that of the same item of the exponential distribution. We have:
Problem A-3
To find the marginal density of , for each applicable , we need to sum out the . According to the following figure, for each , we sum out all values in a horizontal line such that (see the blue horizontal line).
Figure 3
Thus we have:
Thus the marginal distribution of is an exponential distribution. The mean and variance of are:
Problem A-4
The covariance of and is defined as , which is equivalent to:
where and . Knowing the joint density , we can calculate directly. We have:
Note that the last integrand in the last integral in the above derivation is that of a Gamma distribution (hence the integral is 1.0). Now the covariance of and is:
The following is the calculation of the correlation coefficient:
Even without the calculation of , we know that and are positively and quite strongly correlated. The conditional distribution of is which increases with . The calculation of and confirms our observation.
_________________________________________________________________
Answers for Problem B
Problem B-1
Problem B-2
Problem B-3
Problem B-4
_________________________________________________________________
]]>____________________________________________________________
Problem 1a
There are two identical looking bowls. Let’s call them Bowl 1 and Bowl 2. In Bowl 1, there are 1 red ball and 4 white balls. In Bowl 2, there are 4 red balls and 1 white ball. One bowl is selected at random and its identify is kept from you. From the chosen bowl, you randomly select 5 balls (one at a time, putting it back before picking another one). What is the expected number of red balls in the 5 selected balls? What the variance of the number of red balls?
Problem 1b
Use the same information in Problem 1a. Suppose there are 3 red balls in the 5 selected balls. What is the probability that the unknown chosen bowl is Bowl 1? What is the probability that the unknown chosen bowl is Bowl 2?
____________________________________________________________
Problem 2a
There are three identical looking bowls. Let’s call them Bowl 1, Bowl 2 and Bowl 3. Bowl 1 has 1 red ball and 9 white balls. Bowl 2 has 4 red balls and 6 white balls. Bowl 3 has 6 red balls and 4 white balls. A bowl is chosen according to the following probabilities:
The bowl is chosen so that its identity is kept from you. From the chosen bowl, 5 balls are selected sequentially with replacement. What is the expected number of red balls in the 5 selected balls? What is the variance of the number of red balls?
Problem 2b
Use the same information in Problem 2a. Given that there are 4 red balls in the 5 selected balls, what is the probability that the chosen bowl is Bowl i, where ?
____________________________________________________________
Solution – Problem 1a
Problem 1a is a mixture of two binomial distributions and is similar to Problem 1 in the previous post Mixing Binomial Distributions. Let be the number of red balls in the 5 balls chosen from the unknown bowl. The following is the probability function:
where .
The above probability function is the weighted average of two conditional binomial distributions (with equal weights). Thus the mean (first moment) and the second moment of would be the weighted averages of the two same items of the conditional distributions. We have:
See Mixing Binomial Distributions for a more detailed explanation of the calculation.
____________________________________________________________
Solution – Problem 1b
As above, let be the number of red balls in the 5 selected balls. The probability must account for the two bowls. Thus it is obtained by mixing two binomial probabilities:
The following is the conditional probability :
Thus
____________________________________________________________
Answers for Problem 2
Problem 2a
Let be the number of red balls in the 5 balls chosen random from the unknown bowl.
Problem 2b
I also regret not mentioning that Bayes’ formula was taught in French high schools, as illustrated by the anecdote of Bayes at the bac. And not reacting at the question about Bayes in the courtroom with yet another anecdote of Bayes’ formula been thrown out of the accepted tools by an English court of appeal about a year ago. Oh well, another argument for sticking to the written world.
]]>Example 1
As indicated in the diagram, Box 1 has 1 red ball and three white balls and Box 2 has 2 red balls and 2 white balls. The example involves a sequence of two steps. In the first step (the green arrow in the above diagram), a box is randomly chosen from two boxes. In the second step (the blue arrow), a ball is randomly selected from the chosen box. We assume that the identity of the chosen box is unknown to the participants of this random experiment (e.g. suppose the two boxes are identical in appearance and a box is chosen by your friend and its identity is kept from you). Since a box is chosen at random, it is easy to see that .
The example involves conditional probabilities. Some of the conditional probabilities are natural and are easy to see. For example, if the chosen box is Box 1, it is clear that the probability of selecting a red ball is , i.e. . Likewise, the conditional probability is . These two conditional probabilities are “forward” conditional probabilities since the events and occur in a natural chronological order.
What about the reversed conditional probabilities and ? In other words, if the selected ball from the unknown box (unknown to you) is red, what is the probability that the ball is from Box 1?
The above question seems a little backward. After the box is randomly chosen, it is fixed (though the identity is unknown to you). Since it is fixed, shouldn’t the probability that the box being Box 1 is ? Since the box is already chosen, how can the identity of the box be influenced by the color of the ball selected from it? The answer is of course no.
We should not look at the chronological sequence of events. Instead, the key to understanding the example is through performing the random experiment repeatedly. Think of the experiment of choosing one box and then selecting one ball from the chosen box. Focus only on the trials that result in a red ball. For the result to be a red ball, we need to get either Box 1/ Red or Box 2/Red. Compute the probabilities of these two cases. Then add these two probabilities, we will obtain the probability that the selected ball is red. The following diagram illustrates this calculation.
Example 1 – Tree Diagram
The outcomes with red border in the above diagram are the outcomes that result in a red ball. The diagram shows that if we perform this experiment many times, about 37.5% of the trials will result in a red ball (on average 3 out of 8 trials will result in a red ball). In how many of these trials, is Box 1 the source of the red ball? In the diagram, we see that the case Box 2/Red is twice as likely as the case Box 1/Red. We conclude that the case Box 1/Red accounts for about one third of the cases when the selected ball is red. In other words, one third of the red balls come from Box 1 and two third of the red balls come from Box 2. We have:
Instead of using the tree diagram or the reasoning indicated in the paragraph after the tree diagram, we could just as easily apply the Bayes’ formula:
In the calculation in (as in the tree diagram), we use the law of total probability:
______________________________________________________________
Remark
We are not saying that an earlier event (the choosing of the box) is altered in some way by a subsequent event (the observing of a red ball). The above probabilities are subjective. How strongly do you believe that the “unknown” box is Box 1? If you use probabilities to quantify your belief, without knowing any additional information, you would say the probability that the “unknown” box being Box 1 is .
Suppose you reach into the “unknown” box and get a red ball. This additional information alters your belief about the chosen box. Since Box 2 has more red balls, the fact that you observe a red ball will tell you that it is more likely that the “unknown” chosen box is Box 2. According to the above calculation, you update the probability of the chosen box being Box 1 to and the probability of it being Box 2 as .
In the language of Bayesian probability theory, the initial belief of and is called the prior probability distribution. After a red ball is observed, the updated belief as in the probabilities and is called the posterior probability distribution.
As demonstrated by this example, the Bayes’ formula is for updating probabilities in light of new information. Though the updated probabilities are subjective, they are not arbitrary. We can make sense of these probabilities by assessing the long run results of the experiment objectively.
______________________________________________________________
An Insurance Perspective
The example discussed here has an insurance interpretation. Suppose an insurer has two groups of policyholders, both equal in size. One group consists of low risk insureds where the probability of experiencing a claim in a year is (i.e. the proportion of red balls in Box 1). The insureds in other group, a high risk group, have a higher probability of experiencing a claim in a year, which is (i.e. the proportion of red balls in Box 2).
Suppose someone just purchase a policy. Initially, the risk profile of this newly insured is uncertain. So the initial belief is that it is equally likely for him to be in the low risk group as in the high risk group.
Suppose that during the first policy year, the insured has incurred one claim. The observation alters our belief about this insured. With the additional information of having one claim, the probability that the insured belong to the high risk group is increased to . The risk profile of this insured is altered based on new information. The insurance point of view described here has the exact same calculation as in the box-ball example and is that of using past claims experience to update future claims experience.
______________________________________________________________
Bayes’ Formula
Suppose we have a collection of mutually exclusive events . That is, the probabilities sum to 1.0. Suppose is an event. Think of the events as “causes” that can explain the event , an observed result. Given is observed, what is the probability that the cause of is ? In other words, we are interested in finding the conditional probability .
Before we have the observed result , the probabilities are the prior probabilities of the causes. We also know the probability of observing given a particular cause (i.e. we know ). The probabilities are “forward” conditional probabilities.
Given that we observe , we are interested in knowing the “backward” probabilities . These probabilities are called the posterior probabilities of the causes. Mathematically, the Bayes’ formula is simply an alternative way of writing the following conditional probability.
In , as in the discussion of the random experiment of choosing box and selecting ball, we are restricting ourselves to only the cases where the event is observed. Then we ask, out of all the cases where is observed, how many of these cases are caused by the event ?
The numerator of can be written as
The denominator of is obtained from applying the total law of probability.
Plugging and into , we obtain a statement of the Bayes’ formula.
Of course, for any computation problem involving the Bayes’ formula, it is best not to memorize the formula in . Instead, simply apply the thought process that gives rise to the formula (e.g. the tree diagram shown above).
The Bayes’ formula has some profound philosophical implications, evidenced by the fact that it spawned a separate school of thought called Bayesian statistics. However, our discussion here is solely on its original role in finding certain backward conditional probabilities.
______________________________________________________________
Example 2
Example 2 is left as exercise. The event that both selected balls are red would give even more weight to Box 2. In other words, in the event that a red ball is selected twice in a row, we would believe that it is even more likely that the unknown box is Box 2.
______________________________________________________________
Reference
It occurred to me that Bayesian inference can thought of as filtering: the objects of interest are the model parameters but, instead of being measured directly, their measurement is implicit in the data.
Consider standard linear regression:
where is an vector of observations, is an matrix, is a parameter vector and is an noise vector. Typically, we take normally distributed noise, , and here we’ll assume the covariance matrix is known. Thus our probabilistic model is
In Bayesian inference, what we are after is This connects to filtering if you think of the pair as an implicit measurement of given the model. Bayes’ formula tells us
where is our prior for the parameters given . Typically, however, our prior beliefs about will be independent of i.e.
For simplicity, we’ll assume a normal prior: , and, in a later post, we’ll compute the posterior for , which is a nice little mathematical problem in its own right! Till then, I’ll only point out that the posterior is also a normal:
Our job is to compute and
]]>_____________________________________________________________
Discussion of Problem 1
Problem 2 is found at the end of the post.
Problem 1.1
This is an example of a joint distribution that is constructed from taking product of conditional distributions and a marginial distribution. The marginal distribution of is a uniform distribution on the set (rolling a fiar die). Conditional of , has a binomial distribution . Think of the conditional variable of as tossing a coin times where the probability of a head is . The following is the sample space of the joint distribution of and .
Figure 1
The joint probability function of and may be written as:
Thus the probability at each point in Figure 1 is the product of , which is , with the conditional probability , which is binomial. For example, the following diagram and equation demonstrate the calculation of
Figure 2
Problem 1.2
The following shows the calculation of the binomial distributions.
Problem 1.3
To find the marginal probability , we need to sum over all . For example, is the sum of for all . See the following diagram
Figure 3
As indicated in , each is the product of a conditional probability and . Thus the probability indicated in Figure 3 can be translated as:
We now begin the calculation.
The following is the calculation of the mean and variance of .
Problem 1.4
The conditional probability is easy to compute since it is a given that is a binomial variable conditional on a value of . Now we want to find the backward probability . Given the binomial observation is , what is the probability that the roll of the die is ? This is an application of the Bayes’ theorem. We can start by looking at Figure 3 once more.
Consider . In calculating this conditional probability, we only consider the 5 sample points encircled in Figure 3 and disregard all the other points. These 5 points become a new sample space if you will (this is the essence of conditional probability and conditional distribution). The sum of the joint probability for these 5 points is , calculated in the previous step. The conditional probability is simply the probability of one of these 5 points as a fraction of the total probability . Thus we have:
We do not have to evaluate the components that go into . As a practical matter, to find is to take each of 5 probabilities shown in and evaluate it as a fraction of the total probability . Thus we have:
Calculation of
Here’s the rest of the Bayes’ calculation:
Calculation of
Calculation of
Calculation of done earlier
Calculation of
Calculation of
Calculation of
Calculation of
_____________________________________________________________
Probem 2
Let be the value of one roll of a fair die. If the value of the die is , we are given that has a binomial distribution with and (we use the notation ).
_____________________________________________________________
Answers to Probem 2
Problem 2.3
Problem 2.4
]]>