### Guest Blog

Commentary invited by editors of Scientific American

# A Fun DIY Science Goodie: Proof Yourself against Sensationalized Stats

For my book Brain Trust, I interviewed Keith Devlin, NPR’s “Math Guy,” a World Economic Forum fellow, and math professor at Stanford. And being a mathematician, Devlin thinks about things differently than the world at large. For example, in his very good monthly column Devlin’s Angle, he quotes the following problem, originally designed by puzzle master Gary Foshee: “I tell you that I have two children, and that (at least) one of them is a boy born on Tuesday. What probability should you assign to the event that I have two boys?”

Does this sound like a bunch of confounding mumbo jumbo meant to obscure the obvious fact that the other kid has exactly 50/50 chance of being a boy and so if one kid’s definitely a boy, the probability of them both being boys is one in two? Yes, yes it does.

But that’s not the case.

Without the “Tuesday” part, this is a famous problem first published in Scientific American by the venerable mathematician and puzzler Martin Gardner. Imagine the possible genders and birth orders of two kids: B-B, B-G, G-B, G-G. Now, in Gardner’s problem you know that at least one child is a boy, so you can nix only G-G as a possibility, leaving B-B, B-G, and G-B. In only one of these remaining three possibilities are both children boys, so instead of the knee-jerk one in two probability any sane person would expect, mathematicians like Devlin give only a one in three probability that, given one child is a boy, both kids are boys.

Yikes.

But the Tuesday bit can’t possibly matter, can it?

“It depends if you ask a mathematician or a statistician,” says Devlin. The mathematician would simply extend the possibilities that were available in the original puzzle and then nix the possibilities that could be nixed. If we didn’t know that one of the kids was born on a Tuesday, our possibilities would be all the possible crosses of: B-Mo, B-Tu, B-We, B-Th, B-Fr, B-Sa, B-Su, with G-Mo, G-Tu, G-We, G-Th, G-Fr, G-Sa, G-Su.

Cool so far?

Now, you know that either the first or the second child is a boy born on Tuesday, and here’s how Devlin lays out the revised possibilities:

• First child B-Tu, second child: B-Mo, B-Tu, B-We, B-Th, B-Fr, B-Sa, B-Su, G-Mo, G-Tu, G-We, G-Th, G-Fr, G-Sa, G-Su.
• Second child B-Tu, first child: B-Mo, B-We, B-Th, B-Fr, B-Sa, B-Su, G-Mo, G-Tu, G-We, G-Th, G-Fr, G-Sa, G-Su.

Since “both boys born on Tuesday” is already listed in the first set, we don’t need to list it again in the second, making 27 (instead of 28) possible combinations of gender and day of the week for two kids, if at least one is a boy born on Tuesday. And of these 27 possibilities, 13 of them include a second boy. So the answer is (instead of a one in two or one in three chance) a 13/27 chance that both will be boys.

D’you hear that crackling sound? That’s the sound of your neurons trying to deal with the previous five hundred words. Don’t say you weren’t warned. But stick with it. It’s worth it. You can do it.

Now, on to statisticians, who take another view entirely. To them it matters what else could have been said and the interpretations that can pop up when math is released into the real world. “For example,” says Devlin, “we’re taught that multiplication is commutative, that 3 × 4 is the same as 4 × 3; but in the real world three bags of four apples isn’t the same as four bags of three apples.” Similarly, he points out in his blog that if you’re told that a quarter pound of ham costs \$2 and then asked what three pounds will cost, a mathematician would tell you \$24, but a statistician who’s been to a supermarket knows there’s not enough information to answer the question — of course, every supermarket discounts for bulk.

In the case of the Tuesday boy problem, imagine you’re from a culture that requires you to speak about an elder child first, before mentioning the younger. That means it’s the eldest child who’s the boy, and you rule out both G-G (as before) but also G-B, leaving the possibilities BB and BG, and a one in two probability of both being boys.

So there are two broad interpretations of almost all real-world numbers problems — the stripped-down, mathematicians’ approach and the interpretive statisticians’ approach. And it’s in this wiggle room of interpretation where pure math hits the real world that misleading statistics are born. For example, in 1993 the columnist George Will was mathematically correct when he wrote in the Washington Post that “the ten states with the lowest per-pupil spending included four – North Dakota, South Dakota, Tennessee, Utah – among the ten states with the top SAT scores. Only one of the ten states with the highest per-pupil expenditures – Wisconsin – was among the ten states with the highest SAT scores. New Jersey has the highest per-pupil expenditures, an astonishing \$10,561…New Jersey’s rank regarding SAT scores? Thirty-ninth.”

Take a minute and see if you can spot the moment at which pure math became a misleading statistic.

I found this quote in a 1999 article in the Journal of Statistics Education that points out one important fact: in New Jersey all college-bound students take the SAT, whereas in North Dakota, South Dakota, Tennessee, and Utah, only the kids applying to out-of-state schools take the SAT. And you can bet these students applying out of state are the cream of the crop. This is selection bias, and it pops up everywhere. Yes, it seems odd that nine out of ten dentists recommend Crass toothpaste, and nine out of ten also recommend Goldgate, but it’s as easy as finding the right ten dentists to ask.

Or take the following headline (from WorldHealth.net), which demonstrates a trick central to a pop science writer’s existence: sincere smiling promotes longevity. Sure enough, the data in the original study show that people who flash sincere smiles in photographs live longer—the original study title is smile intensity in photographs predicts longevity.

Again, take a minute and see if you can spot the difference.

The trick is that the study demonstrates correlation, while the article implies causation. Does a Duchenne smile “predict” longevity? Yes. Does it “promote” longevity? Not necessarily. Mightn’t it be more likely that these smilers are happy and that something in happiness and not the smile itself actually promotes longevity? Similarly, it’s mathematically correct that gun owners have 2.7 times the chance of being murdered compared to non–gun owners. Does owning a gun cause the owner to be murdered, or might it be something in the character of people likely to own guns?

For another, take the 2010 claim by health reform director Nancy-Ann DeParle that due to the then recently passed health care bill, the average annual cost of insurance coverage would drop by one thousand dollars by 2019. Taken at face value, it’s true. But the reason it’s true is that nearly free health care would be extended to 32 million Americans who were currently without care, meaning that the cost to people who were already insured in 2010 would actually go up to cover the newly added.

This is an apples-to-oranges comparison, like decrying the increase in the average cost of a gallon of gas from \$0.99/gal in February 1992 to \$3.81/gal in March 2012 without adjusting for inflation. You can’t compare the two, because the rules of comparison have changed. On the flip side of the political spectrum, conservative UK politician Chris Grayling cited a 35 percent increase in “violent” crime starting in 2002 as evidence of failed liberal law enforcement policies. But 2002 was the year civilians and not police were given the right to designate a crime “violent,” and many chose to see violence where the police might not have. The “35 percent increase” was the difference between apples and oranges.

Finally, take data showing that the TSA misses 5 percent of people hired to test air security by trying to smuggle dangerous contraband. Yikes! One in twenty people sitting around you on the plane is packing an underwear bomb!

What’s the error?

It’s in sampling. Though some days it feels this way, not everyone is out to get you. In fact, imagine that even one of the two million passengers who try to fly over the United States every day is a deadly terrorist, and imagine the TSA misses 5 percent of them. This means that one in 40 million people flying is a deadly terrorist. Even on a Boeing 767 with a 300-person seating capacity, you’d have to fly more than 130,000 times to sit on a plane with a terrorist. (OK, that’s misleading, too: statisticians would point out that a 1 in 130,000 chance means you could be on a plane with a terrorist at any point, it’s just not ever very likely.) Compare that to a 1 in 100 lifetime chance of dying in a car crash. Actually, please do, because that’s a mathematically correct, misleading statistic, too – what if you don’t drive, or drive cautiously, or are already over age twenty-five?

So the moral of this long and somewhat convoluted tale is that first there’s math, then there’s stats, and finally there are headlines. And like a game of telephone, it’s easy to lose meaning along the way due to things like selection bias, correlation/causation, apples/oranges, and population error.

Mark Twain said there are lies, damned lies, and statistics. Illuminating this, renowned business professor Aaron Levenstein said that statistics are like bikinis – what they reveal is suggestive but what they conceal is vital. But not for you. You now know how to reveal what is vital.

About the Author: Garth Sundem is a TED speaker, Wired GeekDad, Wipeout loser and author of books including Brain Trust: 93 Top Scientists Reveal Lab-Tested Secrets for Surfing, Dating, Dieting, Gambling, Growing Man-Eating Plants and More. He lives with his wife, two kids and two Labradors in Boulder, CO, where he just finished interviewing over 130 Nobel, MacArthur and National Medal of Science winners while sitting in the backyard garden shed. Follow on Twitter @garthsundem.

The views expressed are those of the author and are not necessarily those of Scientific American.

 Previous: Drilling for Oil in Eden: Initiative to Save Amazon Rainforest in Ecuador Is Uncertain MoreGuest Blog Next: How to: Track Down Journal Articles Cited in News Stories (When They Don’t Link Directly)

1. 1. PatGehrke 4:40 pm 03/18/2012

There’s yet another fly in the ointment missed here: the true empiricist would refuse to delete factually existing outliers because of either presumptive measurement error or the desire to excise anomalies from the data set. For example, where are the odds of intersexed persons in the sibling problem you present? The statistician’s response is empirically and theoretically flawed if it begins with the belief that all children are either B or G, when factually some are both or neither or not classifiable in these categories. Similar problems emerge with low probability events in almost every “real world” application of math or stats. We just tend to “normalize” the data that misrepresents the world.

2. 2. ecstatist 4:37 am 03/19/2012

“Compare that to a 1 in 100 lifetime chance of dying in a car crash. Actually, please do, because that’s a mathematically correct, misleading statistic, too – what if you don’t drive, or drive cautiously, or are already over age twenty-five?”

“.. what if you don’t drive?” then consider that many car crash deaths are of pedestrians! So if you stop driving but then walk more, you may increase your risk!

“.. or drive cautiously?” and consider that if, for instance, 2/3 of road deaths are the result of 2 car collisions (where both cars/drivers “absorb” the same energy and thus there is an equal chance of the drivers dying) and only one car driver is at fault (which is usually the case) then even if you drive absolutely perfectly (disregarding for the moment, “defensive driving techniques,”) you still have at least 1/3 the fatality risk of the average driver.(Remember that in many 2 car collisions both drivers die, that explains the “at least” qualifier in the previous sentence.)
Also if you “drive too defensively” you can cause unnecessary accidents/deaths, for example when everyone is driving too fast for the conditions, and you drive at a suitable speed, you may cause an unnecessarily greater amount of “fast cars” to overtake you which can be riskier to you than driving at the “too fast” speed of the others.
I would be a much fairer and kinder god than any of the “existing” versions are (despite what the hype about them claim.)

3. 3. TTLG 3:30 pm 03/19/2012

4. 4. garthsundem 4:46 pm 03/19/2012

Ha! Yes, certainly a post on sensationalized headlines is needed — would experts be in the field of psychology, poli-sci, journalism or…other?

5. 5. Infinoe 6:21 pm 03/19/2012

A recent unclear example:
“The average age of a person living today is 29.5 years, therefore half of all people are below that age”
[confusing the mean with the median??]

No doubt, the story of two siblings contains many other catches, for instance:
(1) They may be twins (born on the same day).
(2) More boys are born, but usually more girls/women survive, the age/sex profile varies with country etc.
(3) This remark may be idealistic, but funny. Imagine the “week” having not seven days but seven thousand days and the mother speaking. Is it equally probable for the other child to have been born on any day of this “week”? Of course, such an effect for the usual 7-day week is practically negligible, but…

6. 6. Travza 9:59 pm 03/19/2012

@PatGehrke

“the true empiricist would refuse to delete factually existing outliers because of either presumptive measurement error or the desire to excise anomalies from the data set.”

Just remember, there is no true Scotsman.

Otherwise, good post, and I agreed with the bulk of your message.

7. 7. JDahiya 2:39 am 03/22/2012

Oh, thought-provoking and charming article!

8. 8. Jezzadj 2:03 am 03/23/2012

The solution presented for the ‘famous’ problem by Gardner is wrong.

The answer is still 50%. Given the gender of each child is an independent random event, they don’t affect each other. Period.

More interesting, is where the presented solution goes wrong – if you know one of two children is a boy they are correct to say that the options for the genders of the two children are BB GB and BG (GG excluded b/c at least one is a boy)

This kind of approach is called ‘ordered pairs’ – i.e. order does matter.

However, the chances are not 1/3 each as they suggest. Because with ordered pairs, there are 2 ways to get BB. It could be BB or BB (where B is the boy we already know about). It’s just like there are 2 ways to get a boy and a girl (GB or BG).

So even after ruling out GG, we still have 4 options, BB BB BG or GB and two of the four options are to have 2 boys.

And the probability of the other child being a boy is still 50%.

In the Boy born on Tuesday variation, they repeat this mistake where they write “Since “both boys born on Tuesday” is already listed in the first set, we don’t need to list it again in the second, making 27 (instead of 28)”

The correct chance is 14/28 or 50%. It appears that in trying to devise a question too complicated they’ve confounded themselves.

Kind regards Jeremy Aldred

9. 9. Jezzadj 2:06 am 03/23/2012

Formatting got lost – should read ‘ the two ways to get BB is Bb and bB where B is the boy we know about.

10. 10. Witold 3:25 pm 03/23/2012

@Jezzadj “The answer is still 50%. Given the gender of each child is an independent random event, they don’t affect each other.”

I am sure the readers appreciate seeming paradoxes like yours, but 1/3 is correct for the simplest version. Your mistake lies in the word EACH, because we do not know WHICH one is a boy. [The more "mathematical" readers can now skip right to the end.]

******

You say “independent random variable” but the parent, while saying “at least one is a boy”, was not speaking about any specific random variable, but rather about two random variables simultaneously. In this version it is assumed that the parent was selected by chance from among those parents which can be described by one of these codes: GB, BG, BB (and each of these three groups contains the same number of 500 million parents, maybe). A parent knows BOTH children simultaneously.

However, imagine meeting a random woman:
“How many children have you?” “Two.”
“Is the younger one a boy?” “Yes.”
Then the answer would be 50%.

Or, imagine a woman having two children who has lost her memory and her son is visiting her. “There is a 50% chance that my other child is a daughter”, she thinks and, in principle, she is correct.

But now, suppose you go jogging on a 50-50 random daily basis and consider a situation when you only know you jogged on Wednesday or/and Thursday. Then you have 3 equally probable possibilities: JJ, JN, NJ.

******

In fact, if a parent is saying “at least one is a boy” perhaps he/she is referring to a boyish behaviour of the daughter, or some degree of disappointment with the other son…?

By the way, math and stats must be really difficult subjects, seeing that so many physicists had suspected a statistical error in the CERN-OPERA faster-than-light experiment, while the method was described so simply and clearly in the ArXiv paper! Just as I had expected, the second (non-statistical) experiment has confirmed the superluminal result. While it is true that another team has now claimed a “refutation”, the whole question is still far from being explained…

11. 11. Jezzadj 10:23 pm 03/23/2012

@Witold: I am more than capable of pointing out the flaw in the above solution. And I’m sure the readers are more than capable of seeing it too.

@anyone who still isn’t certain:
The key word is ‘independent’. knowing the gender of one child has absolutely no bearing on the gender of the other child. There is absolutely no effect. None whatsoever.

I’m sure the seemingly plausible explanation for the above illogical conclusion might tease many great minds, until they see the flawed logic behind it.

Let us suppose for a moment it is true – the above solution suggests in the simple version the chance of the second child being a boy is 1/3.

Yet, in the second version, introducing another piece of [irrelevant] information – that this boy was born on Tuesday, changes this probability to 13/27.

What is going on? – these probabilities are different!

Would the probability be changed every time I provide more information – like one of the two children is a boy born on Tuesday, under the sign Taurus, whose middle name is Bartholemew etc etc?

Short answer – no. That would be inconsistent.

Assuming the chance of a child being a boy is 50%, then knowing that another child is a boy does nothing whatsoever to change that probability. They are entirely independent.

Using an ordered pairs approach to this problem (when the problem doesn’t mention order) is adding an unnecessary level of complexity.

But it still gives the correct answer if applied correctly – as I showed above.

When the solution says BG and GB are different events – this is saying that order matters.

But in taking this approach, we must also recognise the order matters for BB.

As I said above, the two ways to get BB, is Bb and bB where B is the boy we already know about. Order still matters – even though in reality the two boys birth order is what it is (we don’t need to arbitrarily assign ‘B1′ as the first born and ‘B2′ the second).

The order that matters is whether the boy we already know about is the first born or the second born. These are the two options.

As stated above,the probability of the other child being a boy (the one whose gender we don’t yet know) is completely independent.

This still applies if the actual probability of a child being a boy is ~49% – or any number for that matter.

It also applies regardless of the number of irrelevant variables you would like to introduce. And whether the example used involves jogging or tossing a coin.

I think the classic example of independent random events from secondary school/1st-year stats is tossing a coin. An unbiased coin tossed a million times, and getting ‘heads’ a million times in a row, still has a 50% chance of getting ‘tails’. Sound familiar?

Respect to the book/article’s author – this was a really fun puzzle – and I would enjoy reading your book.

J

12. 12. Witold 5:01 am 03/24/2012

@Jezzadj: Simply put, as you say, you have two independent events: A=”the elder child is a boy”, B=”the younger child is a boy”, and you know that A or B occurred, so what you know is: A&B or A&notB or notA&B (but you exclude notA&notB). If someone tosses a coin twice, you ask him ‘Did you get at least one tail?’, he says ‘Yes’, then what will you make of it? If you know that he rolled a dice at the same time, and he got ‘tail&six at least once’ – then what?

Here is another problem. Someone knows some message, and he must have heard it from at least one of 15 independent people, each one telling it to him with probability 90%. What are the odds that all of these 15 people have told him this message? All that is kindergarten stuff, assuming the right approach – good luck!

13. 13. jeffjo 10:31 am 03/24/2012

The reason conditional probability puzzles, like the Tuesday Boy, raise such controversy, is because people try to naively solve them by separating the set of possible outcomes into two groups based on the information you have: “possible,” and “impossible.” The answer then is simply the ratio of the probabilities of the “possible” events of interest, to all “possible” events. If done carefully, this can lead to a correct solution. But it is seldom done correctly, because the information you have represents only a necessary condition to define the events of interest, and the laws of probability require both a necessary and sufficient condition.

The proper solution is to assign probabilities to each event, representing the chances each would result in the observed information. An “impossible” event gets a zero probability this way. It isn’t exactly ignored, as Devlin’s solution suggests, but the effect is the same. But by treating all “possible” events the same, he is assigning a probability of 1 to observing the information in each of them. This is essentially creating a requirement that that information, and no similar information, will always be learned. It is not the *fact* of a Tuesday birth that produces this odd result, it is treating it as a requirement for selection.

Both the original Two Child Problem and Gary Foshee’s version are ambiguously stated, and both Martin Gardner and Keith Devlin issued retractions to their original solutions admitting this. When you know of only one gender in a two-child family, the probability that it is “boy” is 0% for a GG family, 100% for a BB family, but ambiguous for BG or GB families. If it is required that you can only know about boys, the missing probability is 100% and the answer is (1)(1/4)/[(1)(1/4)+(1)(1/4)+(1)(1/4)+(0)(1/4)]=1/3. But if – as seems more intuitive from the problem statement – it is random, the answer is [(1)(1/4)]/[(1)(1/4)+(1/2)(1/4)+(1/2)(1/4)+(0)(1/4)]=1/2.

Similarly, if Gary Foshee was required to tell us about a Tuesday Boy, the answer changes from 1/3 to [(1)(1/196)+(1)(12/196)]/[(1)(1/196)+(1)(12/196)+(1)(14/196)]=13/27. But if he chose his fact at random, from what is either one or (more likely) two that apply, it remains [(1)(1/196)+(1/2)(12/196)]/[(1)(1/196)+(1/2)(12/196)+(1/2)(14/196)]=1/2.

14. 14. Witold 7:46 pm 03/24/2012

Why, in the end, do we obtain all these different answers as probabilities (chances) that both children will be boys?

Question1. Why must we obtain a *greater* chance (1/2 instead of 1/3) when we know WHICH child is certainly a boy (e.g. the elder one)?
- Because we have fewer (i.e., less, in terms of probability) general possibilities: only BB or BG, without GB, while the special possibility BB remains the same.
Thus obviously, even if the proportion of sons to daughters in the world was quite different from 1:1, one answer would still be greater than the other (although no longer 1/2 and 1/3).

Question2. Why must we obtain a chance between 1/3 and 1/2 when we have any additional but incomplete information about one son?
- Because this additional information helps to identify one son by specifying WHICH child he may be. For example, a child ‘born on Tuesday’ is unique unless the other child is also ‘born on Tuesday’.

The possibilities of ‘uniqueness’ give a partial chance of 1/2, while the possibilities of ‘coincidence’ give a partial chance of 1/3, so these two values get mixed in some proportion.

It should now be obvious why the Tuesday problem has answer so close to 1/2, and why it is 13/27. The ‘uniqueness’ Tue-nonTue cases give 12 possibilities (each of them being BB or BG), the ‘uniqueness’ nonTue-Tue cases also give 12 possibilities, and the ‘coincidence’ Tue-Tue case gives 3 possibilities (BB,BG,GB). Since 12+12+3=27, the mixture is as follows:

(1/2)*(24/27)+(1/3)*(3/27)=(12+1)/27=13/27.

I personally hope that the above observations not only help simplify the calculations, but might give us generally some more insight before passing to more ‘serious’ problems.

15. 15. jeffjo 4:46 pm 03/25/2012

“Why … do we obtain all these different answers…?”

Because probability is not like other sciences. Whether or not they realize it, the people who get different answers here are not disagreeing as much on how to formulate a solution, as on what the question is. For brevity, I won’t discuss it futher unless you ask.

“Why must we obtain a *greater* chance (1/2 instead of 1/3) when we know WHICH child is certainly a boy (e.g. the elder one)?”

If we select the family based on the criteria “one is a boy,” the probability is 1/3. If we select a family without requirements, and merely *observe* that one is a boy, the probability is 1/2. Disregarding what others say the probability is, which do you think is the more realistic selection process?

“Why must we obtain a chance between 1/3 and 1/2 when we have any additional but incomplete information about one son?”

It depends on whether you view the information as a requirement before selection, or an observation after selection. If it is a requirement, a two-boy family is (roughly) twice as likely to satisfy any such additional requirement as is a one-boy family. But it is very unintuitive to think of it as a requirement. If it is not a requirement, the answer is 1/2 regardless.

16. 16. Witold 6:48 pm 03/26/2012

@jeffjo: You perhaps partly misunderstood or did not read all my post very carefully. My answers (deliberately avoiding professional terms) are already contained in my post, intended as hints and food for thought for any interested reader.

Yes, you are welcome too, of course, but what I consider is precisely various strict formulations, and the formulation giving 1/2 as an answer does *not* correspond well with the article version of the first puzzle. It is also understood that Question2 refers to a condition before selection (such as for the conditional probability). This is consistent with the puzzle, but as in my earlier posts I am making the `selection prosess’ even more realistic and well-posed by requiring that it is *us*, the experimenters, who ask a random person a question such as: “Is it true that you have exactly 2 children but not 2 daughters?”

What we should remark regarding my Question2/Theorem2 is something else: that the additional information is assumed to arise from a random variable which should be independent of the sibling and its sex. For example “being born on Wednesday” is almost good (when neglecting twins etc.), “being on the same soccer team with his sibling” is not good, and “being born in a year whose number ends with 2” is not quite good. (Needless to say, we are not payed or directly credited for writing this, so I couldn’t be fully precise.) Agreed?

You wrote: “…issued retractions to their original solutions admitting this. When you know of only one gender in a two-child family, the probability that it is “boy” is 0% for a GG family, 100% for a BB family, but ambiguous for BG or GB families”
You are not referring to the SciAm version, are you? Regardless of any issued or nonissued retractions to anything, the present SciAm statement: “(at least) one of them is a boy” is *true* for any {B,G} family.

The general reason some public statements, pieces of propaganda, or even puzzles, raise such controversy is often that they are *intended* to raise controversy, sometimes to provoke thought, but sometimes to manipulate people, pick their brains (when not all solutions are known to the authors), cause a stir, make publicity, or make fun of people, or brainwash them… Because, after all, a puzzle or a public statement is seldom proposed just at random – don’you think? Oh, I almost forgot – wouldn’t we do a good job by giving an answer accounting for twins (1 in 80 births)?

17. 17. Jezzadj 6:57 pm 03/26/2012

welcome to th econversation jeffjo – and sorry for the delay in responding.

It’s reassuring to hear there has been controversy around this ‘famous’ question, as figuring out why we get different answers is a great way to learn.

I was very excited to think that there might be a good explanation why both answers could be correct – that I might learn from this problem – but having looked at your explanation, I’m not convinced there is any ambiguity, nor any other correct answers.

I suspect ambiguity was invoked as a way to rationalise inconsistent answers when we don’t see, or won’t admit to, having made a mistake. Let me explain with your notation, and then point out your error.

The simple answer (independent random events) is one half – as the genders are independent. The chance of the boy we know about being a boy is 1×1. The chance the other child is a boy is 1×1/2 = 1/2. Because we know there are 2 children, I’ve pre-multiplied both by 1, simply to follow your notation). The chance of both being a boy is the product of these two probabilities: 1 x 1/2 = 1/2. Necessary. Sufficient. Solved.

Note that I never invoked order, nor did I use the classic BB BG GB GG construction, as the order of birth is not part of the problem. Let’s park that there.

I agree that either approach can produce the correct answer if applied carefully, and that it seldom is.

Now, your calculations need labels – what you have done is this (with ‘BG’ representing the probability of BG occuring):
1xBB divided by (1xBB + 1xBG + 1xGB + 0xGG)
and similar for the Tuesday boy problem.

Now in conditional probabiliy the chace of GB, for example, is 1/2×1/2 = 1/4. And if all the other probabilities are 1/4 you would be correct to say the result is 1/3.

But as I’ve pointed out a few times now there are two ways to get ‘BB’ within the conditions outlined in the question*. Therefore, what you should have done is this:
(1xBb + 1xbB) divided by (1xBb + 1xbB + 1xBG + 1xGB + 0xGG)
This works out to 2/5 divided by 4/5 – the denominators cancel out and you end up with 2/4 = 1/2.

Similarly for the Tuesday boy problem.

*Note the difference between Bb and bB is NOT birth order. The order of birth is not changing, what is changing is whether we know about the first born boy or the second born boy – when we’re told that at least one child is a boy it could be either.

This is a tricky question I’ll admit, but there’s no ambiguity.

If I’ve made a mistake or misunderstood please tell me. And I will reply that I stand corrected and will be glad to have learned from the experience.

@Witold. Before you run out of letters of the alphabet with which to restate the above, here’s an unambiguous and different conditional probability question for you:

A family has 2 children, and we’re told that at least one is a boy. What is the probability of the second child being a boy?

This is a different question – and the answer is NOT 1/2. And if you follow the logic I’ve explained above, you’ll quickly see the answer is not 1/3 or 2/3 either.

Good luck.

18. 18. Witold 6:18 am 03/27/2012

@Jezzadj: “A family has 2 children, and we’re told that at least one is a boy. What is the probability of the second child being a boy?”

I think your answer is 3/4, since you are basically working with the set {Bb,bB,BG,GB} as probability space and hence the second child is in the multiset {b,B,G,B}.

This gives yet another interpretation. I believe this approach could be explained by looking into the psychological motivation of the puzzle-teller. With 2 sons he might be twice as much inclined to pose this puzzle since the BG-parents might as well think about a *girl* to appear in their puzzle (and the GG-parents would have to). So if he were 3 times more inclined the answer would be 4/5 and if he were twice less inclined then 3/5. That may be why some detectives employ psychologists or clairvoyants.

19. 19. Jezzadj 10:12 am 03/27/2012

@jeffjo I think I’ve had a breakthrough.

in post 15 you distinguish pre-conditions vs obvervation after the fact. “If we select the family based on the criteria “one is a boy,” the probability is 1/3.”

So I’ve been asking myself: does the question actually ask this(/is it ambiguous); what difference does it make; and how can we reconcile two different answers?

It’s not like if you and I both knew Witold had 2 children, but you also know that one of them is a boy, can in fact mean that for me the chance of either child being a boy is 1/2 whereas for you the chance of the other child being a boy is 1/3. Seems strange that knowing or not knowing some fact/s about one child, could affect the chance of the other child being a boy.

I don’t see anything in the question itself (AmSci version) to suggest it’s a pre-condition (notwithstanding it being implied in the presented solution), but perhaps it could be ambiguous so let’s consider it.

So if a family has two boys, this meets this pre-condition (it either does or it doesnt meet the pre-condition, no two ways about it, and it does) so BB is counted. Same for BG and GB. And not so for GG of course.

So looking at it this way we have BB BG or GB – so the answer is 1/3 chance of a BB family. I still find this really odd and hard to reconcile. And I’m not convinced this is any more or less correct than 1/2.

The bit that seems to not sit quite right now, is why are we treating BG and GB as separate options?

To meet this pre-condition, birth order doesn’t matter so why are giving any more importance (likelhood) if getting BG or GB than we are for BB?.

So I think I’m going back to where this started – it’s really a case of BB meets the precondition, a family with a boy and a girl (any order) meets the precondition, and GG does not meet it.

So it seems the options from the pre-condition should now be BB B&G (order doesn’t matter) and GG. And the resulting probability is 1x p(BB) / [1x p(BB) + 1x p(B&G) + 0x p(GG)] = 1/3 / 2/3 = 1/2.

After all that, it now seems to me the take home message is that either order does matter throughout, or order does not matter throuhgout, and we must be very careful not to mix the two.

The question and presented solutino seem to use suggestion to lull us into confusing the two. Maybe it is a very cleverly designed puzzle aimed at making people think, as Witold suggests.

Also, I like Witold’s comment about what psychologists refer to as self-reporting bias. Seems to me that a suspiciously high proportion of parents have ‘above average children’

20. 20. jeffjo 5:43 pm 03/27/2012

@Witold: No, I understood you. I was trying to take the discussion into a different (and correct) direction. Since you didn’t get that, let me explain: The ambiguity, present in all the versions of the problem we are discussing, is that knowing “one is a boy (with property FOO)” is only a necessary condition. That means that each family considered necessarily must include a boy (with property FOO). But it is not a sufficient condition. Having a boy (with property FOO) does not force us to include that family in our calculations. As Jezzadj is now correctly describing, we are on;ly forced if it is a precondition.

For example, say one in three boys in your town will join the Boy Scouts. If you want to assess the probability that one of our two-child-including-a-boy families has a child in Boy Scouts, it is not enough to consider just families with a boy, and apply that 1/3 ratio. You also have assess the probability a boy is old enough to join. So “has a boy” is a necessary, but not sufficient condition.

The question, as originally phrased in SciAm, was “Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?” This was published in the May, 1959 issue. Martin Gardner said the answer was 1/3. But in the October 1959 issue he retracted that, saying: “If from all families with two children, at least one of whom is a boy, a family is chosen at random, then the answer is 1/3.” This is the “requirement” I described, or the “precondition” Jezzadj did. He continued, “But there is another procedure that leads to exactly the same statement of the problem. From families with two children, one family is selected at random. If both children are boys, the informant says ‘at least one is a boy.’ If both are girls, he says ‘at least one is a girl.’ And if both sexes are represented, he picks a child at random and says ‘at least one is a …’ naming the child picked. When this procedure is followed, the probability that both children are of the same sex is clearly 1/2.”

Your Question #1 and Question #2 are actually not directly comparable, since one requires order (which is different that utilizing it, which I’ll get to below) and the other does not. But the reason you get 1/3 and I get 1/2 (which is the new direction I was trying to move toward) is because you require a family to have a boy, essentially assuming it is impossible for you to know a BG family has a girl. To see this, just ask yourself what the answer would be if you exchange “boy” for “girl” in the problem. If the answer is the same, that number has to be the answer to the simplified question “What is the probability a two-child family has two children of the same gender.” You, essentially, are saying it is 1/3.

“The simple answer (independent random events) is one half – as the genders are independent.” … “Note that I never invoked order, nor did I use the classic BB BG GB GG construction, as the order of birth is not part of the problem.” Using order to define the relative proportions of families that exist is different than requiring an order as part of the answer. In fact, it is your solution that requires its use, incorrectly. Mine and Witold’s merely use it to define the relative proportions of the family types we know exist.

We all agree that there are four possible ways a couple could have two children, represented by the four orderings BB, BG, GB, and GG. In your solution, “(1xBb + 1xbB) divided by (1xBb + 1xbB + 1xBG + 1xGB + 0xGG),” you count 5 family types, but only four family types exist. You count to five by counting one twice, as though looking at the order differently makes it a different family. It does not.

In fact, if the family is BB, there is no way to define the “other” child because there is no way to define which boy you know about. By double counting it, you essentially put a “probability=2″ in front of that one type. The real answer is “[1x(Bb or bB)] divided by [1x(Bb or bB) + QxBG + QxGB + 0xGG].” The single event (Bb or bB) represents a single family type where you don’t know which child is referred to; and Q represents the probability you will know about the boy in those family types. The answer is thus 1/(1+2Q). This is 1/3 if Q=1 (as Witold claims) and 1/2 if Q=1/2 (as I think).

“A family has 2 children, and we’re told that at least one is a boy. What is the probability of the second child being a boy?” [1x(Bb or bB) + QxGB]/ [1x(Bb or bB) + QxBG + QxGB + 0xGG]=(1+Q)/(1+2Q). That’s 2/3, or 3/4, depending on Q.

About the existence of a precondition: The SciAm version is more ambiguous than the one at the top of this thread. There is no possibility for a precondition in that one, so the answer is unambiguously 1/2 (but by a different solution than you used). I personally don’t see the ambiguity in the SciAm one, but many people do. So the answer *could* be 1/3.

21. 21. tfrayner 6:43 am 03/28/2012

This is a fascinating discussion. I’m not going to claim to be any kind of authority, but here’s how I’ve been breaking this down in my mind:

1. We are told the family has two children, so we have these birth-order outcomes (as has already been done to death): BB BG GB GG

2. We are also told that at least one child is a boy. To me it seems that the full set of possible birth-order outcomes now becomes (with ‘b’ indicating the boy we know about): Bb bB BG GB bG Gb GG GG

If there are two ways to get two boys, presumably we can safely state that there are two ways to get two girls? I’m a little hazy on that point.

3. The BG and GB outcomes are the interesting ones. B represents a boy we don’t know about, so assuming nobody is lying to us these outcomes can be discarded. We also discard the GG outcomes, yielding a probability of 1/2 that the other child is male.

I suspect I’m just restating a solution that someone else has already propounded, in which case my apologies!

22. 22. Jezzadj 7:18 am 03/28/2012

Hi again, I appreciate both of your help to see this problem from different sides.

I am happy, and even a little surprised to say that I think I can show how the answer *could* be either 1/3 or 1/2 depending on the assumptions made when reading the question. But oddly, using a different approach to both of you, and the article’s presented solution.

Following the October 1959 version where selection is clearly based on the pre-condition, I agree that the answer is 1/3…

Picking up from the end of my last post, if a pre-condition for selection of at least one boy is invovled, then the two options BB or B&G (any order) equally meet that condition. And that’s where I went back to the answer being one half.

But I missed one final point, and the reason is because the population of all two child families from which this selection is made, should have twice as many B&G (any order) families as either BB or GG families. This is due to the BB GB BG GG thing but extrapolating those probabilities out into the population at large, from which the sample is drawn. Harvey-Weinberg equilibrium (genetics) suggests this should in fact be the case.

On the other hand, where there is no pre-condition – it’s just observed as a fact that at least one child is a boy (as seems the most fair interpretation of the question above) I maintain that the answer is 1/2 using both the approaches I’ve outlined (order does|does not matter).

Where order doesn’t matter the answer is 1/2 – as I think we all agree. 1 x 1/2 = 1/2

However, where a distinction is made between BG and GB, even though order doesn’t matter to match the observation that at least one is a boy, then that is double counting BG and GB as there is no way to define which boy you know about (as Jeffjo said of my Bb bB approach) there is no need to make a distinction between BG and GB.

My Bb bB approach, is I think a necessary work around, to balance out this unecessary distinction made between BG and GB. I would never suggest anyone follow this approach as it’s a contrived and unecessary complication.

But I used this approach to pinpoint and illustrate the inconsistency in the presented solution. So I still think the answer is 1/2 as long as the at least one boy is an observation – regardless whether you introduce birth order or not.

So ultimately, I agree with Jeffjo’s first post, that the answer is 1/3 if it’s a pre-condition, and 1/2 if it’s observed – due to the sampling method. Even though we seem to have different ways of getting there.

And finally, there isn’t ambiguity in the question at the top of this page that would support assuming selection is based on a pre-condition, the answer is simply 1/2.

23. 23. Witold 7:18 am 03/29/2012

I returned to this site today to see that we are like art critics or performers searching for the best interpretations, with varying degrees dogmatism and independence.

No secondary conditions were implied in this case. Nevertheless, both direct and indirect possibilities, and more, including puzzles about a girl, are already discussed in my posts of and .

@Jezzadj: You need not be intimidated into giving up your “bB” interpretation. Also, it may be equally correct to use an unordered pair {B,G}, but with an appropriate weight, e.g., twice as much as for {B,B} or {G,G}.
@tfrayner: We now know that you may also consider the “revolutionary” October 1959 space of Gardner (influenced by some of his respondents?): {Bb,bB,Bg,bG,Gb,gB,Gg,gG}. On the other hand, we are never told that there exists “the” other child.

While he May be right, I July be left. Scores of such problems must have been known for a few centuries. In this case, depending on the parent’s strategy, any number from 0 to 1 might be an answer. If this game is played indefinitely, they may change their strategies irregularly so that any answer will be meaningless.

24. 24. Witold 7:40 am 03/29/2012

… The posts I recommended above are: #14 and #18 of 03/24/2012 and 03/27/2012. The latter adds some plausible new conditions. The limit for such novel conditions is sky-high, as I prefer a lower level of dogmatism.

25. 25. Infinoe 11:08 am 03/31/2012

Let’s face it. It’s an easy problem, but only with clear understanding of the answer 1/3 and seeing the difference between “some one” and “this one” can we look for modifications. We cannot fool ourselves or bypass that in a way anaerobic organisms bypass respiration. Statistics is vital for risk management in economics, energy production and many other areas.

26. 26. jeffjo 6:58 am 04/2/2012

It is an easy problem, but statistics (the study of the properties of unknown distributions using real data) has nothing to do with this theoretical problem. All you need to do to solve it is enumerate all of the possibilities, deduce the probability of observing the results as stated in the problem, and then apply the lasw of probability.

P(0 boys)=P(2 boys)=1/4 and P(1 boy)=1/2, but this is not enough. “Observing” is not the same as “existing.” If KS1B is the event where you know “some one” is a boy, then P(KS1B|0 boys)=0, P(KS1B|2 boys)=1, but P(KS1B|1 boy) is harder. To get the answer 1/3, you have to use P(KS1B|1 boy)=1, but then the laws of probability say P(KS1G|1 girl)=0, which is absurd.

Try it another way: where do I make an incorrect statement below?

1) The probability that the two children in a family of 2 share the same gender is 1/2.
2) Say the probability that the two children in a family of 2 share the same gender, given that you know one is a boy, is P.
3) The probability that the two children in a family of 2 share the same gender, given that you know one is a girl, has to be the same. It’s P also.
4) So, if you take 100 families where you know one is a boy, 100P of them have this property; also, 100P families in 100 where you know one is a girl have it.
5) That means 200P out of 200 families have the property regardless of what you know.
6) The probability a random family has it must be P also. And we know the probability is 1/2. So P=1/2.

27. 27. RichardBurkholder 5:08 pm 04/5/2012

Another factor the equation alone fails to take account of. There is at least some additional probability that the first child is part of a set of identical twins. If so, the second would by definition also be a boy. Where’s that possibility factored in?

28. 28. jeffjo 5:35 pm 04/5/2012

There are many such parameters (twins, more boys are born than girls, girls have a higher survival rate, evidences] suggests some parents are more likely to produce one gender even if we ignore twins, some parents are more likely to mention one gender, etc.) which we ignore because (1) there is no fixed value for these paramaters – they vary racially, ethnically, geographically, and culturally – and (2) They don’t affect the solution method, only whether the answer sdi “near 1/2″ ot “near 1/3.” So please, stop being pedantic. Or if you have valid point about how to formulate a solution, as opposed to what probabilities to use in the solution, please describe it.

29. 29. RichardBurkholder 5:44 pm 04/5/2012

(I could of course added “of identical triplets, quadruplets, quintuplets, etc.” But you get the point.)

30. 30. RichardBurkholder 6:31 pm 04/5/2012

(Assuming Comment #28 was addressed to me?)

We’ve already been told that the first was a boy. That’s as a stipulation built into the question. A pre-existing condition.

Of course at least children are born as members of IDENTICAL twins, triplets, etc. Not many at all, but some. Let’s assuming that this boy happens to be one of these.

If so, and if the first one is a boy, that means there is precisely zero possibility that his sibling(s) will be female. (See definition of identical).

The fact that there may also be equal numbers of identical female twins, triplets, etc. born is irrelevant here, since we’ve already been told that the first one is a boy.

This additional increase to same-male-gender probability doesn’t appear to be accounted for in any of the mathematical equations above above.

If it is factored in to the equations above, please tell me where.

31. 31. RichardBurkholder 6:34 pm 04/5/2012

(Should read): “Of course, at least some children are born as members of IDENTICAL twins, triplets, etc.”

32. 32. RichardBurkholder 6:52 pm 04/5/2012

Unless of course taking into account a real world phenomenon directly related to the initial question’s conditions constitutes “pedantry”.

We’re not talking “how many angels on the head of a pin?” in my point here. It’s an apparent omission in the math above.

Can I precisely quantify its additional probability effect? Nope — wasn’t trying to here. Only wanted to point out that it exists, and doesn’t seem to be accounted for.

If you can tell me how the (already stipulated) boy can have a girl as an indentical twin, I’d be happy to hear it.

33. 33. RichardBurkholder 7:19 pm 04/5/2012

OK should add some real world stats to also factor into the math here, so as not to dodge that quantification issue…

Evidently “the odds of having identical twins are about 3 in 1,000″ (see multiples.about.com).

So let’s assume the odds of bearing identical MALE twins are roughly 1.5 per 1,000. Very, very small, but not zero.

So given that the first child here is a boy, let’s for account for this as well. Assuming we’re aiming to get as close as we can to exactitude, right?

Dinner time — over and out.

34. 34. Witold 7:30 pm 04/6/2012

It might be more productive and more fun for some not-yet-bored readers to consider non-identical twins (as I suggested earlier, for the Tuesday version) or identical twins (as R.B. suggestsed above). Naturally, it is more fun if no “chosen child” is pointed at.

Meanwhile, Jeffjo was asking: “where do I make an incorrect statement below?” Frankly, Jeff, you have already made such statements above, e.g., with your boy-scout comparison missing the point, and claiming ambiguity while insisting on your unique answer.

Jeff’s items 1)…6) (see post of 6:58 04/2/2012 above) were probably meant as a joke. They are obviously imprecise and inconclusive but there are easy ways to straighten them out.

Let Prob(GG)=x, Prob(B&G)=u, Prob(BB)=y. Statement 1) says that x+y=u=1/2. Then 2) says that P=y/(y+u). Statement 3) is not necessarily following, but claiming that P=x/(x+u), so now x=y=1/4. Statements 4) and 5) should refer to the expected value. Statement 5) is nonsense, since “the property” is not specified, and it makes a difference to know that “one is a boy” or “one is a boy or a girl (obviously)”, or “one is a girl”. Statement 6) is not following for similar reasons. Disregarding the nonsense, one can compute the answer as
(1/4)/(1/4+1/2)=1/3.
Thus, not only the proposed argument for 1/2 was incorrect, but the answer 1/3 can reappear from it.

35. 35. jeffjo 8:35 am 04/7/2012

“Let Prob(GG)=x, Prob(B&G)=u, Prob(BB)=y. Statement 1) says that x+y=u=1/2.”

If you want to generalize it, sure. The u=1/2 is part of the simplifying assumption RichardBurkholder is refusing to see, which also assumes x=y=1/4.

“Then 2) says that P=y/(y+u).”

No, it doesn’t, and this is part of what you refuse to see. First off, P is a conditional probability that applies only over the set B&G. Theoretically, 0<=P<=1. The probability you are describing is P*y/(y+u).

Is it possible, in general, for you to know only that a B&G family has a girl, the same way this problem says you know only about a boy in this particular family? Yes. That means P is an unknown, with P *less* *than* 1. And so the probability you describe should be less than y/(y+u), not equal to it.

"Statement 3) is not necessarily following,…"

It isn’t supposed to follow from 2), it is supposed to be symmetric with 2). It is "claiming" that the chance you would know about a girl (but not a boy) in a B&G family is the same as the chance you would know about a boy (but not a girl).

"Statements 4) and 5) should refer to the expected value."

They were trying to be brief and not introduce words that would add to the possibilities for deliberate misinterpretation. But here, for once, you are correct. I should have said "you expect 100P…"

"Statement 5) is nonsense, since “the property” is not specified,.."

Again, I was trying to be brief. But since the only property discussed, and the one related to P, was "both children share the same gender," it was clearly implied. Being argumentative like this doesn’t help you; but I'll call it S (for "share") from now on so as not to confuse you.

But it is not nonsense. It clearly follows that since (A) you expect 100P of the 100 families, known to have a boy, to have S; and (B) you expect 100P of the 100 families, known to have a girl, to have S; that (C) you expect 200P of the 200 families, where only one gender is known, to have S. Is there some part of that you don't understand?

"Statement 6) is not following for similar reasons."

Statement 6) follows trivially.

36. 36. jeffjo 8:37 am 04/7/2012

And if you want to be general, use your probabilities x, u, and y; and N families. We expect x and y to be near 1/4, and we know u=1-x-y, so it is near 1/2.

N*y families are B&B, N*x families are G&G, and N*u families are B&G.

From the set of families where you know about one gender only, N*y families are (expected to be) B&B and you know about a boy, N*x families are G&G and you know about a girl, P*N*u families are B&G and you know about a boy, and (1-P)*N*u families are B&G and you know about a girl.

The answer to the question “If I tell you one is a boy, what is the probability that the two children share a gender?” is now (N*x)/(N*x+P*N*u)=x/(x+P*u). Since x and u are close to 1/4 and 1/2, respectively, this is close to 1/(1+2P).

The answer to the same question, if I tell you one is a girl, is y/(y+(1-P)*u. Which is close to 1/(3-2P). Since I didn’t make the assumption I made in 3) before, the paradox has to be phrased differently. Would you expect the answers to these two questions to be similar – differing only because x and y are not exactly 1/4 – or vastly different?

The only way they can be similar is if P is close to 1/2, making both answers close to 1/2. Making the answer to the first is close to 1/3 requires P=1, so the answer to the second is close to 1.

37. 37. Witold 10:48 am 04/7/2012

@Jeffjo, you seem like forcing a single version of a Nostradamus prophecy, which is neither the simplest nor the most instructive one, and you seem to be doing it inconsistently.

“P is a conditional probability that applies only over the set B&G”
You are changing the problem, but then since the genders are different over your set, you have P=0.
“I’ll call it S (for “share”)”
Another arbitrary change, but as I had mentioned, depending on the condition (or lack of it) you have: Prob(S)=x+y=1/2; Prob(S|BB+B&G)=x/(x+u); Prob(S|GG+B&G)=y/(u+y).

If you are certain of your argument, maybe you could publish it in detail for a broader audience on ArXiv, ViXra or elsewhere, adding some more advanced or more practical applications. On the other hand, it is generally well known that putting a mathematical problem in an external context can lead to ambiguities. We learn it in kindergarten. For example, the result of “taking away 0 from 10″ could be 10 or could be 1.

38. 38. Witold 11:15 am 04/7/2012

I meant of course: Prob(S|GG+B&G)=x/(x+u); Prob(S|BB+B&G)=y/(u+y).

And here is for all of us one *Not*-Do-It-Yourself puzzle, based on the song “Whiskey in the Jar”:
“When I was going over the Cork and Kerry Mountain… …And I shot him with both barrels.”
Now, the puzzle says, suppose he’d loaded the barrels randomly and independently with either a blank or a real cartridge (50/50 chance each). Knowing the result, what are the chances that both barrels had been loaded with real bullets?

I guess I should be leaving now. Good luck!

39. 39. jeffjo 4:21 pm 04/7/2012

“If you are certain of your argument, maybe you could publish it …”

It’s been done before. Martin Gardner (the ultimate source of this thread) said the same thing in the October, 1959 edition of SciAm. Before that, it was published in 1889 by Joseph Bertrand.

40. 40. Witold 4:59 pm 04/10/2012

The point is that one simply *cannot* prove any answer without making strict assumptions, such as specifying the probability space, because different spaces can yield different answers. In other words, the answer is model-dependent, although it is sometimes good to consider the simplest model.

The answer 1/3 reportedly goes back to Fermat, maybe earlier. The problem should not, however, be confused with the Bertrand coin-box paradox of 1889, in which gold&silver-box has a chance of 1/3 only, while boy&girl-families usually have a chance of 1/2 as we know.

@Jeffjo. Your argument failed at item 5), because you took 100 families of type BB or B&G, then 100 families of type GG or B&G. This meant your resulting sample of 200 was *non-uniformly* distributed. Each B&G-family had a twice bigger chance to be taken into the 200-sample than each BB or GG family. In other words, conditional expectation is not set-additive. But in effect, we do agree, as you said in the beginning, that it all depends on what the problem is.

Whiskey-in-the-jar answer: As it was a fun article, this was a paintball game. Assuming probabilities p and q as accuracies of each pistol (possibly p=q=1), the answer can be this: (p+q-pq)/(2p+2q-pq).
Best wishes.

41. 41. jeffjo 5:18 pm 04/11/2012

“The point is that one simply *cannot* prove any answer without making strict assumptions, such as specifying the probability space, because different spaces can yield different answers.”

Correct; but some spaces people try to use are inconsistent with the problem statement. In this case, equating the set of families where you know one is a boy, with the set of all families where at least one is a boy, is wrong. Part of your sample space has to determine which gender you know when there are two possibilities for the family.

“The answer 1/3 reportedly goes back to Fermat, maybe earlier.”

Please, support your assertion. I’ve never heard of this. But the 1/2 answer (not the one Gardner and Devlin originally published, the one I repeated) goes back to an 1889 paper titled “Calcul des probabilities” by Joseph Bertrand, which presented a similar problem.

“The problem should not, however, be confused with the Bertrand coin-box paradox of 1889, in which gold&silver-box has a chance of 1/3 only,…”

Close. You need to add a second box with a gold and silver coin, and then it is identical. Change “silver” to “bronze” and say you drew a bronze coin, if you want to make the identicalness of the symbology jump out at you (so “B” stands for boy or bronze, and “G” stands for girl or gold).

Where Bertrand’s three boxes started with a probability of 1/3 each, this set of four boxes, in three types, have probabilities for (B+B, B+G, G+G} of (1/4, 1/2, 1/4). Sound familiar?

Where Bertrand gave probabilities of P(G+G&OG)=(1/3)*1=1/3, P(G+B&OG)=(1/3)*(1/2)=1/6, and P(B+B&OG)=(1/3)*0=0, this set has P(B+G&OB)=(1/4)*1=1/4, P(B+G&OB)=(1/2)*(1/2)=1/4, and P(G+G&OB)=P(1/4)*0=0. Look similar?

Where Bertrand said the resulting *conditional* probability for two gold coins after observing one was P(GG|OG)=(1/3)/(1/3+1/6), this altered problem says P(BB|OB)= (1/4)/(1/4+1/4)=1/2. Guess what – it looks the same because it is the same.

“@Jeffjo. Your argument failed at item 5), because you took 100 families of type BB or B&G, then 100 families of type GG or B&G.”

No. Your comprehension of my solution failed at 5) because you translated what I said, “you know one is a boy,” into “the family is of type BB or B&G.” The first is a strict subset of the second. “Strict” means there are families of the second type that are not of the first type. Dividing the B&G cases evenly into cases where you know B, and know G, makes them uniform.

42. 42. Witold 6:38 pm 04/11/2012

The plain and smart answer 1/3 is not consistently challenged.

“it was published in 1889 by Joseph Bertrand.”

After cheating yourself with your 1/2-argument, you invoke Bertrand like an oracle. But he published something else. To make a comparison with his experiment one would of course have to choose from four boxes (two with coins of both kinds), but then the subsequent stage of reaching for a coin would become unnecessary. It is the parents who know both children and tell us about them, rather than the solver reaching for a coin/child. Much earlier Fermat had a discussion with Pascal over a coin-toss problem genuinely similar to the one in question.

“ambiguously stated, and both Martin Gardner and Keith Devlin issued retractions to their original solutions admitting this.”
Retractions is an overstatement. They did not retract their solutions, just added new versions of the problem to justify other submitted solutions.

Some versions produce 1/2 as an answer, but they begin with complicating the problem and end with trivializing the solution.

43. 43. jeffjo 6:52 am 04/12/2012

“To make a comparison with his experiment one would of course have to choose from four boxes (two with coins of both kinds),…”

Which I described.

“.. but then the subsequent stage of reaching for a coin would become unnecessary.”

This is a non sequitur – a conclusion that does not follow from the evidence you presented. I have no idea why you think it is true, but it isn’t.

“It is the parents who know both children and tell us about them,…”

Right; and a parent of a boy and a girl has the choice of what to tell us; the parent of two of the same does not. Another identical problem is one called the Principle of Restricted Choice in bridges. When you get incomplete evidence (an opponent playing the King of Hearts, or a parent saying “boy”) about an unknown state (both the King and Queen of Hearts are missing, or two genders are possible), the possible state where the choice was restricted (that player had only the King, or that parent had only boys) increases in probability relative to the one where there was a choice (that player had the King and Queen, or that parent also had a girl).

You can look it up.

“Retractions is an overstatement. They did not retract their solutions, just added new versions of the problem to justify other submitted solutions.”

Both said there solution DOES NOT APPLY to the problems as it was originally stated, which were less ambiguous then the one we are dealing with.

44. 44. Witold 10:44 am 04/14/2012

Well, the plain and rational answer 1/3 is unchallenged.

In the Bertrand model the subsequent stage of reaching for a coin would of course become unnecessary and inadequate.

“no idea why you think it is true, but it isn’t.”
Oh, really? It is the parents who know both children and tell us about them, and no one is supposed to reach for, or choose, a coin/child. But if you really have no idea, you are only confirming the fact that your invoking of the Bertrand model was inadequate.

How to make such a model adequate is another problem. What one should do is either (1): a not-two-silver experiment, such as, for instance, weighing the chosen box against a control two-silver-box and realizing the former is heavier, or (2): an analogous at-least-one-silver experiment, weighing the chosen box against a control two-gold-box and noting the cases where the latter is heavier.

Restricted choice was already discussed long ago in the first part of my earlier post (look up #14, 7:46 pm 03/24/2012). Namely, IF we know that a CHOSEN child is a boy, we have fewer (i.e., less, in terms of probability) general possibilities: only BB or BG, without GB, while the special possibility BB remains the same. It is regrettable that @jeffjo does not quite seem to understand that, trying himself to restrict choice only arbitrarily, and having had nothing new and consistent to say from the very beginning.

“a parent of a boy and a girl has the choice of what to tell us”
Who told you that and what the choice might be? Please notice that (1) such a choice is not assumed, but (2) it would amount to choosing a child to tell a puzzle about him or her and (3) this has already been discussed in the article and some earlier posts, such as #10,#12,#14,#15,#16.

The puzzle does not tell us whether he/she/they/group must have a choice, or whether the choices are independent or related or *unanimous* for a group of tellers, or how many solvers should hear what. It is possible that one day some puzzle-inclined not-2-boy families may agree, for instance, to test their girl-puzzle on some less or more naive people. The following day each B&G-family might be telling each of their 2 puzzles to one person and each other family their puzzle to one person (after all, those who buy clothes for a boy and a girl may be used to a bigger variety and effort). An approximate real answer would then be 1/3.
“the parent of two of the same does not.”
Even that may be unfounded, as they might be asking people an even stranger question: ‘two boys or two girls?’
But here we were talking about various extended versions.

“Both said there solution DOES NOT APPLY to the problems as it was originally stated, which were less ambiguous then the one we are dealing with.”
Their solution never does? Not true.
Really less ambiguous? No.
They could hardly have given better wording, ‘(at least) one is a boy’ is logically equivalent to ‘they are not two girls’ and so much different from any other intention, such as ‘the one I am thinking of now is a girl’. Furthermore, retaining their answer (with an explicit probability space) while making a statement under the pressure of readers shows just how logical the 1/3 answer is. Lastly, invoking authorities as oracles would be undecisive even if they had sworn 1/2 (or 1/3 or 7/9) under oath.

45. 45. jeffjo 12:27 pm 04/15/2012

The naive answer is 1/3. It is based on the probability that certain combinations exist, not on the probability that you know about what exists. Many people have realized this, including Martin Gardner, who ultimately inspired this thread. The fraction of two-boy families, among two-child families that include at least one boy, is “simply and soberly” 1/3. The probability of a two-boy family, given that someone tells you there is a boy, is not the same thing.

No one was “required to reach for” either a child, or a coin, in the comparison of the Two-Child problem, or the Bertrand Box paradox. What was reported, and “selected” indirectly, was either a *kind* of coin, or the *gender* of a child. And what the options are obvious. And given the respective problem statements, it is clear someone made such a selection in both. The problems are identical, except in the number of possibilities

Whether or not you *think* you discussed “restricted choice,” you obviously do not grasp the concept as it applies. And it is clear you have no intention of trying, only of trying to find reasons why I must be wrong. Until you demonstrate you are willing to set your preconceived ideas aside, and learn something that is well-established as correct, there is no point in continuing to talk to you.

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Click one of the buttons below to register using an existing Social Account.

## More from Scientific American

### Scientific American Editors

More »

•
• Video of the Week
Space Oddity
• Image of the Week
•