site banner

Small-Scale Question Sunday for July 9, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

5
Jump in the discussion.

No email address required.

I'm dropping in from sorting by new comments.

Is it fair to think of this geometrically, as a, b and c being three lines, with a and c being perpendicular and b lying in between, such that the correlation between the pairs a&b and b&c are positive, yet a&c have zero correlation?

|/_ for a quick representation

That is a good question and exposes that I'm a little out of my depth. But I've spent a happy half hour writing some crude dice rolling simulations, so what follows is partially checked (I'd like to draw some scatter plots too!)

Consider a data generating process using a red d6 and a green d6, where d6 is jargon for the ordinary cubical die with 6 faces. We regard the red and green dice as generating the red and green random variables. A third, yellow random variable is generated by adding together the red and green rolls.

Then red and yellow have a correlation of 0.7 (Will checking with pencil and paper discover that this is 1/√2 ?). Yellow and green also have a correlation of 0.7. Red and green have a correlation of 0.00506. Now I'm regretting writing a dice rolling simulation, rather than a computation using distributions. That has to be really 0.

But lines don't really work. Two of the scatter plots have lines at a definite slope, but red versus green is just a filled in square showing zero correlation.

I'd really like to get the third correlation to be negative rather than zero, to make the point about non-transitivity more strongly. Can I do that with dice? Yes.

Roll five dice, A,B,C,D,E. Generate three random variables

Red = A + B + C

Yellow = A + B + D + E

Green = - C + D + E

Red and yellow share A and B giving them a correlation of 0.57. Yellow and green share D and E giving them a correlation of 0.59 (it has to be the same, but I'm out of time to do the computation exactly)

Meanwhile Red and Green share C but with C subtracted from Green, for a correlation of -0.3

That is shocking. Red correlates positively with yellow. Yellow correlates positively with green. But red and green have a negative correlation.

Now we have reached the point where I really need scatter plots. I think the Red/Yellow plot and the Yellow/Green plot are basically the same (there is an offset because the red mean is 10.5 and the green mean is 3.5, but I don't think that matters). Red/Green contrasts by sloping down rather than up. It doesn't lie between Red/Yellow and Yellow/Green at all.

Non-transitive dice make my head hurt, but thank you for going to such lengths to answer my question!