Sunday, August 5, 2012

Should race be a factor in statistical inference?

This is a question I have long been thinking about. Ye's incident brought this question back to my attention. While I despise and lament the racist comments concerning Ye's performance, I should give a qualified "yes" to the question posed in the title. While apparently paradoxical, this response does not contradict my attitudes on Ye's matter, as I pointed out in the previous post that race should be accounted for properly in a scientific manner.

Conditioning is the soul of statistics. Whenever possible, we should condition on relevant information to aid in our inference. Race, in some cases, could be a relevant information. When that is the case, we should not simply discard the information for political correctness. Being fair-minded does not mean we should turn our head away from any information concerning gender and race. We are seeking truth, uncomfortable it might be, and when those information contains information about the truth, we choose to extract the information from it. (The book Intelligence Paradox contains a good discussion relating to academic political correctness in its Introduction, which I wish to quote in my incoming blogs, and which I agree whole-heartedly.) The hallmark of a racist is not reading information from race, but rather read non-existent information from race and refuse to consider other information.

However, what is lamentable is that many people do not do it properly. We might have different priors based on race, but those priors cannot be dogmatic. They should contain information about race, but they should recognize the limited scope of the information, and allow enough flexibility in the prior so as to readily accept any new relevant information. Put in another way, even if we allow the priors to be different, this difference will be dominated once new relevant information becomes available. Thus, incorporating race, in most circumstances, will only be pertinent in theory, and non-significant in practice, unless one has very strong evidence that the racial information is extremely relevant and reliable, a case for which I am having an extremely difficult time to construct an example.

The problem with most racial information people wish to incorporate is that they are not "structural"--- The relation between race and certain behavior are not constant or fundamental. Given that most of the information is historical data, and that the possibility of a structural change, the past observations contains little relevant information for today, and it will be detrimental to make decisions based on past non-structural observations (similar to the Lucas Critique in economics). For example, African Americans might have lower literacy historically, but this association is non-structural, and it will (and did) change as time passes by. In general, blindly incorporating historical information as if they are historical is a sign of sloppy thinking and crappy reasoning, both of which might bring severe consequences. The current financial crisis, as some economists would argue, results from people's extrapolating from historical data on real estate (believing future performance will mirror past performance). Those traders who were stupid enough to commit such fallacy, brought tremendous loss to their companies. In the case of Ye, the Chinese swimming team had a dishonorable history in the 1990's, and that was the past. To blindly project 1990's to 2010's is similar to what those garbage traders did, except in this case, the consequences is born by the potentially innocent Chinese swimmer Ye.

A less innocuous phenomenon is confirmation bias. When the information of race is entered, the person directly forms a (racist) belief, and he looks for information that only confirms his belief, and interpret ambiguous information so as to confirm his belief. In the case of Ye, the history is just information he digs up to rationalize his belief, he discard relevant information (other swimmers could improve more significantly from personal best and drug test), and he biasedly interprets ambiguous information (interpreting Ye's second not-as-startling performance as evidence of backing-off, rather than symptom of fatigue after staying up late for drug tests nights in a row till right before the contest and extreme psychological pressure resulting from media defamatory coverage).

To conclude, while incorporating racial information per se is scientific and non-racist, many tend to err on the side of over-incorporating racial information. The real racist behavior is not just to incorporate such information, but to do it improperly. The real danger results from forming a dogmatic prior and indulging in confirmation bias.

No comments:

Post a Comment