site banner

Small-Scale Question Sunday for October 16, 2022

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

3
Jump in the discussion.

No email address required.

I'm reading Cathy O'Neill's Weapons of Math Destruction. It has been on my TBR pile for... too many years, now, which makes some of her case studies particularly interesting, in retrospect. I'm a little over halfway through, however, and so far she seems to not appreciate the difference between these two positions:

  • Automated, opaque data aggregation and processing is, by its nature, damaging to something important (e.g. rights, economies, society, mental health, whatever)

  • Automated, opaque data aggregation and processing should be used only to advance my political goals

It's not a bad book, exactly, but I'm concerned that by the time I finish reading it, I will just feel annoyed that it came so highly recommended. A lot of what she says seems basically right, but she essentially telegraphs the eventual capture of so-called "AI alignment" by progressives ideologues. Her hope does not appear (as, I think, advertised) to understand how the application of algorithms to human existence might be objectionable per se, but to find a way to make sure that algorithms apply to human existence only in ways that progressives like.

But in one sense O'Neill accomplished something interesting, at least: she successfully, if inadvertently, became the trendsetter for today. With art generators in the West being specially trained to not produce nudity or violence, while art generators in China are trained to not produce pictures of the 1989 Tiananmen Square Massacre, "AI aligment" "experts" the world over are chattering about how we will avoid building bias into our AI tools by, apparently... building the right bias into our AI tools. In so doing, they are apparently--it so far appears--channeling O'Neill.

Yeah, I recall being pretty disappointed in the book when I read it a few years ago, though I don’t recall why. I seem to recall her making a lot of dubious assumptions

tl;dr: progressivism-related arguments generally fallacious, but not exactly untreaded ground here - general anti-algorithm arguments certainly demonstrate 'algorithms can be components of bad things sometimes', but fail to put algorithms as a direct and only cause, or connect them to much large-scale harm. There were some interesting bits that hadn't occurred to me, like "for-profit colleges are significantly driven by generous student loans", but none of them had much to do with algorithms.

From the wiki article for the book:

Most troubling, they reinforce discrimination: If a poor student can’t get a loan because a lending model deems him too risky (by virtue of his zip code), he’s then cut off from the kind of education that could pull him out of poverty, and a vicious spiral ensues. Models are propping up the lucky and punishing the downtrodden, creating a “toxic cocktail for democracy.”

All of the 'systemic biases mean those who are worse off are made even worse off', and a 'vicious spiral (cycle?)', arguments are inaccurate (in the current year) for two reasons - their effects just aren't large enough, and they're made up for by compensatory progressive programs. The loan example's multiple 'if's each correspond to specific conditions - plenty of minorities live in not-disproportionately-minority zip codes, even given said algorithm some of the ones in the zip code will get loans, there are many local colleges that are incredibly cheap, there are plenty of self-study resources available, etc. And, of course, there are loan programs for poor people and affirmative action. So, even if that's somewhat true, you'd expect - if that was the only effect - incomes to even out over several generations - and, for some ethnic minorities, it does.

On the specific topic of 'automated, opaque data aggregation' - I guess i'll skim the book a bit ... I found the epub on libgen.is, then, wanting a more convenient experience than the native epub reader, searched hn.algolia.com for 'epub read' (the first google result was packed with ads and a premium subscription) and picked "https://app.lotareader.com" for no particular reason

reading a few pages, i'm reminded of why I don't read popular books that much. "weapons of math destruction" is an annoying, unhelpful term, and the book generally seems to be broad gestures towards 'algorithms bad' as opposed to coherently explaining why they are. Many complaints, but just a few:

In general there are a lot of extraneous sentences that don't really add anything - like this paragraph "Now, if they incorporated the cost of education into the formula, strange things might happen to the results. Cheap universities could barge into the excellence hierarchy. This could create surprises and sow doubts. The public might receive the U.S. News rankings as something less than the word of God. It was much safer to start with the venerable champions on top. Of course they cost a lot. But maybe that was the price of excellence". Half the clauses here are entirely useless, and the other half are just smugly restating her point. "as something less than the word of God."? really?

[on a recidivism score taking birthplace into account] But even if we put aside, ever so briefly, the crucial issue of fairness, we find ourselves descending into a pernicious WMD feedback loop. A person who scores as “high risk” is likely to be unemployed and to come from a neighborhood where many of his friends and family have had run-ins with the law. Thanks in part to the resulting high score on the evaluation, he gets a longer sentence, locking him away for more years in a prison where he’s surrounded by fellow criminals—which raises the likelihood that he’ll return to prison.

Again, if you have a feedback loop that, say, increases risk by 5% - x * (1.05) is 1.05x. The 'vicious cycle' means that that .05 gets gets another 1.05 modifier added to it, so we get ... x + (.05) * (1.05)x. The series, sum 0 to inf of .05*n, converges to 1 / (1 - .05) ~= 1.0526. And when you combine 'in part', 'likely', 'more', 'raises the likelihood', 5% seems high! (the math is entirely tangential, "it should be obvious", i guess)

A key component of this suffering is the pernicious feedback loop. As we’ve seen, sentencing models that profile a person by his or her circumstances help to create the environment that justifies their assumptions. This destructive loop goes round and round, and in the process the model becomes more and more unfair.

But this is just stated, no evidence is provided...

The chapter on online advertising spends a lot of time outlining non-targeted-ad, human-driven recruiting methods for for-profit universities that are, afaict, just as 'awful' as the target ad-driven one. So, again - what exactly does the algorithm add here?

Also strange is the claim that making a model transparent improves it. Not super relevant, but the weights for stable diffusion and OPT175 are just ... right there, and nobody is really sure how they work. (It's plausible that stable diffusion and GPT are still 'relatively simple' and we don't understand them just because they're so big and there's a lot of slightly complex 'circuits' or something, but that doesn't mean we're more able to understand them!) She seems to just assume transparency means there will be awesome independent journalists inspecting the model and demanding social change to fix it or something. Especially bizzare is a claim that the 'opacity' of college 'admissions models' leaves applicants/parents "in the dark" but "creates a big business for consultants". A transparent algorithm wouldn't reduce the number of consultants, it's not like each rich parent would manually interpret regression coefficients and design their kid a plan instead of paying for a program. And it just claims 'admissions models are derived from the US news model and each is a WMD'. I guess this is supposed to add to the sense that 'wmd = bad = everywhere', but wouldn't admissions be tough in any case? How would a more holistic admissions model make parents compete less?

I'm not sure "the concept of a safety school is now largely extinct", as claimed, and the 'american college USNews' chapter seemed to just say that US News rankings exist, point to people gaming them and a ton of potential consequences, but never really connected the ranking system to any specific college issues in a coherent way.

That's already way too long and ranty, but the entire book read like that.

Other mostly-unrelated observations:

We had about fifty quants in total. In the early days, it was entirely men, except for me. Most of them were foreign born. Many of them had come from abstract math or physics; a few, like me, had come from number theory

I still don't get the 'all immigrants must be stopped bc they dont contribute to america' thing, at all, in large part because of a ton of observations like this.

Some of people I know would claim, after reading this, "wow, our elite are so stupid, but they believe stuff like this". But it shows the opposite - the author is clearly very smart - algebraic geometry, math professor, quant at hedge fund!