@ControlsFreak's banner p

ControlsFreak


				

				

				
5 followers   follows 0 users  
joined 2022 October 02 23:23:48 UTC

				

User ID: 1422

ControlsFreak


				
				
				

				
5 followers   follows 0 users   joined 2022 October 02 23:23:48 UTC

					

No bio...


					

User ID: 1422

You know what? I don't think he is engaging with the article. The article specifically mentions GPT 5.2 Pro seven times, two of which seem, to my read, to imply that that's what he's using. There is one moment where he just says "GPT 5 Pro". Perhaps he just happened to leave off the ".X" in this one spot. Perhaps I'm reading the other seven mentions of GPT 5.2 Pro wrong, and the dirty secret is that he's using 5.0. I suppose he doesn't say in big bold highlighted words, "I'm definitely using 5.2 and not 5.0," so sure, maybe one could say that it would be nice to have a clear statement.

...but to come in, with one sketchy textual inference, and just boldly declare that the only way anyone could possibly be reporting the experience they're reporting is obviously just because they're using a six month old model, and that obviously it's now totally fixed... it's the same SMH annoyance at someone being annoying and arrogant.

In fairness, perhaps he only read my comment and not the article (thus, not engaging with the article), and in fairness, I did blockquote the one spot where he seemed to have left off the ".X". But yeah, "I didn't RTFA, but I'm going to boldly declare that I've diagnosed exactly what's going on, using the same tired objection," is pretty cold comfort.

The article discusses Erdos problems and Aletheia's performance on "First Proof".

Why is there always someone who blows up with such attitude, yet appearing to not really engage with anything?

But you will also notice the absense of issues you are facing.

Let's turn it around. What version mathematician are we dealing with here? What's your h-index? Have you used any particular LLMs, regardless of particular model/scaffold to solve components of your own publishable mathematics research? Can you personally attest to not encountering any issues like this? I just don't understand this insistence of not looking at the frontier, yet insisting where it is.

Math Prof Daniel Litt talks about LLMs and math proofs

It seems to me to be a balanced take. He's bullish and hopeful on the future, while trying to be accurate/realistic about current capabilities, while remaining somewhat concerned about possible problems. For example on the bullish/hopeful side:

I think I have been underrating the pace of model improvements. In March 2025 I made a bet with Tamay Besiroglu, cofounder of RL environment company Mechanize, that AI tools would not be able to autonomously produce papers I judge to be at a level comparable to that of the best few papers published in 2025, at comparable cost to human experts, by 2030. I gave him 3:1 odds at the time; I now expect to lose this bet.

For discussion the current state, he focuses on "First Proof", which is a set of ten lemmas from current researchers' unpublished papers. He discusses the performance of different groups, different models, different scaffolding. There are positive and negative notes. One personal example section from his own endeavors:

One of the ways I like to test the models is to give them a hard problem, and then see how long it takes me to cajole/guide/bully them into giving me a correct solution. For a lemma from one of my papers, it is typically quite difficult or impossible to get a complete proof without any hints. In one case I devoted, as an experiment, 8 hours (admittedly some of which I spent away from the keyboard in frustration) trying to get GPT 5 Pro to produce a relatively simple counterexample to some statement without hints. The models do much better if one gives them a hint. Frontier models can often execute arguments I would consider "routine" if one explains the general idea in a sentence or two. It's easy to take this as evidence for usefulness, but against automatability. This is wrong. Instead of saying, *it takes 8 hours of human labor, or giving away the main idea(, we should say all it takes is 8 hours of labor or the one-sentence main idea.

My sense is that he's doing this with problems where he knows the solution (to some level; I could probably write a whole post on the different levels of "knowing" a solution for a piece of mathematics). There is great promise here, but also a note of concern. To state that concern somewhat more concisely, he writes:

In the near term, we're in trouble. Models are able to produce both correct, interesting mathematics, as well as incorrect mathematics that is exceedingly labor-intensive to detect. Academic mathematics is simply not prepared to handle this.

This again seems reasonable to me, given my own experiences. Yes yes, I haven't used every model and every scaffold (some of the systems he discusses are not publicly available at any price). When I've known the solution, I can probably get it there. When I've not known the solution, I have to say that at best, it's been good at helping me find other results in the literature that might be helpful. It is, indeed, labor-intensive and quite frustrating to have to carefully pore over every detail, trying to see if it went astray when generating a mountain of text. Then, when you find something wrong, maybe not even having verified the rest of it, it'll happily produce another mountain of text, and it feels like you're starting from square one. When you're already confident that you know a method will work, then it's mostly just a test of will to see if you can get it to figure it out. When you don't know, the question of whether you potentially waste mountains of time on what may be a dead end or just proceed on your own becomes far more difficult, and you have to make that decision repeatedly along the way.

I hate to bring this up, but it's also quite frustrating that when I say things like this, the most common response is that it's a "skill issue" or that I'm just not paying the right quantity of dollars for so-and-so's preferred model. So, maybe this testimony will help allay some of those concerns.

And yeah, Sagan help us when it comes to reviewing the mountain of papers we're going to get submitted to journals/conferences that are more LLM than human in the meantime.

He ends very hopeful:

Let us take this to an absurd extreme. Suppose we had a library filled with proofs of every theorem of ZFC, as well as excellent guides that could, given a question, take us to the answer and explain it. What would a mathematician do in such a library?

If you ask the question this way, the answer becomes clear: they would be unbelievably excited, and immediately get to work. They would immediately start asking questions: how does one prove the Riemann hypothesis? The Hodge conjecture? Their own pet obsession (in my case, the Grothendieck-Katz p-curvature conjecture)? Then they would work until they understood the answer. The job would not be done, not even close.

I do not mean to suggest, even, that humans necessarily have an intrinsic edge in asking mathematical questions that are interesting to humans; that is certainly the case now (and I suspect it will be for some time), but I see no principled reason it should be true. I just mean that this is why we got into mathematics: we want to understand. That's the goal.

Totally agreed. And something like LLMs with automated theorem provers seem incredibly well-suited to potentially get us toward something like this. It seemed natural that they'd be great at translating between humans and machines in terms of code, and we've seen great strides there. It seems natural here, too. We're not there yet, but there's hope.

That's fair enough as a concern for this case, but I would say that it is a different argument from the way that 'the lesser power is included in the greater' is typically invoked. His formulation shows that, perhaps, more granularity is not even a 'lesser' power. It's still not completely conceptually clear to me, but there's something to be said for a more careful analysis.

What you seem to be saying is that, even if one supposes that the granular tariff power is, in some sense, a 'greater' power than shutting off trade entirely, there is still a sort of equivalent 'greater' power in quotas. Again, this is plausible, and I'd want more conceptual exploration of how law should treat cases where there seem to be roughly equivalent, but (I don't know what to call it) "different track" powers.

Alex Tabarrok just made this point as well, and he has useful analogies to illustrate it. I'm not 100% it's entirely correct, but it's definitely plausible and a point that is surely to be bouncing around in my mind for a while.

I don't think there is a sufficient quantity of drugs in the entire supply chain to explain.

It would turn off the purists. It's an indictment of our society that we haven't developed technology that allows the TV viewer to select whether they want a neon green simulpuck or not on their own TV. This is truly the most important technological challenge of our time.

It's the Star Wars meme.

Now that the Supreme Court got rid of the tariffs, the prices are going to go down!

...the prices are going to go down, right?

Conservatives know this deep down, but they don't want to admit it because it conflicts with the First Principle.

No, it really doesn't. At best, you've just found that some people aren't good at applying the First Principle. That doesn't mean the First Principle is wrong.

EDIT: In fact, I'd say that it's likely that you're committing the New Atheist error in thinking that if morality is a thing, it must obviously be an obvious thing that any decent (seemingly-similarly-inclined) person can easily just intuit. And thus, when one sees some number of one's co-(anti-)religionists go off the deep end, one concludes that morality didn't real in the first place.

Instead, it's actually somewhat difficult to cultivate and propagate. It doesn't help that the wickedness of man is great on the earth.

the chance of having your kids taken away by CPS

...for something like letting your pre-teen walk to the neighborhood park alone. This is the key qualifier. How often is that? How do you know?

If you're the kind of black man who wants to do whatever, the cost of getting shot dead by police while unarmed is extremely high.

...stiiiilllll kinda think that I can care a little bit about the rate at which unarmed black men actually get shot dead by police. I don't particularly care whether someone labels the discussion after an old French philosopher. It doesn't really map onto that topic all that well.

I imagine some number of cops will make what seem like unreasonable requests of some number of individuals. Even if the underlying concern is something like shoplifting. A regular reading of Short Circuit and some of the many cases in which cops get qualified immunity for whatever would certainly give a person that impression. And sure, I'm sympathetic that there can be problems in particular cases there. But how often are people actually getting required to follow some inane suggestion? By your own phrasing, the example is a "weird brand new rule that you just made up", not some clear, broadly-applicable rule that the system is applying all over the place in a high percentage of cases. And how often do these inane suggestions actually lead to things like termination of parental rights? Plausibly not very often. Perhaps the inane suggestions happen more often (I don't know), and if we had data, we could discuss that, but the original claim was:

Insufficiently supervising your child will get you a visit from CPS and your child potentially removed. The data bears that out.

I still don't think the data bears that out. Redirecting the claim to saying that maybe sometimes some social workers make inane suggestions (without data here either) doesn't provide data to bear out that claim.

@OracleOutlook

As an addendum, I'd like to go back to my analogy. If someone were telling me that there's such a huge, serious, problem of unarmed black men getting shot to death by police for no reason, I would still want to have some sense of the scale of the problem. If they returned with statistics on how often black men have encounters with police or how often they're incarcerated, or how often there is use of force in police encounters, etc., that might be interesting data. Perhaps some of it would have been unknown to me until it was presented to me, and I would want to update on those items.

...but I sort of don't think that most of those buckets actually capture the phenomenon in question. Certainly, there may be other relevant questions about general allocation of police forces, or people can haggle over how many encounters/arrests/incarcerations/uses of force are ultimately justified/not justified, and those would all be interesting questions that could (and should) be addressed by folks who are interested in them. But none of them really tell me much about the actual scale of the specific problem of unarmed black men being shot to death by police unjustifiably. It could still be huge! It could still be tiny!

Even if they cite a small number of high-profile examples of unarmed black men being shot by police, and even if those small number of examples are bad shoots, I would feel pretty comfortable saying, "Yes, those are bad, but I still don't really know how common it is." And so, I wouldn't really know how reasonable it is to have significant fears on the topic.

The reason I think this is a useful analogy is because I recall seeing that someone did do a bunch of work to figure this out for the case of unarmed black men getting shot to death by police, and the result was that it was quite rare. But I don't think we have anyone who has done this for the question of children being taken away for reasons like a pre-teen going to the park alone. We have a bunch of other statistics that can tell us other things about the system in general, but not this, AFAICT. It could be really common! I don't know!

Thank you for providing data. This is a good post. I admit that I did not expect the rate to be as high as it is. Duly updated.

Some thoughts. The Tabarrok post is pretty good. He also compares to other sources to try to get a sense for a rate at which one might expect some sort of activity to be at least reasonably warranted. His back-of-the-envelope was that it was broadly correspondent. I also did not expect this to be that high, either. He concludes by suggesting, as you do, that perhaps they could ease off on the neglect-only cases.

This seems broadly plausible. I am perhaps colored by my own experience in the 90s, and my familiarity with a couple cases in which parents did have their parental rights terminated. For one, I could see it being classified as 'neglect-only'. However, this neglect was so severe (e.g., leaving an infant in a car seat literally 24/7) to the point that it caused the child to have physical deformities. Whatever CPS was called at that time/location was actually far too loathe to push for terminating parental rights (they eventually did, after a long time).

In another case, a mother was simply seriously too mentally deficient in whatever way to care for a child. I don't know whether the cases were officially tallied as 'neglect-only', but in any event, this mother just kept having babies. After enough of them were taken, apparently the court just said that they could take any further babies immediately. The story goes that on the n+1th iteration, the social workers showed up at the hospital, only to be asked by the mother, who clearly knew them by name at this point, having had multiple prior children taken at birth by those exact people, "[Name], what are you doing here?" "Uh, we're here to take your child, just like the last time and the time before that." Like, this person was that mentally out there.

Obviously, those are extreme cases, but to me, 'neglect-only' doesn't simply mean, "You let your pre-teen go to the neighborhood park without you." Perhaps that type of thing is generating some reports, but I still don't think we have any data to know how prevalent that sort of thing is.

Concerning observations in the data. I think they're probably noisy enough that I don't think that's much of a trend line. A brief look at other papers that cited this one found this, which presents serious concerns about measurement effects, which contributes to my initial thought that it seems plausible that it's more noise/data problems than genuine trend.

Concerning further observations in the data. Figure 2 is a real trend line. Vastly more plausible that it's capturing a real phenomenon. That phenomenon would be that the likelihood drops rapidly with age. That's concerning termination of parental rights, not investigations or other things, and I can't find a similar chart to see age effects on those things or whether 'neglect-only' cases are relatively distributed across age groups or are concentrated in some areas. Without this data, there are still pretty big questions. At the very least, there seems to be a significant reduction in termination when you get up to your age range of 8-10, but are there still a bunch of neglect-only cases in that range? I don't know. Broadly-categorized 'neglect' concerns seem to be far more likely to be justifiable in the earliest years, when a child needs significantly more care and attention. The closest we get to a claim about the neglect-only case is when Tabarrok says:

64% of substantiated victims are victims of neglect only and most of these neglect cases are specifically about lack of sufficient supervision rather than lack of access to food or clothing.

Perhaps someone else can find another place in the primary source that he's using, but frankly, my best guess is that he actually misreported what the report said. The closest statement, with the same 64% number, is:

In the analysis included in chapter 3, FFY 2023 victims are counted for each investigation that resulted in a substantiation and displays the victims with a single type of maltreatment at the state level. If a victim has two or more substantiated maltreatment types in the same report, the victim is counted in the multiple maltreatment type category. For FFY 2023, 64.1 percent of duplicate victims experience neglect only, and 10.6 percent experience physical abuse only.1

I don't see anything in the report to support the claim that "most of these neglect cases are specifically about lack of sufficient supervision rather than lack of access to food or clothing". Perhaps I'm missing it, but I just don't see that this report (that I thought was his primary source for his post) makes any distinction along these lines. Perhaps this was drawing on a different one of his links, and it just wasn't clear.

I am in violent agreement that cases where the government gets involved just because a pre-teen went to the park alone are extremely bad. I still remain fairly unconvinced that I have any idea how common they are. And my lying eyes still look out the window or around the neighborhood when I'm out and see kids in that age range roaming around unsupervised all the time. Maybe it is worse; it probably is; everything is worse.

1 - Me here: There are other bits about how they treat multiple substantiated claims. It talks about duplicates elsewhere, saying, "A victim with two substantiated reports of neglect is counted twice in neglect only." So it seems like there's some double-counting possibly going on, and it's this category of folks that are two-or-more-counted where 64% are neglect-only.

I don't think you made it up. I have no idea what happened, who told you what, what you saw on a website, or what the code that ran their online reservation system did in the past. But I did just go to the internet archive, since you gave me a date. Maybe things were different! I'm in a 'checking' mood today, I guess. Here is the same page on their website, but from January 2023, the first of the snapshots they have available from 2023. In the spot where they would have had the equivalent phrase, the wording is slightly different:

Passengers 12 years of age and younger must travel with an adult passenger who is at least 18 years old.

Again, I could imagine interpreting this either way. I don't have a way of verifying what the code on their website allowed/disallowed three years ago. However, that archived page also says:

One child ages 2 - 12 is eligible to receive a 50% discount on the lowest available adult rail fare on most Amtrak trains with each fare-paying adult (age 18+). If any additional child per adult will be traveling, reservations must be made for that child as an “Adult” and the full adult fare will be charged.

Infants Ride Free

One child under the age of two, not occupying a seat, may ride free with each fare-paying adult (age 18+). Additional infants can be booked as a “Child” and receive the 50% discount (if available) or as an “Adult” (if the Children's fare is not available).

I don't know how to interpret that in any way other than that you could have had more children than you had adults. It's just the discounts that adjust, depending on details. [EDIT: It is entirely plausible that the code they used to run their online reservation system in 2023 was broken and that resulted in rejects rather than discounting according to the stated rules. I can't check that. But their stated policy appears to allow it.]

Where do you live

I'm in the US. Not looking to give any more information than that.

Even if only 1%

Honestly, my initial reaction is basically the same as it would be if I had heard, "Even if only 1% of unarmed black men are shot by police..." Which is, I'm pretty sure you're missing some number of zeros. I don't know how many zeros. I don't know how many zeros matter. I'm not sure if there's a particular number of zeros where it goes from concern to not-concern. But I'm pretty sure the number is far from correct.

Amtrak will not let you buy train tickets for kids unless you have one adult per kid.

I was ready to believe you, because I am never surprised that the federal government would screw up literally anything in the most ridiculous way. Right before I hit "comment", I did decide to check. My search for "Amtrak children" brought me here, which says:

Children and infants must be accompanied by at least one adult (18+) in the same reservation.

Ok, I could read that either way. But I guess what was nice about your claim is that it was that they won't even let you buy tickets. It's not some situation where you could buy tickets, get there, and learn that the correct reading of this phrase is that they have a one-adult-per-child policy. So, presumably, it's something I can check.

Sure enough, I just went to the reservations, picked totally random cities, totally random dates, one adult, four children (2-12, not 'youth', which could plausibly have different rules under various readings, though at this stage, it actually says, "Youth, children and infants must travel with at least one adult who is 18 years old."). At the very least, it lets me get to the page where they want me to start putting in traveler information (name, etc.) for each of the five passengers. I can tab over without entering any information, and it clearly has marked four children, with a different amount of information requested for the children than the adult.

I suppose it is possible that at some point after this step, after all the personal info has been put in and whatnot, the system will finally realize and say, "No, we actually had a one-adult-per-child policy all along, and we just tricked you into getting this far," but on first read, I think you're just wrong on this claim.

The culture is different. The rules and expectations are different. You have to admit that much.

I mean, yes? But that's true for any epsilon difference. Presumably you also have to admit that I look out the window or walk around town and see kids out playing unattended all the time, too, right? Like, we're probably somewhere between epsilon and infinity, but it's kinda squishy to really capture it well.

Stupid? Yes. Annoying? Yes. Should be better? Obviously. But it's not calling CPS. It's not taking your kids away. It's not charging you criminally with child endangerment.

Nor does it seem to contradict my observation of just looking out the window or walking down the street and seeing kids running around unattended all the time. I'm sure plenty of black people can describe some stupid or annoying situation that should have gone differently, and many of them even have a plausible claim that racism was involved. I still sorta think that the concern about unarmed black men being shot by police for no reason is just not an all-consuming problem in the world.

I've heard these stories on the internet as much as anyone else, but does anyone have any clue as to the actual scale of the problem? It's certainly annoying that this failure mode exists at all, since it's relatively scary. That said, my small observation of the real world, seeing kids running around the neighborhood unattended all the time, seems to clash with it. My mild wonder is whether the problem is akin to "unarmed black men getting shot by police for no reason", which objectively is an extremely small problem that manages to capture an extremely oversized proportion of the fears of a subset of the population. Maybe it's just worse in worse places, where perhaps it might actually be a danger for them to be running around on their own?

you don't have any authority to make that determination over them

Fair, and at risk of saying not much, I'd say that it's, uh, complicated. For example, I have good friends who were born and raised Canadian citizens and who later acquired US citizenship, too.1 For several of them, (not brushing with any broader of a brush), they're basically understood to be (and would describe themselves as) "Canadian, but also with US citizenship". Are they "American"? Uh... kinda yeah? Also maybe kinda no? If you just asked them if they were "American", I think they'd say, "I'm Canadian, but I have US citizenship." Does that matter? I don't particularly take a position either way.

Different individuals among them may have different senses of it, too. Some, for example, really are effectively Canadian at heart. One guy I know discovered that one of his ancestors also had US citizenship, and found that the paperwork to go the route of attaining citizenship that way was easier for him than going through spousal immigration in order to move here with his wife.2 If it had been just as easy to do it the other way, would he have bothered? I don't know; it's a counterfactual, and lots of things can come into play over time. But he might have been perfectly happy being "Canadian citizen and US Permanent Resident" indefinitely. Does this matter? I don't know. I can vaguely see both sides.

For what it's worth, my best Puerto Rican friend would say, "I'm Puerto Rican, and oh by the way, we have American citizenship." Does that matter? Hell, I don't know.

You're obviously right that the only non-squishy way to draw lines is via citizenship, but my observation is that a lot of folks view the real world as inherently squishy.

1 - I also know at least one guy born/raised in the US. He and his wife moved to Canada for work for several years. He got Canadian citizenship, she didn't. They would explicitly say that the reason he got Canadian citizenship was just because it made dealing with a certain Canadian law regarding his line of work easier. They've lived back in the US for quite a few years now. I don't think either of them would say they're "Canadian". If you just asked them, they'd probably say that he was "American", full stop. If you went on to ask him about his time in Canada, he'd add, "...and yeah, I did get Canadian citizenship."

2 - For this particular couple, they actually moved to Canada first when they got married; she went through whatever process to be able to move up there and be married to him. I don't know if she acquired Canadian citizenship at any point. Later, when they decided they wanted to live in the US (for a particular work reason), they discovered this business about his ancestor. Where they're living and what citizenship he has is just sort of an incidental and paperwork thing to them.

Keep digging, bud. TBH, I haven't seen this level of bad faith aside from the likes of Darwin/SecureSignals. It's a truly bad look for a mod.

You have now explicitly denied what you have said in the past, and for which there are clear links. Anyone can just click the link and see that you contradicted yourself. It takes not even a modicum of effort.

You have absolutely no argument that anything of mine is irrational or incoherent. At least nothing other than ipse dixit. It is simply comparing your words to your words. If there is any irrationality or incoherence, it is your own.

I have not denied anything I've said in the past

Right, you do the "upper lip curl, go silent" strategy when it is clear that you have contradicted yourself. Contradicting yourself is an implicit denial of what you've said in the past. That you avoid explicit acknowledgement of your contradiction and denial is hardly a redeeming virtue.

Taking no position on whether it's a good or a bad thing, it occurs to me that you seem to have re-derived qualified immunity.

It's not very natural to evaluate a single 'effectiveness' of a supposed monolithic thing when, by your own statement, it's not a single monolithic thing. It's very diverse. Unless one is willing to draw boundaries and say, like you kind of do, that a bunch of people aren't "real Christians", then, well, anyone can pretty much just make their own "I'm A Christian" Flag. That significantly complicates the analysis.

For example, if one is willing to draw boundaries, then one has to consider what measure of 'effectiveness' is going to be used for "real Christians"? One person might think that they can conclude that "real Christians" have been ineffective even just in that they have not been able to prevent "fake Christians" from making their own "I'm a Christian" Flags. I imagine others would disagree that that's a proper measure of effectiveness, and they would prefer other measures. There's just not a natural measure to use.

I've been thinking about this, and I'm not sure I can. There's a lot of little things that stick out. Little nuggets here and there that I remember. If there's anything big picture, it's that most people don't think much about economics or complex systems. Sometimes, the downside to something that sounds good can be right in front of their face, and they won't get it (the price gouging for ice after a hurricane story is legendary). Other times, the dispersed nature of information, thinking, and actions masks implications for how tweaking one thing can change other things. He's very Hayekian in that. It's been kind of a long absorption process, hearing how One Neat Trick failed and Another Neat Trick failed and Another Neat Trick failed that you don't just become skeptical of One Neat Tricks, but you start to gain an intuition for how the next One Neat Trick is likely to fail.