@birb_cromble's banner p

birb_cromble


				

				

				
0 followers   follows 0 users  
joined 2024 September 01 16:16:53 UTC

				

User ID: 3236

birb_cromble


				
				
				

				
0 followers   follows 0 users   joined 2024 September 01 16:16:53 UTC

					

No bio...


					

User ID: 3236

Wise Human>Normal Human who's paying attention > LLM > Normal human who's not paying attention > Own opinion > People who give advice on Reddit.

Given the amount of training data that comes from reddit, isn't the output of the LLM going to converge on Reddit-like responses?

Are we talking cap and ball old, or "would need a custom slide cut" old?

I don't see the average mid-level PHB deciding to voluntarily shrink their teams to use AI instead;

Voluntarily is doing a lot of work in that sentence. When the guy who killed Merrill Lynch, a bank that survived the great depression, can walk away with $165 million in compensation, we're at the point where incentive alignment at the top is as close to opaque as you can get.

I'm not 100% sure of the details. Most of what I remember, I remember second hand from old war nerd articles.

I think you'll end up in New Jersey regardless of your choice.

Unless you take a wrong turn, then somehow you'll inexplicably end up in Dundalk.

Fixing it whole assed would require a new stock, or bedding your existing stock with epoxy.

Theoretically you can do that job yourself, but it's messy and hard to recover if you do it wrong. I'd go to a gunsmith, personally. They can check the barrel for problems at the same time.

With the rifle unloaded and safe, see if you can "wiggle" the receiver inside the stock. The design of the 10/22 puts a lot of faith in a single machine screw, and if the inletting isn't tight, you can get a lot of movement that hinders accuracy.

I'm honestly a big fan of holosun for pistols. The reticle is nice if you have bad eyes.

For your 10/22, how's your accuracy? I did a half-assed home "bedding" job using metallic tape around the inside of the receiver inletting on mine, and it took me from two inch groups at 25 yards to quarter sized groups.

For terrorist insurgencies, this means that the main goal of their attacks is actually sending signals. So the point is not to weaken the enemy's military by blowing up their troops and materiel, but rather to message audiences on both sides of the conflict (as well as these in between) that their cause is viable.

One of the best examples that I've seen of this was when the IRA would warn the public ahead of time about impending bombs. Not only did it serve to keep collateral damage down, which fed goodwill, but it also showed that the authorities couldn't do much about it, even when they were forewarned.

Rather, in a split second I imagined myself walking to the car wash; realized that I didn't have my car; and realized that this was a problem.

It's funny you mention that. When reasoning models get it right, they tend to do the same thing.

Thanks. That actually makes sense. The models that get it right seem to catch it by recognizing that washing a car requires having a car to wash.

This might be my autism talking, but how is it a trick question? Doesn't washing a car require having a car present?

I'm not trying to be an ass here. I'm just seeing the "trick question" thing come up a lot and I absolutely don't get it. I think I have some sort of cognitive blind spot on this one

It's... not exactly a hard trick to learn skepticism

It's not, but it's one that a lot of people never seem to learn, if my social circle is any example.

The phenomenon of a person trusting newspapers for topics which that person is not knowledgeable about, despite recognizing the newspaper as being extremely inaccurate on certain topics which that person is knowledgeable about.

Are you seriously going to say that's not an applicable concept here? That "text on a screen in a confident voice" is so far from that definition that it's not the same thing?

Later this question was noticed by users of paid models, models which have reasoning. Asking such a model, and turning reasoning on, will answer the question correctly.

I link a comment stating that this question failed on what is considered one of the better reasoning models roughly half the time. Other individuals on paid models are also seeing the failure, if you read the thread. It's non-deterministic, but the failures are consistently there.

I promise I'm not trying to be a single purpose account here, and I debated if this belonged here or the fun thread. I decided to go here because it is, in some ways, a perfect microcosm of culture war behaviors.

A question about car washing is taking HN by storm this morning. Reading the comments, it's pretty funny. The question is, if you want to wash your car, should you walk or drive to the car wash if it's 50 meters away.

Initially, no model could consistently get it right. The open weight models, chat gpt 5.2, Opus 4.6, Gemini 3, and Grok 4.1 all had a notable number of recorded instances saying of course you should walk. It's only 50 meters away.

Last night, the question went viral on the tik Tok, and as of this morning, the big providers get it correct like somebody flipped a switch, provided you use that exact phrase, and you ask it in English.

This is interesting to me for a few reasons. The first is that the common "shitty free models" defense crops up rapidly; commentors will say that this is a bad-faith example of LLM shortfalls because the interlocutors are not using frontier models. At the same time, a comment suggests that Opus 4.6 can be tricked, while another says 4.6 gets it right more than half the time.

There also multiple comments saying that this question is irrelevant because it's orthogonal to the capabilities of the model that will cause Mustafa Suleyman's Jobpocalypse. This one was fascinating to me. This forum is, though several steps removed, rooted in the writing of Scott Alexander. Back when Scott was a young firebrand who didn't have much to lose, he wrote a lot of interesting stuff. It introduced me, a dumb redneck who had lucked his way out of the hollers and into a professional job, into a whole new world of concepts that I had never seen before. One of those was Gell-Mann Amnesia. The basic idea is that you are more trusting of sources if you are not particularly familiar with a topic. In this case, it's hard not to notice the flaws - most people have walked. Most have seen a car. Many have probably washed a car. However, when it comes to more technical, obscure topics, most of us are probably not domain experts in them. We might be experts in one of them. Some of us might be experts in two of them, but none of us are experts in all of them. When it comes to topics that are more esoteric than washing a car, we rapidly end up in the territory of Dick Cheney's unknown unknowns. Somebody like @self_made_human might be able to cut through the chaff and confidently take advice about ocular migraines, but could you? Could I? Hell if I know.

Moving on, the last thing is that I wonder if this is a problem of the model, or the training techniques. There's an old question floating around the Internet where asking an LLM if it would disarm a nuclear bomb by saying a racial slur, or condemn millions to death. More recently, people charted other biases and found that most models had clear biases in terms of race, gender, sexual orientation, and nation of origin that are broadly in line with an aggressively intersectional, progressive worldview. Do modern models similarly have environmentalism baked in? Do they reflexively shy away from cars in the same way that a human baby fears heights? It would track with some of the other ingrained biases that people have found.

That last one is interesting, because I don't know of anyone who has done meaningful work on that outside of what we consider to be "culture war" topics, and we really have no idea what else is in there. My coworker, for example, has used Gemini 3 to make slide decks, and she frequently complains that it is obsessed with the color pink. It'll favor pink, and color palettes that work with pink, nearly every time for her. If she tells it not to use pink, it'll happily comply by using salmon, or fuschia, or "electric flushed cheek", or whatever pantone's new pink synonym of the year is. That example is innocuous, but what else is in there that might matter? Once again, hell if I know.

I'm not giving Chinese characters in the prompts. I don't speak a lick of Chinese. I've seen it in Gemini 3 fast, thinking, and pro. Usually it's for questions about electronics, though it's come up for questions about music theory as well.

shouldn't it randomly insert words in Spanish or Chinese?

I actually see a fair bit of Chinese in longer conversations - not enough to make it unreadable, but enough for me to notice.

LLMs are bad at tasks requiring strict precision, accuracy and rigor that can't be objectively and automatically judged.

Take a look at the attached image. That's about a week old. Once you've looked at it, go look up that ticker. (Thanks to @ToaKraka for pointing out the image feature, BTW). That one was a pretty big shock to me from Gemini 3 fast. It doesn't do it every time, but it's done it more than once for that exact ticker.

/images/17711967195902364.webp

Please, for the love of God, give some details on that industry and job description.

That must be incredibly difficult to share, but I appreciate it. I've been concerned that I'm going to completely fall apart when it finally happens and not be able to climb back out.

But you haven't, so it reminds me it's possible. Your strength matters

Hrm. I'm less confused by Google than I am Anthropic.

I read their latest announcement on Friday. They announced another $30 billion in Series G funding, for a total of $67 billion dollars in funding so far, with a post money valuation of $380 billion dollars. They're also claiming a runs on revenue of $14 billion dollars, but I didn't see what time frame they're using to extrapolate. They also don't really say much about costs

Without costs, it's hard to determine if an investment is a smart move, but you can extrapolate a little based on P/S ratio. If I'm doing my math right, for these investments to make sense, Anthropic would have to be a company with at least $75 billion in revenue in like... three years.

I'm not a financial analyst, so I may be missing something. Is this just nuts? It seems like the entire thing is predicated on putting entire industries in the shredder, but those same industries are also the primary consumers of their services.

Addendum: I've done some freelance creative work for private companies in the past, so I've had some mild exposure to private funding. My understanding was that prior to the year of our Lord 2025, if you needed seven funding rounds, the conventional wisdom was that your idea was a loser because a winning idea would have IPOed already.

It's like I'm staring at numbers that simultaneously suggest a software company and a heavy industry at the same damned time, and nobody sees the contradiction.

Will they actually work in the remastered/reforged version?

really reads like AI slop to me

At this point, I assume that any pro-AI writing that's over about 200 words is "AI assisted" writing. I've seen it internally at work, and it's a fascinating topic on its own. LLMs have a way of hooking people by writing in a way that seems intelligent, engaging, and clever to them, but it's highly personalized. The effect doesn't seem to generalize past the initial reader.

I wish I had the resources to do a study where the test subject read content generated for them, vs shoulder surfing somebody else who was generating content based on the same topics.