@faul_sname's banner p

faul_sname

Fuck around once, find out once. Do it again, now it's science.

1 follower   follows 1 user  
joined 2022 September 06 20:44:12 UTC

				

User ID: 884

faul_sname

Fuck around once, find out once. Do it again, now it's science.

1 follower   follows 1 user   joined 2022 September 06 20:44:12 UTC

					

No bio...


					

User ID: 884

Your steps "2a" and "2b" sound like a good fit for non-fungible tokens and other cryptocurrency ideas.

Are you perhaps thinking of CharityNavigator (which tracks things like percentage of donations that actually go to the ostensible cause) instead of GiveWell (which tracks things like expected impact of the donation in terms of the metric the organization is supposed to be helping with)?

The latter - that impact is more important than intention or purity or self sacrifice - is the place where EA distinguishes itself from normal charitable people. Normal people are pretty altruistic, but they're not necessarily strategic about it, because most people are not strategic about most of the things they do most of the time, and particular are not strategic about things that don't significantly affect them and where they will probably never get feedback about whether their approach worked.

The argument is that despite some of the questionable things EA has been caught up in lately, they've saved 200 thousands lives! but did they save good lives? What have they saved really? More mouths to feed?

Yep. Some of those "mouths to feed" might end up becoming doctors and lawyers, but that's not why we saved them, and they would still be worth saving even if they all ended up living ordinary lives as farmers and fishermen and similar.

If you don't think that the lives of ordinary people are worth anything, that needless suffering and death are fine as long as they don't affect you and yours, and that you would not expect any help if the positions were flipped since they would have no moral obligation to help you... well, that's your prerogative. You can have your local community with close internal ties, and that's fine.

More cynically I think this sort of caring is just a way to whitewash your past wrongs, it's pr maximizing, spend x dollars and get the biggest number you can put next to your shady bay area tech movement that is increasingly under societies microscope given the immense power things like social networks and ai give your group.

I don't think effective altruism is particularly effective PR. Effective PR techniques are pretty well known, and they don't particularly look like "spend your PR budget on a few particular cause areas that aren't even agreed upon to be important and don't substantially help anyone with power or influence".

The funny thing is that PR maximizing would probably make effective altruism more effective than it currently is, but people in the EA community (myself included) are put off by things that look like advertising and don't actually do it.

Analysis, Context, Hook, Own Opinion.

ACHOO.

If you do figure it out, I expect at least a LW post or two about it 🙏

If I do, I will definitely make an LW post or two about it. May or may not happen, I have quite a lot going on in the next two months (and then more going on after that, because a lot of the stuff going on is "baby in 2 months").

I agree that this is how it'll likely work out (and it does in smart humans), but isn't that tantamount to enforcing internal consistency, just under adversarial stimulus?

I think the disagreement is more about how often the adversarial stimulus comes about. I expect that in most cases, it's not worth it to generate such an adversarial stimulus (i.e. it costs more than 0.01 A for an adversary to find that trade cycle, so if they can only expect to run the cycle once it's not worth it). So such an agent would trend towards an internally consistent equilibrium, given a bunch of stimuli like that, but probably not very quickly and the returns on becoming more coherent likely diminish very steeply (because the cost of incoherence decreases as the magnitude decreases, and also the frequency of exploitation should decrease as the payoff for exploitation decreases, so the rate of convergence should slow down more than linearly over time).

Ah, would that I had enough money to throw at a housefly and hope to stun it, but at least you're putting yours to noble ends haha.

That'll change with the officially becoming a doctor thing, I expect. And also becoming a doctor helps rather more directly with the whole pandemic preparedness thing.

BTW "e/acc" may be new, "AI accelerationists" are very much not new, nor is it new for them to be associated with the same sorts of circles as EAs run in. So BBJ coined the term "e/acc", but if you're thinking "I've definitely seen those ideas before" then that's pretty plausible.

Your explanation about how AlphaGo works is deeply counter-intuitive to me

Me too! I still don't understand why you can't just run the value model on all legal next moves, and then pick the top N among those or some equivalent heuristic. One of these days I need to sit down with KataGo (open source version of AlphaGo) and figure out exactly what's happening in the cases where the policy network predicts a different move than the top scoring move according to the value network. I have a suspicion that the difference happens specifically in cases where the value network is overestimating the win fraction of the current position due to an inability to see that far ahead in the game, and the moves chosen by the policy network expose the weakness in the current position but do not actually cause the position to be weaker (whereas the moves rated most highly by the value network will be ones where the weakness of the position is still not evident to the value network even after the move is made). That suspicion is based on approximately no evidence though. Still, having a working hypothesis going in will be helpful for coming up with experiments, during which I can notice whatever weird thing actually underlies the phenomenon.

What about more prosaic benefits like avoiding being Dutch booked? I believe that's one of the biggest benefits of consistency.

I expect that the benefits of avoiding being Dutch booked are pretty minimal in practice. If you start out with 100 A there's some cycle 100 A < 1000 B < 9 C < 99 A where some adversary can route you along that path and leave you with 99 A and the same amount of B and C that you started with, the likely result of that is that you go "haha nice" and adjust your strategy such that you don't end up in that trade cycle again. I expect that the costs of ensuring that you're immune to Dutch booking exceed the costs of occasionally being Dutch booked at some fairly minimal level of robustness. Worse is better etc etc. Note that this opinion of mine is, again, based on approximately no evidence, so take it for what it's worth (i.e. approximately nothing).

wastewater monitoring

Yeah I think this is great and jefftk and friends should receive lots of funding. Speaking of which I have just bugged him about how to throw money at NAO, because I believe they should receive lots of funding and have just realized that I have done exactly nothing about that belief.

Yudkowsky made a big fuss about how fragile human values are and how hard it'll be for us to make AI both understand and care about them, but everything I know about LLMs suggest that's not an issue in practise.

Ah, yeah. I spent a while being convinced of this, and was worried you had as well because it was a pretty common doom spiral to get caught up in.

So it's not that the majority of concern these days is an AI holding misaligned goals, but rather enacting the goals of misaligned humans, not that I put a negligible portion of my probability mass in the former.

Yeah this is a legit threat model but I think the ways to mitigate the "misuse" threat model bear effectively no resemblance to the ways to mitigate the "utility maximizer does its thing and everything humans care about is lost because Goodhart". Specifically I think for misuse you care about the particular ways a model might be misused, and your mitigation strategy should be tailored to that (which looks more like "sequence all nucleic acids coming through the wastewater stream and do anomaly detection" and less like "do a bunch of math about agent foundations").

If you can dumb it down for me, what makes you say so? My vague understanding is that things like AlphaGo do compare and contrast the expected values of different board states and try to find the one with the maximum probability of victory based off whatever heuristics it knows works best. Is there a better way of conceptualising things?

Yeah, this is what I thought for a long time as well, and it took actually messing about with ML models to realize that it wasn't quite right (because it is almost right).

So AlphaGo has three relevant components for this

  1. A value network, which says, for any position, how likely that position is to lead to a win (as a probability between 0 ans 1)
  2. A policy network, which says, for any position, what the probability that each possible move will be chosen as the next move. Basically, it encodes heuristics of the form "these are the normal things to do in these situations".
  3. The Monte Carlo Tree Search (MCTS) wrapper of the policy and value networks.

A system composed purely of the value network and MCTS would be a pure expected utility (EU) maximizer. It turns out, however, that the addition of the policy network drastically improves performance. I would have expected that "just use the value network for every legal move and pick the top few to continue examining with MCTS" would have worked, without needing a separate policy network, but apparently not.

This was a super interesting result. The policy network is an adaptation-executor, rather than a utility maximizer. So what this means is that, as it turns out, stapling an adaptation executor to your utility maximizer can give higher utility results! Even in toy domains with no hidden state!

Which brings me to

To name drop something I barely understand, are you pointing at the Von Neumann-Morgenstern theorem, and that you're claiming that just because there's a way to represent all the past actions of a consistent agent as being described by an implicit utility function, that does not necessarily mean that they "actually" have that utility function and, more importantly, that we can model their future actions using that utility function?

Yeah, you have the gist of it. And additionally, I expect it's just factually false that all agents will be rewarded for becoming more coherent / EU-maximizer-ish (in the "patterns chiseled into their cognition" meaning of the term "rewarded").

Again, no real bearing on misuse or competition threat models - those are still fully in play. But I think "do what I mean" is fully achievable to within the limits of the abilities of the systems we build, and the "sharp left turn" is fake.

I do see where you're coming from in terms of instrumental convergence. Mainly I'm pointing that out because I spent quite a few years convinced of something along the lines of

  1. An explicit expected utility maximizer will eventually end up controlling the light cone
  2. Almost none of the utility functions it might have would be maximized in a universe that still contains humans
  3. Therefore an unaligned AI will probably kill everyone while maximizing some strange alien objective

And it took me quite a while to notice that the foundation of my belief was built on an argument that looks like

  1. In the limit, almost any imaginable utility function is not maximized by anything we would recognize as good.
  2. Any agent that can meaningfully be said to have goals at all will find that it needs resources to accomplish those goals
  3. Any agent that is trying to obtain resources will behave in a way that can be explained by it having a utility function that involves obtaining those resources.
  4. By 2 and 3, and agent that has any sort of goal will become a coherent utility maximizer as it gets more powerful. By 1, this will not end well.

And thinking this way kinda fucked me up for like 7 or 8 years. And then I spent some time doing mechinterp, and noticed that "maximize expected utility" looks nothing like what high-powered systems are doing, and that this was true even in places you would really expect to see EU maximizers (e.g. chess and go). Nor does it seem to be how humans operate.

And then I noticed that step 4 of that reasoning chain doesn't even follow from step 3, because "there exists some utility function that is consistent with the past behavior of the system" is not the same thing as "the system is actually trying to maximize that utility function".

We could still end up with deception and power seeking in AI systems, and if those systems are powerful enough that would still be bad. But I think the model where that is necessarily what we end up with, and where we get no warning of that because systems will only behave deceptively once they know they'll succeed (the "sharp left turn") is a model that sounds compelling until you try to obtain a gears-level understanding, and then it turns out to be based on using ambiguous terms in two ways and swapping between meanings.

A Maximizer maximizes

I have seen no evidence that explicit maximzers do particularly well in real-world environments. Hell, even in very simple game environments, we find that bags of learned heuristics outperform explicit simulation and tree search over future states in all but the very simplest of cases.

I think utility maximizers are probably anti-natural. Have you considered taking the reward-is-not-the-optimization-target pill?

Ensure that they have lots of neutral or positive experiences with trans people, ideally in contexts where transness doesn't matter (e.g. building some cool open source tool as part of a team that includes someone trans).

Changes the question from "is trans bad" to "is Piper, who built the state visualization tool we all use, bad".

I mean I think at least some of the people involved on the are quite clear that their goal is a "gameboard-flipping" act which results in world which is permanently safe from anyone who could destroy it. Probably by seizing ultimate power.

I don't think sufficiently godlike power for world domination (as in "gaining control of the world without destroying almost everything of value") is actually on the table though.

This is an excellent answer. One small quibble:

Control of an aligned Superintelligent AGI is equivalent to having the keys to the lightcone, if you make it through the gauntlet of it not killing you and it listens to what you tell it, then you have the means to dominate everyone else, including others who make misaligned AGI, if yours is capable of squashing them at birth, or at the very least capable of panopticon surveillance to prevent anyone from building one in the first place.

For the record I think Yudkowsky and friends are wrong about this one. Control of the only superintelligent AGI, if that AGI is a single coherent entity, might be the keys to the lightcone, but so far it looks to me like AGI scales horizontally much better than it scales vertically.

This, if anything, makes things more dangerous rather than less, because it means there is no permanent win condition, only the deferral of the failure condition for a bit longer.

I’ll be honest I have come down on the Toner being correct and Altman deserved to be fired side of the coin.

I think if the board had just led with that a lot of people would have agreed. "Leader tries to dismantle the structures that hold him accountable" is a problem that people know very well, and "get rid of leader" is not a controversial solution to that problem.

But in fact the board accused Altman of being a lying liar and then refused to stand behind that accusation, even to the subsequent CEOs.

There's gotta be something else going on.

This seems plausible to me.

Why do discussions of white nationalism always feel the need to explicitly mention rejecting violence?

Rhymes with "Yahtzee". The last notable time white nationalists gained power did not go so well, and it is generally agreed that it did not go so well, so people with opinions that resemble that generally want to clarify that their viewpoints do not end up in that generally-agreed-to-be-bad place.

As to why the same isn't true of e.g. communists? Honestly I have no clue, but I think that indicates a problem with the communists.

I think there are probably environments where consequentialists outcompete deontologists (specifically ones where the effects of your actions fall within a known and at least somewhat predictable distribution), and other environments where deontologists outcompete consequentialists (the ones where certain actions are on average good given certain observations, or where acting predictably leads to good outcomes). And there are yet other environments where having a policy of blindly doing things similar to ones that have worked in the past will outperform both of those principled approaches.

And then there are adversarial environments where there may not even be a single strategy that dominates all other strategies within that environment (e.g. you may have a situation with policies A, B, and C, where A > B, B > C, C > A, or even more cursed scenarios where how well a strategy does depends on how many other players are playing that strategy).

My point is not "deontology > consequentialism", it's "whether a strategy is useful depends on the environment, and consequentialism-in-practice is not the most useful strategy across all environments".

Your adversary is allowed to adapt too, and they are allowed to (and in fact incentivized to) adapt in the way that is as inconvenient as possible for your ability to counter that adaptation.

BTW in terms of a concrete adversarial environment I'm thinking "high frequency trading". You can build a gloriously detailed model of the world and a list of actions you can take within the world and the predicted effect of those actions, and you are certainly free to choose the algorithm of "consult my super detailed world model about the expected outcome of each of my possible actions, and take the action with the best expected result according to that model". But your environment contains a bunch of different entities trying out a multitude of different strategies, keeping the ones that work and discarding the ones that don't. The strategies that lose money on average will run out of money and stop trading, and eventually a strategy that makes money on average while trading with you will emerge (and keep trading as long as it continues making money). It is entirely possible that neither you nor your adversary will know why their strategy beats yours on average.

If you're talking about how consequentialism becomes optimal in the limit as your world model approaches perfection, then sure, but I don't think the behavior at the limit is particularly informative of the behavior in the real world. Consider that in the limit as your adversary's available computing power approaches infinity, if you have a 1,000,000 byte message, and you encrypt it with a 4096 bit RSA key that you keep to yourself, and you hand the encrypted message to your adversary, they have 999,488 bytes of information about what your message was. But in practice your adversary actually has ~0 bits of information about the contents of the message.

I claim the only effective way to do that in a way that avoids exploitation is very intelligent consequentialism.

I claim that doesn't work either, if your environment is adversarial, because the difference between your model of the expected consequences of your actions and the actual realized consequences of your actions can be exploited. This doesn't even require an adversary that is generally more intelligent than you, just an adversary that notes a specific blind spot you have (see how humans can beat the wildly superhuman Go engine KataGo by exploiting a very specific blind spot it has in its world model).

I don't think there are any people who are real deontologists, consequentialists, or virtue ethicists -- I think people look at what the consequences of their past actions and decision processes were, and try to do more of the things that turned out well and less of the things that turned out badly. "Try to take actions that future-you will think were good actions" sure is a decision process, and if it's gone well for you in the past I'd expect you to keep using it in the future, but if it starts going badly I would expect you'd stop using it.

And if your decision process is "consequentialism when the successes of consequentialist reasoning are salient to me, and not consequentialism when the failures of consequentialism are salient to me" then I don't think you're a Real Consequentallist™.

My suspicion is that the "on twitter" bit is doing a lot of the selecting there. If you look on discord instead you'll find that they all run incredibly boring b2c generative AI startups (i.e. thin wrappers over existing LLMs).

I have no better hypothesis, but I do note that if that's true, I'm confused by the statement about Sam Altman specifically being "not consistently candid in his communications with the board", and that "the board no longer has confidence in his ability to continue leading OpenAI". If they were just trying to do a pause, I see no reason that they would have made that specific claim instead of saying something vague and inoffensive along the lines of "the board has concluded that the company's long-term strategy and core values require a different kind of leadership moving forward".

The former kind of statement is the kind of statement you only make if someone has royally fucked up and you're trying to avoid getting any of their liability on you.

Brian Armstrong tweeted now that $80 billion of company value has been 'evaporated'. I'm like 'what?'

MSFT was at 372.90 immediately before the announcement, and dropped 1.9% on the news. Also apparently Microsoft has a market cap of $2.75T so yeah that's $52B of paper value evaporating. Not $80B (though we'll see what happens Monday morning), but still quite substantial.

You know more than 1 person, and you know of a lot more people than you know personally. A typical American knows something on the order of 500 people, and knows of probably 20x that many. If there was exactly one person on puberty blockers out of 300 million Americans, you'd expect ~10k / 300M or 0.0033% of Americans to know of them. To get to "0.1% of people know of someone on puberty blockers" you'd only have to have 30 such people in the entire country.

Did you have any commentary on this, or are you just dropping the full text of some article into the thread in the hopes that people read it and start their own discussion?

Edit: Ok, that's better with the paragraphs you wrote at the end in your edit.