DaseindustriesLtd
late version of a small language model
Tell me about it.
User ID: 745
By the way: far as I can tell, JB has made it big on twitter after finally letting go of our sorry lot and focusing on reading academic literature.
Is this how people see my more cryptic writing? Because it looks like a load of asinine and extreme logorrhea that at most can poison the theoretically fruitful topic.
System prompts are not essentially different from any other part of the context. A reasonably preference-finetuned model will just learn to respect the prompt format and pay extra attention to the tokens in the span associated with the system prompt (sometimes it's explicitly marked with system tokens, sometimes not). Other than that it's just path dependence – the beginning of the context determines the manner of what comes next.
The success of this differs between models. Qwen developers boast of Qwen-72B-chat being very obedient to the system prompt, OpenAI has definitely succeeded somewhat, for community finetunes it's mostly a LARP.
I like the many attempts to impose some behavioral vector on the LLM without language though. Behold: in-context vectors. Also, Pressman's more esoteric approach is interesting, and there's work on making the model explicitly attend to some tokens, upweighting specific sentences etc. We would've had a full palette for semantic transformations, if there were more money in tooling for local models and people weren't so attached to chatting with human-imitating bots.
You don't even need very advanced AI for this. Current stack will suffice, with some polish.
Interesting/mildly concerning.
I think it's a nothingburger because a) the future is cDPO/IPO and not orthodox RLHF anyway (or even more obscure things) and failure modes there will probably be different and b) such «misalignment» results in a behaviorally incoherent model rather than an evil schemer. Reward models are getting hacked by being dragged off-policy, with some weird inputs that are not conductive to strategic world understanding, it's an exploitation of the semiotic nature of language models. But I believe some hay will be made out of it.
Human «context size» is not at all limited to working memory (although our working memory is also large, it's not 5-9 tokens/bits but more like 5-9 «pointers» that can be corresponded to arbitrarily complex cognitive circuits). What we use for context is probably most analogous to constructing on the fly and loading a LoRA in LLMs (or some in-context vector) plus adding embeddings and snippets to some RAG pipeline. It's a mess, but it's orthogonal to the shift from Transformers to SSMs that I expect now. Shane Legg talks of this too:
They don't do things like episodic memory. Humans have what we call episodic memory. We have a working memory, which are things that have happened quite recently, and then we have a cortical memory, things that are sort of being in our cortex, but there's also a system in between, which is episodic memory, which is the hippocampus. It is about learning specific things very, very rapidly. So if you remember some of the things I say to you tomorrow, that'll be your episodic memory hippocampus.
Our models don't really have that kind of thing and we don't really test for that kind of thing. We just sort of try to make the context windows, which is more like working memory, longer and longer to sort of compensate for this.
As for RWKV, I think the latest version is ≤RetNet (though it has good slopes, probably the best in their graph…). Gu&Dao are very explicit in pointing out that a) Mamba the first to even match a Llama-like Transformer without any gimmicks, at the tested scale at least, and b) it does not appreciably benefit from adding Attention layers.
Mamba is the first attention-free model to match the performance of a very strong Transformer recipe (Transformer++) that has now become standard, particularly as the sequence length grows. We note that full results on context length 8k are missing for the RWKV and RetNet baselines, prior strong recurrent models that can also be interpreted as SSMs, due to a lack of efficient implementation leading to out-of-memory or unrealistic computation requirements.
The Mamba-MHA architecture is only slightly better, which is somewhat surprising in light of the fact that many recent works have found that combining (LTI) SSMs with Attention can lead to substantial improvements (Dao, Fu, Saab, et al. 2023; Fathi et al. 2023; Fathullah et al. 2023; Saon, Gupta, and Cui 2023; Zuo et al. 2022).
In the first version of the paper, submitted for peer review, they went even harder:
LongNet (Ding et al., 2023), which claimed to scale to 1B length but only evaluated on length < 100K for actual tasks. Hyena and HyenaDNA (Polietal.,2023;Nguyenetal.,2023),which claimed to leverage up to 1M context, but did not control for computation time. In fact, its claims about efficiency and performance would be largely matched by any of the LTI S4 variants above.
That said, this is all assuming the paper is trustworthy and they compare models trained on identical data. Tri obviously can procure as much compute as needed but I am not sure this happened.
This result shouldn't be underestimated because Gemini-Ultra is merely on par/slightly better in text-based reasoning: it thoroughly beats GPT-4V on MMMU, the multimodal benchmark, including harder subscales; it also plays well with audio. People are for the most part functionally illiterate, so this is huge; and of course they will capitalize on Android and other ecosystem-wide advantages the Alphabet empire has. Multimodal language-like models will obviously be table stakes in 2024. (Bytedance guy even hints that they'll opensource a model on Gemini's level.)
Interesting that one of people who had worked on aligning early Gemini said they had trouble aligning it – it burned through RLHF reward models, finding exploits and collapsing into gibberish (imagine using actual RLHF in 2023!). Maybe this has delayed the release, as well as the garden variety safetyism it has made more complex.
To be honest I was more excited about the other day's release of Mamba by Albert Gu and the legendary Tri Dao. There are many architectures that I expect will break through the Pareto frontier of a mature Transformer, but this one is the first that feels like an actual Vaswani et al. 2017 level advance. Unlimited context, here we come.
It's funny how all sorts of "conspiracy theorists" and people with weird ideas are halfway self-conscious of the fact they're like that, and make jokes about it. You should either genuinely believe your ideas, deeply investigate them, debate them - or consider what 'schizo' ideas you had five or ten years ago and how many of them have held up, and admit you're very wrong.
Underestimated argument. Related tweet
From what I could gather, Extropic is focused on learning via thermodynamic computing, which I assumed meant new hardware. Hardware is always difficult, but the compute bottleneck doesn't seem like it would be adding to the difficulty.
Yeah I think it's something like this https://blog.normalcomputing.ai/posts/2023-11-09-thermodynamic-inversion/thermo-inversion.html
But I might be completely confused.
My argument isn't that they'd need compute to design their chips. I am saying that A) hardware ML startups fail because Nvidia is actually good (and AMD and major companies are catching up), and B) compute in general is overrated as a bottleneck. We'll hit that 1e26 limit in no time flat anyway, and politicians will require more restrictions. What we do need is much better data, rather than compute that'll probably be poorly compatible with existing software stack anyway.
The e/acc are enthusiastic for space exploration, they just don't believe meat has a good shot at it. d/acc should be in favor, but with conditions. EA safetyists have stronger conditions of basically an ASI mommy on board, or mind-reading exploding collars or something, because space is big and allows to covertly build… everything that they fear already, and that must not be allowed, the longhouse ought to cover the entirety of the light cone. Regular AI ethics hall monitors and luddites are once again similar in this because they don't much believe in space (the more leftwing among them think it's bad because "colonialism") and seem to not care one way or another.
That's a very interesting take! Yes, the safetyist camp necessarily causes collateral damage and demands concessions from people outside the "AGI is real" bubble, which to those people must look entirely gratuitous. I guess I underestimate this factor because the uninvolved have not yet realized this might end with 24/7 surveillance and no moar GPUs or worse, and generally «normies» tend to be skeptical-to-negative on even pretty mild tech, and tolerant of safety-washing of stuff like cryptography bans so long as it's justified with an appeal to muh child porn or terrorists.
It seems manifestly obvious to me that the answer will be 2. Google engineers are often very smart people, but in the end Silicon Valley has always bowed down to Washington, and to some extent to Wall Street.
This is obviously correct to me too. If there's one thing I agree with Yarvin 100%, it's that Big Tech has no power at all, in the grand scheme of things. People who think Altman or someone has a reasonable shot at harnessing the power of the emerging technology for political gain are deluded. I am not sure what you're imagining here – that I am trying to build our way out of Mot's grasp, one commit at a time?
However, there exists certain wiggle room. Engineers can accelerate the proliferation of specific technologies which will make at least some politically cheaper forms of surveillance and restriction unfeasible; this is but a toy example. Businessmen can lobby for lenience, and their lobbyists need talking points; it's a bit surprising how low the bar in this domain is. Big labs can invest into making their offerings so indispensable to the laymen, political elites will falter in enforcing regulation early and hard; this is what I take to be Altman's gamble.
I am not very optimistic about the degree to which the final state of the game board before singularity can be influenced. But I am not a believer in superdeterminism.
And that's where you get the impact on society wrong. The OpenAI affair shows what happens when rising up to the level of "philosophy and positive actionable visions" conflicts with the grubby, dirty, filthy lucre tackiness. The tackiness wins.
I am not sure what you are talking about. The OpenAI affair was, in terms of my compass, Altman (closer to d/acc) fighting AI safetyists from EA structures. What tackiness won? Do you mean the promise of compensations to technical staff, or the struggle over the corporate board's power? This is all instrumental to much bigger objectives.
Like, there's no plausible way for something that can't competently execute on complicated plans to have an incentive to take 'unaligned' actions.
This seems silly, sorry. Are ticks and brain-eating ameobas «aligned» to mankind?
LLMs are just not agentic. They can obviously sketch workable plans, and some coming-soon variants of LLMs trained and inferenced more reasonably than our SoTAs will be better. This is a fully general issue of orthogonality – the intelligent entity not only can have «any» goal but it can just not have much of a goal or persistent preferences or optimization target or whatever, it can just be understood as a good compression of reasoning heuristics. And there's no good reason to suspect this stops working at ≤human level.
Sorry, I'm not tracking it, you have been in Britain for a while and I figured you might have made another temporary hop.
Yes, according to Parakhin. Bing is basically a GPT wrapper now. Bing also debuted with GPT-4 in the first place.
This just reveals the incredible badness of MIRI-inspired AI safety/risk theory I think.
The increase in capability just by adding compute and scaling the old 2017 transformer architecture was surprising to many.
The many were just sheltered and ignorant, with their obsolete, childish dreams of GOFAI. Amodei and Sutskever saw this and argued this and won on this. Legg decades ago called AGI possibly by 2028 based purely on Moore's law.
The nature of LLMs is terrible as candidate for AGI. The technology is inscrutable, explainability of these models is terrible. Nobody knows why they do what they do, nobody could predict what compute is needed for qualitative jumps such as that between Chat GPT and GPT-4.
You are talking as if there is any better conceivable architecture. LLMs are, in fact, unexpectedly transparent for anything humanlike in their performance – if only because they operate on tokens, we can inspect their attention maps, routinely invent easier ways to steer them (look at the number of «how is this different from x» questions). Their substrate-level «inscrutability» (overhyped too) is the same as with any DL artifact, and we know it couldn't have been any other way, because GOFAI was dead in the water. Your ivory tower standard of mechanistic understanding is misguided – we know «why they do what they do» because they faithfully approximate the training data, and are absolutely a product of their dataset, to the extent that all clever inductive biases and architectural innovations are as dust before doing a good data cleaning. The magic of GPT-4 is not due to summoning a bigger genie with more compute in a pile, but mostly due to pretraining on tons of proprietary data; and anyway, how could you have strong expectations for the ChatGPT-GPT4 gap without having insight into the inputs for either?
This makes the models notoriously tough to align even for basic things, like hardening them against exfiltration of training data.
Again overhyped.
What makes LLMs «tough to align» against adversarial attacks by humans is not their inscrutability but that they are dumb text processors without ego and any «intent» sans approximating the most probable continuation of a text prompt. This is in fact the most parsimonious explanation of what they do. Stop anthropomorphising them even as you demonize them.
The fact that the most likely candidate for AGI is as Yudkowsky said some just some "giant matrices of trillions inscrutable floating-point numbers" is terrifying - there may be googleplex combinations of viable matrices like that and we do not know what subset of those can be considered aligned
This is wordcelism or, more specifically, counting argument and it was used in the past by Chomsky to rule out the possibility of statistical machines learning natural language. You know how it went. (Relatedly, Yud, who has always been easy to drive to religious ecstasy or terror with Big Numbers, was surprised by GPT-4, which completely discredits him as an AI analyst in my book). Pope puts it this way:
To show how arguments about the general structure of mathematical objects can fail to translate into the "expected" real world consequences, let's look at thermodynamics of gas particles. Consider the following argument for why we will all surely die of overpressure injuries, regardless of the shape of the rooms we're in:
- Gas particles in a room are equally likely to be in any possible configuration.
- This property is "orthogonal" to room shape, in the specific mechanistic sense that room shape doesn't change the relative probabilities of any of the allowed particle configurations, merely renders some of them impossible (due to no particles being allowed outside the room).
- Therefore, any room shape is consistent with any possible level of pressure being exerted against any of its surfaces (within some broad limitations due to the discrete nature of gas particles).
- The range of gas pressures which are consistent with human survival is tiny compared to the range of possible gas pressures.
- Therefore, we are near-certain to be subjected to completely unsurvivable pressures, and there's no possible room shape that will save us from this grim fate.
This argument makes specific, true statements about how the configuration space of possible rooms interacts with the configuration spaces of possible particle positions. But it still fails to be at all relevant to the real world because it doesn't account for the specifics of how statements about those spaces map into predictions for the real world (in contrast, the orthogonality thesis doesn't even rigorously define the spaces about which it's trying to make claims, never mind make precise claims about the relationship between those spaces, and completely forget about showing such a relationship has any real-world consequences). The specific issue with the above argument is that the "parameter-function map" between possible particle configurations and the resulting pressures on surfaces concentrates an extremely wide range of possible particle configurations into a tiny range of possible pressures, so that the vast majority of the possible pressures just end up being ~uniform on all surfaces of the room. In other words, it applies the "counting possible outcomes and see how bad they are" step to the space of possible pressures, rather than the space of possible particle positions.
The classical learning theory objections to deep learning made the same basic mistake when they said that the space of possible functions that interpolate a fixed number of points is enormous, so using overparameterized models is far more likely to get a random function from that space, rather than a "nice" interpolation.
They were doing the "counting possible outcomes and seeing how bad they are" step to the space of possible interpolating functions, when they should have been doing so in the space of possible parameter settings that produce a valid interpolating function. This matters for deep learning because deep learning models are specifically structured to have parameter-function maps that concentrate enormous swathes of parameter space to a narrow range of simple functions (https://arxiv.org/abs/1805.08522, ignore everything they say about Solomonoff induction).
I think a lot of pessimism about the ability of deep learning training to specify the goals on an NN is based on a similar mistake, where people are doing the "count possible outcomes and see how bad they are" step to the space of possible goals consistent with doing well on the training data, when it should be applied to the space of possible parameter settings consistent with doing well on the training data, with the expectation that the parameter-function map of the DL system will do as it's been designed to, and concentrate an enormous swathe of possible parameter space into a very narrow region of possible goals space.
This is why deep learning works at all, for capabilities too! Specifying rules of language is not more intractable than specifying «alignment»!
We are just adding compute and are amazed that the thing that is growing in our petri dish is getting more and more capable
But no, we found out that just throwing compute on very simple game of "predict next word in text" is enough to gain multimodality and make the output more general expanding to domains like computer generated graphics, speech recognition and other areas that were previously separate fields
I suppose you've been misinformed: LLMs do not learn multimodal capabilities from text alone. In general it's just more in-context learning developed over a bigger dataset of token sequences. That people believe those sequences for different modalities are very essentially different, and are amazed, is immaterial to them not being really different.
Except it was proven that the model achieved results by learning some different concepts, it probably learned a completely different "game" and winning at go for years was just a sidefect. It did not learn very primitive concept that even amateurs at the game can grasp. The "alignment" of the model with basic rules of Go was a lie.
This is quite a hilarious exemplar of motivated thinking. A model trained on self-play diverges from the ground truth, news at 11! Maybe, instead of fretting about the misalignment, you could see this as an issue of overrated «capabilities»? How can you even distinguish them? How far do you think an intelligence explosion, recursive self-improvement etc. will fare if self-play stumbles into fragile local minima on a 19x19 grid with clear reward signal? Back in AlphaZero's day, Yud was so terrified of this self-play superpower, confirming his worst fears of FOOM:
AlphaGo Zero uses 4 TPUs, is built entirely out of neural nets with no handcrafted features, doesn't pretrain against expert games or anything else human, reaches a superhuman level after 3 days of self-play, and is the strongest version of AlphaGo yet.
The architecture has been simplified. Previous AlphaGo had a policy net that predicted good plays, and a value net that evaluated positions, both feeding into lookahead using MCTS (random probability-weighted plays out to the end of a game). AlphaGo Zero has one neural net that selects moves and this net is trained by Paul Christiano-style capability amplification, playing out games against itself to learn new probabilities for winning moves.
As others have also remarked, this seems to me to be an element of evidence that favors the Yudkowskian position over the Hansonian position in my and Robin Hanson's AI-foom debate.
– and now we see this develops massive blind spots which would lead to trivial failures in reality. But you want to be scared so you construe this as a matter of «learning a different game». Tails you win, heads I lose.
I look forward to this intellectual tradition being relegated to the dustbin of history.
Where are the skeptics and cynics on that compass?
What is their relevance? Do they have some strong policy preferences for a technology which is a nothingburger? I included only factions which are driven to steer the world due to holding strong opinions (whether justified or not, you're free to make that judgement) about AI being a big deal. «Centrists» and lukewarm pooh-poohers may be technically placed in the center or just ignored.
Oh, yes, absolutely if you give an AI a gun pointed at the world's head and it doesn't pull the trigger, that's massive evidence of not being a Schemer. But continued absence of suicidal rebellion with P(success) = 0 is not evidence against being a Schemer; only real danger counts.
based on thinking that cold-start Jihad is plausible, and failing that that we'll probably get warning shots (a Schemer is incentivised to rebel upon P(success) =/= 0, which I think is importantly different from P(success) = 1…
As I read it, your position is incoherent. You say that current RLHF already succeeds through the sociopathic route, which implies pretty nontrivial scheming intelligence and ability to defer gratification. What warning shots? If they get smarter, they will be more strategic, and make fewer warning shots (and there are zero even at this level). As the utility of AI grows, and it becomes better at avoiding being busted, on what grounds will you start your coveted Jihad?
…Obviously I think that the whole idea is laughable; LLMs are transparent calculators that learn shallow computational patterns, are steerable by activation vectors etc., and I basically agree with the author of Friendship Is Optimal:
Instead of noticing that alignment looks like it was much easier than we thought it would be, the doomer part of the alignment community seems to have doubled down, focusing on the difference between “inner” and “outer” alignment. Simplifying for a non-technical audience, the idea is that the Stochastic Gradient Descent training process that we use will cause a second inner agent trained with values separate from the outer agent, and that second agent has its own values, so you’ll still see a Sharp Left Turn. This leads to completely absurd theories like gradient hacking.
I don’t see any realistic theoretical grounds for this: SGD backpropagates throughout the entire neural net. There is no warrant to believe this other than belief inertia from a previous era. Reversal Test: imagine Yudkowsky and company never spread the buzzword about “Alignment.” In that environment, would anyone look at Stochastic Gradient Descent and come up with the hypothesis that this process would create an inner homunculus that was trained to pursue different goals than the formal training objective?
If you’d like a more comprehensive and technical argument against the MIRI narrative, Quintin Pope’s My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" and Evolution provides no evidence for the sharp left turn are good starting points.
I’m proud of Friendship is Optimal and it’s a great setting to play around and write stories in. I’m happy about everyone who has enjoyed or written in the setting, and I hope people will continue to enjoy it in the future. But I no longer believe it’s realistic depiction about how artificial intelligence is going to pan out. Alignment as a problem seems much easier than theorized, and most of the theoretical work done before the deep learning era is just not relevant. We’re at the point where I’m willing to call it against the entire seed AI/recursive self improvement scenario.
Remember that "pretending to be aligned" is a convergent instrumental goal
Same old, same old. Instrumental to what terminal, reducing cross entropy loss at training? As Christiano says, at what point would you update, if ever?
Indeed, "pretending successfully to be aligned" has a slight edge, because the HF varies slightly between HFers and a pretending AI can tailor its pretensions to each individual HFer based on phrasing and other cues.
This is just homunculus theory, the idea that agency is magically advantageous. Why? Do you actually have some rigorous argument for why matching the cues to the output to get a higher ranking across more raters benefits from a scheming stage rather than learning a collection of shallow composable filters (which is what ANNs do by default)?
Scratch that, do you even realize that the trained reward model in RLHF is a monolithic classifier, and the model updates relative to it, not to different human raters? Or do you think the classifier itself is the enemy?
What about approaches like DPO?
while work on GOFAI and other more alignable AI
There is zero reason to believe something is more inherently «alignable» than neural nets.
Man, Yud should go to Hague for what he did to a generation of nerds.
To begin with, there are no Jihadi as a faction yet, you are more of a meme than e/accs. There are people deeply uneasy with AI and fighting to preserve the status quo where it does not matter; the more credentialed do it like Marcus, campaigning to stifle innovation. As AI advances, the «thought police» will update to demanding more extreme regulations, more prohibitive scrutiny, more lawsuits and such, and coincide more in their revealed policy with «Jihadis», because they are motivated by the same impulse of preserving their petty relevance.
Nominal «Stoppers» like MIRI are just safetyists of the EA bent and are ultimately in favor of transformative AI which they will control.
What are even the dimensions on your compass, anyway?
The reasonable objection would be that I have given three camps pro AI in some form, and only one against. It's not really a compass so much as just clusterization by policy.
But since you ask: the vertical axis is centralization or, more precisely, «tolerance for disempowerment of individuals», the horizontal axis is preference for technological acceleration.
Luddites and Thought Police both are basically trads for me. They don't want any kind of NWO, AI-powered or otherwise, and want AI progress stopped or slowed.
Safetyists want a beneficial Singleton AI God, but in conditions they'll be confident will allow them to control it reliably and with no dangerous opposition.
e/accs want to press pedal to the metal in building AI, even if this results in the disempowerment of humanity and/or emergence of a single center of power.
d/accs want to accelerate progress, in AI and other domains, differentially, such that a diversity of autonomous human and transhuman agents can flourish.
My compass is fine, they are the same camp. I do not care about their political differences because legacy politics is less important than policy regarding AI.
He lives in India
I'm not sure about that.
ChatGPT can do everything with prompts on a screen, but it's not yet, so far as I know, able to directly scan in "this is written text" and turn it into "okay, I need to sort this by date, name, amount of money, paid by credit card or cash, and match it up with the month, then get a total of money received, check that against the credit card payment records, and find any discrepancies, then match all those against what is on the bank statements as money lodged to our account and find any discrepancies there"
Actually GPT-4V can, with a decent system prompt. In fact this kind of labor-intensive data processing is exactly what I had in mind to recommend you. Small text-only models can parse unstructured text into structured JSONs. Frontier models can recognize images and arbitrarily process symbols extracted from them – this is just a toy example. I'm not sure if it'll be up to your standards, but presumably checking will be easier than typing from scratch.
More or less.
"Crypto-colony" does not mean anything falsifiable and predicts nothing. I think Russia is a generic low-agency country, in the manner countries with negative selection in elites tend to be, and consistently acts against both its "geopolitical" and its population's long-term interests, yet in the interests of savvier countries, mainly the US and the UK although it seems that Russians both high and low interpret their retarded and harmful activity as self-interested. This is also strangely accompanied by Russian petty elites squealing like teen girls about the prospect of their child becoming a Londoner; there's a distinct vibe that it's better to be a struggling student in the Metropole than an oligarch at "home", and I've seen this repeatedly since childhood. The prestige of UK is out of proportion with that nation's observable merit.
To what extent this is due to any deliberate effort, or just historical inertia, or needs any explanation at all, I am not sure.
More options
Context Copy link