Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.
- 133
- 1
What is this place?
This website is a place for people who want to move past shady thinking and test their ideas in a
court of people who don't all share the same biases. Our goal is to
optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.
The weekly Culture War threads host the most
controversial topics and are the most visible aspect of The Motte. However, many other topics are
appropriate here. We encourage people to post anything related to science, politics, or philosophy;
if in doubt, post!
Check out The Vault for an archive of old quality posts.
You are encouraged to crosspost these elsewhere.
Why are you called The Motte?
A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently,
it's an element in a rhetorical move called a "Motte-and-Bailey",
originally identified by
philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial
but high value claim to a defensible but less exciting one upon any resistance to the former. He likens
this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for
the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired
propositions to which one retreats when hard pressed."
On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.
New post guidelines
If you're posting something that isn't related to the culture war, we encourage you to post a thread for it.
A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts
such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a
submission statement. A submission statement is required for non-text sources (videos, podcasts, images).
Culture war posts go in the culture war thread; all links must either include a submission statement or
significant commentary. Bare links without those will be removed.
If in doubt, please post it!
Rules
- Courtesy
- Content
- Engagement
- When disagreeing with someone, state your objections explicitly.
- Proactively provide evidence in proportion to how partisan and inflammatory your claim might be.
- Accept temporary bans as a time-out, and don't attempt to rejoin the conversation until it's lifted.
- Don't attempt to build consensus or enforce ideological conformity.
- Write like everyone is reading and you want them to be included in the discussion.
- The Wildcard Rule
- The Metarule
Jump in the discussion.
No email address required.
Notes -
Do we have anyone running local offline LLMs here?
How are they coming along? Do you need to load them into VRAM, or can you load them into RAM or something and use either CPU or GPU from there?
If you use llama.cpp, you can load part of the model into VRAM and evaluate it on the GPU, and do the rest of the evaluation on CPU. (The
-ngl [number of layers]
parameter determines how many layers it tries to push to the GPU.)In general, I strongly recommend using this over the "industry standard" Python-based setup, as the overheads of 1GB+ of random dependencies and interpreted language do tend to build up in ways beyond what shows up in benchmarks. (You might not lose so much time per token, but you will use more RAM (easy to measure) and put more strain on assorted caches and buffers (harder to attribute) and have more context switches degrading UI interactivity.)
Very true. Playing around with stable diffusion has made the impossible - made my windows seize up over memory, actual app crashes & need to restart the system more than every three weeks when the inevitable update happens.
I had to download rammap from sysinternals because it could delete leaked memory a1111 spilled in gigabytes every time a model was being swapped. Every ~50 mb of pics generated it will need restarting.
Maybe I should go to comfyUI.
More options
Context Copy link
Thanks! Do you use it?
Yes, though I haven't paid attention to it in about half a year so I couldn't answer what the capabilities of the best models are nowadays. My general sense was that performance of the "reasonably-sized" models (of the kind that you could run on a standard-architecture laptop, perhaps up to 14B?) has stagnated somewhat, as the big research budgets go into higher-spec models and the local model community has structural issues (inadequate understanding of machine learning, inadequate mental model of LLMs, inadequate benchmarks/targets). That is not to say they aren't useful for certain things; I have encountered 7B models that could compete with Google Translate performance on translating some language pairs and were pretty usable as a "soft wiki" for API documentation and geographic trivia and what-not.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
There are some really good models available to run but they require beastly graphics cards. Here are some llama benchmarks, for a rough idea.
In theory, they can be ran on a CPU but GPUs are way better at this task.
The best places to find information on local LLMs that I'm aware of are https://old.reddit.com/r/LocalLLaMA/ and https://boards.4chan.org/g/ and especially the LLM general there.
Thank you.
More options
Context Copy link
More options
Context Copy link
I can run 7B models on a Macbook M2 with 8 GB of ram. This is because of how Macbooks handle VRAM.
It's pretty slow, and 7B models aren't great for general tasks. If you can use one that's fine tune for a specific thing, they're worth it.
Frankly, however, I'd just recommend using something like together(dot)AI or OpenRouter to run larger models elsewhere. Normal caveats about not pushing sensitive info out there, of course. $30-$50 worth of credits, even for monster models like Meta's 405B, will take you easily though a month of pretty heavy usage (unless you're running big automated workloads 24/7).
I think there's going to be a race between local AI specific hardware for consumers and just cloud based hyperscaling. I don't know which will win. Privacy definitely plays a part. I'm quite optimistic to see a new compute hardware paradigm emerge.
I'm using openrouter.ai daily. The credits last for a surprisingly long time. Sonnet 3.5 is my go-to model.
I'd like something offline and private for sensitive use though.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link