site banner

Friday Fun Thread for November 8, 2024

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

If you use llama.cpp, you can load part of the model into VRAM and evaluate it on the GPU, and do the rest of the evaluation on CPU. (The -ngl [number of layers] parameter determines how many layers it tries to push to the GPU.)

In general, I strongly recommend using this over the "industry standard" Python-based setup, as the overheads of 1GB+ of random dependencies and interpreted language do tend to build up in ways beyond what shows up in benchmarks. (You might not lose so much time per token, but you will use more RAM (easy to measure) and put more strain on assorted caches and buffers (harder to attribute) and have more context switches degrading UI interactivity.)

In general, I strongly recommend using this over the "industry standard" Python-based setup, as the overheads of 1GB+ of random dependencies and interpreted language do tend to build up in ways beyond what shows up in benchmarks

Very true. Playing around with stable diffusion has made the impossible - made my windows seize up over memory, actual app crashes & need to restart the system more than every three weeks when the inevitable update happens.

I had to download rammap from sysinternals because it could delete leaked memory a1111 spilled in gigabytes every time a model was being swapped. Every ~50 mb of pics generated it will need restarting.

Maybe I should go to comfyUI.

Thanks! Do you use it?

Yes, though I haven't paid attention to it in about half a year so I couldn't answer what the capabilities of the best models are nowadays. My general sense was that performance of the "reasonably-sized" models (of the kind that you could run on a standard-architecture laptop, perhaps up to 14B?) has stagnated somewhat, as the big research budgets go into higher-spec models and the local model community has structural issues (inadequate understanding of machine learning, inadequate mental model of LLMs, inadequate benchmarks/targets). That is not to say they aren't useful for certain things; I have encountered 7B models that could compete with Google Translate performance on translating some language pairs and were pretty usable as a "soft wiki" for API documentation and geographic trivia and what-not.