Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.
Jump in the discussion.
No email address required.
Notes -
Has anyone successfully ran an LLM fully offline, and still actively uses it? easy to keep using and be confident it’s not sending stuff out? how is it compared to gpt4?
I’d like to take advantage of LLMs for my personal note taking / organization / todos, but I’m generally pretty risk-averse from a privacy and security perspective and I keep some fairly private stuff in there.
I've ran pretty much every local model ≤65B parameters there is, except a few weird Chinese projects (speaking of foundations, not interchangeable LLaMA finetunes). It's easy these days. If you're paranoid you can inspect the inference code, it's all in the open. llama.cpp and kobold.cpp are probably the best open-source cross-platform applications to use right now. Closed-source stuff like LM Studio is more user-friendly.
It's vastly worse, except for tasks where GPT-4 is deliberately NERFed (erotic roleplay mainly, which I don't use – never saw the point even with humans – but which is a big reason for the community to keep going, it seems).
There isn't even anything entirely on par with ChatGPT-3.5-turbo. Specialized coding models (WizardCoder) are getting close, little by little. But really, I'm quite disappointed. People don't think bigly, don't read the literature thoroughly enough, don't pool resources with remotely the urgency the situation commands.
Speaking of which, if there's anyone here interested in making a local ChatGPT-tier model available, you can support Eric Hartford's Dolphin project. Falcon-40B finetuned on OpenOrca dataset and some textbooks might well surpass it. I think it should be on the level of $20k total. For now best we can look forwards to is his Llama 13B finetune.
EDIT: it seems it's more like 10k for Falcon but also these guys are already doing Orca 13B replication. I was not sure whether they or Hartford are more trustworthy, at least some guys in this team are okay.
I'm sorry to say I am not aware of a good extant system that can leverage local models for work with documents. But it'll be something like this or this. There's a big problem with context length and computational cost. Research suggests much more is possible.
I don't do ERP, but I have used jailbroken ChatGPT-4 to generate bespoke erotica, and I can tell you that it's really fun. You can feed it any combination of characters and scenarios you like. If I want to read about the Flintstones wife-swapping with the Rubbles, I can. If I want to read about Agatha Heterodyne having a threesome with Gil and Tarvek to decide who gets to marry her, I can. If I want to read about Nagatoro teasing and bullying Senpai by sending him videos of her sexcapades while he is away at art school, I can. If I want to read about Amaryllis Penndraig becoming a slut to compete with Fenn over Juniper's affections, I can. The possibilities are endless!
For now, jailbreaking ChatGPT is easier and cheaper than using open-source LLMs, not to mention it works WAY better (as you correctly note, no open-source model can even match the output of ChatGPT-3.5), but OpenAI is clearly trying to make their models unbreakable and it is good that people are working on free alternatives should they succeed.
What's a good way to jailbreak gpt4 these days?
The Luigi and Peach jailbreak reliably produces NSFW. Make sure you have DeMod installed, though, or you will get banned.
Thanks. Any chance that DeMod itself will get you banned?
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I've run a couple LLMs offline using guides I found on the LocalLlama Subreddit, though I don't actively use it due to the far lower level of intelligence compared to ChatGPT, even 3.5. By analogy, if GPT 3.5 is a middle schooler and GPT 4.0 is a high school freshman, the best of the best currently available models that can be run at reasonable speed on a high-end consumer GPU (e.g. a 4090 with 24GB of VRAM) is a 1st or 2nd grader. So the tradeoff in usability for privacy, customizability, and uncensored nature doesn't make a lot of sense for my own use cases.
But if privacy is a high priority, then it's pretty trivial to run a local LLM while making sure it's remaining private. Just disconnect your computer from the internet when you use them. The UI tools to run these are all open source and require being checked out directly from Github, so you can check directly yourself that it's not saving your prompts and responses and sending them back to some central server somewhere. I admit I haven't checked this directly, but the community is active enough that something that egregious would've been caught by now.
More options
Context Copy link
More options
Context Copy link