site banner

Repeating the LLM vs Advent of Code experiment

Last year I did an experiment with ChatGPT and Advent of Code. I was thinking of repeating it and since last year I was criticized for choice of model and prompt I'm going to crowdsource them: which LLM should I use, which one is best at writing code? What prompt should I give it?

5
Jump in the discussion.

No email address required.

I'd say do at least 3.5 Sonnet and whichever model of o1 is out by then. Sonnet is the best "classical" code llm (imo!), though you may have to prompt it pretty hard to get it to try a oneshot. But o1 is designed for oneshots and is the only one that may be a paradigm shift in ai design. It's been worse than sonnet at some tasks, but this may play to its strengths. Also if adding a Python interpreter, implore the models to add timeouts. :)