Local LLM Optimism

I don't want to make costly API calls to a supercomputer, I want to run the LLM on my own machine, have it be efficient, and have it be smart, too.

Things appear to be generally moving in this direction. The new local LLMs released by OpenAI (the GPT-OSS series) are pretty good locally, even the 20B model is reasonably smart and runs well on my M4 MacBook Pro that I bought earlier this year. The time to first token still isn't perfect, but it's reasonable and the intelligence of the model is reasonable too. In my experience it can handle some non-trivial programming tasks such as writing custom MDPs in Julia given some reference documents.

My hunch is that, because the demand is there for local LLMs, we will get more tech moving in this direction. To this point I'll suggest Apple's upcoming M5 chip's upgraded neural engine as evidence.

I am optimistic that we can avoid the future where a few megacorps control the only AIs in town.