Last updated: 26 Apr 2024

Inspired by nownownow, this page is about what I've been focusing on over the past few weeks.

Local LLMs on CPU

Ever since the release of llama and llama.cpp, I've been playing around with various local models just for inference. This is a "every so often" hobby and often requires that I convert some model I downloaded to some new format to work with the latest git commit for llama.cpp.

Other than playing around with inference, try to get the newest models running and see how they perform. These days, that's mostly checking out the newest releases from TheBloke on HuggingFace and trying them out using CPU inference using either llama.cpp or kobaldcpp. I used Mistral 7B Instruct for quick queries for a long time, and later switched to UNA Cybertron 7b v3 for quite a few months. With the release of a Q5_K_M Llama 3 GGUF, Llama 3 has now become my primary model. For any queries that run afoul of Llama 3's filters, I fall back to UNA Cybertron, though Llama 3 seems much more well-tuned in this regard than Llama 2.

I was really interested in running Mixtral 8x7b (or some derivative) locally on CPU. I set this up with a Q_4 quantized GGUF that worked within roughly what a 32GB system could afford (I have 64GB, but am quite interested in how the 26GB model runs, since I'm interested in running it on a server that has 32GB of RAM). Inference speed tends to be inversely proportional to parameter count, but despite Mixtral 8x7b's ~56B parameters, in terms of inference, it behaves more like it has 13B parameters. My reading on this indicates it's choosing which parameters are most relevant for the query and pruning accordingly, and my tests with Mixtral have been really positive. I don't tend to choose it first, though, since it's inference speed is still a bit below Llama 3 and UNA Cybertron.

Expanding Notedeck

I still use Notedeck daily, but I've stagnated a little bit on development, and there are a few neat areas to explore here.

Prototyping with the Barracuda Application Server

I'd really like to build a more robust version of notedeck that is designed for hackability. Most widely advertised software these days isn't good for this, but I ran across the Mako server, built on the Barracuda Application Server, and thought it might be a good fit: everything written in Lua, with a single binary for the server. The standard library has all the batteries needed for what Notedeck needs, which makes dependency management much simpler. This is still just an idea, but a direction I'd like to explore in the coming weeks.

Self-hosted Version

Notedeck has always been self-hosted, in the sense you run it on your own machine and use it via a localhost URL in your browser. But I'd like to explore a CGI (I know, I know, but CGI is still useful and easy for small scale projects) script that would provide a simple interface for a handful of users to store and access their wikis from anywhere. I'll probably use a minimal, conservative, durable stack here, like Lua + Fennel. It suits my tastes, and is highly practical from a performance perspective.

Other Apps

Tiddlywiki is a neat base because it can write a copy of itself back to the server, and the server has very little understanding of exactly what those bytes are. This opens the door to the server being useful for other apps designed the same way. To myself, I call this idea "AppDeck", but I haven't really written anything about it much.

The first I'm working on is Column, a webpage that allows you to author it using itself. It's basically like a single page Tiddlywiki. I mostly kicked this off to prove I could write a WebDAV-enabled quine.

More interestingly is a project that I'm calling Typhon. It's a self-saving file like a Tiddlywiki or Feather Wiki, but a web-based launcher for websites centered around nested mnemonics. My first version (that I still use!) is called WebHydra, named after abo-abo's most excellent Hydra package for Emacs.

Finally, I have a couple of ideas about micro-apps that could be useful:

  • Porting Neatnik's Calendar from PHP to JS
  • Creating a basic Pomodoro-style timer, or something more generic that supports a similar workflow
  • A linktrap that you can dump links into, with an optional description. They can be searched by domain/path/description incrementally, and are displayed as a nested list, dates at the top level, and links beneath the date they were created. Maybe tags, but has to be streamlined.

Axe / Tomahawk throwing

I went to a team outing for work where we threw axes in 2023, and thought it would be a fun hobby. A downed tree in my yard provided a good stump I could use for a target, so I picked up a couple of Cold Steel throwing axes, as well as a SOG Fasthawk. While the Cold Steel axes are pretty close to regulation WATL design, the Fasthawk is much smaller (just 12.5"!) and is a tactical tomahawk, so it has a blade on one side, and a spike on the other.

Although I find the Fasthawk more challenging to throw and stick, it's basically indestructible, so it's become by favorite. It has a glass-filled-nylon handle (GFN) that is both light and extremely impact-resistant. One of Cold Steel axes (with a hickory handle) broke, and finding a replacement turned out to be much more challenging than I expected.