Notes
Thoughts on programming, games, and whatever else.
-
KISS web apps in the age of AI agents
2026-05-03My dad is a brilliant programmer. Last week he used Claude Code to drum up a simple web app for personal use. He doesn't work in web tech - he works in games and game engines. So, he deferred to Claude on the underlying tech stack.
Claude did everything 100% textbook: Electron, React.js, Zustand, etc. But for a simple personal-use app with simple functionality, this was absurd bloat and total overkill. A tech stack optimized for a 20+ person team at an organization with revenue measured in millions, not a hobbyist weekend project.
I grew up programming using Lua and Love2D, which I think was a fantastic way to learn programming. I also think I gravitated towards it because it is so simple - you can just call love.graphics.circle() and draw a circle on the screen. No scene graph or anything.
Because of this, when I first started learning JavaScript and web tech when I was 19, I gravitated towards vanilla JS and raw client-side HTML5 that I opened by double-clicking the .html file on my disk. Only later learned that there was a thing called "node". And I still think, even after working in production NodeJS/TypeScript/React software at NASA, that basic vanilla HTML5/ES5 is dramatically underrated.
I have to inform this to my AI agent of choice whenever I want to drum up a web app, so over doing this a few times I've settled on a stack and corresponding SKILL.md file which I like which I call KISS - "Keep It Simple, Stupid!".
The core of it is, of course: - use vanilla HTML and JS - use CDNs or copy-paste JS into a
libsorvendorfolder in your source directory instead of using a package manager - use NW.js or a Python webview instead of Electron, since both are much more similar to simple client-side webdev with some added permissions - in general, keep it simple and maintainableThis allows for compact, concise, readable SPAs without bloat or cruft that run super fast. I used this when developing GrobPaint, for example.
I also wrote up a skill file for this so you can hand it to your own AI agents. Please use it! :) simple-python-vanilla-js-web-skill.md.
-
I asked Claude Code to generate a shonen-style manga page about a pizza delivery guy who takes his job too seriously, using GPT Images 2.0 and my OpenAI API key. It went pretty awesome, actually. I was very impressed.

Then I wanted to see if I could turn that single generated page into a little motion comic. The ending result looked like this:

You can watch the final video on TikTok here.
The basic pipeline was:
- Detect the panels
- Reveal the panels in sequence
- Pan and zoom the camera between them
- Add a little bit of foreground motion
- Generate voiceover and sound effects with elevenlabs
The first thing I needed, and what ended up being the most difficult part of the whole thing was identifying and isolating the individual panel geometry. To dim unread panels and center the camera on each panel, I needed a polygon for every panel on the page.
I started with classical computer vision approaches: thresholding, Hough lines, watershed, flood fill, edge following, and sampling gutter colors. Unfortunately, none of them were reliable enough for manga-style pages.
Manga panels are just annoying for this. They have angled gutters, missing borders, art that bleeds to the edge, dark panel interiors, off-white gutters, and boundaries that are often more semantic than purely visual.
Then I had an idea - what if I hand the manga page back over to the image model as an image prompt, and ask it to generate a black and white mask for the panels?
"Output a binary diagram the same size as the input image. Fill every comic panel interior with pure white. Fill all panel borders, gutters, and outside areas with pure black. No grayscale. Same resolution as the input."

That worked surprisingly well. The model wasn't even doing edge detection. It was interpreting the page layout and returning a semantic panel mask.
Even though this felt like an elegant solution, the mask still needed cleanup. The generated boundaries were approximate, often off by about 5-10 pixels. I used Claude Code to write a Python script that extracts rough polygons from the mask and snaps their edges toward the real panel gutters, now back to using traditional computer vision. The script walks along each edge and searches for the strongest local image gradient. But now, the problem has been narrowed down so much that traditional computer vision tech could do the job well and reliably.
The useful pattern here was:
- Use the image model for rough semantic segmentation.
- Use classical CV for precise pixel alignment.
Now that we have a good mask, we can actually get to the meat and potatoes of the animation. For the actual animation, I used Hyperframes to render an HTML composition with a GSAP timeline to MP4. The composition was a single HTML file with:
- The generated page image
- An SVG overlay containing one polygon per panel
- Black fills over unread panels
- A GSAP timeline controlling camera movement and panel reveal
Camera motion was CSS
transform: scale(...) translate(...). Each timeline step moved the camera to a panel and faded out that panel's black overlay.One implementation detail mattered: with
transform-origin: 0 0, changing scale also changes the apparent translation. A naive(scale, x, y)tuple drifts during push-ins. The fix was to compute the translation from the target panel centroid and target scale, so the panel stays centered regardless of zoom level.Then I tried to add small foreground motion. Not real animation, just enough to make the page feel less static.
The first attempt was simple parallax. It did not work. The source image is flat, so moving the whole image just makes the camera feel unstable.
So I used the same mask-generation trick again, this time asking GPT Images 2.0 to separate characters and held objects from the background.

From that mask, I cut foreground components into transparent PNG layers and placed them over the original page.
That immediately exposed the main problem: the character still exists in the original background. If the foreground cutout moves even a few pixels, the original sharp character underneath becomes visible as a doubled silhouette.
The cheap hack I landed on was to just blur the character regions in the background image. I dilated the foreground mask, applied a Gaussian blur inside those regions, and kept the rest of the page the same.

When the foreground cutout is aligned with the blurred region, the blur is covered. When the cutout moves slightly, the exposed background looks more like motion blur than a duplicated character.
The blur trick has a narrow useful range. For this page, the practical limit was about:
- 1% scale change
- 8 pixels of translation
More than that, you need real inpainting, which I don't feel like doing here.
For voice and sound I used ElevenLabs v3.
The first TTS approach was to generate short clips separately and concatenate them. That sounded pretty bad, though. The better approach was one TTS call per continuous narration block, with multiple sentences in a single input string.
The
speedparameter was not useful for the v3 model in my tests. When a line needed to fit a timing window, I changed speed afterward with ffmpeg'satempofilter.The most useful control mechanism was v3 audio tags. Tags like
[whispers]and[shouts]in the input text were more effective than trying to tune sliders.The main thing Claude Code helped with was iteration. I could say something like "make the runner shrink slightly during this panel", have it edit the HTML composition or helper scripts, render a new MP4 through Hyperframes, and then compare the output.
Most of the work was small glue code:
- Polygon cleanup from the generated panel mask
- Foreground connected-component extraction
- Blurred-background generation
- Composition edits for GSAP timing
This is the kind of stuff that would normally be annoying to write by hand because it is narrow, disposable, and full of library details. With a coding agent, it becomes cheap enough to try. (The same could be said about this whole project, actually :P)
The final pipeline ended up as:
- Generate the manga page.
- Generate a panel mask.
- Extract and refine panel polygons.
- Build an HTML/GSAP composition.
- Render camera moves and panel reveals through Hyperframes.
- Generate a foreground mask.
- Extract foreground components.
- Blur foreground regions in the background plate.
- Add small foreground tweens.
- Generate and mix voice and sound effects.
You can see this as a GitHub repo as a reusable template / skill for animated manga page creation at motion-manga
Cool what you can do these days when you stitch together the various new AI tools. This is only scratching the surface.
-
GrobPaint
2026-03-15I released GrobPaint, a lightweight image editor that sits somewhere between MS Paint and Paint.NET. It's multiplatform by default, which was the main motivation. I've been wanting a simple Paint.NET replacement on macOS for years and nothing quite fit. I always found GIMP's UI to be clunky and Krita to be weird, most likely because I've grown so accustomed to Paint.NET. I've only ever heard good things about Asesprite, but too tailored to pixel art for me. No knock on any of those projects! Probably just me being overly particular.
Anyway, now that I'm addicted to Claude Code, I thought it probably wouldn't be too hard to just roll my own.

It's built with vanilla JS and a small Python backend. No frameworks, no Electron. About 5,000 lines of code total. Layers, blend modes, magic wand, lasso, the usual stuff you'd expect. Try it out in your browser!
I posted it on Hacker News and it got some nice traction. The GitHub repo is already over 100 stars, which is cool and unexpected. My Simple Markdown Editor also got more traction than I expected on HN (I should really post more!)
Actively developing this one. If you have feature requests, open an issue on GitHub.
-
Super Mario 64's elegant collision system
2025-12-31Most resources on 3D collision are either too technical (Separating Axis Theorem, GJK algorithm) or not technical enough ("just use a physics engine"). There's a gap for the developer who wants something quick, dirty, and good enough for most games. Super Mario 64's approach fills that gap brilliantly.
Mario is a point collider. That's it!
Collision triangles are grouped into three types--floor, wall, or ceiling--determined solely by the Z component of the triangle's normal vector. Floors point up, ceilings point down, walls are everything in between.
Detection works by extending each triangle into a triangular prism along its normal. If Mario's point is inside this prism, he gets pushed out. Each triangle type uses different prism dimensions: walls extend further horizontally, floors and ceilings extend further vertically. This simulates Mario being taller than he is wide without any complex shape math.
The secret sauce: the prism detection uses the triangle's normal snapped to the nearest axis vector, not the actual normal. This prevents seams where triangles meet--imagine walking across a floor made of slightly angled triangles. Without snapping, you'd catch on edges constantly. The real normal is still used for collision resolution, so surfaces feel correct.
This approach is genius because it's dead simple. No SAT, no GJK. Just point-in-prism tests and push-out vectors. One of the best parts is that it works with arbitrary triangle soups! No need for Carmack-style brush systems like Quake required. I've implemented this in Lua and JavaScript for first-person shooters and it just works.
I also prefer this over capsule colliders, the common modern choice. Capsules have rounded bottoms that cause floaty behavior when walking off platform edges--bad for game feel in platformers. The prism method gives you hard edges where you want them.
Sometimes the 1996 solution is still the right one.
Credit to pannenkoek2012 on YouTube, whose videos go into incredible detail about Super Mario 64's inner workings.
-
Maybe stocks aren't a great investment
2025-12-03Here's a graph of the S&P 500's closing price for the last 100 years. It goes up and to the right, and since this is a logarithmic graph, the growth is dramatic. This is what long-term index fund investors point to as proof of their investing thesis.

Here's the chart of gold. Gold has also gone up over time, especially after the 1971 removal of the U.S. gold standard when the dollar became a free-floating fiat currency. It has had much longer periods of going sideways than the S&P, which has caused many to consider it a boring store of value rather than an investment.
What's interesting about gold is how stable it has been as a store of value throughout human civilization. An 8-gram Roman aureus from the 1st century BCE was worth a legionary's monthly salary and could buy 100-200 loaves of bread--roughly $500-$1,000 in today's dollars, similar to wages today.

Gold being a constant store of value throughout history is dubious, but it may be the best proxy we have besides real estate. If we assume it's roughly consistent, we can divide the S&P 500's price by gold to approximate the real value of stocks.

Viewed this way, the S&P 500 hasn't really gone up and to the right--it has gone mostly sideways. The boom and bust cycles of the U.S. economy become much more visible.
I find this perspective useful because most stock charts show nominal prices. When the news reports "The S&P made a new record high today," it's worth asking: in terms of what?
If we take this assumption seriously--that gold has remained constant while the dollar has devalued--the implications for long-term financial planning are interesting:
-
Hold gold in proportion to your risk tolerance. If stocks fluctuate around a mean, gold provides stability.
-
Buy stocks at the cheap end of the cycle. The SPX/gold ratio resembles a sine wave, so buying when it's low (and rising) could make sense.
If instead we assume dollar devaluation is guaranteed but gold isn't necessarily a perfect store of value, the strategy shifts:
-
Buy a broad basket of assets. If everything rises in nominal terms due to inflation, diversification captures that.
-
Use long-dated options (LEAPS) to go leveraged-long on inflation that markets may not fully price in.
This is not financial advice--just an interesting way to think about stock valuations.
-
-
Lisp Visualization Test
2025-10-15I recently prototyped an idea of making an interactive version of Lisp using visual blocks, inspired by Scratch (which is how I originally learned programming!).
There are two types of blocks - lists and atoms. The lists automatically lay themselves out either horizontally or vertically depending on how nested they are.

You can right-click on blocks to execute them, and defining variables makes them show up as blocks in the environment - making the environment itself editable and interactive.

Here you can see me defining a variable and a function, and then using the function to update the variable in the environment.

This project isn't live yet because I'm still working out the bugs. Hopefully soon!
-
Local LLM Optimism
2025-09-14I don't want to make costly API calls to a supercomputer, I want to run the LLM on my own machine, have it be efficient, and have it be smart, too.
Things appear to be generally moving in this direction. The new local LLMs released by OpenAI (the GPT-OSS series) are pretty good locally, even the 20B model is reasonably smart and runs well on my M4 MacBook Pro that I bought earlier this year. The time to first token still isn't perfect, but it's reasonable and the intelligence of the model is reasonable too. In my experience it can handle some non-trivial programming tasks such as writing custom MDPs in Julia given some reference documents.
My hunch is that, because the demand is there for local LLMs, we will get more tech moving in this direction. To this point I'll suggest Apple's upcoming M5 chip's upgraded neural engine as evidence.
I am optimistic that we can avoid the future where a few megacorps control the only AIs in town.
-
When programming, I care about iteration time. A lot.
It affects my choice of tech stack dramatically. Maybe more than most other single consideration besides ecosystem. And, unlike ecosystem, I don't think enough developers talk about iteration time.
Iteration time is the time it takes for you to write a change to a line of code and see that change reflected in whatever you're building. When this time is low, I can flow as a developer. Stay in the flow state, never leave context. Put on some good coding music (drum and bass, anyone?) and go for hours pounding coffee. :)
When build times become a consideration, this all flies out the window. Make a change, run a compile, oops now I'm browsing Hacker News or Tiktok. There goes 15 minutes! Oh wait, what was I doing again? So much time wasted, so many context changes! Productivity tanks by like 50%.
This is part of the reason why I love scripting languages so much, or Julia with its REPL-driven development. I much prefer dealing with runtime errors or mucking up some typing issue than breaking my flow state.
Maybe you're the same way. Or maybe you just don't know it yet.
-
Everybody just wants immediate mode
2025-09-10In retained mode, you build a tree of persistent UI objects. Create a button, add it to a panel, add that to a window. When something changes, find the right object and update it. The DOM, Win32, Qt, GTK, Cocoa--all retained mode.
In immediate mode, you redraw the UI every frame as a function of state. No persistent objects. Just "if this condition, draw a button here." Dear ImGui is the canonical example.
Why care? Immediate mode makes
UI = f(state). Your UI cannot desync from your data because there is no separate UI state. An entire class of bugs becomes impossible.This is why everyone loves React. It brought immediate mode thinking to the browser--components as functions of props and state, the virtual DOM handling mutations. The browser is fundamentally retained mode, and React makes it feel like it isn't.
The endless churn of JS frameworks is developers searching for better immediate mode semantics on a retained mode platform. I don't think this framing comes up enough. When people ask "why do we need yet another JavaScript framework?" the answer is rarely framed this way. But that's the core of it.
This lens explains the 2010s. Desktop UI was "solved"--Qt, GTK, WPF all worked. Yet developers reached for Electron despite the bloat. There are many reasons: cross-platform, web talent pool, npm ecosystem. But immediate mode is underappreciated. Developers will pay a real cost to avoid retained mode. The new generation of native frameworks--Flutter, SwiftUI, Jetpack Compose--all converged on declarative, immediate-style rendering.
I love immediate mode. For technical applications, Dear ImGui is hard to beat. Your entire UI defined in one place as a function of data. No scattered callbacks or state sync bugs. Just draw what you need.
When developers add complexity to "solved" problems, ask what abstraction they're escaping. Often it's retained mode.