Spent $130K+ "cloned" Screen Studio and made it 3x faster: AGI for software dev feels so close
Spent $130K+ "cloned" Screen Studio and made it 3x faster: AGI for software dev feels so close
For software engineering, I believe the AGI era has already arrived. At least for software engineers, they can now fully build any product from zero to one. Let me share my own case: I spent a little over six months building a product from scratch that matches (and even surpasses) Screen Studio, with 4K video export running 3x faster.
Let me start with the story of building ScreenKite over a few short months. There was a slow iteration period of several months (roughly December to March). I was traveling and working and didn't have much time. But the development process of ScreenKite showed me that the AGI era had arrived. I came to believe that the traditional software engineer's career is fundamentally over, so I decided to leave my job entirely. I needed to slow down and think about what to do next.
Why did I want to build ScreenKite? Last August, I started an AI community with a friend, and I frequently needed to record my screen for teaching. I had purchased Screen Studio at the time, but its video export was painfully slow, especially for 4K 60fps video, where the export time far exceeded the time spent recording and editing. The bottleneck was export. I wondered: could I build a purely native macOS app to solve this? So I started building with Claude Code, back when the model was still around Opus 4.0.
Another idea was that we could already feel AI becoming capable of assisting with video editing. Whether it was CapCut or other software up to today, none had a truly good AI editing feature. Since I was already using Codex and Claude Code, I also wanted to build an AI native video editor, preferably local, since macOS rendering on my machine was quite fast.
So on October 1st last year, I made the first commit on the ScreenKite project. Interestingly, after all this time, there still wasn't a good open-source project to reference. The entire project was essentially invented independently by two AI systems, Claude Code and Codex, reading documentation from scratch.
The project started as a native macOS application. Screen recording with multiple stream inputs (screen capture, front-facing camera, plus two audio channels) is actually an extremely complex process. Honestly, the pressure on hardware orchestration and time synchronization was significant. For roughly the first two months, I felt like I couldn't get this right.
There wasn't much in the way of reference documentation, tutorials, or open-source projects for this. Maybe there is plenty today, but there wasn't back then. So the only option was to let AI keep implementing, iterating continuously. Because this is a very complex native project with no way to do automated AI demos, I had to build and test the project manually after every iteration.
There was a turning point when GPT 5.2 Codex dramatically enhanced iteration capability. Before that, I was using Opus as the primary driver, with Gemini and Codex for review. After that, I gradually migrated to Codex as the primary driver, with Claude and Gemini models occasionally reviewing, because it was just that powerful. This timing coincided exactly with the explosion of OpenClaw's popularity.
Specifically, the current iteration workflow is Codex as the primary, using acpx to bridge to Claude Code and Gemini CLI for code review. I've found that different AIs think about problems from different angles. Although I basically don't look at a single line of code anymore, having Codex write the code and then letting several AIs review each other's work makes me feel much more confident. I've also been fortunate to contribute code to openclaw/acpx a few times. I really like that project.
The development process reflected this too. In the beginning, iterating on the app was basically orchestrating a single agent, either Claude Code or Codex. Before December last year, I kept running into the same problem: implementing one feature would break one or two others. There were two reasons for this. First, the models simply weren't capable enough at that time. This was the most fundamental reason. Second, I recently realized the issue of low test code coverage. This led me to two conclusions: first, AI models will gradually consume software, or at least massively reduce the cost of software development.
Second, test code is even more important than production code, because it is the critical guarantee of stability for agent-driven iteration. Without it, video rendering, GPU acceleration, and Apple Metal optimizations are prone to regressions.
It's worth mentioning that at least 30% of ScreenKite's development was completely hands-off. From requirements to acceptance, the entire process was fully automated. It's somewhat like OpenAI's Symphony, and also like what OpenClaw's creator Peter Steinberger described: just keep typing 'continue'. That's genuinely what I did.
ScreenKite was originally a native Mac application. After OpenAI released its Rough Loop (also known as /goal) last week, I wanted to try whether I could rewrite the entire Mac native app into a full Windows application with just a handful of prompts. It looks like it took roughly 58 hours with GPT 5.5 Medium to basically get the job done, using less than 7B tokens. While the LoC is only about 30% of the macOS Swift codebase, the core functionality and UI are already fully comparable.
A brief look back at ScreenKite's development process
I've poured a lot of emotion into this project, and I've had many feelings along the way. The most immediate one is simply happiness that the bottleneck of creation has been solved like this. I've been an iOS developer for over ten years, but I was very unfamiliar with macOS, especially GPU, Metal, and some of macOS's more specialized areas. Being able to build such a complex product with the help of Codex and Claude Code made me genuinely happy. The project currently has over 500,000 lines of code, and I think it's already a fairly mature product.
On the other hand, it's also been extremely draining. The reality is that until true AGI arrives, the daily work is still being an AI QA engineer, and of course its product manager too. For the remaining 70% of the work I mentioned, you still need a human in the loop.
I originally planned to include screenshots showing ScreenKite's evolution from the old versions to now. I actually wanted to use an agent to build each historical version and take screenshots, but it turned out the biggest bottleneck was Apple's developer tools, which can no longer build projects that are too old. I didn't want to waste more time reinstalling older versions of Xcode just to take screenshots. I had planned to put together a visual history of the product's iteration, but due to Apple's limitations, it just wasn't possible. So it is what it is.
Isn't it much cheaper to just buy the software?
Of course not. It's still thousands to tens of thousands of times more expensive. When you factor in your time and token consumption, if tokens weren't subsidized, we'd be spending an enormous amount of money. The software costs just over a hundred dollars per year to subscribe to, but the actual cost of building it runs into tens of thousands of dollars.
I developed ScreenKite on two computers. Here's roughly how many tokens were consumed (assuming no subsidies):
MacBook A:
| Tool | Total tokens | Cost |
|---|---|---|
| Claude | 4.87 B | $3,354.09 |
| Codex | 116.69 B | $92,791.20 |
| Gemini | 0.06 B | n/a |
MacBook B:
| Tool | Total tokens | Cost |
|---|---|---|
| Claude | 2.68 B | $2,207.20 |
| Codex | 54.51 B | $39,557.73 |
| Gemini | 0.07 B | n/a |
What workflow would I follow to build a new software product today?
I think the workflow is different from before. The simplest way to put it: build a rough, ugly-UI prototype that iterates fast and covers most features, then address correctness and aesthetics on top of that prototype. UI and user experience are definitely important.
After that, there's more to do: have AI help organize our vision, assist in defining a vision document, then use that vision document to fix all kinds of issues, whether architectural, requirements related, or logical. Once the prototype is built, you can use an agent harness to iterate automatically or semi-automatically for a long time, since the requirements and vision are already set. At that point, it's essentially a fixed-input, fixed-output process, in my opinion.
What's the future for ScreenKite?
Given my investment, I don't see the point of selling this software for $29 or $9. I can't see how that would capture any meaningful market share, because there are too many similar products out there. I don't believe there's room for paid differentiation in this space anymore. So ScreenKite is a free alternative to Screen Studio and Loom. Everyone is welcome to use it and submit feedback. I think software's value for most people comes down to two things: first, creating value, meaning you can use the software to produce something different, to improve your methods. Second, saving time, for example, faster post-production editing and export. These are the two things I want to focus on going forward, and they're things I want for myself too. For post-production video editing, given the current state of large language models, I believe Google's Gemini multimodal model is the most likely to solve this first. So I think ScreenKite's editing component, offered as an independent tool for AI to use, represents a potential opportunity. Beyond end-to-end AI-automated editing, I think there's another interesting opportunity: code to animation, code to visuals. That means inserting video clips generated by hyperframes and remotion into our video editor to render faster, better videos. Currently, you could use something like hyperframe's video editor for rendering, but for longer videos the rendering performance is poor and slow. For longer videos or individual segments, a traditional high-performance video editor is still more efficient. That's why ScreenKite also has a sister product, PilotCut (I originally bought the CoPilotCut domain, but given my coding experience lately, I decided on PilotCut instead). https://www.pilotcut.com
What does this mean for developers?
Distribution and branding matter far more than anything else now. Models keep improving, and building bottlenecks will be overcome one by one. The harder problem is branding and distribution. Why should anyone buy your software? It's like Coca-Cola or Red Bull: they're all just beverages, so why buy the big brand? Why are people accustomed to buying them? That's branding at work.
On the technical side of building: stop developing native applications. Whether it's debugging, iteration speed, or UI fidelity, everything is significantly worse with native. Electron and web technology are the future, especially as software can be dynamically built and distributed, native apps' disadvantages will only grow. The OS of the future will likely have a native compatibility layer, similar to how browsers work today. Building on top of that will be far more efficient than native software, at least during early iterations. That's why I believe Screen Studio choosing Electron was still the right path. Many apps, especially the latest ones from AI labs, prove this point. Why do OpenAI and Anthropic use Electron for their most powerful desktop apps, Codex Desktop and Claude Desktop, instead of going native? That answers the question. Native development, in my view, carries too much historical baggage and isn't intuitive enough. That intuitiveness matters for humans, and it matters even more for AI.
Design and user experience still matter. Even with AI, for software that demands high determinism, we absolutely need excellent user experience, and that's a key reason to differentiate from others. To make truly different software, we need exceptional designers. I'm grateful to my designer friend @hi_caicai for the design work. For example, my designer friend can quickly use Claude / Codex with Tauri to build a UI and UX redesign for ScreenKite, rather than using Figma. Compared to Figma's Dev Mode, coding agents can directly read Tauri's frontend code to reproduce designs in SwiftUI and AppKit (though current LLM models aren't great at this yet, they can't do it few shot).
I think that while software development at the execution level has reached AGI, deciding what to build and what not to, deciding how to build it, to what degree, and UI and user experience still cannot reach AGI. Current large language models are incredibly strong executors, but they are not autonomous AI agents with their own will. I believe that in the not-too-distant future we may reach a fully self-aware AI. But for now, if I'm looking for some professional consolation, it's this: our creativity as humans, our empathy, our feelings about the world, our experiences, all of these are truly unique and truly important things. So that is the only consolation I have right now, as a software engineer whose execution-level work has already been replaced.
The importance of good work
In the AI era, I think how well you work matters more than how much time you spend. What's truly important is writing that core prompt. Choosing what to do and what not to do is critically important for both companies and individuals. While tokens can help us do many things, the endless code, Markdown, and everything else they generate is always a burden. That's why I feel there's a fundamental difference between using someone else's agent skills and building your own.
The reason I left my previous job wasn't entirely because I felt my work was being replaced by AI. It was also because AI is gradually redefining what truly good work looks like, and it's definitely not endless meetings, buck-passing, and office politics. We should focus on solving problems and building quickly. Between building sessions, we should rest our minds, so we can accomplish more in less time instead of blindly grinding overtime. In Good Work: Reclaiming Your Inner Ambition, Paul Millerd writes: "Most of my writing happens in the three-hour blocks I schedule each week, and my years are filled with seasonal flows of intense work and leisure. But the lines between work and non-work are fuzzy. Since I love thinking about ideas and exploring my curiosity, 'work' is happening all the time in conversations with friends, on long bike rides, in the shower, and at the gym." I'm not against working overtime on something you love. I hope I can find a cause worth fighting for that also makes me happy. But personally, I believe slowing down is what lets you create better, more valuable, more unique things. Spending endless time on token maxing is not the future. I don't particularly agree with Meta's internal token maxing approach either. My personal account has been irresponsibly banned by Meta twice, with no one to help me get it restored. On that note, Meta's oversight board does allow appeals, but when Meta bans you, they deliberately don't give you a ticket number, making it impossible to appeal. I also feel this company is too aggressive in its approach, which paradoxically prevents them from building anything of substance. Looking at the AI models we use today, that's the current state of affairs. Although I'm a small shareholder of Meta, I don't think they're on a philosophically correct path.
What does the future look like?
For the vast majority of white-collar industries, many jobs will be automated away by AI. The cruel reality is that many people will lose their jobs, myself included. I've been gradually falling into an existential crisis lately. For someone like me who hasn't built up a passive income base, it's hard to survive in this society on the old path. We have to find new ways forward. I also believe Anthropic's leadership hasn't been overly alarmist. After all, Dario Amodei said last May that AI would be writing over 90% of code within six months, and in hindsight, that turned out to be a conservative estimate.
To be honest, since I started learning to program, I've felt that much of the work was fairly mechanical. Learning complex, verbose syntax like Objective-C, Rust, and C++, which meant only a tiny fraction of people could do this work. Recently, Boris Cherny, the creator of Claude Code, mentioned in a Sequoia Capital interview that today's large language models are somewhat like the invention of the printing press. They can reduce illiteracy, specifically programming illiteracy, and actually increase demand. When you think about how everyone was illiterate in the Middle Ages compared to today's world, it's incomparable. If we look at it optimistically, things will certainly get better after a brief period of upheaval. Of course, I still want to say that opportunities on Earth may become fewer. If humanity can develop outer space sooner, I believe we'll enter an era of extraordinary abundance. I still hope SpaceX and Blue Origin can step up their game.