DILEMMA WORKS

Erik on product management and such





Implementing lightweight Recursive Language Models using PLANS.md

TLDR: You can use the text in this write-up as a prompt for your chat agent to implement your own RLM-ish development workflow.



The core idea of the Recursive Language Models paper is (in my opinion) this: requirements and implementation plans should not be part of context, but should exist as static documents that can be referenced by the models as it progresses its code generation steps. 

Context degrades (”rots”) and information is then lost, which results in implementations that do not match requirements, requiring additional review steps, tests and bug fixes, or follow-on implementation plans to fix the gaps between initial plan and implemented reality.

That means that requirements should not be part of the user prompt, but a document in the repo. Prompting is left for commanding the model to take certain actions, and those actions should be based on static docs. 

I’ve only been using Codex in VSCode for about a month and avoided to jump into complex workflows too soon so I could first build an understanding of the basics. My old workflow has evolved into an implicit version of this that I want to flesh out and make explicit using a modified version of Codex’s PLANS.md and a multi-step workflow where each step takes an input doc and outputs a doc, in addition to code changes.

How I currently work
1. I start describing to a chat agent in ChatGPT what I want to achieve. With memory turned on, the agent has context for what my application does and what I’ve previously implemented.
2. With an existing code base, I often need to transfer codebase knowledge to the chat agent, so I ask it to analyze the current codebase and output an ‘AS-IS’ doc in .md format. 
3. I share the AS-IS doc with the chat agent, and ask it to modify the requirements according to what already exists. When I’m satisfied with this brainstorm I ask the agent to output an .md doc that contains the detailed requirement. 
4. I copy the detailed requirement and paste it as a prompt in Codex inside VSCode. The requirement itself generally includes generating a <feature>-implementation-summary.md, running tests, and listing what files have been touched. It also contains instructions for Codex to generation a <feature>-implementation-plan.md based on the requirement.
5. I share the implementation plan with the agent for validation, and proceed with development when the plan is complete and approved.
6. When development has been completed, I will copy the implementation summary to the chat agent for review, to make sure that everything in the plan has been implemented according to requirements.

This workflow requires a lot of back and forth pasting of docs, and also has the issue of using prompts to carry requirements and plans. 

In my implementation of RLM, I intend to never copy text, always copy files, so that no part of the process is subject to context rot. I will also make explicit a couple of steps I already do most of the time but not always. In a future version, I could build a front-end that looks like a kanban board to reduce manual prompting of chat and codex agents.

Future workflow
I am adding additional steps and defining an input doc for each step as well as an output doc. The next step uses the previous step’s output doc as input and generates an output doc of its own for the next step. At the end of the workflow, I will have a DECISIONS.md that documents all of the whys and whats and hows for each requirement, as well as referencing all of the docs used in prior steps. 

The new workflow has more explicit steps, and a large amount of new docs, as you can see in this graphic:



1. Requirements discussed and defined with chat agent outside of VSCode. The output is saved as .md in a pre-defined folder in the repo.
Input: Chat
Output: Requirements .md file

2. The agent is prompted to read the requirements doc, and then crete an ‘as-is’ analysis of the current codebase, which is saved as a doc. The agent should double check that the analysis covers everything that is relevant in the requirements doc.
Input: Requirements .md file
Output: As-is analysis .md file

3. The agent is prompted to read the as-is doc and create a to-be implementation plan. If a change is large or complex, the implementation should be divided into sub-phases, each with their own implementation steps, checklists and tests. The agent should double check that the implementation plan covers everything that is relevant in the analysis doc. 
Input: As-is analysis .md file
Output: To-be implementation plan .md file

4. The agent is prompted to implement the plan. After finishing the implementation the agent should write an implementation summary doc and save as .md. The agent should validate that everything in the implementation plan has been implemented.
Input: Implementation plan .md file
Output: Implementation summary .md file

5. The agent should run tests that should have been defined in the implementation plan. Iterate until all tests pass. The agent should then output a test summary and save as .md.
Input: Implementation summary and implementation plan
Output: Test summary .md file

6. The agent should ask me to do the manual QA according to the QA matrix/test cases that should have been part of the implementation plan. Once approved by the user, go to next step.

7. Updates a Decisions .md doc that lists all code changes (requirements ledger), together with reasoning for why, what, how the changes were made, and lists all related docs
Output: Decisions .md index file

8. Update STATE.md based on the decisions, to create an up to date overview of the app’s current functionality and the state of the code base
Output: Updated STATE.md doc

Implementing the new workflow
I simply copied the entire text above for proper context and expected outcomes, included the standard Codex PLANS.md with an ask for it to be modified accordingly, and to get a reference block to be added to an existing AGENTS.md. Then I created the respective folders and markdown files, and I was good to go.

Testing
For a first test of the RLM workflow I tried adding an entry point to Keyboard controls in the deck settings component in tinytunes DJ

Worked perfecly on the first attempt, with approvals at each stage and a manual QA step where I was asked to put Pass or Fail in the QA doc.


I then used the workflow to fix an issue where track audio metadata (waveform, BPM) was not correctly loaded from cached but analyzed from the track when loaded from queue to deck each time.



This time, the workflow ran smoothly but failed my manual QA four times. As a result, it iterated on the requirements and added 5 addendum docs which it implemented, until it finally passed my checks. I used GPT 5.2 Medium until the fifth addendum when I switched to High. Hard to say if that’s what made it pass.



Benchmarking


Non-RLM workflow


RLM-workflow