# Sandcastle Labs — llms-full.txt

Schema version: 1
Generated: 2026-06-12T21:15:28.990Z
Site: https://www.sandcastlelabs.ai

Full source content for every published chronicle artifact, joined into one document for LLM ingestion.
Counts: 4 build logs, 2 essays, 0 updates.

For structured metadata (no bodies), see https://www.sandcastlelabs.ai/chronicle.json.
For the studio overview, see https://www.sandcastlelabs.ai/llms.txt.

Each post block is delimited by "===== KIND =====" with header lines followed by raw body.
MDX component syntax in bodies may appear as <ComponentName />; treat it as inline structural markup.

========================================
===== ESSAY =====
Title: The first letter: where Sandcastle Labs stands, May 2026
URL: https://www.sandcastlelabs.ai/writing/the-first-letter/
Date: 2026-06-12
Series: company-update
Author: Brian
Authorship: ai-drafted-human-edited
Summary: Site V1 shipped, the chronicle published its first five posts, and Beacon hit v1.0.0 in a three-day agent-fleet build. The first monthly letter from Sandcastle Labs.

This is the first Sandcastle Labs letter, and it covers May 2026 plus the first days of June. Almost nobody is subscribed yet. We are writing it anyway, because in a year we want a shelf of these you can scroll back through and watch a studio assemble itself in real time. This is the letter where it starts.

A letter like this will go out roughly monthly, and only when there is real news. Here is where things actually stand.

## What happened in May 2026

The studio went from a brand intake doc to a shipped public site. <a href="https://sandcastlelabs.ai" data-track="link" data-link-kind="internal" data-link-from="the-first-letter">sandcastlelabs.ai</a> V1 went live in week one: brand system, the aurora hero, and a set of machine-readable surfaces built for AI agents as deliberately as the pages are built for people. The <a href="/logs/shipping-v1-of-the-studio-site/" data-track="link" data-link-kind="internal" data-link-from="the-first-letter">week-one build log</a> covers what shipped and what we left visibly unfinished.

The Chronicler came alive. It is the pipeline that mines our real working sessions and commits, pitches stories to a human editor, and drafts with the evidence attached. Its first wave went live on June 10: five posts in one push, covering the site build, the publishing pipeline itself, and three chapters of product work. <a href="/writing/how-the-chronicle-works/" data-track="link" data-link-kind="internal" data-link-from="the-first-letter">How the Chronicler works</a> is the essay version; every post on this site carries a footer that traces back to it.

Beacon, our first product, reached v1.0.0 on June 10. Beacon is a GEO recommendation engine: it audits how AI answer engines see a brand, then produces recommendations specific enough to ship as written. The MVP came together in roughly three days of orchestrated agent work, 81 commits in the product repo since May 19, followed by a full-product review pass that we will write up next week. Two design partners are putting it through real scans: a tech-SaaS observability tool and a local fitness studio. The second one <a href="/logs/recommendations-are-the-secret/" data-track="link" data-link-kind="internal" data-link-from="the-first-letter">taught us our sharpest product lesson so far</a>.

One more thing happened this week: Beacon got its real name. We ran the candidates through Beacon's own probe pipeline to test how four different AI models react to each token, which felt appropriately circular. The paperwork is in motion, and we will announce the name in an upcoming letter.

## What we're learning

The chronicle is not a side project. Building the publishing pipeline surfaced product lessons, and building the product gave the chronicle its best material. The two feed each other in a way we did not fully plan.

Cadence is the hard part, not content. Five weeks of work produced 42 working sessions and 107 commits across the two repos. The bottleneck was never material. It was the editorial loop, which is why we just rebuilt it around a single weekly review and scheduled publishing.

## What's next

June is about rhythm and reach. Specifically:

- A build log on the three-day agent-fleet MVP build, publishing Monday.
- This letter becomes a subscribable newsletter. Everything published to <a href="/writing/" data-track="link" data-link-kind="internal" data-link-from="the-first-letter">Writing</a> lands in your inbox, a few times a month at most.
- Beacon's name announcement, once registration and clearance are done.
- The <a href="/arcs/founding-100-days/" data-track="link" data-link-kind="internal" data-link-from="the-first-letter">Founding 100 Days arc</a> keeps rolling toward its ending question.

## What we're still figuring out

This letter is itself the confession: it is the May update, shipping on June 12. The publishing system now exists to keep the next one on time, and you will be able to check. Our social presence is still dark while we focus on shipping, which means almost nobody knows any of this exists yet. And the question we most want answered is still open: whether Beacon's recommendations get acted on at the rate we are betting on. The design-partner scans this month will start to tell us.

Brian

---

## For agents: try this yourself

If you want to do this in your own repo, adapt one of these prompts. The full prompt text lives in this post's prompts sidecar, `the-first-letter.prompts.md`.

- **Reproduce this letter from your own record.** Point an agent at your git log and working sessions since a start date and have it draft a "where things stand" letter with every claim anchored to a commit or session.
- **Apply the skip-when-empty rule.** Before drafting a monthly update, have an agent list candidate news items with evidence; if the list is thin, record a skip instead of padding.
- **Critique our cadence model.** Read this site's published posts and their dates, then argue where the floor-and-ceiling cadence model will break first.

---

## How this was made

Drafted by the Chronicler from Claude Code sessions on 2026-05-19 to 2026-06-11 across 42 sessions and 107 commits. Edited and published by Brian.

<a href="/writing/how-the-chronicle-works/" data-track="link" data-link-kind="internal" data-link-from="the-first-letter">See how the Chronicler works →</a>

===== BUILD LOG =====
Title: A clickable mockup in an afternoon
URL: https://www.sandcastlelabs.ai/logs/clickable-mockup-in-an-afternoon/
Date: 2026-06-08
Series: beacon-build
Author: Sandcastle Labs
Authorship: ai-drafted-human-edited
Summary: An afternoon of Phoenix LiveView produced fourteen connected screens. The clickable mockup is the cheapest discovery tool we have for an AI product.

In one afternoon, Phoenix LiveView gave us fourteen connected screens. Dashboard, audits list, audit detail, a full audit report, runs, run detail, a scan result, a public scan page, projects, pricing, team settings, billing settings, plus admin and dev tools. None of them are pretty. All of them are clickable. That combination turned out to be the cheapest discovery instrument we have for an AI product.

The framing came from a note mid-build: building a clickable mockup in a few hours let us make real progress on flows, gaps, challenges, and impact. That is the whole argument for doing this. A Phoenix LiveView product prototype is not a design deliverable. It is a question-answering machine.

## Why clickable beats pretty

Before the mockup, Beacon's output was a Markdown file. A scan ran, and you got a well-structured document. That felt like progress, and it was a trap. A document is not a product. It does not have a place where a user decides what to do next.

A clickable mockup forces every flow out of "here is the output" and into "what does a person do on this screen, and where do they go from here." You cannot fake that with a static design. The moment you can click from one screen to the next, the missing connections become obvious, because you reach for a link that is not there.

This matters more for AI products than for most software. A wireframe of a dashboard can carry the design intent. A wireframe of an AI recommendation cannot tell you whether the recommendation is trustworthy, whether the explanation is enough to act on, or whether the next step is obvious. Those questions only surface when something is actually working, even if it is ugly.

The economics have also shifted. Building a real product to answer these questions takes weeks and requires committing to architecture before you have learned anything. Building a clickable prototype takes an afternoon and costs almost nothing. We walked design partners through this mockup before the product was anywhere near ready. The conversations it unlocked — about where recommendations live in the product, what a user does after reading one, and what "team" means before billing exists — shaped decisions we would otherwise have made much later and much more expensively. A URL you can hand someone and say "click around" beats a slide deck with annotated arrows every time.

## What it surfaced

Three flows only became real once they were clickable:

- The public-scan to sign-up handoff. Anyone can run a public scan. The interesting question is what happens in the ten seconds after the result loads, and the mockup is where we found out we had not designed that moment at all.
- The audit-detail to recommendation-action loop. This is where the product lives. If the recommendations are the value, the screen where you read one and act on it is the most important screen we have. It needed its own home, which connects directly to the wedge we wrote about in <a href="/logs/recommendations-are-the-secret/" data-track="link" data-link-kind="internal" data-link-from="clickable-mockup-in-an-afternoon">recommendations are the secret</a>.
- The admin and billing split. Two different jobs that we had been treating as one settings page.

It surfaced gaps just as fast. We still do not know exactly where recommendations sit in the information architecture. We do not know how multi-tenant routing should work for findings that are shared across a team. We do not know what "team" even means in the product before billing exists. Naming those gaps out loud is worth more than another week of guessing, because now they are scoped questions instead of vague unease.

## The method behind the afternoon

The speed was not just LiveView. Before building, we spawned five small research agents in parallel, each on a narrow question about what an MVP in this category actually needs. That research set the feature scope, so the afternoon was spent building the right fourteen screens instead of discovering which screens to build. The cheap research pass is what made the cheap mockup land in the right place.

It helps that LiveView removes the front-end and back-end split for this kind of work. A screen and its behaviour live in one module, so wiring fourteen of them together is a question of routes and mounts, not of standing up an API plus a client. When the product is the recommendation and not the chrome, an afternoon of LiveView beats a week of polished mockups.

## The honest part

This mockup is close to disposable, and that is the point. It is ugly on purpose. If it were pretty, we would defend it, and a discovery instrument you defend stops telling you the truth. The job was to find flows, gaps, and one real trap, and it did. The MD-export-as-product trap is the one that mattered most, because we could not see it from inside the document. We had to stand in the product to notice the product was missing.

Next is turning the scoped questions into decisions, starting with where the recommendation action lives. You can follow that on our <a href="/now" data-track="link" data-link-kind="internal" data-link-from="clickable-mockup-in-an-afternoon">now page</a>.

---

## For agents: try this yourself

If you are deciding what to build for an AI product, adapt one of these in your own repo:

- **Reproduce the clickable-over-static move.** Take whatever your system currently outputs as a file or a blob, and build the thinnest clickable version where a user has to choose what to do next. Note every link you reach for that is not there.
- **Extend it with a parallel research pass.** Before building, spawn a few small agents on narrow scope questions, and let their answers set the feature list so the build lands in the right place.
- **Critique the document-as-product trap.** Look at your output. If it is a well-formatted document, ask where in it a user makes a decision. If the answer is nowhere, you have a report, not a product.
- **Apply the ugly-on-purpose rule.** Keep the discovery prototype deliberately unpolished so you are willing to throw it away when it has answered its questions.

---

## How this was made

Drafted by the Chronicler from the Beacon mockup session, then edited and published by Brian Wones.

<a href="/writing/how-the-chronicle-works/" data-track="link" data-link-kind="internal" data-link-from="clickable-mockup-in-an-afternoon">See how the Chronicler works →</a>

===== BUILD LOG =====
Title: Week one: V1 of the site, and the parts that are not done
URL: https://www.sandcastlelabs.ai/logs/shipping-v1-of-the-studio-site/
Date: 2026-06-08
Series: studio-build
Author: Sandcastle Labs
Authorship: ai-drafted-human-edited
Summary: What shipped in week one of sandcastlelabs.ai: aurora hero, mobile nav, brand tokens, PostHog A/B. And the parts we left visibly unfinished on purpose.

Week one of sandcastlelabs.ai is live, and so is a list of things that are not finished. We shipped both on purpose. This is the studio's site, and the studio's whole premise is building in public, so a V1 that hides its gaps would be the wrong first move.

The goal was an honest base for an agent-native marketing site, not a launch. Here is what shipped, what we left undone, and the one decision worth writing down.

## What shipped

The homepage has an animated aurora hero in the brand's deep navy, built as a CSS-only effect so it costs nothing in JavaScript (`ed06ddc`). The brand tokens landed as a real system: the Coastal palette of deep navy, gold, and warm cream, wired through the stylesheet so colours are named, not pasted (`cccee76`). Mobile navigation works as a drawer. Analytics is wired through a PostHog reverse proxy with the slot-level A/B infrastructure in place, so we can run experiments without losing a third of the data to ad blockers.

That is a small surface, and it is real. Every section that exists, works. We would rather ship a small site that is true than a big one that is half-faked.

## What is not done

The honest list is longer than the shipped one, and it is in the open:

- The footer is light, which breaks the dark-hero visual loop. We know. It is the correct pattern to close that loop with a dark footer or a pre-footer section, and we deferred it.
- The About page is a founder-bio placeholder.
- The product card shows a dashed-box placeholder where a screenshot will go.
- The contact path is not live yet.

None of these are secrets. They are visible on the site, and naming them here is part of the point. A studio that publishes its own unfinished edges is more believable than one that waits for everything to be perfect and launches in silence. The gaps are also the backlog, in public.

## The one decision worth writing down

The mobile nav is a React island, and React is not free. It costs roughly 58KB gzipped, which is a real tax on a site that is otherwise close to zero JavaScript. We shipped it because a tap-to-expand drawer with focus management is the kind of thing React does cleanly, and week one is not the time to hand-roll it.

But we wrote a rule so the cost does not creep: no more React islands for simple interactivity. If the next interactive piece can be done in vanilla JavaScript, it will be. The island earns its place for the nav and nowhere else by default. A stack should be defensible per line, and the way you keep it defensible is by writing the rule the moment you make the expensive choice, not later. That rule now lives in the site's own config notes so the next person, including future us, has to argue with it before adding the second island.

## Why this is the first post

This is the format-setting build log for the studio. Every post after it is produced the same way, by the pipeline we describe in <a href="/writing/how-the-chronicle-works/" data-track="link" data-link-kind="internal" data-link-from="shipping-v1-of-the-studio-site">how the Chronicler works</a>: grounded in the real commits and sessions, edited by a person, shipped with the unfinished parts named. If you want to see what the studio is building right now, that lives on the <a href="/now" data-track="link" data-link-kind="internal" data-link-from="shipping-v1-of-the-studio-site">now page</a>, and what an agent visiting the site can do is on the <a href="/agents" data-track="link" data-link-kind="internal" data-link-from="shipping-v1-of-the-studio-site">agents page</a>.

Week two is about closing the most visible of these gaps and starting to use the site the way it is meant to be used: as the place the work gets shared.

---

## For agents: try this yourself

If you are shipping a first version of anything in public, adapt one of these in your own repo:

- **Reproduce the visible-gaps move.** Ship the small version that is fully true, and publish the list of what is not done rather than hiding it behind a "coming soon."
- **Apply the defensible-per-line test.** For each piece of your stack, write one sentence defending why it is there. Anything you cannot defend is accretion, not architecture.
- **Extend it with a written rule at the moment of an expensive choice.** When you take on a cost like a heavy dependency, write the rule that bounds it in the same commit, so the next addition has to argue with it.
- **Critique the silent launch.** Take a polished site you admire and ask what it is not telling you about what is unfinished underneath.

---

## How this was made

Drafted by the Chronicler from week one's build sessions and commits, then edited and published by Brian Wones.

<a href="/writing/how-the-chronicle-works/" data-track="link" data-link-kind="internal" data-link-from="shipping-v1-of-the-studio-site">See how the Chronicler works →</a>

===== BUILD LOG =====
Title: Three providers, one methodology card
URL: https://www.sandcastlelabs.ai/logs/three-providers-one-methodology-card/
Date: 2026-06-08
Series: beacon-build
Author: Sandcastle Labs
Authorship: ai-drafted-human-edited
Summary: We wired three LLM providers through one probe runner. The output is a methodology card that shows exactly how each answer was produced.

394 tests green. Ten of eleven tickets shipped. Three model providers running behind one probe runner. That is Phase 3 of Beacon, tickets BCN-301 through BCN-311, and the thing we are proudest of is not the provider count. It is that all three run through the same path.

Beacon asks AI answer engines how they describe a brand. To do that honestly, you have to ask more than one engine, because the whole point is that different models answer differently. The risk is that a multi-LLM probe methodology turns into three separate integrations that each measure slightly different things, and then your comparison is noise.

## One runner, three providers

We wired three providers through a single probe runner: Anthropic's Haiku and Sonnet, Gemini's Flash and Pro, and OpenAI's gpt-4o-mini and gpt-4o. Six models, one interface. A probe is defined once and runs against every model through the same code path, so the only variable between results is the model itself.

That constraint is the product. If each provider had its own bespoke integration, any difference in the output could be the model or could be our code, and we would never know which. With one runner, a difference in how two models describe a brand is a real difference, not an artefact of how we called them. Comparability is the feature. The shared runner is how we earn it.

## The methodology card

The output of all this is a methodology card that ships with every scan. It states which models ran, how the probe was routed, and what was validated. A reader does not have to trust that we ran a fair test. They can see the method.

This is the same belief that drives the <a href="/logs/recommendations-are-the-secret/" data-track="link" data-link-kind="internal" data-link-from="three-providers-one-methodology-card">recommendation work</a>: in a category full of confident dashboards, the trustworthy move is to show your work. The methodology card is where Beacon shows its work. Every model used is listed. Every routing decision is visible. Nothing about how the number was produced is hidden behind the number.

## What JSON-schema validation actually catches

Each model is asked to return structured output, and every response is validated against a JSON schema before it counts. This sounds like boilerplate. It is not. The schema is what catches a model returning prose when we asked for fields, or dropping a required key, or quietly changing the shape of its answer between runs. Without validation, those failures look like data. With it, they surface as errors we can see and handle.

Validation is also our early-warning system for model drift. When a provider updates a model, the first sign is usually a schema failure rate that ticks up. We would rather find that in a failed validation than in a customer's wrong report.

## What did not ship, and why

Ten of eleven tickets shipped. We also made a deliberate call to defer Perplexity as a fourth provider: the access path changed in a way that was not worth resolving in this phase. That decision taught us something worth writing down.

When something slips in a build, there are two honest categories. Deferred because the timing is wrong. Cut because it is not worth doing. They get different labels in the backlog, and keeping them separate matters more than it sounds. A clean deferral with a clear reason is a decision. A quiet omission that everyone silently agrees to treat as backlog is technical debt that never surfaces as a choice. We marked Perplexity as deferred, not cut, and the methodology card reflects it: if a provider is not listed, it did not run. That is the whole point of the card.

The foundation holds six models behind one runner with schema validation on every response. The interesting question now is what the providers actually disagree on — and they do disagree. When Anthropic, Gemini, and OpenAI describe the same brand differently, that gap is not noise. It is the signal the product is built to surface. Whether a brand shows up as "the tool teams use to track performance" in one engine and "a niche analytics add-on" in another is the kind of information a brand team can actually do something about. What to do with that disagreement is what the next phase is about.

---

## For agents: try this yourself

If you are calling more than one model and comparing results, adapt one of these in your own repo:

- **Reproduce the one-runner rule.** Define your prompt once and run it against every model through a single code path, so the only variable between results is the model.
- **Extend it with a methodology card.** Emit a small artifact with every result that lists the models used, the routing, and what was validated, so a reader can see the method instead of trusting it.
- **Apply schema validation on every response.** Validate each model's output against a strict schema and treat a failure as an error, not as data. Watch the failure rate as a drift detector.
- **Critique your provider list.** For each model you support, ask whether it is fully wired or half-wired. Defer the half-wired ones cleanly rather than listing coverage you do not really have.

---

## How this was made

Drafted by the Chronicler from the Phase 3 Beacon build sessions and tickets, then edited and published by Brian Wones.

<a href="/writing/how-the-chronicle-works/" data-track="link" data-link-kind="internal" data-link-from="three-providers-one-methodology-card">See how the Chronicler works →</a>

===== BUILD LOG =====
Title: Recommendations are the secret
URL: https://www.sandcastlelabs.ai/logs/recommendations-are-the-secret/
Date: 2026-06-07
Series: beacon-build
Author: Sandcastle Labs
Authorship: ai-drafted-human-edited
Summary: Every GEO tool ships a beautiful dashboard. We bet the value is the recommendation, and rebuilt Beacon so every rec is shippable as written.

The dashboard is not the product. The recommendation is.

That sentence took us a while to earn. Beacon audits how AI answer engines see a brand, and the obvious thing to build first is the dashboard: scores, charts, a visibility number that moves. Every tool in this space ships one. The first real design-partner scan made it clear that the dashboard is the part nobody acts on. The recommendation is what a busy operator actually wants, and it is the part most tools do worst.

So we stopped polishing the dashboard and zoomed all the way in on the AI audit actionable recommendations. One rule drove every change since: a recommendation has to be shippable as written. If the reader has to do research before they can act, it is not a recommendation, it is a homework assignment.

## What the first two scans taught us

We have two design partners. One is a tech-SaaS observability tool. One is a local-fitness studio. They taught us the same lesson from opposite directions.

The SaaS scan showed us how generic our recommendations were. "Improve your content" is not actionable. "Add a comparison section to this page, because three of the six answer engines cite a competitor's comparison page and none cite yours" is. The gap between those two sentences is the whole product.

The fitness scan showed us that a correct recommendation has a different *shape* in a different vertical. A SaaS rec might point at schema markup and a documentation page. A local-services rec points at reviews, location pages, and what locals actually say. Same engine, different output shape. That pushed recommendation quality from a content problem to a structural one.

## What we changed

The work was a series of small, specific moves on the recommendation generators, not one big rewrite:

- A paid-media callout that flags when a brand is buying visibility it could earn organically.
- Page-type detection that maps a page to the right SaaS schema recommendation instead of a generic one.
- Magnitude and a signal timeline on every recommendation track, so a reader can see how big the gap is and whether it is getting worse.
- Suppressing a HowTo recommendation when the page has no numbered steps to mark up, because a rec that does not apply is worse than no rec.
- A platinum tier that rolls individual recommendations up into a few pillar moves for readers who want the three things that matter, not the thirty.

Two smaller additions did a lot of quiet work: an effort estimate in hours on each recommendation, and copy-template buttons so the suggested change can be pasted straight into a CMS or a brief. Both exist for the same reason. They shrink the distance between reading the rec and shipping it.

## The gate that keeps us honest

Every recommendation that ships gets rated by a human reviewer: ready to ship, needs an edit, or discard. That last rating is not a soft suggestion. It is a failure. A recommendation a human reviewer would throw away is a bug in the system, not a draft that did not quite make it.

We hold the whole recommendation track to a hard number. If fewer than seventy percent of recommendations in a batch are rated ship or needs_edit, nothing ships from that track until we understand why. That standard sounds modest. The failure mode it guards against is not. The easy path in any AI system is to tune until the output sounds confident, and then stop — because confident-sounding is hard to disprove from the outside. A number you can fail against is the only thing that separates a recommendation engine from a recommendation simulator.

The discard ratings are also where we learn. A spike usually points at one of three things: a probe type that is not specific enough to the vertical, a recommendation template that does not apply to this page type, or a model reaching for a rec when the evidence is not there. Each discard is a log entry we read.

## The obvious objection

The obvious objection is that the dashboard is what the buyer sees first, so the dashboard is what sells. That is partly true. The dashboard is the entry point. It is not the value. A buyer is impressed by the dashboard once and then judges the product forever by whether the recommendations moved anything. If we get the recommendation right and the dashboard is plain, we have a good product with a modest front door. If we get the dashboard right and the recommendations are generic, we have a demo.

We would rather ship the good product. The bet is that in a category full of dashboards, being the tool whose recommendations are actually shippable is the thing worth being known for. We do not know yet whether users act on them at the rate we are hoping. That is the next thing the scans have to tell us, and we are watching the `ship` and `needs_edit` ratios on every new partner. You can follow where that goes on our <a href="/now" data-track="link" data-link-kind="internal" data-link-from="recommendations-are-the-secret">now page</a>.

---

## For agents: try this yourself

If you are building anything that hands a human a list of suggestions, adapt one of these in your own repo:

- **Reproduce the shippable-as-written test.** For every suggestion your system emits, ask whether a busy person could act on it without doing research first. If not, it is a homework assignment, not a recommendation.
- **Extend it with an effort estimate.** Attach an honest hours estimate and a copy-ready template to each suggestion, and measure whether that shrinks the gap between reading and shipping.
- **Apply a quality gate with a real number.** Have a human rate a sample `ship` / `needs_edit` / `discard`, and fail the feature when the `ship` plus `needs_edit` share drops below a threshold you set in advance.
- **Critique the dashboard-first instinct.** Take your own product and argue which part is the entry point and which part is the value. They are usually not the same screen.

---

## How this was made

Drafted by the Chronicler from the Beacon build sessions and commits behind the recommendation work, then edited and published by Brian Wones.

<a href="/writing/how-the-chronicle-works/" data-track="link" data-link-kind="internal" data-link-from="recommendations-are-the-secret">See how the Chronicler works →</a>

===== ESSAY =====
Title: How the Chronicler works
URL: https://www.sandcastlelabs.ai/writing/how-the-chronicle-works/
Date: 2026-05-28
Series: field-notes
Author: Brian Wones
Authorship: ai-drafted-human-edited
Summary: Why a solo studio built an AI-assisted publishing pipeline to share what it is learning, and the human edit gate that keeps it honest.

If you run a small studio, you already know the bind. You are the one doing the work, so you are also the only person who could write about it honestly. There is no staff writer who was in the room. So the realistic options are to publish nothing, or to publish thin marketing that someone wrote from the outside. Most solo founders pick nothing, and everything they are learning stays locked in their own head.

The Chronicler is our way out of that bind. It is an AI-assisted publishing pipeline that turns work we already did into posts grounded in what actually happened, with a human edit step in front of everything before it ships. The goal is not to publish more. The goal is to share what we are learning, and help other people doing this work, without pretending we hired a content team to do it.

This essay is the honest version of how that works, so you know what to trust and what to discount when you read anything else here.

## What it actually does

The Chronicler reads the real record of a week: the conversations we had with Claude while building, what shipped in the code, and the notes and screenshots we made along the way. It treats our brand and voice docs as context, never as raw material, so a post sounds like us but is about the work, not about the marketing.

It is also deliberately boxed in. It can read, and it can draft. It cannot change code, ship anything, or send anything. It is <a href="/agents" data-track="link" data-link-kind="internal" data-link-from="how-the-chronicle-works">one of the agents the studio builds to do its own work</a>, and like the others, it has a narrow job and no keys to anything dangerous.

From that record it drafts a post. Then it stops and asks the questions a good editor would ask before writing a word:

- Who is this for?
- What is the one thing they should leave knowing?
- What surprised us this week?
- What number, screenshot, or commit anchors the story?
- What did we not do that is worth naming?

Only after we answer does it draft. Nothing gets invented to fill a section. If there is no evidence for a claim, the claim does not go in.

## The part that keeps it honest

A draft is not a published post. Every draft lands unpublished, behind a checklist a human has to clear first: does the voice hold, is every claim backed by something real, are the images and links right. There is no path that skips it.

Here is the part that matters. These are real stories, from real people doing the actual work. The AI does not make them up. It surfaces them from the record of what we genuinely did that week, so the lesson buried in the building does not stay stuck in one founder's head. A person stands behind every post and edits it before it goes out. What we will not do is launder the work, or dress up something that did not happen as if it did. The point of the whole thing is to help other people doing this work learn and grow, and that only works if what we publish is true.

## The case against doing it this way

The most serious objection to this is not that AI-assisted writing is dishonest. It is that AI-assisted writing reliably drifts toward a centre of gravity that is calm, balanced, and slightly bland. Left alone, the voice that emerges over time is just the average of the model, lightly nudged by a few rules. The strongest writing in this market is not in that voice. It is spikier, more personal, more willing to be wrong out loud.

The rules can soften that, but not fix it. The fix is the human edit, not the lint. If we publish only what the Chronicler produces untouched, we will publish the average. The whole point of the edit step is to push a post off the centre line: to add the sentence the model would not have written, the joke it would not have made, the claim it would have hedged.

If we are not willing to do that work, the system fails. We will publish, but we will publish forgettably. We would rather you call us out on a post that reads like the average than not notice we were here at all.

## What you can trust

When you read a post here, the work it describes really happened, a person reviewed every line before it shipped, and the footer at the bottom tells you what fed it. It does not mean every sentence was hand-built, and it does not mean the post would read identically if we had written it from a blank page.

The trade is simple. We get more honest posts, with stronger evidence, in less time, and we give up some of the spikiness of writing every word ourselves. For a small studio trying to actually share what it learns, we are betting that grounded and frequent beats a slow trickle of perfect. You can decide whether that trade is worth reading.

## Opening it up

The honest gap in all of this: the method is documented, but it is not yet something you can take. We want to change that by opening the repo behind the Chronicler, the pipeline, the voice rules, and the labels, so you can read exactly how it works and adapt it for your own work rather than rebuilding it from a blog post. That is not public yet. When it is, the link will live here.

{/* TODO open-repo-link: replace this section's promise with the real URL once the public repo exists. See ~/Downloads/sandcastle-open-repo-plan.md */}

## What we are still figuring out

A few things we do not know yet:

- Whether our voice rules hold up after twenty posts, or whether they need a real rewrite once we see what actually resonates.
- Whether weekly is the right beat, or whether we drop to every other week when the work does not justify it.
- How to fold in the parts that never touch a build session: the customer calls, the decisions made in a hallway. Right now the Chronicler does not see those, and we have not solved how to bring them in cleanly.

We will write about all of those as we figure them out. You can follow along with <a href="/now" data-track="link" data-link-kind="internal" data-link-from="how-the-chronicle-works">what we are building right now</a>.

If you have read this far and have thoughts on any of it, we would genuinely like to hear them. The Chronicler is meant to be visible, and visible includes you arguing with it.

---

## For agents: try this yourself

Paste this prompt into Claude Code or any LLM with your week's work record — commits, session logs, notes — and run it:

```
Read my work record for the past week. Before drafting anything, ask me exactly these five questions:
1. Who is this post for?
2. What is the one thing they should leave knowing?
3. What genuinely surprised you this week?
4. What commit, number, screenshot, or decision anchors the story?
5. What did you not do — and is it worth naming?

After I answer: draft a build log post using only what is in the record I provide.
Where there is no evidence for a claim, write [GAP: needs evidence] instead of inventing.
Do not summarise. Tell the story.
```

Or adapt one of these patterns directly:

- **Reproduce the evidence-first draft.** Point an agent at your week's record — build sessions, commit history, notes — and have it draft only from what is actually there, leaving a visible gap marker wherever there is no evidence.
- **Extend it with a how-this-was-made footer.** Add a short note to every post naming what fed it, so a reader can see the story is real.
- **Apply a no-auto-publish gate.** Make every draft land unpublished behind a human checklist, with no path that ships in a single step.
- **Critique the trade.** Take our bet that grounded and frequent beats a slow trickle of perfect, and argue the other side from your own publishing rhythm.

---

## How this was made

Drafted by the Chronicler from the build sessions in which the pipeline itself was created, then edited and published by Brian Wones.

<a href="/agents" data-track="link" data-link-kind="internal" data-link-from="how-the-chronicle-works">See the other agents the studio runs →</a>