Recommendations are the secret

Jun 7, 2026

The dashboard is not the product. The recommendation is.

That sentence took us a while to earn. Beacon audits how AI answer engines see a brand, and the obvious thing to build first is the dashboard: scores, charts, a visibility number that moves. Every tool in this space ships one. The first real design-partner scan made it clear that the dashboard is the part nobody acts on. The recommendation is what a busy operator actually wants, and it is the part most tools do worst.

So we stopped polishing the dashboard and zoomed all the way in on the AI audit actionable recommendations. One rule drove every change since: a recommendation has to be shippable as written. If the reader has to do research before they can act, it is not a recommendation, it is a homework assignment.

What the first two scans taught us

We have two design partners. One is a tech-SaaS observability tool. One is a local-fitness studio. They taught us the same lesson from opposite directions.

The SaaS scan showed us how generic our recommendations were. “Improve your content” is not actionable. “Add a comparison section to this page, because three of the six answer engines cite a competitor’s comparison page and none cite yours” is. The gap between those two sentences is the whole product.

The fitness scan showed us that a correct recommendation has a different shape in a different vertical. A SaaS rec might point at schema markup and a documentation page. A local-services rec points at reviews, location pages, and what locals actually say. Same engine, different output shape. That pushed recommendation quality from a content problem to a structural one.

What we changed

The work was a series of small, specific moves on the recommendation generators, not one big rewrite:

A paid-media callout that flags when a brand is buying visibility it could earn organically.
Page-type detection that maps a page to the right SaaS schema recommendation instead of a generic one.
Magnitude and a signal timeline on every recommendation track, so a reader can see how big the gap is and whether it is getting worse.
Suppressing a HowTo recommendation when the page has no numbered steps to mark up, because a rec that does not apply is worse than no rec.
A platinum tier that rolls individual recommendations up into a few pillar moves for readers who want the three things that matter, not the thirty.

Two smaller additions did a lot of quiet work: an effort estimate in hours on each recommendation, and copy-template buttons so the suggested change can be pasted straight into a CMS or a brief. Both exist for the same reason. They shrink the distance between reading the rec and shipping it.

The gate that keeps us honest

Every recommendation that ships gets rated by a human reviewer: ready to ship, needs an edit, or discard. That last rating is not a soft suggestion. It is a failure. A recommendation a human reviewer would throw away is a bug in the system, not a draft that did not quite make it.

We hold the whole recommendation track to a hard number. If fewer than seventy percent of recommendations in a batch are rated ship or needs_edit, nothing ships from that track until we understand why. That standard sounds modest. The failure mode it guards against is not. The easy path in any AI system is to tune until the output sounds confident, and then stop — because confident-sounding is hard to disprove from the outside. A number you can fail against is the only thing that separates a recommendation engine from a recommendation simulator.

The discard ratings are also where we learn. A spike usually points at one of three things: a probe type that is not specific enough to the vertical, a recommendation template that does not apply to this page type, or a model reaching for a rec when the evidence is not there. Each discard is a log entry we read.

The obvious objection

The obvious objection is that the dashboard is what the buyer sees first, so the dashboard is what sells. That is partly true. The dashboard is the entry point. It is not the value. A buyer is impressed by the dashboard once and then judges the product forever by whether the recommendations moved anything. If we get the recommendation right and the dashboard is plain, we have a good product with a modest front door. If we get the dashboard right and the recommendations are generic, we have a demo.

We would rather ship the good product. The bet is that in a category full of dashboards, being the tool whose recommendations are actually shippable is the thing worth being known for. We do not know yet whether users act on them at the rate we are hoping. That is the next thing the scans have to tell us, and we are watching the ship and needs_edit ratios on every new partner. You can follow where that goes on our now page.

For agents: try this yourself

If you are building anything that hands a human a list of suggestions, adapt one of these in your own repo:

Reproduce the shippable-as-written test. For every suggestion your system emits, ask whether a busy person could act on it without doing research first. If not, it is a homework assignment, not a recommendation.
Extend it with an effort estimate. Attach an honest hours estimate and a copy-ready template to each suggestion, and measure whether that shrinks the gap between reading and shipping.
Apply a quality gate with a real number. Have a human rate a sample ship / needs_edit / discard, and fail the feature when the ship plus needs_edit share drops below a threshold you set in advance.
Critique the dashboard-first instinct. Take your own product and argue which part is the entry point and which part is the value. They are usually not the same screen.

How this was made

Drafted by the Chronicler from the Beacon build sessions and commits behind the recommendation work, then edited and published by Brian Wones.

See how the Chronicler works →