Stretching the vertical

Jun 16, 2026

Beacon was built and tuned for one vertical: tech SaaS. The first design partner is a SaaS observability tool, and everything from the probe wording to the category model quietly assumed that world. Then we ran a scan for a second design partner, a small gym in the Northeast that caters to a very targeted audience, and the assumptions fell over in the first few seconds.

That is the useful part of AI tool vertical expansion. You think you built a general tool. The second vertical tells you how specific it really was.

Our probes were speaking SaaS

The probes are the questions Beacon asks answer engines about a brand. Ours were full of SaaS vocabulary without us noticing: open-source alternatives, self-hosted options, tools, platforms. Those words are load-bearing for a developer-tools company. For a neighbourhood gym, they are noise. Nobody asks an answer engine for the self-hosted alternative to a fitness studio.

So the scan was technically fine and practically useless. It returned clean data about questions a real customer of this business would never ask. A passing scan that measures the wrong thing is more dangerous than a failing one, because it looks like success.

The fix was a new template set, not a config flag

We wrote a separate probe template set for the vertical, templates_local_fitness.exs, with questions that match how people actually choose a local service. The shape is different on purpose: what approach do coaches recommend for a given goal, how does this brand compare to other gyms nearby, what do locals say about a category. These are the questions that decide whether a real person walks in the door.

The important thing is that this was not a setting. We could not make Beacon vertical-general by adding a dropdown. The vertical assumptions were baked into the probe wording, the category model, and what we treated as a meaningful comparison. Stretching to a new vertical meant re-modelling, not configuring. That is a different and more expensive kind of work, and it is better to learn it on partner two than on customer fifty.

The question under the question

The redesign surfaced a deeper problem. One of our core categories is “alternatives,” because in SaaS that is a real thing people search for. Buyers genuinely compare a tool against its alternatives before purchase. For a local-services business, “alternatives” may not be a real category at all. People do not look for alternatives to a gym. They look for a good gym near them. The mental model is proximity and trust, not feature comparison.

That matters because the design partner was evaluating Beacon partly against ProFound, an existing tool in this space, and the easy move would have been to match ProFound category for category. The scan pushed us the other way. Copying a competitor’s category model would have imported the same SaaS assumption we just found in our own probes. The right move is to question whether the category exists for this vertical, not to assume it does because a competitor lists it.

Where this leaves us

We are choosing explicit per-vertical template sets over a pretend-general default. A tool that honestly says “here is how I model SaaS, and here is how I model local services” is more trustworthy than one that claims to handle everything and quietly speaks SaaS to everyone. This is the same lesson, from a different angle, as recommendations are the secret: a correct output has a different shape in a different vertical, and pretending otherwise is how a tool ships confident, wrong answers.

The open question is how many vertical models we can carry before the maintenance cost outruns the value. We do not know yet. For now, two explicit models beat one fake-general one. You can follow where the vertical work goes on our now page.

For agents: try this yourself

If you built a tool for one domain and want to know how general it really is, adapt one of these in your own repo:

Reproduce the second-vertical test. Run your tool against a domain it was not built for and read the output as a stranger to your assumptions. Note every word that is load-bearing in your domain and meaningless in the new one.
Critique your category model. Pick one core category and ask whether it is a real category in the new domain, or just a habit imported from the first one.
Extend it with explicit per-domain models. Make the domain a named, visible template set rather than a hidden default, so the tool can say which world it is speaking to.
Apply the wrong-but-passing check. Look for results that pass cleanly while measuring something your real user would never ask. Those are more dangerous than failures.

How this was made

Drafted by the Chronicler from the second design-partner scan and the redesign it forced, then edited and published by Brian Wones. No partner finding is reproduced here; specifics stay private until cleared.

See how the Chronicler works →