Canaryflux

Why we built Canaryflux.

This is a launch post, so let me be honest about why this product exists.

We kept watching the same thing happen on shipping teams. The team would push a Friday-night fix, the deploy would go green in CI, every Playwright test would pass, every screenshot diff would come back clean. Then on Monday morning a customer would DM a screenshot of the signup button cut in half on a Pixel 5 — and nobody on the team could reproduce it on their laptop.

The reason is depressing once you see it: "works on my machine" is even more true now than it was in 2010. Modern CI runs Chrome in a Linux container at a desktop viewport. Lighthouse runs Chrome in a Linux container at a simulated mobile viewport. Visual regression tools run Chrome in a Linux container at a recorded viewport. All of them are simulating what a phone sees. None of them are a phone.

Real visitors are using two-year-old Androids on intermittent connections, with a system font that doesn't match yours, with an OS that aggressively pauses background tabs, with a webview that the manufacturer hasn't updated in eight months. That's the test environment. CI is the wishful one.

The gap we kept hitting

If you've shipped a marketing site in the last two years, you've felt this. It looks like:

None of these show up in Lighthouse. None of them show up in your visual regression diff. They show up only when you point a real device at the page and use it like a customer would.

The two options teams already had

Until recently, you had two ways to catch this kind of bug.

Option A: a manual QA team with a device lab. This works. It also costs you a permanent headcount plus a closet full of phones plus the operational overhead of keeping them all charged and updated. Most companies under a hundred engineers won't do this, and the ones that do get bottlenecked on QA-team capacity before the engineering team is.

Option B: a device-cloud test-script suite. Real hardware in the cloud, but you have to write and maintain the tests. Coverage scales with engineering time you don't have, and a framework migration can invalidate scripts you wrote months ago.

Both options assume the QA bottleneck is finding bugs. From years of watching shipping teams, we became convinced the actual bottleneck is something earlier: nobody is even looking at the rendered page on a real phone before it ships. Not because they don't want to. Because the cost of looking is too high.

What Canaryflux does instead

We took the opposite shape of constraint. Instead of asking the team to write tests, we ask the team to paste a URL. The scanner does the rest:

What you get back is a list of ranked findings, each one with a screenshot of the actual bug, the device it surfaced on, and a copy-pasteable fix. In our internal scans against public marketing sites, a typical run surfaces a handful of findings — sometimes more on busier pages.

Paste a URL. Get the report in about ninety seconds. No SDK to install, no snippet to embed, no DNS changes. If your site renders HTML, we work with it.

What we got wrong, and what we changed

The first version of Canaryflux just dumped raw model output into a list. We thought "device profiles + AI vision" was the whole product. It wasn't. The list was 60% noise. Half of every output was the model hedging — "this could be a layout issue" — and the other half was things that were technically present but not user-impacting.

The fix was the verification stage. Every candidate finding goes through a second pass with a stricter prompt and the screenshot re-examined. If the second pass can't confidently reproduce the bug from visible evidence, the finding is dropped. That single change dropped the typical scan from a couple dozen noisy candidates down to a handful of confirmed findings — and the handful is the part you actually want to read.

What's not in v1, but is on the roadmap

We're deliberately small at launch. Real product, narrow scope.

If you're a team that ships a marketing site

Canaryflux is built for you specifically. Not for native-app QA, not for full integration testing, not as a replacement for your CI suite. As a thing you run after every deploy to make sure the page you just shipped looks right on the phones your customers actually own.

The Free plan gives you three scans a month. Paste a URL — your own site, a staging deploy, a competitor's site, anything public — and see what comes back.

We'd rather you find the bug than your customer.

← Back to blog

Run your first scan in minutes.

No SDK. No credit card. Paste a URL you own, pick a device, see what comes back.

Start free scan → How it works