Canary Waves: Building Voice AI for Industrial Safety
How we built Canary Waves, the safety constraints that shaped every decision, and what it took to hit accuracy high enough to deploy in live operations.

The Problem Is Not the Technology
Most voice AI products are built around convenience. A misunderstood command means a wrong song plays, or a calendar event lands on the wrong day. Annoying. Fixable.
Industrial facilities are different. A misheard command near heavy equipment, live electrical systems, or pressurized lines does not produce a support ticket. It produces a shutdown, an injury, or worse.
That is the environment Canary Waves was built for. And it changed almost every technical decision we made.
The client came to us with a real operational problem. Their facilities relied on manual communication between operators and control systems. Voice was the natural interface because operators had their hands full. But existing voice tools were not built for factory-floor noise levels, safety-critical state changes, or the liability that comes with getting it wrong.
They needed AI voice interaction that could operate in real conditions. Not demo conditions.
Defining the Constraints Before Writing a Line of Code
Before we touched any model configuration or interface design, we sat down with the client to define what "wrong" actually meant in their context.
This is a step a lot of builds skip. They treat accuracy as a dial to turn up later. In safety-critical systems, accuracy is a constraint you design around from the start.
We identified three categories of failure:
Category 1: Wrong command executed. The system hears one command, executes another. This is the worst case. It had to be near-impossible.
Category 2: Low-confidence command executed. The system is not sure what it heard, but executes anyway because something is better than nothing. Also unacceptable.
Category 3: Valid command rejected. The system is too conservative and rejects a legitimate operator input. Frustrating, but recoverable. An operator can repeat themselves. They cannot undo a triggered system state.
Once we had those categories ranked, the architecture decisions followed naturally.
The Three Safety Layers We Built Into Canary Waves
1. A Rejection Threshold Calibrated to the Real Environment
We could not tune the model in a quiet office and ship it to a facility floor. The acoustic environment was completely different. Background machinery, ventilation systems, metal-on-metal contact, radio chatter from other workers.
We collected ambient audio samples from the actual facility and used them to stress-test the recognition pipeline. The rejection threshold, the confidence score below which the system refuses to act, was set against that noise profile, not against clean studio audio.
The result was a higher rejection rate than a typical voice assistant would tolerate. That was intentional. Silence is safer than a guess.
2. Confirmation Loops for State-Changing Commands
Not every command carries the same risk. Asking the system for a status reading is low stakes. Telling the system to shut down a line, release a valve, or change an equipment state is a different matter.
For any command that touched physical system state, we built a mandatory confirmation loop. The system would read back what it heard and the action it was about to take. The operator had to confirm verbally before execution.
This added latency. The client pushed back on it during testing. We held the line. In the three months since deployment, that confirmation loop has caught two instances where an operator misspoke and the system heard a state-change command that was not intended. Neither incident escalated. Both were caught at the confirmation step.
3. Safe-State Fallback, Not Last-Known-State Fallback
System failures happen. Network drops, model timeouts, ambiguous input streams that stall processing. The question is: what does the system do when it cannot proceed?
The common default in software is to hold the last known state. In most applications, that is fine.
In an industrial environment, last known state might mean a piece of equipment stays energized, or a process stays running, when it should not. We built Canary Waves to default to the predefined safe state for each system, not the last known state. Every connected system had a defined safe configuration. On failure or uncertainty, the AI returned to that configuration and flagged the operator.
What It Actually Took to Hit Deployable Accuracy
Honestly, it took longer than we projected.
The first round of real-environment testing revealed recognition gaps we had not seen in lab conditions. Specific command phrases that the model confused with each other at certain noise levels. Operator accents and speaking patterns that skewed confidence scores. Edge cases in the confirmation loop where the system misheard the confirmation itself.
We went through four iterations of threshold tuning, two rounds of phrase set revision with the client's ops team, and one full rebuild of the confirmation flow before we were comfortable calling it production-ready.
The client had a go-live date. We pushed it back. That conversation was uncomfortable. But shipping at 94% accuracy in this context was not a negotiable compromise.
By final deployment, the system was operating above 99.2% command accuracy in live facility conditions, with a false-execution rate of zero across the testing period.
What You Can Take From This
If you are building AI into any environment where a wrong output has physical consequences, start with failure mode analysis before you touch the model.
Ask: what does wrong look like here? Rank those failure modes. Design your rejection thresholds, confirmation flows, and fallback states around that ranking, not around what makes the demo feel fast.
Accuracy is not a feature you add. It is the constraint everything else is built inside.


