134 finance pros watched an AI agent refuse to make up a number
Last Wednesday afternoon, 134 finance professionals sat on a Zoom call watching a Copilot agent get asked a variance question.
The agent paused. Then it answered.
"No matching finance workbook report surfaced… that means I can't calculate the variance figures yet, and I won't invent numbers."
An AI agent – The same kind of tool half the internet says make things up all the time and cannot be trusted. The same kind that most Deloitte surveyed leaders struggle to govern – chose to stop instead of hallucinate.
The reason it stopped wasn't that it was a super advanced model, or specially designed. It was just one thing one of my AI Finance Club experts Mateusz had written 20 minutes earlier:
The line in the instruction was:
It is really important that you build this safety into all of your AI instructions.
So today, I am going to show you exactly how to do this, so you can feel more confident about using AI safely.
Why "AI makes mistakes" is the wrong way to think about hallucinations
Deloitte just surveyed 3,235 senior leaders for their 2026 State of AI report.
And you know what’s crazy? Only 1 in 5 has a mature governance model for agentic AI.
That means 2,588 are letting agents run with no rules!? So there is a good chance your AI is just guessing. And that guess is going out under your name.
The problem with how most finance leaders talk about AI hallucination is treating it like the weather.
Something that happens to us. Something we review retrospectively: "Sorry, the AI got that one wrong." But then we go back to using it the exact same way.
But the models aren't randomly wrong. They are trained to please you. They got rewarded for giving an answer, any answer, and were never penalised for giving a wrong one. So when the data isn't there, in most cases the AI doesn't stop. It guesses because it wants to be useful.
The cost is your name on that number.
A board doesn't fire CFOs for slow forecasts. They fire CFOs for figures that don’t add up.
The Andon Cord rule – Toyota's 1948 fix, applied to your AI
|
|
In post-war Japan, Taiichi Ohno gave every worker on the Toyota assembly line a rope.
If anyone, anywhere on the line, spotted something wrong, they pulled the rope. The line stopped, and the whole factory stopped. Then, the problem got fixed at the source instead of being passed down the line.
They called it the Andon Cord. And the genius of it was the permissions behind it. Workers didn't need approval to stop production. Stopping was the default response to something looking wrong to maintain super high levels of quality.
Instead of adding a QA inspector, Toyota built the rule into every worker's hands at the source. Permission to stop, plus an obligation to stop.
What’s super important with this is that quality is engineered into the process, not added on.
Your AI needs the same rule. Not just "I'll check the output afterwards." (although you should also do this), but a stop request, or a failsafe: if the data isn't there, stop. Don't guess.
This is what Mateusz wrote into his FP&A Variance Agent before he showed it in our masterclass.
And it's the same idea as briefing a junior. You don't tell them to "be careful." You give them a rule: if you can't find the source, come back to me before you publish.
How to Build Your ‘Stop-First’ Agent in Copilot (15 minutes)
Inspired by the Andon Cord method, my recommendation is to build the ‘Stop-First’ principle into all of your prompting.
By the end of this, you'll have a Copilot Agent that can’t publish a fabricated number. And the same method works for Claude Projects, Custom GPTs, or Gemini Gems.
1. Open Copilot:
Go to m365.cloud.microsoft and navigate to "All agents" in the left sidebar. Hit "Create Agent."
|
|
Then click “Skip to configure” to configure the agent settings.
|
|
2. Name it clearly:
|
|
"FP&A Variance Agent" beats "Finance Helper." The name tells the agent and your team what decision it supports
3. Add your Andon Cord rule:
Before anything else, write:
|
|
Then continue with the rest of your prompt.
To make sure that it’s not missed, you can create a subheading within the prompt. I’d recommend using # IMPORTANT
(the hashtag is Markdown formatting, a simple text formatting language, that makes prompts easier to understand for the AI)
4. Attach only sources you'd defend in an audit:
|
|
Add the SharePoint folder, the Excel file, the specific report. Limit any web searches, or the ability for the Agent to find anything outside the sources you’ve specified.
|
|
Make sure to enable "Only use specified sources" to make sure the Agent will not reference materials outside of intended context.
5. Pick the model yourself (don't trust Auto):
|
|
Inside the agent, you will see a model selector. You have two good options depending on what your organization has enabled.
If your organization has Anthropic models enabled in Copilot Studio, select Claude Opus 4.7. It is designed for complex, multi-step reasoning and follows instructions closely, which is exactly what you want for a variance agent.
If Claude is not available in your environment, select GPT-5.5 Think Deeper from the model picker under GPT. This activates extended reasoning on GPT-5.5 and is the next best option for analytical work.
6. Test it with a question you know it can't answer:
Ask for a figure that isn't in the sources. Watch what it does. If it invents a number, your instructions failed. Rewrite the rule and re-test. If it stops and asks for the source, your Andon Cord is working.
7. Build the same agent in your other tools if you're not on Microsoft:
Claude Projects, Custom GPTs and Gemini Gems all have an "instructions" field. The rule is the same for all of them.
The One Thing to Remember
Building failsafes into your AI prompts should be the first ‘stop’ when you or your team build anything.
“ChatGPT makes mistakes. Check important info” is not the correct guidance because you should be engineering out the ability for it to make mistakes from the very start.
Should you still check important info – of course. But the point of all this is to check much less.
Otherwise using AI is no better than doing the work manually yourself!
So, the next time you brief your team on AI, tell them the Toyota story, and brief them on how you’re going to add the 'Stop First Principle' failsafes into all of your prompts.
Then find one prompt and update the instructions.
You’ve just made the first step towards becoming safer, and more confident with AI.
Best,
Your AI Finance Expert,
– Nicolas
P.S. – Do you have problems with AI hallucinations? Hit reply and tell me – I will help where I can.
P.P.S. – Want more ways to use AI safely and accurately in finance? Here are 100 of them → 100 SECRET tips on AI for FINANCE/ AI for CFO
