Reducing AI Risk with Simple Rules

Alastair McDermott

The AI conversation right now is split down the middle, and I think many people on both sides are way off the mark.

On one side, you’ve got people who think AI is the answer to everything – the hype, the breathless benchmarks, the “this changes everything” crowd.

On the other, you’ve got people who’ve decided it’s all slop and nonsense – overhyped, unreliable, not worth their time.

And I think that second group, the ones who’ve written it off entirely, are going to get a very rude awakening. They don’t understand exponential improvement, and non-adoption doesn’t slow down the technology – it just means they won’t have a voice when the decisions get made.

But I also understand where some of that negativity comes from. If you’ve only seen AI used badly – generic chatbots spitting out waffle, “AI-powered” products that are just a thin wrapper around an API call with no guardrails – then of course you’d be sceptical. I’d be sceptical too. The problem isn’t the underlying technology, it’s that a lot of people are deploying it without thinking about what happens when it gets things wrong.

I’m building tools with AI right now that genuinely surprise me with what they can do. But the part that makes them actually work in production – the part that makes them reliable enough that I’d trust them with real data? That’s almost always the most boring piece of the system.

I’m talking about rules-based code – if-then logic, pattern matching, lookup tables. The kind of software that’s been around since before most of us started our careers, and that nobody has ever described as exciting. But in the age of AI, it turns out to be more valuable than ever.

That sounds like a strange claim, so let me explain why I believe it.

AI is probabilistic. That’s the problem and the point.

Large language models handle ambiguity, interpret messy natural language, and make judgment calls that rigid rules would never manage. That’s their strength. But they’re fundamentally probabilistic – they work with probability and uncertainty. They will hallucinate, miscategorise, and produce unexpected output. Not as rare edge cases, but as a basic property of how they work. That’s fine when you understand it. It’s what makes them good at the hard stuff.

The thing is, most tasks aren’t the hard stuff. Most of the inputs flowing through your system have predictable, known answers. A recurring payment from the same supplier every month doesn’t need an AI to categorise it. A customer enquiry that matches one of your twelve standard request types doesn’t need a language model to route it.

And yet I keep seeing teams route everything through the AI, paying the full cost on every single request – latency, token spend, and unpredictability – even for inputs that a lookup table would handle perfectly. It’s a bit like hiring a brilliant consultant to sort your post. It’s not that they can’t do it, it’s that you’re wasting their strengths on work that doesn’t need them.

This is where old-school rules-based software earns its keep.

Rules handle the predictable. AI handles the rest.

The principle I follow in everything we build at HumanSpark is straightforward: let deterministic, rules-based code handle everything it can. Only call the AI for the stuff that rules genuinely can’t reach. And when the AI does run, never trust its output until a separate piece of predictable code has checked it.

If that sounds like Boring AI, it should. This is what Boring AI looks like at the level of actual system design – not the flashy demo, but the reliable, unglamorous architecture that works when real data is flowing through it.

I talk a lot about “start boring” when I’m advising organisations on their first AI projects – pick the unglamorous, repetitive process, not the ambitious moonshot. The same principle applies when you’re building the system itself. The boring rules engine is your foundation. The AI is the exception handler, brought in only when the predictable path runs out of answers. “Start boring” isn’t just about which project to pick first. It’s about how you design the thing.

In practice, this means a three-stage pipeline.

Stage 1 – The rules engine. Pattern matching, lookup tables, simple heuristics. If the rules can handle the input, they do. Only inputs that don’t match any rule move forward.
Stage 2 – The AI fallback. The language model proposes a suggestion. That word “proposes” is doing important work – the AI suggests, it does not decide.
Stage 3 – The hard gate. A separate piece of rules-based code validates the AI’s suggestion against an authoritative source of truth. If the suggestion checks out, it’s accepted. If not, it’s rejected and flagged for human review.

That third stage is where we can easily go wrong, so it’s worth being specific about what a “hard gate” actually means.

The AI can’t mark its own homework

A hard gate is not a prompt instruction. It’s not asking the AI to “please only pick from this list.” Prompt instructions are still merely suggestions to a probabilistic system – they work most of the time, which isn’t good enough when you’re writing to a database or changing someone’s financial records.

A hard gate is a completely separate piece of traditional code that checks the AI’s output against a known-good source. Is the suggested category actually in the valid category list? Does the output conform to the expected format? Is the value within the allowed range?

The check runs against data from an authoritative source – an API, a database, a configuration file – not from the AI itself. You wouldn’t let a student grade their own exam. Same principle.

This is where rules-based software shows its real value in the AI era. It’s not just handling the predictable inputs up front (Stage 1). It’s also standing guard at the back (Stage 3), making sure the AI hasn’t done something creative with your data.

The simplest hard gate you’ll ever build

Let’s start with something concrete. Say you’ve put an AI chatbot on your company website. You absolutely, under no circumstances, want it to swear at your customers. You can tell the AI in the prompt to be polite, and it will be – almost all of the time. But “almost all of the time” is a terrible standard when you’re talking to customers.

So you build a hard gate. You give it a list of words the chatbot is never allowed to say – including, let’s say, a particular four-letter word that rhymes with “duck.” Before any AI response reaches the customer, the gate scans it. If the word appears, the response gets blocked entirely, and the customer sees something like “I’m sorry, I wasn’t able to answer that. Can I help you with something else?”

It’s simple, it’s boring, and it’s completely reliable – which, when you’re dealing with a probabilistic system talking to your customers, is exactly what you want.

Cartoon image of a duck trying to check in at a counter with an annoyed looking owl — Mr. Ducker has trouble checking in

Now, you will get edge cases. A customer types in “My name is John Motherducker” and the chatbot, being ever so helpful, tries to respond with “Hello Mr Motherducker, how can I help?” The hard gate catches it, and the response never reaches the customer. Is that an over-correction? Probably. Mr Motherducker is going to have a slightly confusing experience. But I’d much rather have a chatbot that occasionally can’t respond than one that swears at a customer because the AI thought it was just being polite.

You can build more sophisticated rules too. A few years ago, a Chevy dealership in the US put an AI chatbot on their website without any hard gates. Someone talked it into agreeing to sell them a car for a dollar. A binding offer, confirmed in writing, by a bot that had no business making pricing decisions. A simple rule – “if the conversation involves pricing, do not confirm any amount below X” – would have prevented the whole thing. That’s not AI sophistication, it’s an if-then statement – the kind of code we’ve been writing for decades, and apparently the kind of code that dealership really wished they’d written too.

These are deliberately simple examples, but that’s the point. Hard gates don’t need to be clever – they just need to exist.

A more technical example: Booky

Our bookkeeping tool Booky follows this pattern for transaction categorisation. SparkCore (our integration layer) connects to the accounting software and pulls back the list of transactions and the valid chart of accounts. SparkCore moves data – nothing more.

Booky’s rules engine (Stage 1) tries to match each transaction using deterministic rules – payee patterns, amount ranges, recurring transaction signatures. For the transactions that don’t match any rule, Booky’s AI (Stage 2) suggests a category. Then Booky’s hard gate (Stage 3) checks: is that suggestion actually in the valid category list that SparkCore pulled from the accounting software?

If yes, accept. If no, reject and flag for human review.

In this system, roughly 80% of transactions match deterministic rules. That means we’ve cut AI costs by 80% and eliminated 80% of potential hallucination incidents. Not by making the AI smarter, but by using it less.

The part that makes this really satisfying

When the hard gate rejects an AI suggestion, we don’t automatically retry. An AI that miscategorises a transaction will often be wrong in the same way on a second attempt – you’d just burn tokens in a loop.

Instead, we log the suggestion and the reason it was rejected.

Those logs become training data, but not for the AI. They’re training data for us. If we see the same rejection pattern showing up repeatedly, that’s a clear signal to write a new deterministic rule. Every rejection is an opportunity to expand the rules engine, which means the AI fires less often over time.

The system gets more predictable and cheaper as it matures. The AI gradually works itself out of a job – not because it failed, but because we learned enough from watching it work to write better rules. (The recovering software engineer in me finds this deeply satisfying.)

How you know it’s working

The metric that matters most is your rules-to-AI ratio: what percentage of inputs are handled by deterministic rules versus the AI fallback? If that ratio is climbing, the system is maturing exactly as it should. If it drops, your rules probably need updating.

Beyond that, a few non-negotiables: every AI suggestion passes through the hard gate before any system state changes. Every rejection produces a structured log entry. And there’s always a clear path to human review – built in from day one, not bolted on later.

This isn’t just about bookkeeping

The same three-stage pattern applies anywhere you’re integrating AI output into a system that needs to be reliable. We use the same architecture in SessionPilot (which monitors live workshop delivery), in our tender response tools, and it’ll be the foundation for whatever we build next. The specific components change every time. The pattern stays the same.

If you’re building anything that puts AI output into a production workflow – or you’re overseeing people who are – the question worth asking is: where do the rules sit? Because the unsexy, traditional, rules-based code isn’t something AI makes obsolete. It’s what lets you deploy AI with confidence, knowing there’s solid ground underneath it.

P.S. If you want more on the philosophy behind this approach, I’ve written about it in the Boring AI Manifesto.

💡

Your AI Transformation Starts Here

Get The Free AI Toolkit for Strategic Breakthrough Zero Guesswork, Maximum Impact

💡 Your AI Transformation Starts Here:

Get The Free AI Toolkit for Strategic Breakthrough
Zero Guesswork, Maximum Impact

Get Instant Access

Reducing AI Risk with Simple Rules

AI is probabilistic. That’s the problem and the point.

Rules handle the predictable. AI handles the rest.

The AI can’t mark its own homework

The simplest hard gate you’ll ever build

A more technical example: Booky

The part that makes this really satisfying

How you know it’s working

This isn’t just about bookkeeping

More posts like this.

Building a Live Speech to Text AI

Year of the Harness

Labs: Building a Production-Ready AI Project Manager in 12 Hours

AI Doesn’t Care

Why “95% of AI Pilots Fail” Is the Wrong Lesson

Why Boring is Interesting

What the Hell Is “Data”?

The Boring AI Manifesto

How to Run an AI Pilot

Why AI Feels Overwhelming

AI Success Has Less to Do With AI Than You Think

Where Do You Draw the Line on AI and Stolen Work?

The Big 4 Just Validated the Future of AI-Driven Work

Misleading AI Stories in 2025

Navigating the Chaos: How to Prepare for AI Disruption

Why Talk About ROI First?

How to Use AI for Professional Writing

Where is the ROI?

The Butler and the Advisor: A Simple Framework for Getting Real Value from AI

What Can We Learn from Moderna’s AI Strategy?

Your most valuable asset is invisible

If You Work Like a Robot, AI Will Replace You

Are You Still an Author if AI Helps You Write?

Case Study: How I Created My Unique Value Proposition in 15 Minutes

AI Will Quietly Keep Getting Better

Think Like an AI Project Manager

Where Should You Use AI Next?

Where Should You Use AI First?

Want Practical AI Skills in 3 Minutes a Day?

Reducing AI Risk with Simple Rules

AI is probabilistic. That’s the problem and the point.

Rules handle the predictable. AI handles the rest.

The AI can’t mark its own homework

The simplest hard gate you’ll ever build

A more technical example: Booky

The part that makes this really satisfying

How you know it’s working

This isn’t just about bookkeeping

More posts like this.

Get regular updates on AI strategies that work.

You're almost there!