Chain Of Draft: Reduce Your Token Use By 300% In Five Minutes Just By Asking AI to Stop Waffling

Ever Heard of the “Too Much Talking” Problem?

Imagine you ask a friend for directions to the nearest shop, and instead of a quick “Turn left, then right,” they launch into a whole history of the road, the best times to go shopping, and the weather forecast.

We all know someone like that.

Article illustration — chain-draft-reduce-your-token-use-300-five-minutes-just

That’s exactly how AI often behaves when solving problems.

It thinks step by step (which is great), but it also talks way too much.

This makes it slow, expensive, and sometimes downright annoying.

Now, what if AI could think more like humans—jotting down only the important bits, like a shopping list instead of a novel?

That’s exactly what Chain of Draft (CoD) may do.

It’s like giving AI a “get to the point” button and may significantly reduce time, processing power, token use and that can reduce the electricity demand gobbled up by the greedy Large Language Models.

The concept was the subject of this recent paper on the 25th of March from Zoom Communications, who lets face it you wouldn’t have though were big into LLM’s.

The impressive part is that it requires no updates, no engineering and you can start using it at the end of this article to reduce your AI token costs by 80%. No updates, no new model - just modify your prompting strategy.

The Old Way: Chain of Thought (CoT)

AI uses something called Chain of Thought (CoT), which means thinking step by step—like explaining a maths problem in full detail.

For example, if you asked AI:

“Steve had 20 lollipops. He gave some to Fred. Now he has 12. How many did he give away?”

The chain of thought process in AI would respond with:

Jason started with 20 lollipops.
He gave some to Denny.
Now, he has 12 left.
To find out how many he gave away, subtract:20 - 12 = 8
Therefore, Jason gave away 8 lollipops.

Nice and clear, but also too much talking.

AI burns through tons of words (tokens), which makes it slow and expensive.

The Potential New Way: Chain of Draft (CoD)

Now, let’s try Chain of Draft (CoD) as defined in this paper—where AI cuts the fluff and writes just the essential info.

Same question: “Steve had 20 lollipops. He gave some to Fred. Now he has 12. How many did he give away?”

CoD version: 👉 20 - x = 12; x = 20 - 12 = 8. 👉 Answer: 8.

Boom. That’s it. No unnecessary words. No extra steps. Just the maths. 🔥

Why This Matters

💡 CoD makes AI faster, cheaper, and still accurate.

Instead of drowning in words, it gives you exactly what you need—like a shopping list instead of an entire cookbook.

⚡ Faster: Less thinking, more doing.
💰 Cheaper: Fewer words = less processing power = lower costs.
✔ Still accurate: AI still gets the right answer, just without the waffle.

The Proof: Does It Actually Work?

Researchers tested CoD on maths problems, common sense questions, and logic puzzles.

Here’s what they found:

AI using CoT (long-winded explanations) was 95% accurate but used 200 words per answer.
AI using CoD (quick and sharp) was 91% accurate but used just 40 words.

That means CoD cuts 80% of the words while still being just as smart.

That’s like trading a slow, rambling donkey for a speedy leopard, the donkey is a bit smarter but takes a lot longer to get places.

Where This Could Be a Game Changer

Chain of Draft isn’t just for my lollipop maths—it could revolutionise how AI is used in business and tech:

Customer service: Faster chatbot responses.
Finance: Instant, efficient data analysis.
AI assistants: Less filler, more action.

Basically, anywhere AI needs to be quick, smart, and efficient—CoD may be the way forward, and a hybrid approach utilising both Chain of Thought and Chain of Draft could potentially operate faster more effective LLM’s without a significant accuracy degradation.

How Transformers Work in Chain of Thought (CoT)

Transformers (the AI models behind ChatGPT, Claude, Gemini, etc.) don’t actually think like humans. Instead, they predict the next most likely word in a sequence based on what they’ve seen before.

When using Chain of Thought (CoT), a Transformer (not robots in disguise but smart part inside an Large Language Model) doesn’t just jump to an answer—it generates a step-by-step explanation before reaching the final result.

The key to CoT is that it forces the AI to “reason” out loud, which reduces mistakes and helps with complex problems like maths, logic, and reasoning.

How Transformers Generate Chain of Thought Responses

💡 Transformers work by using something called self-attention, which means they look at all words in a sentence at once, figuring out how they relate to each other.

In CoT mode, the model:

Expands the problem into steps → Instead of just jumping to an answer, it generates a structured explanation.
Predicts each step logically → It doesn’t just “guess”; it calculates the next step based on the previous ones.
Uses its “memory” (context window) → It remembers previous words to keep its reasoning consistent.

This is why CoT works well for longer reasoning problems—the model doesn’t just spit out an answer, it actually “thinks through” the problem.

So, Where Does Chain of Draft (CoD) Fit In?

CoD is a compressed version of CoT. Instead of a full-blown explanation, it only writes down the key bits.

Both use transformers and self-attention, but CoD trims the fat, keeping just the important logic.

Reducing Hallucinations

An AI hallucination is not when your AI dims the lights, fires up the Lava lamp and drops a microdot of Acid.

Its more like a confused old man that is loosing his marbles.

When an artificial intelligence model, like ChatGPT, makes up information that is false, misleading, or completely nonsensical—but presents it as fact.

Sound familier?

Why Do AI Hallucinations Happen?

AI doesn’t “know” things like humans do.

It predicts words based on probability.

If it hasn’t seen the right answer before, it might make something up that sounds correct, in the same way your confused grandfather may call you by the wrong name.

Some common causes:

1️⃣ Lack of Data – If AI hasn’t been trained on the right facts, it fills in the blanks.

2️⃣ Overgeneralisation – AI patterns information together and assumes things that are BS.

3️⃣ Misinformation in Training Data – If AI was trained on bad or false data, SISO.

4️⃣ Prompt Confusion – If your question is vague, AI might go off-track.

Chain of Draft (CoD) may reduce AI hallucinations by forcing the AI to “think” in smaller, more controlled steps, rather than generating long, rambling explanations where hallucinations can creep in.

How To Use Chain Of Draft Right Now - With No Investment

This is the smart thing.

This is a prompt strategy.

It requires no engineering.

To implement Chain of Draft you just prompt your LLM differently and it does the rest.

Chain Of Draft

Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. Return the answer at the end of the response after a separator of ####

So there you go, I have just reduced your token use and saved you a fortune in your AI token costs, no engineering required. Drop me a like to show your appreciation or share with your AI friends to show how cool you are.

About me

Helping leaders in Cybersecurity, Quantum, and AI drive high-impact growth, stronger valuations, and better exits.

📌 Director of the world’s largest Quantum Cybersecurity community (700+ members), connecting top experts in Quantum, AI, and Cybersecurity.

📌 C-suite executive with a proven track record in scaling tech, finance, and asset finance businesses across EMEA & APAC.

📌 Former network engineer with deep expertise in computational Root Cause Analysis & Causal Reasoning, applied in military and telecom environments.

📌 Member of the Institute of Directors, European Corporate Governance Institute, and Royal United Services Institute for Defence & Security.

Steven Vaile

Board technology advisor and QSECDEF co-founder. Writes on AI governance, quantum security, and commercial strategy for boards and deep tech founders.