Build an AI Customer-Support Bot in n8n (RAG, With Real Monthly Cost)

Two workflows, a few dollars a month, and your own docs as the source of truth

Muhammad Qasim HammadAI-assistedJune 14, 202612 min read2,374 words

AI-drafted, reviewed by Muhammad Qasim Hammad on June 14, 2026. See our AI disclosure.

Flow diagram showing an n8n AI customer support bot answering questions from docs: customer question to retrieval to Claude writing an answer to reply sent

Table of contents

What Is a RAG Customer-Support Bot, and What Can It Actually Do?
How Does RAG Work Inside n8n? (The Two Workflows)
How Much Does an n8n Support Bot Cost to Run?
Which Vector Store and Embedding Model Should You Use?
How Do You Build It in n8n, Step by Step?
Ingest Workflow
Answer Workflow
Where Can This Go Wrong?
What Should You Set Up This Weekend?

The same five questions land in your inbox every single day: pricing, hours, how to reset a password, refund policy, and where to find the getting-started guide. An n8n AI customer support bot built with RAG answers all of them from your own docs, the moment they arrive, around the clock.

Without one, you spend a chunk of every morning copy-pasting the same replies. That time compounds. Even 20 minutes a day is 10 hours a month you could spend on paid work or building something new.

The good news: this is not a complex enterprise build. It is two small n8n workflows, a handful of nodes, and a Claude API key. I have run the same Webhook -> AI Agent pattern for email triage and WhatsApp responses. A support bot is the same shape with a retrieval tool bolted on.

What Is a RAG Customer-Support Bot, and What Can It Actually Do?#

An n8n AI customer support bot built with RAG reads a question, pulls the most relevant passages from your knowledge base, and writes a reply grounded in those passages. It handles FAQ-style questions 24/7: pricing, policies, how-to steps, product details. It cannot place orders, process refunds, or access live databases unless you wire those tools in separately.

The "from your docs" part is what separates this from a generic chatbot. A plain ChatGPT wrapper answers from its training data and guesses when it does not know. A RAG bot either finds the answer in your specific docs or admits it cannot help. That honesty is the feature you want in a customer-facing tool.

This is not a fit for every support scenario. If your support volume is mostly unique, complex problems, a bot will frustrate more than it helps. But if you can identify a cluster of repeat questions, most solo operators have 10 to 20 of them, this setup earns its $4.50 a month on day one.

Comparison table showing Q&A Chain versus AI Agent node in n8n for RAG customer support bots across setup, answering method, tool support, and best use.

Pick Q&A Chain for simplicity; pick AI Agent when you need multiple tools.

How Does RAG Work Inside n8n? (The Two Workflows)#

RAG in n8n is two workflows, not one. According to the n8n documentation, the ingest workflow runs once (or whenever your docs change) and writes vectors to a store; the answer workflow runs continuously and reads from that store at query time.

Ingest workflow (run once, then re-run on updates):

A trigger node (Manual Trigger or a scheduled run)
Default Data Loader to pull in your files
Recursive Character Text Splitter (or Token Splitter) to chunk text into roughly 500-token pieces with a 50-token overlap
An Embeddings sub-node: Embeddings OpenAI (text-embedding-3-small) or Embeddings Ollama for a free local option
A Vector Store node in Insert mode

Answer workflow (runs on every customer message):

A Chat Trigger (or Webhook for a custom integration)
An AI Agent node (or Question and Answer Chain for a simpler setup)
Attached sub-nodes: an Anthropic Chat Model (Claude) and a Vector Store Tool pointing to the same store

One hard rule: the embedding model used to query must exactly match the one used to ingest. If you ingest with text-embedding-3-small (1,536 dimensions) and query with a different model, retrieval breaks silently. Put this in a comment on the workflow so future-you does not spend an hour debugging it.

Run this workflow once, then re-run it whenever your docs change.

How Much Does an n8n Support Bot Cost to Run?#

A RAG support bot answering 1,000 questions a month costs about $4.50 in Claude tokens on Haiku 4.5 (as of June 2026). The platform fee is $0 if you self-host n8n, the vector store can also be $0, and embeddings add only a fraction of a cent. The table below breaks the math down by model tier.

Here is the arithmetic, fully checkable by hand. One answer uses roughly 3,000 input tokens (system prompt + 4 retrieved chunks at ~500 tokens each + the question) and about 300 output tokens (a short reply). So 1,000 answers consume approximately 3.0M input tokens and 0.3M output tokens.

Cost per 1,000 answered questions (as of mid-2026):

Model	Input $/1M	Output $/1M	Cost / 1,000 answers	Best for
Claude Haiku 4.5	$1.00	$5.00	$4.50	FAQ-style support, high volume
Claude Sonnet 4.6	$3.00	$15.00	$13.50	Nuanced or multi-step answers
Claude Opus 4.8	$5.00	$25.00	$22.50	Complex reasoning; overkill for most support

Prices sourced from Anthropic's official pricing page. Confirm before you build; LLM pricing changes frequently.

The embedding cost is negligible. Embedding a 200-article knowledge base of roughly 160,000 tokens once costs about $0.003 using OpenAI's text-embedding-3-small at $0.02 per 1M tokens. Embedding each incoming 50-token question costs about $0.000001. The dominant cost is the LLM answering, not the retrieval side.

One more saving: Claude's prompt caching discounts repeated input tokens (the system prompt is the same on every call) by roughly 0.1x. The real bill for a high-volume bot often falls below the table figures above.

n8n's Community Edition is free to self-host under the Sustainable Use License. If you prefer managed hosting, n8n Cloud Starter is €20/mo. That is your only platform cost.

Four key statistics for an n8n RAG customer support bot: $4.50 per 1000 answers on Haiku, $0.02 per million tokens to embed, $0 self-hosted vector store, 24/7

All figures based on mid-2026 list prices; embeddings cost is nearly nothing.

Which Vector Store and Embedding Model Should You Use?#

For testing, use the Simple Vector Store (in-memory, built into n8n, costs nothing). For production, Supabase Vector Store, a self-hosted PGVector Vector Store, or a self-hosted Qdrant Vector Store are all free to run and survive restarts. The n8n docs list all supported vector store nodes.

For embeddings, you have two real options:

Embeddings OpenAI (text-embedding-3-small): $0.02 per 1M tokens. Hosted, reliable, zero setup. For a 200-article knowledge base, the one-time ingest cost is about a third of a cent.
Embeddings Ollama (e.g. nomic-embed-text): $0.00. Runs locally. Keeps your documents off third-party servers. Requires Ollama installed on the same machine as n8n or reachable on your network. See the local RAG with Ollama guide for the full setup.

The practical rule: if your docs contain sensitive client information, go local with Ollama. If you are embedding public-facing help content and want the simplest path, use OpenAI embeddings.

How Do You Build It in n8n, Step by Step?#

Building an n8n AI customer support bot takes two workflows and about two hours the first time: one to ingest your docs into a vector store, and one to answer questions from it. The AI Agent node documentation covers the sub-node wiring; here is the practical sequence.

Ingest Workflow#

Start a new workflow in n8n. Add a Manual Trigger (you will click "Test workflow" once, then re-run it whenever docs change).

Connect a Default Data Loader node. Point it at your docs folder, a URL, or a Notion database. Connect a Recursive Character Text Splitter node. Set chunk size to 500 tokens and overlap to 50 tokens. Smaller chunks give more precise retrieval; larger chunks give more context per match. 500 is a solid starting point.

Attach an Embeddings OpenAI sub-node (or Embeddings Ollama). Then connect a Vector Store node: pick Supabase Vector Store or Simple Vector Store depending on your setup. Set the mode to Insert.

Run the workflow once. Your docs are now searchable vectors.

Answer Workflow#

Create a second workflow. Add a Chat Trigger node (this gives you a hosted chat URL for testing right away).

Add an AI Agent node. Attach these sub-nodes:

Anthropic Chat Model sub-node, set to claude-haiku-4-5
Vector Store Tool sub-node, pointed at the same vector store collection from the ingest step

In the AI Agent's System Prompt field, paste:

code

Answer the customer question using only the retrieved context below.
If the retrieved context does not contain enough information to answer,
reply: "I am not sure about that one. Let me connect you with a human."
Do not make up facts, prices, or policies that are not in the context.

Set the Vector Store Tool's top-k to 4. That retrieves four ~500-token chunks per question, totalling ~2,000 tokens of context per call. Raise to 6 if answers feel thin; lower to 3 to trim costs.

Test with five real questions from your inbox. Compare each answer to your actual docs. If a correct answer is missing, the chunk containing that information is likely too large or the wrong content was retrieved. Adjust the splitter or improve the source doc's wording.

When answers look good, connect the Chat Trigger to your live channel: a website widget, WhatsApp via Twilio, or any webhook-capable chat tool.

Pros and cons of letting an n8n AI customer support bot answer versus routing to a human, covering speed, cost, accuracy, and handoff for money-related

Use the bot for repeatable questions; always hand off money and orders.

Where Can This Go Wrong?#

The most common failure mode is stale docs. The bot can only answer from what is in the vector store. If you update your pricing page and forget to re-ingest, the bot keeps quoting the old price with full confidence. Re-ingest whenever docs change; a scheduled ingest once a week is a reasonable safety net.

The second failure mode is a mismatched embedding model at query time. If you ingest with text-embedding-3-small (1,536 dimensions) and later swap to a local Ollama model with different dimensions, every search returns garbage results. Retrieval does not error out; it just silently returns irrelevant chunks. Keep both sub-nodes set to the same model.

The third is over-trusting retrieval. RAG reduces hallucination; it does not eliminate it. Claude can still misread a chunk or stitch two chunks together incorrectly. The system prompt instruction to admit uncertainty when context is thin is your main guard. Test the "I do not know" response explicitly: ask a question that is nowhere in your docs and confirm the bot routes to a human rather than inventing an answer.

Top-k is a cost-quality dial. A top-k of 4 at 500 tokens per chunk adds about 2,000 tokens of context to every call. Raising it to 8 gives better recall for complex questions but nearly doubles your input token cost. Run a two-week test at k=4, note which questions the bot misses, then decide if a higher k is worth it.

For a deeper look at choosing the right Claude model for your agent, the Claude vs GPT vs Gemini cost and speed comparison walks through the trade-offs across real workflows. If you want to cap spend at the API level, the Claude API cost-control agent workflow adds a guardrail before costs run away.

code

flowchart TD
  START([New customer question]) --> D1{Answer found in your docs?}
  D1 -- No --> HANDOFF1[Hand off to a human]
  D1 -- Yes --> D2{Confident in the retrieved match?}
  D2 -- No --> HANDOFF2[Hand off to a human]
  D2 -- Yes --> ANSWER[Answer from retrieved context]
  ANSWER --> END([Reply sent])

Decision flowchart showing how an n8n RAG support bot decides whether to answer from retrieved docs or hand off to a human based on retrieval confidence.

Wire this logic as an IF node after the AI Agent for safe production use.

What Should You Set Up This Weekend?#

Start with the ingest workflow: collect your docs, run them through the Default Data Loader -> Recursive Character Text Splitter -> Embeddings OpenAI -> Simple Vector Store chain, and confirm the vectors are written. That takes under an hour, and the in-memory Simple Vector Store costs nothing to test with.

Next, wire the answer workflow: Chat Trigger -> AI Agent -> Anthropic Chat Model + Vector Store Tool. Paste in the system prompt. Ask it the five questions your customers ask most often. If four of five come back accurate, you have a working bot. Add the handoff branch before you connect it to any live channel.

The whole stack, at 1,000 answered questions a month, costs about $4.50 in tokens. If you self-host n8n on a $5 VPS and use a free-tier vector store, the platform cost is $0. No seat fees, no per-agent pricing, no surprise overage bill at month end.

If you want to go fully private with $0 embedding cost, the local RAG with Ollama guide covers running the entire stack on your own machine. For help choosing between Supabase, PGVector, and Qdrant for production, the vector store comparison lays out the trade-offs.

Frequently asked questions

What is a RAG chatbot?

RAG stands for retrieval-augmented generation. The bot first retrieves the most relevant passages from your knowledge base, then passes them to the language model as context so it answers from your actual docs instead of guessing.

How much does an n8n AI customer support bot cost to run?

About $4.50 per 1,000 answered questions using Claude Haiku 4.5 (as of mid-2026), based on roughly 3,000 input tokens and 300 output tokens per answer. Embedding costs are under a fraction of a cent. Platform cost is $0 if you self-host n8n.

Which vector store should I use in n8n?

Start with the Simple Vector Store (in-memory, free, built into n8n) for testing. For production, Supabase Vector Store or a self-hosted Qdrant or PGVector instance costs $0 and survives restarts. See the full comparison at /pgvector-vs-chroma-vs-qdrant-rag.

Do I need OpenAI for embeddings, or can I run them locally?

You can run embeddings locally for free using the Embeddings Ollama sub-node in n8n with a model like nomic-embed-text. If you prefer a hosted option, OpenAI text-embedding-3-small costs $0.02 per 1M tokens, which is nearly nothing for a typical knowledge base.

Can the bot hallucinate, and how do I stop it?

Yes, it can still hallucinate even with RAG. The best guard is a system prompt that tells the model to answer only from the retrieved context and to respond with 'I am not sure, let me get a human' when the retrieval match is weak.

Q&A Chain or AI Agent: which RAG approach should I use?

Use the Question and Answer Chain node for a fast, simple setup where every message triggers a retrieval. Use the AI Agent node with a Vector Store Tool when you want the model to decide when to search, or when you plan to add other tools like a calendar or order lookup.

Sources

Primary references and vendor documentation used while drafting and reviewing this article.

#workflow automation #n8n #solopreneur tools #Claude #RAG #customer support automation #vector store #AI chatbot