If you've used ChatGPT or any other AI chatbot, you've probably noticed something: they're great at sounding confident and terrible at being accurate about your stuff. Ask a chatbot about your company's expense policy or what's on page 47 of a contract, and you'll get a polished, completely fabricated answer.
That's not a bug — it's a fundamental limitation of how those models work. They were trained on the internet, not your documents. They don't know what's in your files because they've never seen them.
RAG fixes this.
RAG in plain language
RAG stands for Retrieval-Augmented Generation. The name is academic, but the idea is dead simple: before the AI answers your question, it searches your actual documents first.
Think of it like the difference between asking a random person on the street for legal advice versus asking a lawyer who has your case file open in front of them. Both might be articulate. Only one has the facts.
Here's what happens when you ask a RAG-powered system a question:
- Retrieval: Your question gets converted into a search query. The system scans your documents — PDFs, manuals, notes, whatever you've loaded — and finds the most relevant passages.
- Augmentation: Those passages get handed to the AI along with your question. Now the model has context — real text from your real files.
- Generation: The AI writes an answer based on what it actually found in your documents, not what it vaguely remembers from internet training data.
The result: answers that are grounded in your data, with citations pointing to specific documents and page numbers.
Why regular chatbots fall short
Standard AI chatbots (even good ones) have two problems when it comes to your documents:
Problem 1: They hallucinate. When a chatbot doesn't have the answer, it doesn't say "I don't know." It generates something plausible-sounding that may be completely wrong. In a casual conversation, this is annoying. When you're referencing a legal contract or medical protocol, it's dangerous.
Problem 2: They require uploading. To use your documents with cloud AI, you have to upload them to someone else's servers. For a recipe collection, that's fine. For client case files, patient records, proprietary research, or internal company documents? That's a non-starter for many people.
RAG solves the first problem by giving the AI actual source material to work from. And when you run RAG locally — like ThothAI does — you solve both problems at once.
How ThothAI uses RAG
When we built ThothAI, RAG was the core design decision. Here's how it works in practice:
1. You create a Knowledge Base
A Knowledge Base is just a named collection of PDFs. You might have one called "Work Documents," another called "Tax Records," and another called "D&D Campaign Notes." They stay separate so the AI searches the right context for each question.
2. ThothAI processes your PDFs
When you add PDFs to a Knowledge Base, ThothAI extracts the text, breaks it into searchable chunks, and builds a local search index. This happens entirely on your device. Processing time depends on document size — a typical PDF takes a few seconds, a 500-page manual might take a minute or two.
3. You ask questions in natural language
Select a Knowledge Base, type your question, and ThothAI handles the rest. The RAG pipeline searches your documents, finds relevant passages, and generates an answer.
4. Every answer includes source citations
This is the part that makes RAG genuinely useful: every answer tells you exactly where the information came from. You'll see the document name and page number for each source. Tap a citation to jump directly to that page in the PDF viewer and verify it yourself.
The AI isn't asking you to trust it. It's showing its work.
RAG vs. just pasting text into ChatGPT
You might be thinking: "Can't I just copy-paste my document into ChatGPT?" You can, but there are real limitations:
- Context window limits: AI models can only process a certain amount of text at once. A 200-page PDF won't fit. RAG doesn't have this problem because it only retrieves the relevant passages, not the entire document.
- Multiple documents: RAG can search across dozens of PDFs in a single query. Copy-pasting doesn't scale.
- Citations: When you paste text into a chatbot, you lose all page references. RAG maintains them.
- Privacy: Copy-pasting into a cloud chatbot means that text is now on someone else's server. With a local RAG system, it never leaves your machine.
When RAG matters most
RAG isn't necessary for every AI interaction. If you're asking "explain quantum entanglement" or "write me a poem," a standard chatbot is fine — it's drawing on general knowledge.
RAG becomes essential when:
- Accuracy matters more than creativity. Legal research, medical references, compliance documents, academic papers — anywhere a wrong answer has real consequences.
- Your documents are the source of truth. Internal SOPs, product manuals, client files, research data — information that doesn't exist on the open internet.
- You need to cite sources. If someone asks "where did you get that?" you need a better answer than "the AI said so."
- Privacy is non-negotiable. Client confidentiality, HIPAA compliance, trade secrets, personal records — anything you wouldn't email to a stranger.
The future is local RAG
Two years ago, running an AI model on a phone or laptop wasn't practical. The models were too big, the hardware too slow. That's changed fast. Models like the LiquidAI LFM series that ThothAI uses are specifically optimized for on-device inference — they run well on modern phones and laptops with reasonable memory.
This means you can have a RAG-powered AI assistant that runs entirely on your device, with no internet connection, no subscription, and no data leaving your machine. That's not a future promise — it's what ThothAI does today on Windows and iOS.
The combination of local AI models and RAG is genuinely new territory. For the first time, non-technical people can have a private, document-aware AI assistant without any cloud infrastructure. That's worth paying attention to.