Knowledge Base

Give your bot a memory made from your own content. The Knowledge tab turns uploaded documents and hand-crafted Q&A pairs into a searchable context layer the bot draws from every time it replies.

What is this?

When a user asks your bot a question, the default behavior is for the language model to answer from its general training data. The knowledge base changes that. Before the model generates a response, Balchemy searches your uploaded content, finds the most relevant passages, and injects them into the prompt as context. This technique is called RAG — Retrieval-Augmented Generation. The model still generates the response, but it generates it grounded in your specific documents rather than making it up from general knowledge.

Prerequisites

A bot in any status (knowledge can be added even before the bot is started)
Documents in PDF, TXT, or Markdown format
RAG enabled in the RAG Configuration section at the bottom of the Knowledge tab

Opening the Knowledge tab

Navigate to Studio → Bots → [your bot] → Knowledge. You will see three sections: Documents, Q&A Pairs, and RAG Configuration.

Document types

The upload zone accepts three file formats:

Format	Extension	Best for
PDF	`.pdf`	Product docs, whitepapers, trading guides, research reports
Plain text	`.txt`	Raw data exports, simple content, log extracts
Markdown	`.md`	Formatted guides, README-style content, structured notes

One file is uploaded at a time. The upload progress bar shows real-time progress. After upload, documents enter Pending status until you approve them for indexing.

Tip: Break large documents into focused topic files. A 50-page PDF about Solana DeFi retrieves less precisely than five 10-page files each focused on a specific protocol. Focused documents produce better retrieval results.

Step-by-step: upload a document

Go to Studio → Bots → [your bot] → Knowledge.
Click the upload zone or drag a file onto it. The zone highlights blue when a file is dragged over it.
Wait for the progress bar to complete. The file appears in the document list with Pending status.
Click the green checkmark icon on the document row to Approve it.
The document is now indexed and available for retrieval.

Documents that you do not approve remain in Pending status indefinitely and are not searched. You can also reject a document (red X icon) to remove it from the queue without indexing it.

Document list columns

Column	Meaning
File icon	PDF, TXT, or MD — visual type indicator
File name	Original uploaded filename
Type badge	Uppercase format label
Upload time	Relative timestamp (e.g., "3 hours ago")
Status badge	Pending / Approved / Rejected — always paired with an icon
Actions	View, Approve (pending only), Reject (pending only), Delete

Q&A Pairs

Q&A pairs are manually authored question-and-answer entries. They work differently from documents: instead of being chunked and embedded like a document, each Q&A pair is stored as a discrete unit. When a user's message closely matches a stored question, the exact answer is retrieved with high priority — it ranks above document chunks in the context selection.

Use Q&A pairs when you need precision. Documents are good for broad coverage. Q&A pairs are good for specific, frequently asked questions where the exact wording matters.

When to use Q&A pairs

Frequently asked questions with specific answers ("What is Balchemy's trading fee?")
Corrections to common misconceptions ("The bot does NOT have access to your private keys")
Product-specific terminology ("What is a DCA strategy in Balchemy?")
Regulatory or compliance statements that must be verbatim

Step-by-step: add a Q&A pair

Click Add Q&A Pair at the top right of the Q&A section.
Enter the Question — write it the way a real user would ask it.
Enter the Answer — write the exact response the bot should give.
Leave Active checked unless you want to save the pair without activating it.
Click Save.

The pair appears in the list immediately. Click any pair to expand it and view the full question and answer, edit it, or delete it. Toggle the Active switch in the expanded view to enable or disable the pair without deleting it.

Q&A pair limits

The Q&A list has no hard UI limit, but retrieval quality degrades when the list becomes very large (hundreds of pairs with overlapping topics). Keep pairs focused and review the list periodically to remove outdated entries.

RAG Configuration

The RAG Configuration section controls how the retrieval engine searches your knowledge base. It is a collapsible panel at the bottom of the Knowledge tab with its own Save button.

Enable RAG Search

The master toggle for knowledge retrieval. When this is off, the bot answers entirely from its training data, ignoring all uploaded documents and Q&A pairs. Turn it on to enable knowledge-grounded responses.

Warning: RAG is off by default on new bots. Remember to enable it after uploading documents, or the bot will not use them.

RAG settings reference

Setting	Range	Default	What it controls
Similarity Threshold	0.1 – 1.0	0.7	Minimum cosine similarity score for a chunk to be included in context. Higher = stricter match required.
Max Results	1 – 50	5	How many chunks are injected into the prompt. More chunks = more context but longer prompts.
Chunk Size	100 – 8,000 chars	1,000	Size of each text segment when documents are split.
Overlap Size	0 – 2,000 chars	200	Overlap between adjacent chunks. Prevents context loss at chunk boundaries.
Learning Mode	cautious / balanced / aggressive	balanced	How readily proactive learning generates new knowledge suggestions.
Share Across Bots	On / Off	Off	Whether other bots in your workspace can query this knowledge base.

Tuning similarity threshold

The similarity threshold is the most impactful setting after enabling RAG.

Too high (0.9+): Few or no chunks are retrieved. The bot falls back to training data even when you have relevant documents. Responses may not reflect your content.
Balanced (0.65–0.75): Good starting point. Retrieves relevant content without pulling in loosely related material.
Too low (0.3 and below): Many loosely related chunks are retrieved. The prompt fills up with tangentially relevant content, reducing answer quality.

If the bot is ignoring your documents, lower the threshold slightly. If it is giving answers that mix unrelated document content, raise it.

Tuning chunk size and overlap

Chunk size determines how documents are split during indexing. The same document split at 500 characters creates twice as many (smaller) chunks as at 1,000 characters.

Smaller chunks (300–600): Better for precise question-answering with dense factual content. Each chunk covers one specific idea.
Larger chunks (1,000–2,000): Better for narrative content where context spans multiple sentences. The bot gets more surrounding context with each retrieved chunk.

Overlap prevents information loss at the boundary between chunks. If chunk size is 1,000 and overlap is 200, each chunk shares 200 characters with its neighbors. Keep overlap at 10–20% of chunk size as a starting point.

Tip: After changing chunk size or overlap, re-upload your documents. These settings only apply at indexing time. Documents uploaded before a settings change use the old chunk parameters.

Learning Mode

This setting interacts with Proactive Learning in the Training tab:

Mode	Behavior
Cautious	High confidence required before a new knowledge suggestion is generated
Balanced	Moderate threshold — a good default for most bots
Aggressive	Lower threshold — generates more frequent suggestions, including lower-confidence ones

When enabled, other bots in your workspace can query this bot's knowledge base as an additional knowledge source. This is useful when you have a "master knowledge bot" with company documentation that multiple specialized bots should reference. Disable this if the knowledge is sensitive or specific to one bot's use case.

Document approval workflow

All uploaded documents start in Pending status. The two-step upload-then-approve workflow exists to give you a chance to review before content becomes searchable. This matters because:

Accidentally uploaded files (wrong document, draft version) cannot affect bot responses until approved.
You can batch-upload multiple documents and approve them all once you have reviewed the list.
The Rejected status permanently removes a document from the searchable index without deleting it from your upload history.

Approval actions

Action	Available when	What it does
Approve (checkmark)	Pending	Indexes the document for retrieval
Reject (X)	Pending	Marks as rejected — not indexed, stays in history
Delete (trash)	Any status	Permanently removes from the list and index
View (eye)	Any status	Opens a preview of the document content

Proactive learning auto-add

When Proactive Learning is active (Training tab) and a suggestion of type "question" is approved, the system automatically adds a new Q&A pair to this knowledge base. You will see these auto-generated pairs in the Q&A list with the source labeled as "Proactive Learning."

Review these pairs periodically. Auto-generated answers are based on patterns from past conversations and may not always be exactly right. Edit any pair whose answer you want to refine.

Best practices

Upload focused documents. One topic per file produces better retrieval than multi-topic catch-all documents.
Write Q&A pairs from real questions. Use phrasing you have actually seen from users, not idealized questions.
Test after every addition. Open the bot chat and ask questions that should hit your new content. If the bot misses it, check the similarity threshold.
Keep content current. Outdated answers in Q&A pairs are worse than no answers — the bot will confidently give stale information.
Remove noise. Content that is irrelevant to your use case reduces retrieval quality by consuming slots in the context window.
Enable RAG. It sounds obvious, but the most common "knowledge base is not working" issue is that RAG Search is still off.

Common issues

The bot is not using my uploaded documents. Check that RAG Search is toggled on in the RAG Configuration section. Also confirm that the document status is Approved — Pending documents are not indexed.

The bot gives a generic answer instead of my specific document content. Lower the similarity threshold. Try 0.6 and retest. If that does not help, the document may need to be re-uploaded with a smaller chunk size so the relevant passage is in its own chunk.

I uploaded a file but it shows as Pending. Pending is the expected state after upload. Click the green checkmark icon on the document row to approve it.

Q&A pairs I deleted keep appearing. Q&A pairs are saved to the server with a batch update call. If a previous save succeeded but a later one failed, the list may be out of sync. Refresh the page to get the latest server state.

Proactive learning added Q&A pairs I did not want. Disable Auto-Apply in the Training tab to require manual approval of all suggestions. You can also delete unwanted auto-generated pairs directly from the Q&A list.

Bot Settings

Automation

Edit this page on GitHub