What Knowledge means
The Knowledge section in your dashboard (/dashboard/training) is where you teach your AI everything it needs to answer customers. It bundles together every input the AI uses, organized by what the input is for and how it gets into the prompt.
There are two broad shapes of input:
- Always-injected directives. Guidance, Internal Docs, and Instructions are added to every reply, no matter what the visitor is asking. Use these for rules and reference content the AI must always have in hand.
- Knowledge base (RAG). URL Sources, Integrations, File Uploads, and content auto-learned from resolved tickets are searched at answer time. The AI only sees the chunks that semantically match the visitor’s question. This is where the long tail of your reference material lives.
What’s inside the Knowledge section
The dashboard sidebar has eight sections. Here’s what each one is for, what it does, and how it works.All Data
The unified, searchable view of every chunk ingested across every source: URL crawls, integrations, internal docs, file uploads. Each row in the list represents one source, with its chunk count and last-updated timestamp. What it does. Lets you find or audit any specific piece of content without remembering which sidebar section it came from. The “Search” bar matches across content, end-user email, and company name. How it works. Halo splits each source into small overlapping passages (“chunks”) and stores both the text and a vector embedding for each. At answer time the AI runs a semantic search across every chunk in your org and pulls the top matches into the prompt. See How knowledge gets used for the full pipeline.Guidance
Short, individually managed directives the AI always follows on every reply. Each guideline is a discrete row you can toggle on/off, edit, or reorder. What it does. Locks in org-wide rules and word choices that should apply to every conversation, every agent. New guidelines start applying as soon as you save them. How it works. Guidance is injected into the prompt above your knowledge base and your agents’ instructions, so it takes precedence on every reply. It is not similarity-gated like the knowledge base. A per-agent Custom Instruction can still override an individual guideline for that specific agent. Examples of good guidance:- “Office hours are 9-5 ET, Monday through Friday.”
- “Refer to our pricing as ‘simple, predictable pricing’.”
- “Never promise a specific delivery date for in-progress feature requests.”
URL Sources
Crawl public web content and keep it in sync as your sites change. Best for help centers, public docs, and marketing pages you want the AI to be able to reference. What it does. You add a URL, Halo crawls the page (and optionally the rest of the site), and the content becomes searchable knowledge. How it works. Two crawl modes:- Single page (default). Just the URL you added.
- Crawl entire site. Follow links and ingest the whole site. Mark a URL as a help center to restrict crawling to article links on the same domain. Best for docs and KB sites where you only want articles, not marketing pages.
<lastmod> advanced since our last crawl get re-scraped, and pages added or removed from the sitemap are picked up automatically. You can trigger a manual full refresh from the row menu any time.
Read more: URL Sources and Web Crawler.
Integrations
Pull data from your existing tools so the AI can answer using it, and keep it synced as those tools update. What it does. Connects each tool, syncs its records, and breaks them into discrete training sources. A HubSpot deal becomes one source, a Fathom meeting becomes many sources (one per topic discussed). How each integration syncs:- HubSpot. Contacts, companies, deals, tickets, and KB articles.
- Intercom. Conversations, articles, and help center content.
- Fathom. Meeting notes and transcripts.
- Slack. Channel messages and insights.
- Zoom. Meeting transcripts.
- PandaDoc. Contracts and proposals.
Internal Docs
Long-form private reference content the AI can rely on for every reply. Each doc is a titled markdown entry you write directly in the dashboard. What it does. Stores policies, playbooks, FAQs, and any other reference material you want the AI to always have in hand. Customers never see Internal Docs and the AI never links to them in replies. How it works. Internal Docs are always injected into the prompt (similar to Guidance, but for content too long to fit in a single bullet rule). Use Internal Docs when a single guideline isn’t enough to capture what you need to say. Examples of good Internal Docs:- Refund policy. Plan-by-plan rules, refund windows, approval requirements.
- Common troubleshooting steps. “If a customer reports the widget isn’t loading, first ask whether they see any errors in the browser console…”
- Account cancellation walkthrough. Step-by-step instructions the AI walks customers through.
The AI may quote or paraphrase Internal Docs in answers. Don’t write anything in an Internal Doc that you wouldn’t be okay with the AI repeating to a customer in their own words.
File Uploads
PDFs, spreadsheets, exported KB articles, internal SOPs you have as a file. Each upload is parsed, chunked, and embedded so the AI can retrieve relevant passages at answer time. What it does. Adds reference content that doesn’t live behind a URL or in a connected tool. Supported formats. PDF, CSV, Excel (xlsx / xls), TXT, DOC, DOCX, Markdown. How it works. Uploaded files appear in this section and their chunks also surface in All Data. Like URL Sources and Integrations, they’re part of the knowledge base, so the AI retrieves them only when the visitor’s question is semantically similar to a chunk. For content that changes often, prefer a URL or integration so updates flow through automatically. Files have to be re-uploaded to update. Read more: Files & Internal Docs.Instructions
A single org-wide system prompt prepended to every agent. Use it for company-wide voice, tone, and persona that should apply to all agents. What it does. Sets the baseline personality and behavior for every AI conversation in your org. Example: “Be empathetic, concise, and professional. Always prioritize user satisfaction. Never share internal system details.” How it works. These instructions are added to every agent’s prompt before that agent’s own instructions. Each agent’s individual instructions are rendered after the globals, so an agent can soften or override a global directive when it makes sense for that agent’s role. Instructions vs. Guidance. Use Guidance for short, individually-toggled rules the AI must always follow. Use Instructions for longer narrative prompts (voice, persona, response shape) that you want present in every reply.Internal Users
Tell the AI who’s on your team so they aren’t counted as customers. What it does. Anyone matching a domain or email here is recognized as an internal team member. Their conversations and activity won’t be attributed to end users in analytics or shown as customer interactions. How it works. Add your company domain (e.g.acme.com) to catch all teammates automatically. Add individual emails for contractors or external collaborators on your side that don’t share a domain. Useful for keeping your dashboards clean and making sure your team’s testing doesn’t pollute customer data.
How knowledge gets used
Every input above flows into one of two places at answer time:- Static portion of the system prompt. Guidance, Internal Docs, and Instructions are inserted into a cached static block that’s identical across every turn of a conversation. Cache hits make these effectively free.
- Dynamic RAG block. URL Sources, Integrations, File Uploads, and auto-learned ticket content are searched semantically against the visitor’s current question. The top-matching chunks are inserted into the prompt for that turn.
Ingestion pipeline
Every knowledge-base source (URLs, integrations, files) flows through the same pipeline:Content extraction
For URLs, the crawler fetches each page, strips navigation/boilerplate, and extracts text + image alt + video transcript references. For files, the content is parsed directly. For integrations, the relevant fields are extracted (e.g. HubSpot engagement body, Zoom transcript text).
Chunking
Content is split into overlapping chunks at sentence boundaries. Halo uses a dual-chunking strategy:
- Parent chunks (~2000 chars) for full context
- Child chunks (~500 chars) for precise matching
Embedding
Each chunk is converted to a vector embedding via Voyage AI. These embeddings capture semantic meaning, so the agent can find relevant content even when the customer’s question doesn’t match exact keywords.
Knobs the AI uses internally
The AI has aknowledge_search tool with these knobs:
- Source type filter. Search only
help_center_article, onlyfile, onlyhubspot, etc. - Top-K results. How many chunks to pull.
- Score threshold. Minimum relevance for inclusion.
Distillation
For some sources (HubSpot engagements, Zoom transcripts, Fathom transcripts, PandaDoc contracts), Halo can run an additional distillation pass. Distillation summarizes raw content into customer-scoped insights (“Acme Corp’s account manager is Sarah; they prefer monthly invoices; their renewal is in March”) that are easier for the AI to find and use. Distilled content is stored alongside the raw content and re-runs when source data changes.Where to start
If you have an existing help center or docs site, the fastest way to bootstrap your AI is to crawl it:- Go to Knowledge -> URL Sources
- Add your help center root (e.g.
help.acme.com) - Wait a few minutes for the crawler to discover and ingest all pages
- Test in Live Train. The AI should now answer questions backed by your real docs.
- Guidance for short org-wide rules the AI must always follow.
- Internal Docs for longer policies and playbooks the AI should always have access to.
- File Uploads for content not on a URL (internal SOPs, runbooks, exported KB articles).
- Integrations for product/customer-specific context that lives in HubSpot, Zoom, etc.
- Instructions for the org-wide voice and persona.
Where to go next
AI Learning Priority
Which source wins when two disagree, with the full priority stack.
URL Sources
Crawl websites, help centers, and docs sites.
Files & Internal Docs
Upload files or write articles directly.
Web Crawler
Technical reference. User-Agent, allowlisting, request behavior.
Learning Rules
Tell the AI what topics to learn from synced data and what to skip.
Live Train
Real-time coaching during a live conversation.