Knowledge

What Knowledge means

The Knowledge section in your dashboard (/dashboard/training) is where you teach your AI everything it needs to answer customers. It bundles together every input the AI uses, organized by what the input is for and how it gets into the prompt. There are two broad shapes of input:

Always-injected directives. Guidance, Internal Docs, and Instructions are added to every reply, no matter what the visitor is asking. Use these for rules and reference content the AI must always have in hand.
Knowledge base (RAG). URL Sources, Integrations, File Uploads, and content auto-learned from resolved tickets are searched at answer time. The AI only sees the chunks that semantically match the visitor’s question. This is where the long tail of your reference material lives.

When two sources tell the AI different things, the higher-priority one wins. See AI Learning Priority for the complete precedence stack and the rationale behind it.

What’s inside the Knowledge section

The dashboard sidebar has eight sections. Here’s what each one is for, what it does, and how it works.

All Data

The unified, searchable view of every chunk ingested across every source: URL crawls, integrations, internal docs, file uploads. Each row in the list represents one source, with its chunk count and last-updated timestamp. What it does. Lets you find or audit any specific piece of content without remembering which sidebar section it came from. The “Search” bar matches across content, end-user email, and company name. How it works. Halo splits each source into small overlapping passages (“chunks”) and stores both the text and a vector embedding for each. At answer time the AI runs a semantic search across every chunk in your org and pulls the top matches into the prompt. See How knowledge gets used for the full pipeline.

Guidance

Short, individually managed directives the AI always follows on every reply. Each guideline is a discrete row you can toggle on/off, edit, or reorder. What it does. Locks in org-wide rules and word choices that should apply to every conversation, every agent. New guidelines start applying as soon as you save them. How it works. Guidance is injected into the prompt above your knowledge base and your agents’ instructions, so it takes precedence on every reply. It is not similarity-gated like the knowledge base. A per-agent Custom Instruction can still override an individual guideline for that specific agent. Examples of good guidance:

“Office hours are 9-5 ET, Monday through Friday.”
“Refer to our pricing as ‘simple, predictable pricing’.”
“Never promise a specific delivery date for in-progress feature requests.”

Guidance is org-wide and applies to every customer the AI talks to. Don’t put customer-specific information here (e.g. “Acme Corp gets 20% off”). For per-customer facts, use traits and context entries on the customer instead.

Read more: AI Learning Priority -> Guidance.

URL Sources

Crawl public web content and keep it in sync as your sites change. Best for help centers, public docs, and marketing pages you want the AI to be able to reference. What it does. You add a URL, Halo crawls the page (and optionally the rest of the site), and the content becomes searchable knowledge. How it works. Two crawl modes:

Single page (default). Just the URL you added.
Crawl entire site. Follow links and ingest the whole site. Mark a URL as a help center to restrict crawling to article links on the same domain. Best for docs and KB sites where you only want articles, not marketing pages.

Halo keeps sites in sync with a daily sitemap diff: only URLs whose <lastmod> advanced since our last crawl get re-scraped, and pages added or removed from the sitemap are picked up automatically. You can trigger a manual full refresh from the row menu any time. Read more: URL Sources and Web Crawler.

Integrations

Pull data from your existing tools so the AI can answer using it, and keep it synced as those tools update. What it does. Connects each tool, syncs its records, and breaks them into discrete training sources. A HubSpot deal becomes one source, a Fathom meeting becomes many sources (one per topic discussed). How each integration syncs:

HubSpot. Contacts, companies, deals, tickets, and KB articles.
Intercom. Conversations, articles, and help center content.
Fathom. Meeting notes and transcripts.
Slack. Channel messages and insights.
Zoom. Meeting transcripts.
PandaDoc. Contracts and proposals.

Data is refreshed automatically on a schedule, or you can trigger a manual sync. Use Learning Rules to filter which topics from synced data the AI should learn from.

Internal Docs

Long-form private reference content the AI can rely on for every reply. Each doc is a titled markdown entry you write directly in the dashboard. What it does. Stores policies, playbooks, FAQs, and any other reference material you want the AI to always have in hand. Customers never see Internal Docs and the AI never links to them in replies. How it works. Internal Docs are always injected into the prompt (similar to Guidance, but for content too long to fit in a single bullet rule). Use Internal Docs when a single guideline isn’t enough to capture what you need to say. Examples of good Internal Docs:

Refund policy. Plan-by-plan rules, refund windows, approval requirements.
Common troubleshooting steps. “If a customer reports the widget isn’t loading, first ask whether they see any errors in the browser console…”
Account cancellation walkthrough. Step-by-step instructions the AI walks customers through.

The AI may quote or paraphrase Internal Docs in answers. Don’t write anything in an Internal Doc that you wouldn’t be okay with the AI repeating to a customer in their own words.

Because Internal Docs are injected into every prompt, they count toward your token budget on every turn. Keep them tight and well-titled. Put long-tail reference material in URL Sources, Integrations, or File Uploads instead.

File Uploads

PDFs, spreadsheets, exported KB articles, internal SOPs you have as a file. Each upload is parsed, chunked, and embedded so the AI can retrieve relevant passages at answer time. What it does. Adds reference content that doesn’t live behind a URL or in a connected tool. Supported formats. PDF, CSV, Excel (xlsx / xls), TXT, DOC, DOCX, Markdown. How it works. Uploaded files appear in this section and their chunks also surface in All Data. Like URL Sources and Integrations, they’re part of the knowledge base, so the AI retrieves them only when the visitor’s question is semantically similar to a chunk. For content that changes often, prefer a URL or integration so updates flow through automatically. Files have to be re-uploaded to update. Read more: Files & Internal Docs.

Instructions

A single org-wide system prompt prepended to every agent. Use it for company-wide voice, tone, and persona that should apply to all agents. What it does. Sets the baseline personality and behavior for every AI conversation in your org. Example: “Be empathetic, concise, and professional. Always prioritize user satisfaction. Never share internal system details.” How it works. These instructions are added to every agent’s prompt before that agent’s own instructions. Each agent’s individual instructions are rendered after the globals, so an agent can soften or override a global directive when it makes sense for that agent’s role. Instructions vs. Guidance. Use Guidance for short, individually-toggled rules the AI must always follow. Use Instructions for longer narrative prompts (voice, persona, response shape) that you want present in every reply.

Internal Users

Tell the AI who’s on your team so they aren’t counted as customers. What it does. Anyone matching a domain or email here is recognized as an internal team member. Their conversations and activity won’t be attributed to end users in analytics or shown as customer interactions. How it works. Add your company domain (e.g. acme.com) to catch all teammates automatically. Add individual emails for contractors or external collaborators on your side that don’t share a domain. Useful for keeping your dashboards clean and making sure your team’s testing doesn’t pollute customer data.

How knowledge gets used

Every input above flows into one of two places at answer time:

Static portion of the system prompt. Guidance, Internal Docs, and Instructions are inserted into a cached static block that’s identical across every turn of a conversation. Cache hits make these effectively free.
Dynamic RAG block. URL Sources, Integrations, File Uploads, and auto-learned ticket content are searched semantically against the visitor’s current question. The top-matching chunks are inserted into the prompt for that turn.

The full ordering and “who wins on conflicts” story is in AI Learning Priority.

Ingestion pipeline

Every knowledge-base source (URLs, integrations, files) flows through the same pipeline:

Content extraction

For URLs, the crawler fetches each page, strips navigation/boilerplate, and extracts text + image alt + video transcript references. For files, the content is parsed directly. For integrations, the relevant fields are extracted (e.g. HubSpot engagement body, Zoom transcript text).

Chunking

Content is split into overlapping chunks at sentence boundaries. Halo uses a dual-chunking strategy:

Parent chunks (~2000 chars) for full context
Child chunks (~500 chars) for precise matching

During search, child chunks match the question, then parent chunks are returned for richer surrounding context.

Embedding

Each chunk is converted to a vector embedding via Voyage AI. These embeddings capture semantic meaning, so the agent can find relevant content even when the customer’s question doesn’t match exact keywords.

Storage and dedup

Chunks and embeddings are stored in your knowledge base. Subsequent ingests use content hashing to skip unchanged content. Fast, cheap, and incremental.

Knobs the AI uses internally

The AI has a knowledge_search tool with these knobs:

Source type filter. Search only help_center_article, only file, only hubspot, etc.
Top-K results. How many chunks to pull.
Score threshold. Minimum relevance for inclusion.

The AI uses these automatically based on the question. You don’t configure them per query, they’re tuned for general support quality.

Distillation

For some sources (HubSpot engagements, Zoom transcripts, Fathom transcripts, PandaDoc contracts), Halo can run an additional distillation pass. Distillation summarizes raw content into customer-scoped insights (“Acme Corp’s account manager is Sarah; they prefer monthly invoices; their renewal is in March”) that are easier for the AI to find and use. Distilled content is stored alongside the raw content and re-runs when source data changes.

Where to start

If you have an existing help center or docs site, the fastest way to bootstrap your AI is to crawl it:

Go to Knowledge -> URL Sources
Add your help center root (e.g. help.acme.com)
Wait a few minutes for the crawler to discover and ingest all pages
Test in Live Train. The AI should now answer questions backed by your real docs.

After that, layer in:

Guidance for short org-wide rules the AI must always follow.
Internal Docs for longer policies and playbooks the AI should always have access to.
File Uploads for content not on a URL (internal SOPs, runbooks, exported KB articles).
Integrations for product/customer-specific context that lives in HubSpot, Zoom, etc.
Instructions for the org-wide voice and persona.

Where to go next

AI Learning Priority

Which source wins when two disagree, with the full priority stack.

URL Sources

Crawl websites, help centers, and docs sites.

Files & Internal Docs

Upload files or write articles directly.

Web Crawler

Technical reference. User-Agent, allowlisting, request behavior.

Learning Rules

Tell the AI what topics to learn from synced data and what to skip.

Live Train

Real-time coaching during a live conversation.

Getting Started

Web Widget

AI Agents

Ask AI

Inbox

Channels

Help Center

Contacts & Companies

Outreach

Settings

Advanced

Knowledge

What Knowledge means

What’s inside the Knowledge section

All Data

Guidance

URL Sources

Integrations

Internal Docs

File Uploads

Instructions

Internal Users

How knowledge gets used

Ingestion pipeline

Knobs the AI uses internally

Distillation

Where to start

Where to go next

AI Learning Priority

URL Sources

Files & Internal Docs

Web Crawler

Learning Rules

Live Train

​What Knowledge means

​What’s inside the Knowledge section

​All Data

​Guidance

​URL Sources

​Integrations

​Internal Docs

​File Uploads

​Instructions

​Internal Users

​How knowledge gets used

​Ingestion pipeline

​Knobs the AI uses internally

​Distillation

​Where to start

​Where to go next

AI Learning Priority

URL Sources

Files & Internal Docs

Web Crawler

Learning Rules

Live Train

What Knowledge means

What’s inside the Knowledge section

All Data

Guidance

URL Sources

Integrations

Internal Docs

File Uploads

Instructions

Internal Users

How knowledge gets used

Ingestion pipeline

Knobs the AI uses internally

Distillation

Where to start

Where to go next