Website crawl

The fastest way to give your agent broad coverage of what your business does is to let it read your website.

Add a crawl

Open Knowledge

Sidebar → Knowledge → New source → Website crawl.

Paste the URL

Top-level domain works best: https://acme.com. You can also start from a sub-section (/docs) to scope the crawl.

Pick an agent

Choose which AI agent should have access. You can reassign later.

Start the crawl

Click Create. Keloa:

Discovers the sitemap (or crawls outward from the URL).
Fetches each page.
Extracts clean text (no menus, no footers).
Chunks it and indexes it.

Small sites finish in a minute; large ones take a few minutes to an hour.

What gets crawled

Pages on the same domain.
Content accessible to anonymous visitors (logged-in areas are not fetched).
HTML — not PDFs linked from pages. Upload those separately as file uploads.

What gets ignored

Navigation, footers, cookie banners.
JavaScript-rendered content that’s not prerendered.
Pages blocked by robots.txt or noindex.
Off-domain links.

Page limits

Plan	Pages per source	Pages per month (all sources)
Starter	25 (single URL only — no full-site crawl)	25
Growth	120	120
Business	1,200	1,200
Scale	Unlimited	Unlimited

If your site has more pages than your cap, the crawl stops at the cap — we prioritise top-level pages first.

Starter is single-URL only. Full-site crawl (sitemap discovery + multi-page traversal) unlocks on Growth+. On Starter, paste the exact page you want indexed; the crawler fetches that one page and stops.

Watching progress

On the Knowledge list, the source status cycles queued → syncing → synced. Click the row to see individual pages, their status, and word count. Click View pages to inspect what was extracted.

Keeping content fresh

A crawl is a snapshot at that moment. Two ways to keep it current:

Recrawl manually — open the source → Recrawl. Re-fetches and re-indexes.
Schedule auto-recrawl — pick the strongest cadence your plan allows on each source.

Plan	Auto-recrawl cadence	Manual syncs / month
Starter	Off	1
Growth	Monthly	5
Business	Weekly (or monthly)	20
Scale	Daily (or weekly / monthly)	Unlimited

The “manual syncs” budget is shared across every source and every kind of sync (recrawl + Shopify resync) per calendar month. When a page is removed from your site, it stays in the index until the next recrawl.

Tips

Start with your top-level domain. Scope down only if the crawl pulls in irrelevant content (blog posts, legal boilerplate).
After the crawl, chat with the agent (Test) and look for bad answers. Those point at missing or stale pages — patch with a Q&A pair rather than rewriting the site.
If a specific page shouldn’t be indexed, add it to your robots.txt.

Troubleshooting

Issue	Try
Crawl status stuck on syncing for >1 hour	Open the source → Retry. If still stuck, contact support.
Pages have garbled text	The page is likely JS-rendered. Use a file upload of the content instead.
Crawl captured 0 pages	Check the URL scheme (`https://`), check `robots.txt`.
Answer uses outdated price	The page was crawled before the change. Recrawl the source.

​Add a crawl

​What gets crawled

​What gets ignored

​Page limits

​Watching progress

​Keeping content fresh

​Tips

​Troubleshooting

Add a crawl

What gets crawled

What gets ignored

Page limits

Watching progress

Keeping content fresh

Tips

Troubleshooting