Add a crawl
Paste the URL
Top-level domain works best:
https://acme.com. You can also start from a sub-section (/docs) to scope the crawl.What gets crawled
- Pages on the same domain.
- Content accessible to anonymous visitors (logged-in areas are not fetched).
- HTML — not PDFs linked from pages. Upload those separately as file uploads.
What gets ignored
- Navigation, footers, cookie banners.
- JavaScript-rendered content that’s not prerendered.
- Pages blocked by
robots.txtornoindex. - Off-domain links.
Page limits
| Plan | Pages per source | Pages per month (all sources) |
|---|---|---|
| Starter | 25 (single URL only — no full-site crawl) | 25 |
| Growth | 120 | 120 |
| Business | 1,200 | 1,200 |
| Scale | Unlimited | Unlimited |
Starter is single-URL only. Full-site crawl (sitemap discovery + multi-page traversal) unlocks on Growth+. On Starter, paste the exact page you want indexed; the crawler fetches that one page and stops.
Watching progress
On the Knowledge list, the source status cycles queued → syncing → synced. Click the row to see individual pages, their status, and word count. Click View pages to inspect what was extracted.Keeping content fresh
A crawl is a snapshot at that moment. Two ways to keep it current:- Recrawl manually — open the source → Recrawl. Re-fetches and re-indexes.
- Schedule auto-recrawl — pick the strongest cadence your plan allows on each source.
| Plan | Auto-recrawl cadence | Manual syncs / month |
|---|---|---|
| Starter | Off | 1 |
| Growth | Monthly | 5 |
| Business | Weekly (or monthly) | 20 |
| Scale | Daily (or weekly / monthly) | Unlimited |
Tips
- Start with your top-level domain. Scope down only if the crawl pulls in irrelevant content (blog posts, legal boilerplate).
- After the crawl, chat with the agent (Test) and look for bad answers. Those point at missing or stale pages — patch with a Q&A pair rather than rewriting the site.
- If a specific page shouldn’t be indexed, add it to your
robots.txt.
Troubleshooting
| Issue | Try |
|---|---|
| Crawl status stuck on syncing for >1 hour | Open the source → Retry. If still stuck, contact support. |
| Pages have garbled text | The page is likely JS-rendered. Use a file upload of the content instead. |
| Crawl captured 0 pages | Check the URL scheme (https://), check robots.txt. |
| Answer uses outdated price | The page was crawled before the change. Recrawl the source. |