Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.insitechat.ai/llms.txt

Use this file to discover all available pages before exploring further.

InsiteChat is built on retrieval-augmented generation (RAG) with a hybrid retrieval layer. The pages below explain — in plain English — how each piece works and why we made the architectural choices we did. Read them in order if you’re new to RAG; skim individually if you’re evaluating specific aspects.

The five core concepts

What is RAG?

Retrieval-Augmented Generation — how InsiteChat grounds AI answers in your content instead of relying on the LLM’s frozen pre-training. The technique that makes chatbots factually accurate.

Embeddings

How text becomes high-dimensional vectors that capture meaning. The foundation of semantic search — finds “cancel my subscription” matches “close my account” even though they share zero keywords.

Hybrid Search

Why InsiteChat combines vector (semantic) and BM25 (keyword) search with Reciprocal Rank Fusion. Catches edge cases — error codes, currency symbols, brand names — that vector-only systems miss.

System Prompts

The standing instruction that defines your chatbot’s persona, tone, and refusal behavior. One small block of text controls every response. InsiteChat ships 6 templates plus persona shaping.

Vector Databases

Where embeddings live. InsiteChat runs on pgvector — PostgreSQL with the vector extension — for single-source-of-truth storage, ACID guarantees, and sub-15ms nearest-neighbor search at scale.

How the pieces fit together

A single query flowing through InsiteChat touches all five concepts:
  1. Your content was previously chunked, embedded, and stored in the vector database at training time.
  2. A visitor types a question. Their question is also embedded.
  3. Hybrid search runs vector + BM25 retrieval and merges the rankings with RRF, returning the top-K most relevant chunks.
  4. The retrieved chunks plus your system prompt plus conversation history are sent to the LLM.
  5. The LLM generates an answer grounded in the retrieved context — this is RAG.
The whole loop runs in 1-3 seconds and produces an answer that cites the source pages it came from.

Why these architectural choices

  • RAG over fine-tuning: lets you update knowledge by editing content (minutes, free) instead of retraining a model (hours, expensive). See What is RAG? § “RAG vs fine-tuning”.
  • Hybrid search over vector-only: catches edge cases (error codes, GST numbers, ₹ symbols, technical terms) that pure semantic search misses. See Hybrid Search.
  • pgvector over a separate vector DB: single source of truth with chatbot config, conversations, and leads. No two-store sync problems. See Vector Databases.
  • Custom Q&A pairs override retrieval: precise control on high-stakes answers (pricing, refunds, hours) without losing the flexibility of RAG. See Custom Q&A.