03Active Research

RAG and Knowledge Systems

Retrieval architectures for enterprise knowledge at scale.

RetrievalVector SearchKnowledge GraphsCitation

Overview

Retrieval-augmented generation for enterprise knowledge management. We research retrieval architectures that combine dense vector search with structured data access, citation tracking, and evidence verification for high-stakes applications.

Research Directions

Hybrid retrieval

Combining BM25 keyword search with dense vector retrieval for precision on both lexical and semantic queries.

Multi-hop retrieval

Chain-of-retrieval for queries requiring evidence from multiple disjoint documents.

Citation fidelity

Grounding model outputs to source passages with verifiable references.

Incremental indexing

Online index updates for high-velocity corpora without full reindexing.

Access-controlled knowledge graphs

Per-tenant graph partitions with row-level security at retrieval time.

The retrieval precision problem

Naive top-K vector search retrieves semantically related documents but misses exact matches on names, codes, and identifiers. A pure BM25 retriever misses paraphrase and synonym variants. Hybrid retrieval fuses both signals with Reciprocal Rank Fusion (RRF), yielding MRR@10 improvements of 18-31% over either alone on enterprise QA benchmarks.

Multi-hop reasoning over document graphs

Many enterprise queries require evidence from multiple documents: a contract clause references a policy, which references a regulatory standard. Single-pass retrieval misses these chains. We model documents as nodes in a citation graph and run iterative retrieval that expands the candidate set by following edges until the retrieved context is sufficient to answer the query.

Sufficiency check

After each retrieval hop, a small entailment model checks whether the current context is sufficient to answer the query. This avoids over-retrieval (inflated context cost) and under-retrieval (incomplete answers). Typical queries settle within 2-3 hops.

python

def multi_hop_retrieve(
    query: str,
    graph: DocumentGraph,
    max_hops: int = 3,
) -> list[Document]:
    context: list[Document] = []
    for _ in range(max_hops):
        candidates = hybrid_search(query, graph, exclude=context)
        context.extend(candidates[:3])
        if entailment_model.is_sufficient(query, context):
            break
        query = rewrite_with_context(query, context)
    return context

Citation grounding at generation time

Hallucination in RAG systems typically originates not from retrieval failure but from the generation model synthesizing beyond retrieved evidence. We constrain generation with a citation requirement: every factual claim must be followed by a bracketed source reference. A post-generation verifier checks each claim against its cited passage, flagging unsupported statements for human review.

Incremental vector index updates

Enterprise corpora change continuously. Full reindexing on a 50M document corpus takes hours. We maintain a small delta index for recent documents and merge it into the main index on a rolling schedule. Query time searches both indexes and deduplicates results. This reduces index freshness lag from hours to under 60 seconds on typical update volumes.

PreviousLLM Infrastructure Next Production MLOps