Retrieval architectures for enterprise knowledge at scale.
Overview
Retrieval-augmented generation for enterprise knowledge management. We research retrieval architectures that combine dense vector search with structured data access, citation tracking, and evidence verification for high-stakes applications.
Research Directions
Hybrid retrieval
Combining BM25 keyword search with dense vector retrieval for precision on both lexical and semantic queries.
Multi-hop retrieval
Chain-of-retrieval for queries requiring evidence from multiple disjoint documents.
Citation fidelity
Grounding model outputs to source passages with verifiable references.
Incremental indexing
Online index updates for high-velocity corpora without full reindexing.
Access-controlled knowledge graphs
Per-tenant graph partitions with row-level security at retrieval time.
Naive top-K vector search retrieves semantically related documents but misses exact matches on names, codes, and identifiers. A pure BM25 retriever misses paraphrase and synonym variants. Hybrid retrieval fuses both signals with Reciprocal Rank Fusion (RRF), yielding MRR@10 improvements of 18-31% over either alone on enterprise QA benchmarks.
Many enterprise queries require evidence from multiple documents: a contract clause references a policy, which references a regulatory standard. Single-pass retrieval misses these chains. We model documents as nodes in a citation graph and run iterative retrieval that expands the candidate set by following edges until the retrieved context is sufficient to answer the query.
Sufficiency check
After each retrieval hop, a small entailment model checks whether the current context is sufficient to answer the query. This avoids over-retrieval (inflated context cost) and under-retrieval (incomplete answers). Typical queries settle within 2-3 hops.
def multi_hop_retrieve(
query: str,
graph: DocumentGraph,
max_hops: int = 3,
) -> list[Document]:
context: list[Document] = []
for _ in range(max_hops):
candidates = hybrid_search(query, graph, exclude=context)
context.extend(candidates[:3])
if entailment_model.is_sufficient(query, context):
break
query = rewrite_with_context(query, context)
return contextHallucination in RAG systems typically originates not from retrieval failure but from the generation model synthesizing beyond retrieved evidence. We constrain generation with a citation requirement: every factual claim must be followed by a bracketed source reference. A post-generation verifier checks each claim against its cited passage, flagging unsupported statements for human review.
Enterprise corpora change continuously. Full reindexing on a 50M document corpus takes hours. We maintain a small delta index for recent documents and merge it into the main index on a rolling schedule. Query time searches both indexes and deduplicates results. This reduces index freshness lag from hours to under 60 seconds on typical update volumes.