preloader
blog post

Query Expansion & Rewriting for RAG: Improving Retrieval Accuracy

author image

Why Your RAG System Misses Relevant Documents (And How Query Expansion Fixes It)

A user asks your RAG system: “How do we handle authentication?”

Your system searches the vector database. It finds documents about “OAuth” and “JWT tokens” but misses the section on “single sign-on” and “identity management” that contains exactly what the user needs.

The problem isn’t your documents. It’s the query.

This is the query matching problem: the user’s question doesn’t use the same language as the documents. Query expansion and rewriting solve this by transforming user queries to better match document language and concepts.

This article explores techniques that improve RAG retrieval accuracy by 20-40%.

The Query Matching Problem

RAG systems match queries to documents using semantic similarity. But semantic similarity has limits.

The problem in action:

User query: “How do we handle authentication?” Document contains: “Single sign-on integration”, “OAuth 2.0 implementation”, “JWT token validation”

Semantic similarity might not connect “handle authentication” to “single sign-on” if the embedding space doesn’t strongly associate these terms.

Why this happens:

  1. Vocabulary mismatch: User says “authentication”, documents say “identity management”
  2. Abstraction level mismatch: User asks about “security”, documents discuss “cryptographic protocols”
  3. Context loss: User’s short query lacks context that appears in documents
  4. Implicit assumptions: User assumes knowledge the system doesn’t have
  5. Domain terminology: Industry-specific terms might not be in training data

Impact:

  • Relevant documents don’t get retrieved
  • User gets incomplete or irrelevant answers
  • System appears to lack knowledge it actually has

Solution 1: Query Expansion

Query expansion generates multiple reformulations of the user’s query to increase matching chances.

How it works:

Original query: "How do we handle authentication?"

Expanded queries:
1. "How do we handle authentication?"
2. "How do we implement identity management?"
3. "How do we manage user authentication and authorization?"
4. "What are our authentication mechanisms?"
5. "How do we secure user access?"
6. "How do we implement single sign-on?"
7. "How do we manage OAuth and JWT tokens?"

Search using all queries, combine results, re-rank

Expansion strategies:

1. Synonym expansion: Replace terms with synonyms and related concepts.

"authentication" → "authentication", "identity verification", "user verification", "login"
"handle" → "handle", "implement", "manage", "configure"
"we" → "our", "the system", "the platform"

2. Semantic expansion: Use embeddings to find semantically similar terms.

Query: "How do we handle authentication?"
Similar terms: "authorization", "identity", "access control", "credentials", "verification"
Expanded: "How do we handle authentication, authorization, and access control?"

3. Domain-specific expansion: Add domain-specific terminology and abbreviations.

Query: "How do we handle authentication?"
Domain knowledge: OAuth, SAML, JWT, SSO, LDAP
Expanded: "How do we handle authentication? What about OAuth, SAML, JWT, or SSO?"

4. Hyponym/hypernym expansion: Add more specific and more general terms.

"authentication" (specific) → "security" (general)
"OAuth" (specific) → "authentication protocol" (general)

5. Multi-hop expansion: Generate intermediate queries to find related documents.

Original: "How do we handle authentication?"
Intermediate: "What authentication mechanisms do we use?"
Related: "What is OAuth?", "What is JWT?", "What is SAML?"

Solution 2: Query Rewriting

Query rewriting reformulates the user’s query to better match document structure and language.

How it works:

The system analyzes the query and rewrites it for better retrieval.

Original query: "How do we handle authentication?"

Rewritten queries:
1. "authentication implementation" (remove question words)
2. "authentication mechanisms systems" (add related concepts)
3. "OAuth JWT SAML authentication" (add specific technologies)
4. "user authentication access control" (add related concepts)
5. "authentication configuration best practices" (add context)

Rewriting techniques:

1. Question-to-statement conversion:

Question: "How do we handle authentication?"
Statement: "Authentication handling implementation"

Why? Statements often match document structure better than questions.

2. Generalization:

Specific: "How do we implement JWT token validation?"
General: "Token validation authentication"

Why? Broader terms match more documents.

3. Specification:

General: "How do we do security?"
Specific: "How do we implement authentication, authorization, and encryption?"

Why? Specific terms match relevant sections better.

4. Contextual rewriting:

Original: "How do we handle authentication?"
With context (user is looking at API docs): "API authentication OAuth JWT"

Why? Context helps narrow to relevant documents.

5. Structure-aware rewriting:

Original: "How do we handle authentication?"
Rewritten for markdown docs: "# Authentication", "## Implementation", "## Configuration"

Why? Matches document heading structure.

Solution 3: LLM-Based Query Rewriting

Use an LLM to intelligently rewrite queries.

Prompt-based approach:

System prompt:
"You are a query optimization expert. Rewrite the user's query to:
1. Use domain-specific terminology
2. Add related concepts
3. Remove ambiguity
4. Match technical documentation language

Generate 5 variations that would match relevant documents."

User query: "How do we handle authentication?"

LLM output:
1. "authentication implementation"
2. "user authentication mechanisms"
3. "OAuth SAML JWT authentication"
4. "authentication configuration best practices"
5. "identity management access control"

Advantages:

  • Understands domain context
  • Generates natural variations
  • Can use few-shot examples
  • Adapts to your documentation

Disadvantages:

  • Requires LLM API calls (cost and latency)
  • Quality depends on prompt engineering
  • May hallucinate irrelevant terms

Solution 4: Hybrid RAG with Query Expansion

Combine query expansion with other RAG improvements.

Full pipeline:

User query: "How do we handle authentication?"
    ↓
[Query Expansion]
Generate 5-7 variations
    ↓
[Parallel Retrieval]
Search with each variation
    ↓
[Result Merging]
Combine results, remove duplicates
    ↓
[Re-ranking]
Score results by relevance
    ↓
[Deduplication]
Remove similar results
    ↓
[Context Assembly]
Select top results for LLM
    ↓
[LLM Response]
Generate answer with context

Implementation Approaches

Approach 1: Simple Synonym Expansion (Easy)

# Simple synonym-based expansion
synonyms = {
    "authentication": ["identity verification", "user verification", "login", "authorization"],
    "handle": ["implement", "manage", "configure"],
    "we": ["our system", "the platform", "the application"]
}

def expand_query(query):
    expanded = [query]  # Original
    
    for term, synonyms_list in synonyms.items():
        if term in query:
            for syn in synonyms_list:
                expanded.append(query.replace(term, syn))
    
    return expanded[:5]  # Return top 5

# Usage
queries = expand_query("How do we handle authentication?")
# Results: ["How do we handle authentication?", "How do we implement authentication?", ...]

Approach 2: Semantic Expansion (Moderate)

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')

def semantic_expand(query, documents, num_expansions=5):
    # Get query embedding
    query_embedding = model.encode(query)
    
    # Find semantically similar documents
    doc_embeddings = model.encode(documents)
    similarities = np.dot([query_embedding], doc_embeddings.T)[0]
    
    # Get top similar documents
    top_indices = np.argsort(similarities)[-num_expansions:]
    similar_docs = [documents[i] for i in top_indices]
    
    # Extract key terms from similar documents
    expanded_terms = extract_key_terms(similar_docs)
    
    # Create expanded query
    expanded = f"{query} {' '.join(expanded_terms)}"
    
    return expanded

# Usage
expanded = semantic_expand("How do we handle authentication?", document_corpus)

Approach 3: LLM-Based Query Rewriting (Advanced)

import anthropic

client = anthropic.Anthropic()

def llm_rewrite_query(query, domain_context=""):
    prompt = f"""You are a search query optimization expert. 
Rewrite the user's query in {len(domain_context)} different ways to improve document retrieval.

Domain context: {domain_context}

Original query: {query}

Generate 5 variations that:
1. Use domain-specific terminology
2. Add related concepts
3. Match technical documentation language
4. Increase likelihood of finding relevant documents

Return only the 5 variations, one per line."""

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    
    variations = message.content[0].text.split('
')
    return [v.strip() for v in variations if v.strip()]

# Usage
domain = "API authentication, OAuth, JWT, SAML"
expanded = llm_rewrite_query("How do we handle authentication?", domain)
# Results: ["authentication implementation", "OAuth JWT configuration", ...]

Combining with Re-ranking

Query expansion generates more results. Re-ranking ensures the best results are selected.

def expanded_rag_search(query, vector_db, llm, num_results=10):
    # Step 1: Expand query
    expanded_queries = expand_query(query)
    
    # Step 2: Search with all queries
    all_results = []
    for expanded_q in expanded_queries:
        results = vector_db.search(expanded_q, top_k=5)
        all_results.extend(results)
    
    # Step 3: Remove duplicates (same document retrieved multiple times)
    unique_results = {}
    for result in all_results:
        doc_id = result['id']
        if doc_id not in unique_results or result['score'] > unique_results[doc_id]['score']:
            unique_results[doc_id] = result
    
    # Step 4: Re-rank using LLM
    ranked = rerank_with_llm(
        query=query,
        documents=list(unique_results.values()),
        llm=llm,
        top_k=num_results
    )
    
    return ranked

def rerank_with_llm(query, documents, llm, top_k=5):
    # Create ranking prompt
    doc_text = "
".join([f"{i+1}. {doc['content'][:200]}" for i, doc in enumerate(documents)])
    
    prompt = f"""Given the user query, rank these documents by relevance.

Query: {query}

Documents:
{doc_text}

Return the ranking as: 1,3,2,4,... (document numbers in order of relevance)"""
    
    response = llm.generate(prompt)
    ranking = parse_ranking(response)
    
    # Reorder documents
    return [documents[i-1] for i in ranking[:top_k]]

Real-World Performance Impact

Case Study: Customer Support System

Company: SaaS platform with 500+ documentation pages Problem: Users couldn’t find answers to common questions Solution: Implemented query expansion + re-ranking

Results:

  • Retrieval accuracy: 65% → 89% (+24%)
  • User satisfaction: 3.2/5 → 4.1/5
  • Support ticket reduction: 15%
  • Implementation time: 2 weeks

Query Example:

User: "How do I reset my password?"

Without expansion:
- Retrieved: "Password security best practices"
- Retrieved: "Account settings overview"
- Miss: "Account recovery and password reset"

With expansion:
- Retrieved: "Account recovery and password reset" ✓
- Retrieved: "Password reset procedures"
- Retrieved: "Account management"

Best Practices

1. Start simple, add complexity:

  • Begin with synonym expansion (easy, fast)
  • Add semantic expansion if needed
  • Use LLM rewriting for critical queries

2. Monitor and measure:

  • Track retrieval metrics (precision, recall)
  • Measure LLM response quality
  • Monitor latency and cost

3. Tune for your domain:

  • Create domain-specific synonym lists
  • Fine-tune expansion parameters
  • Test with real user queries

4. Combine techniques:

  • Use query expansion for coverage
  • Use re-ranking for precision
  • Use LLM rewriting for complex queries

5. Cache and optimize:

  • Cache expanded queries
  • Reuse expansion results
  • Batch LLM calls

Common Mistakes

Mistake 1: Over-expansion Too many query variations → too many results → worse ranking Solution: Limit to 5-7 variations

Mistake 2: Ignoring latency LLM-based rewriting adds latency (500ms-2s) Solution: Use for important queries, cache results

Mistake 3: No re-ranking Expanded queries retrieve more noise Solution: Always re-rank expanded results

Mistake 4: Domain-agnostic expansion Generic expansion misses domain terms Solution: Customize for your domain

Mistake 5: Not measuring impact Can’t tell if expansion helps Solution: A/B test with and without expansion

Query Expansion in Calliope

Chat Studio:

  • Automatic query expansion
  • Configurable expansion strategies
  • Built-in re-ranking
  • Performance monitoring

AI Lab:

  • Build custom expansion logic
  • Experiment with strategies
  • Fine-tune for your documents
  • Evaluate on test queries

Langflow:

  • Visual query expansion workflows
  • Connect to LLM for rewriting
  • Chain with retrieval and ranking
  • Debug and optimize

The Bottom Line

Query expansion and rewriting improve RAG retrieval accuracy by 20-40% with minimal overhead.

Start with:

  1. Simple synonym expansion (5-10 minutes)
  2. Measure baseline retrieval quality
  3. Add semantic expansion if needed
  4. Use LLM rewriting for complex queries
  5. Always combine with re-ranking

The best query expansion strategy is domain-specific. Invest time understanding your documentation language and user query patterns.

Build better RAG retrieval with Calliope →

Related Articles