
Introducing Calliope CLI: Open Source Multi-Model AI for Your Terminal
Your Terminal Just Got Superpowers Today we’re releasing Calliope CLI as open source. It’s a multi-model AI …

A user asks your RAG system: “How do we handle authentication?”
Your system searches the vector database. It finds documents about “OAuth” and “JWT tokens” but misses the section on “single sign-on” and “identity management” that contains exactly what the user needs.
The problem isn’t your documents. It’s the query.
This is the query matching problem: the user’s question doesn’t use the same language as the documents. Query expansion and rewriting solve this by transforming user queries to better match document language and concepts.
This article explores techniques that improve RAG retrieval accuracy by 20-40%.
RAG systems match queries to documents using semantic similarity. But semantic similarity has limits.
The problem in action:
User query: “How do we handle authentication?” Document contains: “Single sign-on integration”, “OAuth 2.0 implementation”, “JWT token validation”
Semantic similarity might not connect “handle authentication” to “single sign-on” if the embedding space doesn’t strongly associate these terms.
Why this happens:
Impact:
Query expansion generates multiple reformulations of the user’s query to increase matching chances.
How it works:
Original query: "How do we handle authentication?"
Expanded queries:
1. "How do we handle authentication?"
2. "How do we implement identity management?"
3. "How do we manage user authentication and authorization?"
4. "What are our authentication mechanisms?"
5. "How do we secure user access?"
6. "How do we implement single sign-on?"
7. "How do we manage OAuth and JWT tokens?"
Search using all queries, combine results, re-rank
Expansion strategies:
1. Synonym expansion: Replace terms with synonyms and related concepts.
"authentication" → "authentication", "identity verification", "user verification", "login"
"handle" → "handle", "implement", "manage", "configure"
"we" → "our", "the system", "the platform"
2. Semantic expansion: Use embeddings to find semantically similar terms.
Query: "How do we handle authentication?"
Similar terms: "authorization", "identity", "access control", "credentials", "verification"
Expanded: "How do we handle authentication, authorization, and access control?"
3. Domain-specific expansion: Add domain-specific terminology and abbreviations.
Query: "How do we handle authentication?"
Domain knowledge: OAuth, SAML, JWT, SSO, LDAP
Expanded: "How do we handle authentication? What about OAuth, SAML, JWT, or SSO?"
4. Hyponym/hypernym expansion: Add more specific and more general terms.
"authentication" (specific) → "security" (general)
"OAuth" (specific) → "authentication protocol" (general)
5. Multi-hop expansion: Generate intermediate queries to find related documents.
Original: "How do we handle authentication?"
Intermediate: "What authentication mechanisms do we use?"
Related: "What is OAuth?", "What is JWT?", "What is SAML?"
Query rewriting reformulates the user’s query to better match document structure and language.
How it works:
The system analyzes the query and rewrites it for better retrieval.
Original query: "How do we handle authentication?"
Rewritten queries:
1. "authentication implementation" (remove question words)
2. "authentication mechanisms systems" (add related concepts)
3. "OAuth JWT SAML authentication" (add specific technologies)
4. "user authentication access control" (add related concepts)
5. "authentication configuration best practices" (add context)
Rewriting techniques:
1. Question-to-statement conversion:
Question: "How do we handle authentication?"
Statement: "Authentication handling implementation"
Why? Statements often match document structure better than questions.
2. Generalization:
Specific: "How do we implement JWT token validation?"
General: "Token validation authentication"
Why? Broader terms match more documents.
3. Specification:
General: "How do we do security?"
Specific: "How do we implement authentication, authorization, and encryption?"
Why? Specific terms match relevant sections better.
4. Contextual rewriting:
Original: "How do we handle authentication?"
With context (user is looking at API docs): "API authentication OAuth JWT"
Why? Context helps narrow to relevant documents.
5. Structure-aware rewriting:
Original: "How do we handle authentication?"
Rewritten for markdown docs: "# Authentication", "## Implementation", "## Configuration"
Why? Matches document heading structure.
Use an LLM to intelligently rewrite queries.
Prompt-based approach:
System prompt:
"You are a query optimization expert. Rewrite the user's query to:
1. Use domain-specific terminology
2. Add related concepts
3. Remove ambiguity
4. Match technical documentation language
Generate 5 variations that would match relevant documents."
User query: "How do we handle authentication?"
LLM output:
1. "authentication implementation"
2. "user authentication mechanisms"
3. "OAuth SAML JWT authentication"
4. "authentication configuration best practices"
5. "identity management access control"
Advantages:
Disadvantages:
Combine query expansion with other RAG improvements.
Full pipeline:
User query: "How do we handle authentication?"
↓
[Query Expansion]
Generate 5-7 variations
↓
[Parallel Retrieval]
Search with each variation
↓
[Result Merging]
Combine results, remove duplicates
↓
[Re-ranking]
Score results by relevance
↓
[Deduplication]
Remove similar results
↓
[Context Assembly]
Select top results for LLM
↓
[LLM Response]
Generate answer with context
Approach 1: Simple Synonym Expansion (Easy)
# Simple synonym-based expansion
synonyms = {
"authentication": ["identity verification", "user verification", "login", "authorization"],
"handle": ["implement", "manage", "configure"],
"we": ["our system", "the platform", "the application"]
}
def expand_query(query):
expanded = [query] # Original
for term, synonyms_list in synonyms.items():
if term in query:
for syn in synonyms_list:
expanded.append(query.replace(term, syn))
return expanded[:5] # Return top 5
# Usage
queries = expand_query("How do we handle authentication?")
# Results: ["How do we handle authentication?", "How do we implement authentication?", ...]
Approach 2: Semantic Expansion (Moderate)
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('all-MiniLM-L6-v2')
def semantic_expand(query, documents, num_expansions=5):
# Get query embedding
query_embedding = model.encode(query)
# Find semantically similar documents
doc_embeddings = model.encode(documents)
similarities = np.dot([query_embedding], doc_embeddings.T)[0]
# Get top similar documents
top_indices = np.argsort(similarities)[-num_expansions:]
similar_docs = [documents[i] for i in top_indices]
# Extract key terms from similar documents
expanded_terms = extract_key_terms(similar_docs)
# Create expanded query
expanded = f"{query} {' '.join(expanded_terms)}"
return expanded
# Usage
expanded = semantic_expand("How do we handle authentication?", document_corpus)
Approach 3: LLM-Based Query Rewriting (Advanced)
import anthropic
client = anthropic.Anthropic()
def llm_rewrite_query(query, domain_context=""):
prompt = f"""You are a search query optimization expert.
Rewrite the user's query in {len(domain_context)} different ways to improve document retrieval.
Domain context: {domain_context}
Original query: {query}
Generate 5 variations that:
1. Use domain-specific terminology
2. Add related concepts
3. Match technical documentation language
4. Increase likelihood of finding relevant documents
Return only the 5 variations, one per line."""
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[
{"role": "user", "content": prompt}
]
)
variations = message.content[0].text.split('
')
return [v.strip() for v in variations if v.strip()]
# Usage
domain = "API authentication, OAuth, JWT, SAML"
expanded = llm_rewrite_query("How do we handle authentication?", domain)
# Results: ["authentication implementation", "OAuth JWT configuration", ...]
Query expansion generates more results. Re-ranking ensures the best results are selected.
def expanded_rag_search(query, vector_db, llm, num_results=10):
# Step 1: Expand query
expanded_queries = expand_query(query)
# Step 2: Search with all queries
all_results = []
for expanded_q in expanded_queries:
results = vector_db.search(expanded_q, top_k=5)
all_results.extend(results)
# Step 3: Remove duplicates (same document retrieved multiple times)
unique_results = {}
for result in all_results:
doc_id = result['id']
if doc_id not in unique_results or result['score'] > unique_results[doc_id]['score']:
unique_results[doc_id] = result
# Step 4: Re-rank using LLM
ranked = rerank_with_llm(
query=query,
documents=list(unique_results.values()),
llm=llm,
top_k=num_results
)
return ranked
def rerank_with_llm(query, documents, llm, top_k=5):
# Create ranking prompt
doc_text = "
".join([f"{i+1}. {doc['content'][:200]}" for i, doc in enumerate(documents)])
prompt = f"""Given the user query, rank these documents by relevance.
Query: {query}
Documents:
{doc_text}
Return the ranking as: 1,3,2,4,... (document numbers in order of relevance)"""
response = llm.generate(prompt)
ranking = parse_ranking(response)
# Reorder documents
return [documents[i-1] for i in ranking[:top_k]]
Case Study: Customer Support System
Company: SaaS platform with 500+ documentation pages Problem: Users couldn’t find answers to common questions Solution: Implemented query expansion + re-ranking
Results:
Query Example:
User: "How do I reset my password?"
Without expansion:
- Retrieved: "Password security best practices"
- Retrieved: "Account settings overview"
- Miss: "Account recovery and password reset"
With expansion:
- Retrieved: "Account recovery and password reset" ✓
- Retrieved: "Password reset procedures"
- Retrieved: "Account management"
1. Start simple, add complexity:
2. Monitor and measure:
3. Tune for your domain:
4. Combine techniques:
5. Cache and optimize:
Mistake 1: Over-expansion Too many query variations → too many results → worse ranking Solution: Limit to 5-7 variations
Mistake 2: Ignoring latency LLM-based rewriting adds latency (500ms-2s) Solution: Use for important queries, cache results
Mistake 3: No re-ranking Expanded queries retrieve more noise Solution: Always re-rank expanded results
Mistake 4: Domain-agnostic expansion Generic expansion misses domain terms Solution: Customize for your domain
Mistake 5: Not measuring impact Can’t tell if expansion helps Solution: A/B test with and without expansion
Chat Studio:
AI Lab:
Langflow:
Query expansion and rewriting improve RAG retrieval accuracy by 20-40% with minimal overhead.
Start with:
The best query expansion strategy is domain-specific. Invest time understanding your documentation language and user query patterns.

Your Terminal Just Got Superpowers Today we’re releasing Calliope CLI as open source. It’s a multi-model AI …

Understanding the Math Behind Modern AI Vector embeddings are everywhere in AI now. They power RAG systems, semantic …