
Introducing Calliope CLI: Open Source Multi-Model AI for Your Terminal
Your Terminal Just Got Superpowers Today we’re releasing Calliope CLI as open source. It’s a multi-model AI …

You’ve built a RAG system. You’ve loaded documents, created embeddings, and deployed a vector database. But when users ask questions, the system retrieves irrelevant passages or fragments that lack context.
The problem isn’t your embedding model. It’s your chunking strategy.
Chunking—how you split documents into retrievable pieces—is the most underestimated component of RAG systems. A poor chunking strategy creates a cascade of failures: incomplete context, fragmented information, and ultimately, poor AI responses.
Yet most RAG implementations use the simplest possible approach: split documents every N characters or tokens.
This article explores advanced chunking strategies that dramatically improve RAG performance. We’ll move beyond naive splitting and examine techniques used by production systems handling billions of documents.
Before diving into solutions, let’s understand why chunking matters.
The chunking challenge:
Real-world impact: A financial services company built RAG for contract analysis. Using fixed-size chunks, the system frequently retrieved partial clauses without their conditions. Restructuring clauses as atomic units improved retrieval accuracy by 35%.
Semantic chunking splits documents based on meaning, not arbitrary boundaries.
The approach:
When to use:
Implementation considerations:
Example:
Document: "The company reported Q3 revenue of $50M, up 20% YoY.
Operating expenses were $30M. Net income reached $20M."
Semantic analysis:
- Sentence 1 (revenue): embedding_1
- Sentence 2 (expenses): embedding_2 → high similarity to revenue (financial context)
- Sentence 3 (net income): embedding_3 → high similarity to both
Result: Single chunk (all related financially)
vs. Fixed chunking: Would split at 100 tokens, breaking the financial statement
Pros:
Cons:
Different document structures require different chunking approaches.
For Markdown/Documentation:
For PDFs with tables:
For Code:
For Scientific papers:
Example - Markdown structure:
# Architecture
## Components
### Database Layer
This is the database layer description.
### API Layer
This is the API layer description.
# Deployment
## Production
Production deployment details.
Poor chunking (fixed size):
This is…”
This is the API…”
Better chunking (structure-aware):
This is the database layer description.”
This is the API layer description.”
Production deployment details.”
Pros:
Cons:
Sliding window chunking creates overlapping chunks to preserve context across boundaries.
The approach:
Example:
Document: [A B C D E F G H I J K L M N O]
Fixed chunking (size 5):
- Chunk 1: A B C D E
- Chunk 2: F G H I J
- Chunk 3: K L M N O
Sliding window (size 5, stride 3):
- Chunk 1: A B C D E
- Chunk 2: C D E F G H
- Chunk 3: F G H I J K
- Chunk 4: I J K L M N O
When to use:
Configuration:
Pros:
Cons:
Attach metadata to chunks to improve filtering and ranking.
Metadata to capture:
Use in RAG:
# Example: RAG with metadata filtering
query = "How do we handle authentication?"
filters = {
"document_type": "architecture",
"version": "latest",
"language": "english"
}
# Retrieve only relevant documents
results = vector_db.search(
query_embedding,
filters=filters,
top_k=5
)
Benefits:
Metadata to include:
chunk = {
"content": "The API uses JWT tokens...",
"metadata": {
"source": "architecture/authentication.md",
"section": "Authentication/JWT",
"document_type": "architecture",
"version": "2.1",
"last_updated": "2025-01-15",
"confidence": 0.95, # How well does this match the semantic intent?
}
}
Production systems often combine multiple strategies.
Example: Document Processing Pipeline
Raw Document
↓
[Detect Document Type]
├─ Markdown → Structure-aware chunking
├─ PDF with tables → Table extraction + context
├─ Code → Function-boundary chunking
└─ Scientific paper → Section-based chunking
↓
[Apply Semantic Refinement]
→ Verify chunks have sufficient context
→ Merge small chunks
→ Split oversized chunks
↓
[Add Metadata]
→ Source, section, type, version
→ Timestamps and confidence scores
↓
[Sliding Window Overlap]
→ Add overlapping chunks for boundary preservation
↓
[Embed and Store]
→ Compute embeddings
→ Store in vector database
Real-world example: A legal tech company processes contracts, case law, and regulations:
Result: 40% improvement in retrieval accuracy vs. fixed-size chunking.
Optimal chunk size depends on your use case.
Small chunks (128-256 tokens):
Medium chunks (512-1024 tokens):
Large chunks (2048+ tokens):
Data-driven optimization:
For each chunk size:
1. Chunk documents
2. Embed and store
3. Run evaluation queries
4. Measure: precision, recall, F1 score
5. Calculate cost (storage + retrieval latency)
6. Select size maximizing quality/cost tradeoff
When designing your chunking strategy:
Document Analysis:
Strategy Selection:
Implementation:
Evaluation:
Maintenance:
Mistake 1: Ignoring document structure Using fixed-size chunks on markdown or PDFs loses semantic information. Use structure-aware chunking instead.
Mistake 2: Chunks too large 1500+ token chunks waste context window and reduce precision. Aim for 512-1024 tokens.
Mistake 3: No overlap at boundaries Information split across chunks doesn’t get retrieved. Use sliding window overlap for critical systems.
Mistake 4: Not capturing metadata Without metadata, you can’t filter by freshness, security, or type. Always capture source, version, and section.
Mistake 5: Static chunking strategy Different document types need different strategies. Implement adaptive chunking based on document type.
Calliope simplifies advanced chunking:
AI Lab:
Chat Studio:
Langflow:
Chunking is where most RAG systems fail silently. A document might be in your vector database, but poor chunking prevents it from being retrieved.
The best chunking strategy:
Invest time in chunking. It’s the foundation of retrieval quality.

Your Terminal Just Got Superpowers Today we’re releasing Calliope CLI as open source. It’s a multi-model AI …

Understanding the Math Behind Modern AI Vector embeddings are everywhere in AI now. They power RAG systems, semantic …