preloader
blog post

Advanced Chunking Strategies for RAG: Beyond Simple Splits

author image

Why Your RAG System Retrieves the Wrong Information (And How Chunking Fixes It)

You’ve built a RAG system. You’ve loaded documents, created embeddings, and deployed a vector database. But when users ask questions, the system retrieves irrelevant passages or fragments that lack context.

The problem isn’t your embedding model. It’s your chunking strategy.

Chunking—how you split documents into retrievable pieces—is the most underestimated component of RAG systems. A poor chunking strategy creates a cascade of failures: incomplete context, fragmented information, and ultimately, poor AI responses.

Yet most RAG implementations use the simplest possible approach: split documents every N characters or tokens.

This article explores advanced chunking strategies that dramatically improve RAG performance. We’ll move beyond naive splitting and examine techniques used by production systems handling billions of documents.

The Chunking Problem

Before diving into solutions, let’s understand why chunking matters.

The chunking challenge:

  1. Too small chunks lose context. A sentence about “quarterly revenue” without surrounding financial context becomes meaningless.
  2. Too large chunks waste retrieval. If you retrieve 2,000 tokens when only 200 are relevant, you waste context window and confuse the LLM.
  3. Arbitrary boundaries split information unnaturally. A chunk that breaks a table in half is useless.
  4. Document structure is ignored. Headers, paragraphs, tables, and code blocks have different semantics.

Real-world impact: A financial services company built RAG for contract analysis. Using fixed-size chunks, the system frequently retrieved partial clauses without their conditions. Restructuring clauses as atomic units improved retrieval accuracy by 35%.

Strategy 1: Semantic Chunking

Semantic chunking splits documents based on meaning, not arbitrary boundaries.

The approach:

  1. Process document into sentences or small units
  2. Calculate embeddings for each unit
  3. Measure semantic similarity between consecutive units
  4. Create chunk boundaries where similarity drops below a threshold
  5. Merge small chunks to meet minimum size requirements

When to use:

  • Documents with natural semantic boundaries (articles, reports, documentation)
  • When retrieval precision is critical
  • When you can afford embedding computation at ingestion time

Implementation considerations:

  • Requires computing embeddings for every sentence/unit during ingestion
  • More expensive than simple splitting, but better retrieval justifies the cost
  • Threshold selection requires tuning (typically 0.5-0.7 similarity)
  • Works best with dense embedding models

Example:

Document: "The company reported Q3 revenue of $50M, up 20% YoY. 
Operating expenses were $30M. Net income reached $20M."

Semantic analysis:
- Sentence 1 (revenue): embedding_1
- Sentence 2 (expenses): embedding_2 → high similarity to revenue (financial context)
- Sentence 3 (net income): embedding_3 → high similarity to both

Result: Single chunk (all related financially)

vs. Fixed chunking: Would split at 100 tokens, breaking the financial statement

Pros:

  • Respects document semantics
  • Reduces context fragmentation
  • Improves retrieval relevance

Cons:

  • Computationally expensive at ingestion
  • Requires tuning threshold parameter
  • May create variable-sized chunks (harder to optimize batch processing)

Strategy 2: Structure-Aware Chunking

Different document structures require different chunking approaches.

For Markdown/Documentation:

  • Respect heading hierarchy
  • Keep sections together
  • Treat code blocks as atomic units
  • Preserve list structure

For PDFs with tables:

  • Extract tables as separate chunks
  • Include surrounding context
  • Preserve table structure (don’t flatten)
  • Handle multi-page tables carefully

For Code:

  • Respect function/class boundaries
  • Keep related functions together
  • Include docstrings with code
  • Preserve imports and dependencies

For Scientific papers:

  • Split by sections (abstract, introduction, methods, results, discussion)
  • Keep equations with surrounding text
  • Treat figures/captions as separate chunks
  • Link related sections

Example - Markdown structure:

# Architecture

## Components
### Database Layer
This is the database layer description.
### API Layer
This is the API layer description.

# Deployment
## Production
Production deployment details.

Poor chunking (fixed size):

  • Chunk 1: “# Architecture

Components

Database Layer

This is…”

  • Chunk 2: “…database layer description.

API Layer

This is the API…”

  • Chunk 3: “…layer description.

Deployment

Production…”

Better chunking (structure-aware):

  • Chunk 1: “# Architecture

Components

Database Layer

This is the database layer description.”

  • Chunk 2: “## Components

API Layer

This is the API layer description.”

  • Chunk 3: “# Deployment

Production

Production deployment details.”

Pros:

  • Respects document structure
  • Preserves context boundaries
  • Improves semantic coherence

Cons:

  • Requires document type detection
  • More complex implementation
  • May need custom parsing per format

Strategy 3: Sliding Window with Overlap

Sliding window chunking creates overlapping chunks to preserve context across boundaries.

The approach:

  1. Create fixed-size chunks (e.g., 512 tokens)
  2. Create overlapping chunks with stride (e.g., 256 tokens)
  3. Overlap preserves context that would be lost at chunk boundaries

Example:

Document: [A B C D E F G H I J K L M N O]

Fixed chunking (size 5):
- Chunk 1: A B C D E
- Chunk 2: F G H I J
- Chunk 3: K L M N O

Sliding window (size 5, stride 3):
- Chunk 1: A B C D E
- Chunk 2: C D E F G H
- Chunk 3: F G H I J K
- Chunk 4: I J K L M N O

When to use:

  • When boundary information is critical
  • For question-answering systems (overlap helps context)
  • When chunk size is small relative to document

Configuration:

  • Chunk size: 256-1024 tokens (depends on use case)
  • Stride: 50-80% of chunk size (typical: stride = chunk_size / 2)
  • Larger overlap = more redundancy but better context preservation

Pros:

  • Prevents context loss at boundaries
  • Improves retrieval for questions spanning chunks
  • Simple to implement

Cons:

  • Increases storage and retrieval cost (more chunks)
  • Can cause duplicate content in results
  • Requires deduplication logic

Strategy 4: Metadata-Aware Chunking

Attach metadata to chunks to improve filtering and ranking.

Metadata to capture:

  • Document source and version
  • Section/heading hierarchy
  • Document type (article, code, table, etc.)
  • Creation date and modification date
  • Author and permissions
  • Language and domain

Use in RAG:

# Example: RAG with metadata filtering
query = "How do we handle authentication?"
filters = {
    "document_type": "architecture",
    "version": "latest",
    "language": "english"
}

# Retrieve only relevant documents
results = vector_db.search(
    query_embedding,
    filters=filters,
    top_k=5
)

Benefits:

  • Filter by document freshness (ignore outdated docs)
  • Enforce security (only retrieve docs user has access to)
  • Improve ranking (prefer architecture docs over blog posts)
  • Support multi-version systems (retrieve from specific version)

Metadata to include:

chunk = {
    "content": "The API uses JWT tokens...",
    "metadata": {
        "source": "architecture/authentication.md",
        "section": "Authentication/JWT",
        "document_type": "architecture",
        "version": "2.1",
        "last_updated": "2025-01-15",
        "confidence": 0.95,  # How well does this match the semantic intent?
    }
}

Strategy 5: Hybrid Chunking (Combining Approaches)

Production systems often combine multiple strategies.

Example: Document Processing Pipeline

Raw Document
    ↓
[Detect Document Type]
    ├─ Markdown → Structure-aware chunking
    ├─ PDF with tables → Table extraction + context
    ├─ Code → Function-boundary chunking
    └─ Scientific paper → Section-based chunking
    ↓
[Apply Semantic Refinement]
    → Verify chunks have sufficient context
    → Merge small chunks
    → Split oversized chunks
    ↓
[Add Metadata]
    → Source, section, type, version
    → Timestamps and confidence scores
    ↓
[Sliding Window Overlap]
    → Add overlapping chunks for boundary preservation
    ↓
[Embed and Store]
    → Compute embeddings
    → Store in vector database

Real-world example: A legal tech company processes contracts, case law, and regulations:

  1. Contracts: Structure-aware (by section/clause) + metadata (contract type, parties)
  2. Case law: Semantic chunking (by legal principle) + sliding window (preserve precedent context)
  3. Regulations: Hierarchy-aware (by section/subsection) + metadata (jurisdiction, date)

Result: 40% improvement in retrieval accuracy vs. fixed-size chunking.

Advanced Consideration: Chunk Size Optimization

Optimal chunk size depends on your use case.

Small chunks (128-256 tokens):

  • Pros: Precise retrieval, less noise, lower storage
  • Cons: May lack context, more retrieval calls
  • Use for: QA systems, precise fact retrieval

Medium chunks (512-1024 tokens):

  • Pros: Balance precision and context, standard for most RAG
  • Cons: May include irrelevant info, moderate storage
  • Use for: General RAG systems, documentation

Large chunks (2048+ tokens):

  • Pros: Rich context, fewer retrieval calls
  • Cons: Noisy retrieval, wastes context window
  • Use for: Summarization, document analysis

Data-driven optimization:

For each chunk size:
1. Chunk documents
2. Embed and store
3. Run evaluation queries
4. Measure: precision, recall, F1 score
5. Calculate cost (storage + retrieval latency)
6. Select size maximizing quality/cost tradeoff

The Chunking Checklist

When designing your chunking strategy:

Document Analysis:

  • Understand document structure (markdown, PDF, code, etc.)
  • Identify natural boundaries (sections, functions, tables)
  • Determine optimal chunk size for your domain
  • Plan for document updates (versioning strategy)

Strategy Selection:

  • Choose primary strategy (semantic, structure-aware, sliding window, or hybrid)
  • Plan metadata capture
  • Define filtering and ranking criteria
  • Design for your specific use cases

Implementation:

  • Implement document parsing
  • Implement chunking logic
  • Add metadata extraction
  • Handle edge cases (very long sections, tables, code blocks)
  • Plan for incremental updates

Evaluation:

  • Create test queries covering your use cases
  • Measure retrieval quality (precision, recall)
  • Measure performance (latency, cost)
  • Iterate on chunk size and strategy
  • Monitor in production

Maintenance:

  • Track document versions
  • Update chunks when documents change
  • Monitor retrieval quality over time
  • Adjust strategy based on query patterns

Common Chunking Mistakes

Mistake 1: Ignoring document structure Using fixed-size chunks on markdown or PDFs loses semantic information. Use structure-aware chunking instead.

Mistake 2: Chunks too large 1500+ token chunks waste context window and reduce precision. Aim for 512-1024 tokens.

Mistake 3: No overlap at boundaries Information split across chunks doesn’t get retrieved. Use sliding window overlap for critical systems.

Mistake 4: Not capturing metadata Without metadata, you can’t filter by freshness, security, or type. Always capture source, version, and section.

Mistake 5: Static chunking strategy Different document types need different strategies. Implement adaptive chunking based on document type.

Chunking in Calliope

Calliope simplifies advanced chunking:

AI Lab:

  • Build custom chunking pipelines
  • Experiment with different strategies
  • Test on your documents
  • Iterate before production

Chat Studio:

  • Automatic structure-aware chunking
  • Configurable chunk size and overlap
  • Metadata extraction from documents
  • Semantic refinement options

Langflow:

  • Visual chunking workflow builder
  • Connect to document sources
  • Apply transformations
  • Visualize chunk boundaries

The Bottom Line

Chunking is where most RAG systems fail silently. A document might be in your vector database, but poor chunking prevents it from being retrieved.

The best chunking strategy:

  1. Respects your document structure
  2. Preserves semantic boundaries
  3. Includes rich metadata
  4. Uses overlap to preserve context
  5. Is tailored to your use cases

Invest time in chunking. It’s the foundation of retrieval quality.

Build RAG with advanced chunking in Calliope →

Related Articles