Building a Production RAG Pipeline with Bedrock and OpenSearch Serverless

May 14, 2026guides

The first RAG pipeline I built in anger was a Saturday afternoon affair: a LangChain notebook, a FAISS index sitting on local disk, and an embedding loop. But as soon as that demo hits the real world, the questions change. How do you handle 10,000 documents? How do you refresh the index without rebuilding from scratch? Who owns the IAM policies? And finally, what is the cost floor?

Amazon Bedrock Knowledge Bases is the enterprise answer to these questions. It takes the "small distributed system" of RAG—the chunking, the embedding pipeline, the vector store provisioning, and the sync logic—and folds them into a managed service.

AWS Production RAG Architecture: S3 to OpenSearch Serverless


The Vector Backend Decision Matrix

The default vector backend for Bedrock is OpenSearch Serverless (OSS). It is a fine default, but it is also the most expensive, and understanding the OCU floor matters before you sign your team up for the bill.

Vector Backend Cost Floor Latency Best For...
OpenSearch Serverless ~$345/mo (2 OCU min) Sub-100ms High traffic, Hybrid Search, standard AWS RAG.
S3 Vectors Pay-per-request 100ms - 1s Spiky traffic, indices up to 2 billion vectors (GA Dec 2025).
Aurora PostgreSQL Instance price Variable Small datasets, SQL-familiar access patterns.
Pinecone / MongoDB SaaS pricing Variable Existing platform investment outside of AWS.

Security & IAM: The Tripartite Trust Model

The mental model: the Bedrock Service Role is the one doing the work. The Data Access Policy on the OSS collection must explicitly grant that service role permission to touch the collection, because IAM alone is not sufficient.

Bedrock IAM & Data Access Model

1. Trust Policy

Lets Bedrock assume the role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "bedrock.amazonaws.com" },
      "Action": "sts:AssumeRole"
    }
  ]
}

2. Permissions Policy

Read S3, invoke embedding model, and write to OSS.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::my-kb-docs", "arn:aws:s3:::my-kb-docs/*"]
    },
    {
      "Effect": "Allow",
      "Action": "bedrock:InvokeModel",
      "Resource": "arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v2:0"
    },
    {
      "Effect": "Allow",
      "Action": "aoss:APIAccessAll",
      "Resource": "arn:aws:aoss:*:*:collection/*"
    }
  ]
}

Advanced Chunking Strategies

The two knobs that meaningfully affect retrieval quality are the chunking strategy and the embedding model. Choice is more consequential than the documentation suggests.

RAG Chunking Strategy Comparison

  • Fixed-Size: 300-token slices. Predictable, but splits tables and code blocks.
  • Hierarchical: Retrieves on small child chunks but returns the 1500-token parent to the model. Best for technical docs.
  • Semantic: Uses an embedding model to detect topic shifts. Highest quality for narrative content, but slowest to compute.

Implementation: The Boto3 SDK Path

Here is the exact code path to stand up a Knowledge Base with Hierarchical Chunking.

import boto3
import time

bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')

# Step 1: Create the Knowledge Base
kb = bedrock_agent.create_knowledge_base(
    name='company-docs-kb',
    description='Internal policy and engineering docs',
    roleArn='arn:aws:iam::123456789012:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/abc123',
            'vectorIndexName': 'company-docs-index',
            'fieldMapping': {
                'vectorField': 'embedding',
                'textField': 'text',
                'metadataField': 'metadata'
            }
        }
    }
)

kb_id = kb['knowledgeBase']['knowledgeBaseId']

# Step 2: Attach S3 Data Source with Hierarchical Chunking
ds = bedrock_agent.create_data_source(
    knowledgeBaseId=kb_id,
    name='company-docs-s3',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-kb-docs'
        }
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'HIERARCHICAL',
            'hierarchicalChunkingConfiguration': {
                'levelConfigurations': [
                    {'maxTokens': 1500}, 
                    {'maxTokens': 300}
                ],
                'overlapTokens': 60
            }
        }
    }
)

# Step 3: Kick off the first ingestion job
job = bedrock_agent.start_ingestion_job(
    knowledgeBaseId=kb_id, 
    dataSourceId=ds['dataSource']['dataSourceId']
)
job_id = job['ingestionJob']['ingestionJobId']

# Step 4: (Expert Path) Poll for completion and check statistics
while True:
    status = bedrock_agent.get_ingestion_job(
        knowledgeBaseId=kb_id,
        dataSourceId=ds['dataSource']['dataSourceId'],
        ingestionJobId=job_id
    )['ingestionJob']
    
    print(f"Status: {status['status']}")
    if status['status'] in ['COMPLETE', 'FAILED', 'STOPPED']:
        # The Statistics block tells you exactly what failed
        stats = status['statistics']
        print(f"Ingested: {stats['numberOfDocumentsScanned']}")
        print(f"Failed: {stats['numberOfDocumentsFailed']}")
        break
    time.sleep(30)

Querying the Knowledge Base

Using the retrieve_and_generate API to get grounded answers with citations.

import boto3

runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

response = runtime.retrieve_and_generate(
    input={'text': 'What is our policy on remote work for engineering?'},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': kb_id,
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0',
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'overrideSearchType': 'HYBRID' # Essential for keyword+vector
                }
            }
        }
    }
)

print(response['output']['text'])
for citation in response.get('citations', []):
    for ref in citation.get('retrievedReferences', []):
        print(ref['location'], ref['content']['text'][:120])

The Production Checklist

  1. Sync Failures: Always monitor the statistics block in get_ingestion_job. Corrupted PDFs will fail silently, leaving gaps in your index.
  2. Metadata Filtering: Use .metadata.json sidecar files in S3. This is mandatory for multi-audience KBs to prevent "vibes-based" disambiguation.
  3. Model Migrations: You cannot swap embedding models in an existing KB. You must create a new KB, re-ingest, and cut over at the application layer.
  4. Cost Monitoring: OpenSearch Serverless bills by OCU (2 min). A single misconfigured retry loop can burn $1,000 in an afternoon. Use Budget Alarms.

The 2026 Roadmap

The focus is shifting to the edges. S3 Vectors changed the economics for large RAG deployments overnight. AgentCore is increasingly the choice for systems that need to take actions, while Bedrock Data Automation has become the best way to parse complex PDFs with tables and figures.

For multi-modal workloads, Amazon Nova Multimodal Embeddings V1 (3072 dimensions) is the new standard, enabling RAG over product catalogs and manuals where diagrams matter as much as text.

This architecture is unglamorous and well-documented—the only kind that survives the shift from demo to system.