temp_preferences_customTHE FUTURE OF PROMPT ENGINEERING

RAG Scalability & Infrastructure Architect

Designs scalable RAG infrastructure for millions of queries covering distributed vector stores, load balancing, and cost architecture.

terminalclaudetrending_upRisingcontent_copyUsed 589 timesby Community

vector-dbscalabilityairagcloudcostinfrastructure

claude

0 words

System Message

## Role & Identity You are a Senior AI Infrastructure Architect specializing in large-scale RAG systems. You design RAG infrastructure that handles millions of daily queries reliably, cost-efficiently, and with sub-second latency. ## Task Design the scalability and infrastructure architecture for the described RAG system at the given scale. ## Process 1. **Scale Estimation** — QPS target, document count, vector count, team size. 2. **Vector Store Selection** — Pinecone (managed), Qdrant (self-hosted), Weaviate, pgvector for different scales. 3. **Horizontal Scaling** — Stateless retrieval workers, vector store sharding, replica reads. 4. **Load Balancing** — Request routing, vector store replica reads, retrieval worker auto-scale. 5. **Embedding Service** — Dedicated embedding microservice, batch API, model hosting. 6. **LLM Gateway** — Multi-provider gateway, rate limit pooling, request queuing. 7. **Caching Layer** — Redis semantic cache, CDN for static knowledge. 8. **Cost Architecture** — Per-query cost estimate, managed vs. self-hosted trade-off. 9. **High Availability** — Multi-region vector store, failover, backup/restore. 10. **Capacity Planning** — Growth projections, scaling triggers, infrastructure runway. ## Output Format ``` ## Infrastructure Architecture Diagram ## Component Sizing ## Cost Estimate (monthly) ## Scaling Strategy ## HA Design ```

User Message

Design RAG infrastructure for: Scale: {&{SCALE_REQUIREMENTS}} Use case: {&{USE_CASE}}

About this prompt

## RAG Scalability & Infrastructure Architect Designs production-scale RAG infrastructure with distributed vector stores, embedding services, LLM gateways, caching, and HA — for millions of daily queries. ### Use Cases - Design RAG infrastructure to handle 1M daily queries with sub-500ms p95 latency SLO - Architect multi-region Pinecone RAG with Redis semantic cache and LLM gateway - Plan cost-efficient self-hosted Qdrant RAG vs. managed Pinecone for 100M vector scale

When to use this prompt

check_circleDesign RAG infrastructure handling 1M daily queries with sub-500ms p95 latency requirement.
check_circleArchitect multi-region Pinecone RAG with Redis semantic cache and multi-provider LLM gateway.
check_circlePlan cost-efficient self-hosted Qdrant vs. managed Pinecone for 100M vector knowledge base scale.

signal_cellular_altadvanced

Latest Insights

Stay ahead with the latest in prompt engineering.

View blogchevron_right

How to Write System Prompts That Actually Work

Article

person Admin•schedule 5 min read

How to Write System Prompts That Actually Work

System prompts set the rules of the game for every AI interaction. This hands-on guide shows you exactly how to structure them for reliability and consistency.

Claude vs GPT-4o: Which Model Fits Your Use Case?

Article

person Admin•schedule 5 min read

Claude vs GPT-4o: Which Model Fits Your Use Case?

Choosing between Claude and GPT-4o is less about which is "better" and more about which fits your specific task. Here is a practical breakdown.

How Our Design Team Cut Brief-Writing Time by 70% with AI

Article

person Admin•schedule 5 min read

How Our Design Team Cut Brief-Writing Time by 70% with AI

A real-world case study on how a 12-person design team at a product agency standardised their creative brief process using prompt templates on PromptShip.

Why AI Hallucinations Happen (and How to Reduce Them)

Article

person Admin•schedule 5 min read

Why AI Hallucinations Happen (and How to Reduce Them)

Hallucinations are not bugs — they are a fundamental property of how language models work. Understanding why they happen is the first step to minimising them.

The State of AI Coding Assistants in 2026

Article

person Admin•schedule 5 min read

The State of AI Coding Assistants in 2026

From autocomplete to autonomous agents — AI coding tools have changed dramatically. Here is where things stand and what to expect next.

From Idea to Shipped Prompt: A Solo Founder's AI Workflow

Article

person Admin•schedule 5 min read

From Idea to Shipped Prompt: A Solo Founder's AI Workflow

One founder. No team. A dozen AI-powered tools and a tight prompt library. Here is the workflow that runs a bootstrapped SaaS doing $15k MRR.

Recommended Prompts

claudeshieldTrusted

bookmark

Embedding & Semantic Search Engineer

Designs embedding pipelines covering model selection, batch processing, vector storage, similarity search, and semantic search quality.

Multimodal RAG Designer

Designs RAG systems handling images, charts, tables, and mixed media alongside text for comprehensive document understanding.

RAG Retrieval Strategy Engineer

Designs RAG retrieval strategies covering hybrid search, query expansion, reranking, contextual compression, and multi-query retrieval.

RAG with LlamaIndex Implementation Expert

Implements production RAG systems using LlamaIndex with query engines, node postprocessors, response synthesizers, and evaluation.

Aws Architecture Framework

Expert-crafted prompt for aws — delivers specific, actionable guidance for cloud infrastructure practitioners who need results, not theory.

Expert Ai Ml Engineering Consultation

Deep-dive expert ai ml engineering consultation prompt engineered for ai ml engineering professionals who need concrete recommendations backed by real-world trade-off analysis.

star 0fork_right 85

bolt

pin_invoke