Write an architecture decision record for how to add vector search to an existing Postgres-backed SaaS app: evaluate using pgvector in the existing Postgres versus adopting a dedicated vector database, and recommend an approach with the tradeoffs and rationale.
This Architecture Decision Record evaluates two approaches to adding vector search to an existing PostgreSQL-backed SaaS application: extending the current database with pgvector versus adopting a dedicated vector database (Pinecone, Qdrant, Weaviate). Based on performance benchmarks, operational complexity analysis, cost modeling, and team constraints, the recommended decision is to adopt pgvector within the existing PostgreSQL database using HNSW indexes, with a dedicated read replica for vector query isolation. This approach is valid for embedding corpora under ~20M vectors with sub-100ms latency requirements, and avoids the operational overhead of a second data store. However, four preconditions must be satisfied before implementation proceeds, including a compliance audit and internal benchmark validation.
Proposed — Pending Precondition Clearance (will advance to Accepted upon satisfaction of all four preconditions in the Preconditions table) Date: 2026-06-02 Deciders: Engineering Lead, Platform Architect, CTO
The SaaS application uses PostgreSQL as its primary data store. A new product requirement demands vector search capability — semantic search, RAG pipelines, or embedding-based recommendations. The engineering team must decide between: **Option A (pgvector):** Install the pgvector extension into the existing PostgreSQL cluster, storing embeddings alongside relational data. **Option B (Dedicated Vector DB):** Provision a separate managed vector database service (Pinecone, Qdrant Cloud, Weaviate), maintaining dual write paths and separate query routing. Constraints shaping this decision: - Vector corpus is currently in the low-to-mid millions of embeddings (Assumption #2: risky) - Query latency requirement: sub-100ms at moderate QPS; no sub-20ms SLA has been established - Team has strong PostgreSQL expertise but limited operational bandwidth — two engineers familiar with the core data layer departed in the past year - Existing ORM (Prisma) has incomplete native pgvector support as of mid-2026 [R17] - Budget favors solutions that avoid new managed service costs
**Install pgvector into the existing PostgreSQL database cluster, using HNSW indexes, with a dedicated read replica to isolate vector query traffic from OLTP workloads.** Implementation steps: 1. Enable pgvector extension — install on existing PostgreSQL cluster; verify version compatibility 2. Store embeddings in PostgreSQL — co-locate embeddings with relational data; leverage atomic transactions for consistency 3. Provision dedicated read replica — route all vector similarity queries to this replica to prevent resource contention with OLTP workloads [R8] 4. Implement HNSW indexing — configure `ef_construction` and `m` parameters based on internal benchmarks 5. Address ORM gaps — use parameterized raw SQL or lightweight query builder (e.g., Knex) where Prisma falls short [R17]; conduct security review 6. Deploy monitoring — track embedding count, replica replication lag, CPU/memory/IO utilization, and 95th/99th percentile query latencies 7. Set re-evaluation threshold — automated alert at 20M embeddings triggers architecture reassessment
**Option B: Dedicated Managed Vector Database (Pinecone, Qdrant Cloud, Weaviate)** Provision a separate vector database service. Application writes embeddings to both PostgreSQL (source of truth) and the vector DB; similarity queries hit the vector DB directly. Performance (≤10M vectors): pgvector competitive — sub-100ms queries with HNSW [R1][R3]. Dedicated DB comparable; purpose-built but marginal advantage at this scale [R7]. Performance (50M+ vectors): pgvector degrades — single-instance memory and I/O limits [R19]. Dedicated DB superior — DiskANN, GPU acceleration, native sharding [R10]. Operational complexity: pgvector minimal — no new infrastructure, no sync layer, uses existing expertise [R5]. Dedicated DB significant — dual write path, data synchronization, new operational domain, vendor lock-in. Transactional consistency: pgvector native — embeddings and relational data in same transaction [R3]. Dedicated DB eventual — requires sync mechanism. Cost: pgvector low — only existing Postgres infrastructure + read replica sizing [R12][R4]. Dedicated DB higher — managed service fees ($0.10–$0.40/1K queries typical); dual infrastructure. Scaling ceiling: pgvector vertical only — no native sharding; single instance hits hard memory constraints (~6GB per 1M × 1536-dim embeddings) [R19][R11]. Dedicated DB horizontal — built for distributed scaling beyond 100M vectors. **Option C: Hybrid Approach (pgvector + Dedicated Vector DB)** Use pgvector for low-latency queries on recent/small datasets; offload large-scale or specialized vector workloads to a dedicated DB. Adds complexity of both approaches and is premature given current corpus size. Noted as a viable future path if regulatory requirements mandate data isolation or corpus exceeds 50M.
The decision favors pgvector for five reasons: **1. Performance is sufficient at current scale.** pgvector with HNSW indexing matches or outperforms dedicated vector databases at the 1M–10M embedding scale. Supabase benchmarks (April 2026) show pgvector HNSW outperforming Qdrant on equivalent compute at 99% accuracy [R1]. Timescale's pgvectorscale extension demonstrates 471 QPS at 99% recall on 50M vectors [R6]; plain pgvector with HNSW is competitive at 1M–10M scale. Sub-100ms latency is achievable on millions of vectors with appropriate hardware [R3]. **2. Operational simplicity is critical given team constraints.** With two data-layer engineers departed and limited operational bandwidth, adding a second database system introduces unacceptable operational risk. pgvector requires no new infrastructure, no synchronization layer, and no new operational expertise [R5]. **3. Transactional consistency eliminates a class of bugs.** Co-locating embeddings with relational data in PostgreSQL enables atomic writes. A dedicated vector DB requires eventual-consistency synchronization, introducing stale-data and orphaned-record failure modes. **4. Cost is materially lower.** pgvector uses existing PostgreSQL infrastructure. The incremental cost is read replica sizing and additional storage. At 10M × 1536-dim embeddings, memory requirements for the HNSW index may reach 60GB [R19], implying a high-memory instance class costing approximately $800–$2,000/month on major cloud providers. This remains lower than dual-infrastructure managed vector DB fees, but teams should model actual replica sizing before treating the cost advantage as self-evident. **5. The market is converging on integrated solutions.** PostgreSQL added pgvector, Elasticsearch added vector search, and open-source alternatives are undercutting dedicated vector database costs [R11][R10]. The 2026 question is less "which novel DB do we adopt" and more "do we need a separate vector store at all" [R10].
**Accepted tradeoffs:** - **Scaling ceiling.** pgvector on a single PostgreSQL instance hits hard limits above 50M embeddings [R19]. The team accepts this with an explicit 20M-embedding re-evaluation trigger. - **Limited index options.** Only HNSW and IVF_FLAT are available — no DiskANN, no GPU-accelerated search [R10]. - **ORM friction.** Prisma does not natively support pgvector operations as of mid-2026 [R17]. The team will use parameterized raw SQL or a lightweight query builder, introducing maintainability debt. - **Resource contention risk.** The dedicated read replica mitigates this, but replication lag under heavy load remains a risk. **Positive consequences:** - No new infrastructure to provision, secure, or maintain - Single backup and disaster recovery strategy covers all data - Existing monitoring, alerting, and operational runbooks extend naturally - Developer onboarding remains simple — one database technology to learn - Atomic transactions between embeddings and relational data eliminate sync bugs **Negative consequences:** - Vertical scaling is the primary lever; costs grow with the primary database [R4][R13] - HNSW index rebuilds are time-consuming and memory-intensive; must be scheduled carefully [R10] - Knowledge concentration risk with reduced team size; pgvector tuning expertise must be documented and cross-trained - If corpus growth exceeds projections, migration to a dedicated vector DB under pressure is more costly than adopting one upfront
Four conditions must be cleared before this decision is viable: **1. Compliance audit** — Confirm vector+relational data co-location does not violate GDPR, HIPAA, CCPA, or data-residency requirements. Why blocking: if co-location violates policy, the entire pgvector approach is invalid; hybrid or dedicated DB is required. Owner: Platform Architect + Legal. Deadline: before implementation begins. **2. Embedding growth projections** — Obtain product roadmap data on expected embedding corpus size over 12–24 months. Why blocking: if projections exceed 30M within 12 months, the 20M re-evaluation threshold is too late. Owner: Product Lead + Engineering Lead. Deadline: before implementation begins. **3. Internal performance benchmarks** — Run vector search benchmarks on actual production hardware class with representative data and query patterns. Why blocking: research benchmarks [R1][R3][R7] may not reflect production conditions. Owner: Engineering Lead. Deadline: before go-live. **4. Monitoring infrastructure** — Deploy dashboards and alerts for read-replica replication lag (<100ms), CPU/memory/IO contention (alert at 70%), and embedding count milestones (15M, 20M, 30M). Why blocking: without monitoring, OLTP degradation from resource contention will be discovered by users, not engineers. Owner: Platform Architect. Deadline: before go-live. **If precondition #1 or #2 fails, revert to Option B (Dedicated Managed Vector DB).**
**Read replica becomes bottleneck** — Monitor replication lag; implement backpressure (reject queries if lag >500ms); alert at 80% CPU. Before go-live. **HNSW index rebuild blocks queries** — Run rebuilds on replica only using `REINDEX CONCURRENTLY`; document rebuild windows and SLA; test in staging. Before go-live. **Team knowledge concentration** — Document all pgvector tuning procedures; schedule quarterly knowledge-transfer sessions; cross-train one additional engineer. Within 30 days of go-live. **ORM raw SQL introduces injection risk** — Use parameterized query wrappers exclusively; conduct security review before production; re-evaluate if Prisma adds pgvector support in Q3 2026 [R17]. Before go-live. **Application writes embeddings asynchronously** — Audit application code for eventual-consistency patterns in embedding writes; refactor to atomic writes or document eventual-consistency SLA. Before go-live. **Memory overhead estimate is wrong** — Calculate actual memory overhead using representative embeddings from production pipeline; adjust replica sizing accordingly [R19]. Before go-live.
This decision should be revisited if any of the following triggers are met: 1. **Embedding count reaches 20M** — schedule architecture reassessment (allow 2–3 months lead time for migration) 2. **Product team requests sub-50ms latency SLA** — pgvector degradation risk begins below 30ms at elevated QPS; treat any request for sub-30ms as an immediate blocking trigger 3. **Read replica CPU consistently exceeds 80%** — vertical scaling ceiling approaching 4. **Compliance audit requires data isolation** — pivot to hybrid approach or dedicated vector DB 5. **Prisma ships native pgvector support** — re-evaluate raw SQL workarounds (potential simplification) 6. **pgvector adds DiskANN or GPU acceleration** — re-evaluate scaling ceiling assumptions Pre-plan the escape: schedule a distributed SQL evaluation spike (CockroachDB, YugabyteDB) for Q4 2026, regardless of whether triggers are met, to have an evaluated migration path ready.
1. **Embedding dimensionality not specified.** The ADR assumes 1536-dimensional embeddings (OpenAI ada-002). If the application uses different dimensions (384 for lighter models, 4096+ for multi-modal), memory estimates and performance projections shift significantly [R19]. This must be confirmed before replica sizing. 2. **Query volume projections absent.** Current QPS requirements are not stated. The decision is sound for "moderate QPS" but the boundary between moderate and heavy is not defined. Internal benchmarks (precondition #3) should establish the QPS ceiling. 3. **Product roadmap not available.** Embedding growth rate, future latency requirements, and new vector-dependent features are unknown. Any significant product direction change should trigger re-evaluation. 4. **Benchmark freshness concern.** The pgvectorscale benchmark [R6] and the Supabase HNSW benchmark [R1] are from early-to-mid 2026. Dedicated vector databases are also improving. These comparisons may shift within 6–12 months. 5. **Cross-step tension (INFORMATIONAL).** The Requirements Analysis flagged "Vector dataset size will not exceed pgvector limitations" as a risky assumption, while the ADR treats the 20M threshold as adequate mitigation. The 20M number is a reasonable starting point but should be validated against internal benchmarks, not treated as a hard truth.
Overall confidence: **MEDIUM-HIGH** **Performance at current scale — High.** Multiple independent benchmarks confirm pgvector competitive at 1M–10M [R1][R3][R6][R7]. **Operational simplicity advantage — High.** Well-documented across sources; aligns with team constraints. **Cost advantage — High.** No additional service fees; incremental cost is replica sizing [R4][R12]. **Scaling ceiling accuracy — Medium.** 20M–50M inflection range is well-supported by research but not validated on actual hardware/data. **Long-term viability (>24 months) — Medium-Low.** Depends on embedding growth, product requirements, and market evolution — all unknowable today. **Compliance viability — Unknown.** Not yet audited — this is the primary risk to the entire decision. The decision is well-supported for the next 12–18 months given stated constraints. Beyond that horizon, confidence degrades without product roadmap data and compliance confirmation.
1. Conduct compliance audit on vector+relational data co-location — Platform Architect + Legal, Week 1–2 2. Obtain embedding growth projections from product team — Engineering Lead + Product Lead, Week 1–2 3. Confirm embedding dimensionality and calculate actual memory overhead — ML Engineer, Week 2 4. Provision read replica and run internal performance benchmarks — Engineering Lead, Week 3–4 5. Deploy monitoring dashboards and alerting — Platform Architect, Week 3–4 6. Audit application code for async embedding write patterns — Backend Engineer, Week 3 7. Implement parameterized query wrappers for pgvector operations; conduct security review — Backend Engineer, Week 4–5 8. Document pgvector tuning procedures and conduct first knowledge-transfer session — Engineering Lead, Week 5–6 9. Schedule Q4 2026 distributed SQL evaluation spike (CockroachDB/YugabyteDB) — Platform Architect, Week 6 10. Re-evaluate Prisma pgvector support status — Backend Engineer, Q3 2026
[R1] Vector Database Performance Compared: pgvector vs Pinecone vs Qdrant vs Weaviate. DEV Community. https://dev.to/kencho/vector-database-performance-compared-pgvector-vs-pinecone-vs-qdrant-vs-weaviate-2ne6. Source tier: T3. [R3] pgvector vs Dedicated Vector Databases: When PostgreSQL Is Enough. https://zenvanriel.com/ai-engineer-blog/pgvector-vs-dedicated-vector-db/. Source tier: T4. [R4] pgvector vs Vector Database: When to Use Each. https://postgresqlhtx.com/what-is-pgvector-and-when-you-should-use-it-instead-of-a-dedicated-vector-db/. Source tier: T4. [R5] The Case Against pgvector — Alex Jacobs. https://alex-jacobs.com/posts/the-case-against-pgvector/. Source tier: T4. [R6] PostgreSQL as a Vector Database: When to Use pgvector vs Pinecone vs Weaviate. DEV Community. https://dev.to/polliog/postgresql-as-a-vector-database-when-to-use-pgvector-vs-pinecone-vs-weaviate-4kfi. Source tier: T3. [R7] Best Vector Databases in 2026: A Complete Comparison Guide. Firecrawl. https://www.firecrawl.dev/blog/best-vector-databases. Source tier: T4. [R8] Pgvector vs. Qdrant. Tiger Data. https://www.tigerdata.com/blog/pgvector-vs-qdrant. Source tier: T4. [R10] Vector Search 2026: Azure AI Search vs pgvector vs Pinecone. Technspire. https://technspire.com/en/blog/vector-search-2026-azure-pgvector-managed. Source tier: T4. [R11] When Self Hosting Vector Databases Becomes Cheaper Than SaaS. OpenMetal. https://openmetal.io/resources/blog/when-self-hosting-vector-databases-becomes-cheaper-than-saas/. Source tier: T4. [R12] 4 Best Vector Databases for AI Apps (Tested). Mongoengine. https://mongoengine.org/vector-databases-for-ai-apps/. Source tier: T4. [R13] PostgreSQL as Vector database: Create LLM Apps with pgvector. Tessell. https://www.tessell.com/blogs/postgresql-as-vector-database-create-llm-apps-with-pgvector. Source tier: T4. [R17] pgvector/pgvector: Open-source vector similarity search for Postgres. GitHub. https://github.com/pgvector/pgvector. Source tier: T1. [R19] Outgrowing PostgreSQL: Why Distributed SQL Is the Only Way to Scale Vector Search Beyond 50M Embeddings. Medium. https://medium.com/@dikhyantkrishnadalai/outgrowing-postgresql-why-distributed-sql-is-the-only-way-to-scale-vector-search-beyond-50m-35a65dc795f8. Source tier: T3.
Consulting external sources for current information and best practices
Analyzing requirements to ensure complete coverage and identify gaps
Documenting architectural decisions with rationale
Identifying and validating implicit assumptions
Combining all findings into a unified deliverable
Reviewing for completeness, consistency, and accuracy from multiple angles