Search Rag

Vector Databases: Pinecone, Weaviate, Chroma, and When They Matter

Rza

23 Feb 2026 — 7 min read

Vector databases are the storage and retrieval layer for RAG systems and semantic search. They store embeddings — numerical representations of text — and find the ones most similar to a query. Every "chat with your data" product runs on one. The question most developers actually need answered is not "which vector database is best" but "do I need a vector database at all." For a surprising number of projects, the answer is no. For the projects where the answer is yes, the choice between Pinecone, Weaviate, Chroma, Qdrant, and pgvector depends on constraints that have nothing to do with benchmark performance and everything to do with your existing infrastructure, your team's tolerance for operational complexity, and how many documents you're actually working with.

What It Actually Does

A vector database stores vectors and performs similarity search on them. That's the job. You give it a collection of embeddings — each one a high-dimensional numerical array representing a piece of text, an image, or any other data you've embedded — and when you query it with a new vector, it returns the K most similar vectors in the collection. "Similar" is measured by distance metrics like cosine similarity or Euclidean distance. The database uses specialized indexing algorithms — HNSW, IVF, or variants — to do this quickly without comparing the query against every single vector in the collection. That indexing is the reason vector databases exist as a category. At small scale, brute-force comparison works fine. At millions of vectors, you need the index.

Here's the landscape as of early 2026:

Pinecone is the fully managed option. You don't run any infrastructure. You get an API endpoint. You send vectors in, you query vectors out. Pinecone handles scaling, indexing, and availability. The developer experience is the simplest of any option — a few API calls and you're running. The tradeoff is cost and control. Pinecone's pricing is per-vector-stored plus per-query, and at scale — millions of vectors with frequent queries — the bills add up in ways that surprise people. You also can't tune the indexing algorithm, can't run it locally for development, and can't inspect what's happening under the hood when retrieval quality is bad.

Weaviate is the feature-rich option that can run managed or self-hosted. It supports hybrid search out of the box — combining vector similarity with keyword (BM25) search in a single query. It has built-in support for different data types, automatic vectorization through module integrations, and a GraphQL API that's powerful if you like GraphQL and annoying if you don't. Self-hosting Weaviate means running a Go binary that's reasonably well-behaved but requires you to think about memory, disk, and backup strategies. The managed Weaviate Cloud offering handles this for you at a premium.

Chroma is the lightweight, developer-friendly option that runs in-process. You pip install it, write a few lines of Python, and you have a working vector store. No separate server process. No infrastructure to manage. For prototyping and small-to-medium projects (under a few hundred thousand documents), Chroma is the fastest path from zero to working retrieval. The tradeoff: it's not designed for production workloads with millions of vectors and concurrent users. It's a development tool that can serve as a production tool for small deployments, not a production tool that happens to be easy to develop with.

Qdrant is the performance-focused option, written in Rust, with strong filtering capabilities. If your retrieval needs involve not just "find similar vectors" but "find similar vectors where the metadata matches these conditions" — which is common in production — Qdrant's filtering is among the best. It can run self-hosted or via their managed cloud. The Rust foundation gives it good memory efficiency and query latency at scale. The tradeoff is a smaller community and ecosystem compared to Pinecone or Weaviate, which means fewer tutorials, fewer integrations, and more time reading documentation when you hit an edge case.

pgvector is the "do I really need another database" option. It's a Postgres extension that adds vector storage and similarity search to your existing Postgres database. If you're already running Postgres — and many applications are — pgvector means no new infrastructure, no new operational burden, and your vectors live alongside your relational data. You query them with SQL. Joins between vector results and your regular tables work natively. The tradeoff: pgvector's indexing and query performance at scale lag behind the dedicated solutions. For under 100K vectors, you won't notice. For millions of vectors with sub-100ms latency requirements, you will.

What The Demo Makes You Think

Every vector database demo shows the same thing: a few lines of code, some vectors go in, a query comes out, and the results are relevant. It looks trivial. The choice between providers looks like picking between identical cans of paint in slightly different colors.

Here's what the demos skip.

They don't show you the operational costs. Running a vector database in production means monitoring, backups, scaling, and debugging retrieval quality issues. Pinecone hides this behind a managed service — but you pay for that convenience, and when retrieval quality is bad, you can't inspect the index to understand why. Self-hosted Weaviate and Qdrant expose these controls but require someone to operate them. pgvector piggybacks on your existing Postgres operations — but if your Postgres is already under pressure, adding vector workloads makes it worse. The demo shows the API call. Production requires the infrastructure team.

They don't show you the cost curve. Pinecone's pricing looks reasonable at prototype scale — a few dollars a month for a starter tier. At production scale with millions of vectors, the monthly bill can reach four or five figures [VERIFY]. Self-hosted alternatives have lower per-vector costs but nonzero infrastructure costs (servers, storage, engineer time). pgvector has zero additional database cost but may require a larger Postgres instance. The total cost of ownership comparison depends on your scale and your team's ability to operate infrastructure, and no demo shows you that math.

They don't show you that retrieval quality is mostly not about the database. When your RAG system returns bad results, the instinct is to blame the vector database — switch to a different one, tune the index parameters, adjust the distance metric. In practice, bad retrieval is almost always caused by bad embeddings (wrong model for your domain), bad chunking (wrong chunk size for your content), or bad data (messy documents that embedded poorly). The vector database is faithfully returning the most similar vectors. The problem is that the most similar vectors don't contain the right information. Switching from Chroma to Pinecone doesn't fix a chunking problem. It just moves the chunking problem to more expensive infrastructure.

They don't show you the hybrid search gap. Pure vector similarity search has a known weakness: it can miss results where exact keyword matching would succeed, and it can return semantically similar but factually irrelevant results. "Apple revenue 2025" might return chunks about "Apple Watch features" because "Apple" is semantically close in both contexts. Hybrid search — combining vector similarity with traditional keyword matching — addresses this, but not all vector databases support it equally. Weaviate has it built in. Qdrant supports it. Pinecone added it [VERIFY]. pgvector requires combining it with Postgres's existing full-text search, which works but takes more setup. If your use case involves any mix of semantic and keyword queries — and most do — hybrid search support should be a selection criterion, not an afterthought.

What's Coming (And Whether To Wait)

The vector database market is consolidating and maturing simultaneously. Several trends are shaping where this goes.

Context windows are growing. Every time an LLM doubles its context window, the threshold for "you need a vector database" moves higher. Projects with under 100K tokens of total knowledge can now just paste everything into the prompt. This doesn't kill vector databases — large-scale retrieval still needs them — but it reduces the addressable market. The "I have 500 documents" use case that drove a lot of early adoption is increasingly served by long-context prompts without any retrieval layer.

Existing databases are adding vector support. Postgres has pgvector. MongoDB has Atlas Vector Search. Elasticsearch has dense vector search. Redis has vector similarity. The trend is clear: vector search is becoming a feature of databases you already use, not a reason to adopt a new one. For many teams, the right answer in 12 months will be "turn on vector search in whatever database you're already running" rather than "adopt a dedicated vector database." The dedicated vendors know this — it's why Pinecone, Weaviate, and Qdrant are all racing to add features that differentiate them beyond basic vector search: better filtering, built-in reranking, integrated embedding generation, and multi-modal support.

Should you wait? No, but you should start simple. If you're building a RAG system today, start with Chroma or pgvector. Get your pipeline working. Validate that your chunking and embedding strategy produces good retrieval. Only migrate to a heavier solution when you hit a specific limitation — latency, scale, filtering — that your current solution can't handle. The migration from Chroma to Pinecone or Weaviate is straightforward because the abstraction is simple: vectors go in, vectors come out. The time you spend optimizing your vector database selection before you have a working pipeline is time you should have spent optimizing your chunks.

The Verdict

Here's the honest recommendation, broken down by project size:

Under 10K documents: Use Chroma for Python projects or pgvector if you're already on Postgres. You do not need a managed vector database. You might not need a vector database at all — check whether your documents fit in a long-context prompt first.

10K to 100K documents: Chroma or pgvector still works. If you need hybrid search, look at Weaviate. If you need strong filtering on metadata, look at Qdrant. Pinecone is fine if you'd rather pay money than spend engineer time on infrastructure.

100K to 1M+ documents: This is where the dedicated solutions earn their keep. Pinecone if you want fully managed and can absorb the cost. Weaviate or Qdrant if you want control and your team can operate infrastructure. pgvector starts to struggle with query latency at this scale unless you're careful about index configuration.

The meta-recommendation: Most projects that think they need Pinecone actually need Chroma with better chunks. Most projects that think they need a vector database actually need a longer context window and a well-structured prompt. The vector database is a solved problem at every scale. The unsolved problem is everything upstream of it — document parsing, chunking strategy, embedding model selection, and retrieval tuning. Spend your time there first. The database will be the easiest part.

This is part of CustomClanker's Search & RAG series — reality checks on AI knowledge tools.

Vector Databases: Pinecone, Weaviate, Chroma, and When They Matter

Rza

What It Actually Does

What The Demo Makes You Think

What's Coming (And Whether To Wait)

The Verdict

Read more

The YouTube + AI Pipeline

The Weekly Drop

The Tool Collector's Guide to Owning Nothing

Self-Hosting & Tinkering