03 — Providers

DATA MODEL
PROVIDERS

PolyDB ships 15 providers callable today (the 16th, Transaction, is coming soon) covering every major data paradigm. Each provider exposes a consistent MCP tool interface and runs identically across all three storage backends. Mix and match freely — they share one connection. SQL operations get full PostgreSQL ACID; cross-model atomic transactions are coming soon.

Jump To Provider

SQL Document Key-Value Vector Graph Stream Spatial Time-Series Analytics S3 Memory Temporal Full-Text Blob Iceberg Transaction

1. SQL — Relational

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The SQL provider exposes a full relational interface: schema creation, DML, and parameterised queries. It sits directly on top of the backend's native SQL engine, so you get dialect-correct DDL and full JOIN support. Use it whenever your data has a clear schema and you need the expressiveness of SQL — aggregations, window functions, complex predicates.

Key operations

query_sql

query_sql is read-only — it executes parameterised SELECT, WITH (CTEs), EXPLAIN, and SHOW statements against the tenant schema. Write operations (INSERT, UPDATE, DELETE) are rejected; use the dedicated provider tools instead (e.g. store_document, set_keyvalue, store_vector). DDL (CREATE TABLE, etc.) is issued via the backend's query-builder AST, not as raw SQL strings.

When to use vs alternatives

Prefer SQL when your data is structured and schema-stable. Reach for Document when fields vary per record or evolve frequently. Use Analytics instead of raw SQL GROUP BY when you need multi-dimensional OLAP slicing.

2. Document — NoSQL JSON

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Document provider stores arbitrary JSON documents in named collections, with MongoDB-compatible filter operators ($eq, $gt, $lt, $in, $regex). Documents are indexed by a generated _id and stored in the backend's native JSONB column, so filter queries are backend-accelerated. No schema declaration is needed — just insert and query.

Key operations

store_document
search_documents
get_document
delete_document

When to use vs alternatives

Use Document for semi-structured or heterogeneous data — user profiles, event payloads, product catalogues with varying attributes. If the shape is uniform and query patterns are clear, SQL will give better query performance. For full-text search over document fields, combine with Full-Text.

3. Key-Value — Redis-style cache

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Key-Value provider offers a Redis-style set/get/delete/exists interface with optional namespacing and per-key TTL expiry. Keys are arbitrary strings; values are JSON-serialisable objects. Expired keys are lazily purged on access. Use it for caching computed results, storing session tokens, feature flags, or any data with a natural lifetime.

Key operations

set_keyvalue
get_keyvalue
delete_keyvalue
list_keyvalues

When to use vs alternatives

Choose Key-Value when lookup is always by exact key and TTL matters. For richer querying (find by value, range scans), use Document. For time-series metrics use Time Series instead.

4. Vector — AI embeddings & similarity search

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Vector provider stores high-dimensional embedding vectors using a per-backend strategy: an in-process FAISS index on the embedded engines (SQLite, Stoolap), native pgvector (HNSW) on Neon, and pgvector with an IVFFlat index on CockroachDB. It supports cosine and L2 similarity search, metadata filtering, and batch upsert. Vectors are identified by a vector_id string and carry a JSON metadata payload. Designed for RAG pipelines, semantic search, recommendation engines, and any workflow that involves embedding models.

Key operations

store_vector
search_vectors
delete_vector

When to use vs alternatives

Vector is the right choice whenever you need nearest-neighbour retrieval over embedding space. For structured lookup by ID, use Key-Value. Combine with Memory when building LLM agent memory that requires semantic recall.

5. Graph — Nodes, edges & traversal

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Graph provider exposes a property-graph model powered by NetworkX. Nodes carry typed properties; edges carry directional relationships with their own properties. Traversal operations support BFS/DFS, shortest-path, neighbour enumeration, and subgraph extraction. Graph state is persisted to the backing store between sessions.

Key operations

add_graph_node
add_graph_edge
query_graph
delete_graph_node
delete_graph_edge

Traversal and shortest-path queries are expressed through query_graph (supports neighbour enumeration, BFS/DFS, path queries).

When to use vs alternatives

Use Graph for relationship-heavy data — social networks, knowledge graphs, dependency trees, access control hierarchies. For simpler parent–child relationships that only need depth-1 traversal, a Document with embedded references is lighter. For recommendation graphs that rely on vector similarity, combine Graph with Vector.

6. Stream — Event streams

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Stream provider implements an append-only event log with named streams. Producers publish JSON event payloads; consumers read from a stream with a configurable limit and optional offset cursor. Events carry monotonically increasing sequence numbers and wall-clock timestamps. Use it for activity feeds, audit logs, change data capture, and lightweight pub/sub within a single PolyDB instance.

Key operations

publish_stream
consume_stream

When to use vs alternatives

Choose Stream for ordered, time-sequenced events where consumers need replay. For durable task queues with acknowledgement, use Document with a status field. For metrics derived from events, pipe into Time Series.

7. Spatial — Geospatial data

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Spatial provider stores geometric shapes as WKT or GeoJSON with arbitrary attribute payloads. Queries include bounding-box intersection, radius/nearby search (metres), and contains/intersects predicates. Geometry operations are handled by Shapely for correctness; results are returned with the original WKT and all attributes. Suitable for store locators, delivery zones, asset tracking, and environmental datasets.

Key operations

store_spatial
search_spatial_nearby
search_spatial_bbox
delete_spatial

When to use vs alternatives

Use Spatial when geometry queries are a first-class concern (radius search, polygon intersection). For simple lat/lng storage without geometric operations, a Document with a coordinates field is sufficient. Combine with Time Series for moving-asset tracking.

8. Time Series — Metrics & temporal data

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Time Series provider stores named numeric metrics with ISO 8601 timestamps and free-form tag dictionaries. Range queries return raw data points or aggregated series (avg, sum, min, max, count) with optional downsampling intervals. Tags enable multi-dimensional filtering — for example, querying CPU usage by host and region simultaneously.

Key operations

store_timeseries
query_timeseries

When to use vs alternatives

Use Time Series for any numeric metric that needs time-range queries and aggregation — infrastructure monitoring, IoT sensor data, business KPIs. For events with rich payloads (non-numeric), use Stream. For multidimensional BI-style roll-ups, use Analytics.

9. Analytics — OLAP cubes

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Analytics provider builds OLAP-style cubes over fact tables with named dimensions and measures. Cube definitions declare the fact source and how dimensions map to columns; query operations slice and dice using one or more dimensions and aggregate measures with SUM, AVG, COUNT, MIN, or MAX. Results are returned as tabular JSON ready for charting libraries.

Key operations

create_analytics_cube
query_analytics
delete_analytics_cube

When to use vs alternatives

Use Analytics when you need pre-modelled dimensional aggregation — revenue by region and quarter, churn by cohort. For ad-hoc SQL aggregation, use the SQL provider directly. For numeric metric time-ranges, use Time Series.

10. S3 — Object storage

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The S3 provider exposes a bucket/key object storage API, mirroring the AWS S3 interface. Start by creating a bucket with create_bucket, then write objects with put_s3_object. Objects are stored with arbitrary metadata and MIME type. Operations cover bucket lifecycle (create, delete, list) plus object put, get, delete, and list with prefix filtering.

Key operations

create_bucket
delete_bucket
list_buckets
put_s3_object
get_s3_object
list_s3_objects
delete_s3_object

When to use vs alternatives

Use S3 for large, opaque objects identified by a key — uploaded files, rendered reports, ML model artefacts. For small binary blobs that need metadata queries, use Blob. For structured data at scale, use Iceberg.

11. Memory — LLM agent memory

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Memory provider implements a structured memory store for LLM agents, automatically classifying each interaction into episodic (specific events), semantic (general facts), or procedural (how-to knowledge) memory types. It tracks session context, supports semantic recall via embedding similarity, and provides recency-weighted retrieval so agents can surface the most relevant context for a given query.

Key operations

store_memory
recall_memory
delete_memory
store_knowledge
search_knowledge
delete_knowledge

When to use vs alternatives

Use Memory for any AI agent that needs persistent cross-session recall. For a simple conversation log without classification, use Document. For semantic similarity retrieval without session structure, use Vector directly.

12. Temporal — Versioned data & as-of queries

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Temporal provider stores every write as a new immutable version rather than overwriting in place. Any past state can be retrieved with an as-of timestamp query. The complete version history for any entity is available as an ordered list. This enables full audit trails, configuration rollback, and compliance with data retention requirements — without any application-level versioning logic.

Key operations

store_temporal
query_temporal_at
query_temporal_history

When to use vs alternatives

Use Temporal for any data where "what did this look like at time T?" is a valid query — configuration, pricing, access policies, compliance records. For event streams where ordering matters more than entity identity, use Stream. Iceberg also provides time-travel but at the table level rather than per-entity.

13. Full-Text — FTS5/BM25 search

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Full-Text provider builds inverted indexes using FTS5 (SQLite) or tsvector (PostgreSQL/CockroachDB) and ranks results with BM25 relevance scoring. Documents are indexed with an ID and one or more text fields. Queries support phrase matching, boolean operators, and prefix search. Highlights with match snippets are optionally returned alongside results.

Key operations

index_fulltext
search_fulltext
delete_fulltext

When to use vs alternatives

Use Full-Text for keyword and phrase search over human-readable content — articles, support tickets, product descriptions. For semantic/concept search (find documents by meaning not exact words), use Vector. Combine both for a hybrid search pipeline.

14. Blob — Binary large objects

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Blob provider stores arbitrary binary data alongside a structured metadata envelope (MIME type, size, checksums, custom tags). Unlike the S3 provider, Blob is optimised for smaller objects that need queryable metadata — find all images tagged with a given label, or list all PDFs uploaded by a user. Binary content is base64-encoded in transit and stored as BYTEA or BLOB natively.

Key operations

store_blob
get_blob
delete_blob

When to use vs alternatives

Use Blob when you need both binary storage and metadata queries on the same object. For large, opaque files where key-based access is sufficient, use S3. For storing generated text artefacts, Document is lighter weight.

15. Iceberg — Apache Iceberg tables

OSS: SQLite, Stoolap, CockroachDB · Cloud: Neon (CockroachDB on Large)

The Iceberg provider exposes Apache Iceberg table semantics: schema evolution, snapshot-based time-travel, partition pruning, and metadata-layer management. Tables accumulate snapshots on each write; any previous snapshot can be queried as a read-only view. Schema changes (add/drop/rename column) are tracked in the metadata layer without rewriting data files, making it suitable for long-lived analytical tables in a data lake architecture.

Key operations

create_iceberg_table
append_iceberg
get_iceberg_snapshot_as_of
add_iceberg_column
expire_iceberg_snapshots

When to use vs alternatives

Use Iceberg for large analytical datasets that need schema evolution without downtime and point-in-time queries over the whole table. For per-entity version history, use Temporal. For OLAP aggregations over existing tables, use Analytics.

16. Transaction — Cross-model ACID

Coming soon

Cross-model ACID transactions — wrapping operations from SQL, Document, Key-Value, Vector, and other providers in a single commit-or-rollback boundary — are under active development. Single-operation SQL transactions are available today via query_sql. Full multi-provider transaction MCP tools will ship in an upcoming release.

DATA MODELPROVIDERS

Jump To Provider

1. SQL — Relational

2. Document — NoSQL JSON

3. Key-Value — Redis-style cache

4. Vector — AI embeddings & similarity search

5. Graph — Nodes, edges & traversal

6. Stream — Event streams

7. Spatial — Geospatial data

8. Time Series — Metrics & temporal data

9. Analytics — OLAP cubes

10. S3 — Object storage

11. Memory — LLM agent memory

12. Temporal — Versioned data & as-of queries

13. Full-Text — FTS5/BM25 search

14. Blob — Binary large objects

15. Iceberg — Apache Iceberg tables

16. Transaction — Cross-model ACID

DATA MODEL
PROVIDERS