Going to production
The defaults are for development
Section titled “The defaults are for development”Out of the box, mcp-flowgate uses an in-memory store and no audit sink. That’s fine for trying things out. It’s not fine for production, because a restart erases all workflow state and you have no record of what happened.
Here’s what to change before you go live.
Durable storage
Section titled “Durable storage”The default memory store loses everything on restart. Switch to a durable backend:
# SQLite -- good for single-node deploymentsstore: kind: sqlite path: /var/lib/mcp-flowgate/workflows.db
# Postgres -- good for multi-node / HA setupsstore: kind: postgresSQLite is the simplest path. One file, no extra infrastructure, handles thousands of concurrent workflows. Use Postgres when you need multiple gateway instances sharing state, or when your ops team already runs Postgres and you want everything in one place.
Audit trail
Section titled “Audit trail”By default, audit events go nowhere. You want them going somewhere:
audit: sink: file path: /var/log/mcp-flowgate/audit.jsonlEvery workflow start, transition, executor call, guard evaluation, and error gets written as a structured JSON line. Each line is a self-contained event with a timestamp, workflow ID, event type, and relevant details.
Set up log rotation (logrotate, systemd journal, or your preferred tool) so the file doesn’t grow unbounded. The JSON lines format plays well with any log aggregation system — pipe it to your observability stack and you get full visibility into what every model did, when, and with what inputs.
Validate config in CI
Section titled “Validate config in CI”Don’t find out your config is broken when you deploy it. Run the checker in your CI pipeline:
mcp-flowgate check --config gateway.yamlThis catches problems that YAML syntax checking misses:
- Dangling targets — a transition points to a state that doesn’t exist.
- Unreachable states — a state that no transition ever leads to.
- Dead-ends — a non-terminal state with no outbound transitions.
- Schema issues — malformed input schemas, invalid executor references.
It’s fast and deterministic. Add it right next to your linter.
Schema-aware editing
Section titled “Schema-aware editing”Point your editor’s YAML language server at the config schema for autocomplete and inline validation:
{ "yaml.schemas": { "./schemas/gateway-config.schema.json": "gateway.yaml" }}You get red squiggles for typos, autocomplete for field names, and documentation on hover. Catches mistakes before you even save the file.
Hot reload
Section titled “Hot reload”You don’t need to restart the gateway to pick up config changes. Send SIGHUP:
kill -HUP $(pgrep mcp-flowgate)Definitions, executors, connections, and the discovery index all rebuild and swap atomically. In-flight workflows keep running on their current definitions. A config.reloaded audit event confirms the reload happened. See the hot reload guide for details.
Multi-tenancy
Section titled “Multi-tenancy”mcp-flowgate runs as a single-user, same-trust-boundary system. If every model connecting to your gateway is operating on behalf of the same user or within the same trust boundary, you’re good to go.
For cross-trust-boundary deployments — where different users or teams need isolation — put an identity proxy in front of the gateway. Envoy, OAuth2-proxy, or any reverse proxy that injects identity headers will work. The gateway sees the identity from headers and can scope workflows accordingly.
Don’t skip this if you have untrusted callers. The gateway itself doesn’t authenticate requests.
High availability
Section titled “High availability”For HA deployments:
- Use the Postgres store so all instances share workflow state.
- Put a load balancer in front of multiple gateway instances.
- Any instance can serve any request — there’s no sticky session requirement because all state lives in Postgres.
Scaling is straightforward: add more gateway instances behind the load balancer. The Postgres store handles concurrent access.
Monitoring
Section titled “Monitoring”Every audit event is structured JSON. This is your monitoring surface. Pipe audit output to your observability stack and you get:
- Workflow throughput — count
workflow.startedevents per time window. - Error rates — count
executor.failedevents, break down by workflow and transition. - Latency — measure time between
workflow.startedand terminal state events. - Guard rejections — track
guard.rejectedevents to see what’s being blocked and why. - Human approval queues — monitor
approval.pendingevents to catch bottlenecks.
You don’t need a custom metrics integration. The audit stream already has everything. Build dashboards from the JSON lines the same way you would from any structured log source.