Credential vault and secret management
Connectors need real secrets — Azure client secrets, GCP service-account keys, GitHub tokens. AiSOC stores them with a defense-in-depth design: encrypted at the application layer, with the database as a transport medium, not a trust boundary.
This page documents the model, how to operate it, and what changes when you move from local dev to production.
At a glance
| Property | Value |
|---|---|
| Cipher | Fernet (AES-128-CBC + HMAC-SHA256, RFC-aligned) |
| Library | cryptography.fernet from the cryptography package |
| Storage | connector_instances.auth_config JSONB column, encrypted-at-write |
| Master key | AISOC_CREDENTIAL_KEY environment variable, 32 url-safe base64 bytes |
| Implementation | services/api/app/security/credential_vault.py |
| Plaintext exposure | Only inside the connector microservice process at fetch time |
Why application-layer encryption
We deliberately do not rely on the database's own at-rest encryption (e.g. AWS RDS STORAGE_ENCRYPTED, Postgres pgcrypto) as the only line of defense. Those mechanisms protect against disk theft but not against a compromised read-only DB role, a leaky backup, or a misconfigured replica.
Fernet at the API layer means:
- A leaked
pg_dumpis useless withoutAISOC_CREDENTIAL_KEY. - A read-only support role can see that a connector exists (its name, type, last-sync time) but cannot read the secret payload.
- Rotation is a key-swap operation, not a database migration.
How a credential moves through the system
Browser API service Connector microservice
------ -------------- ---------------------
POST /connectors → CredentialVault.encrypt() → (stored as ciphertext in DB)
{ auth_config: {…} } fernet(key, plaintext)
│
▼
connector_instances.auth_config Scheduler tick:
(BYTEA / JSONB ciphertext) CredentialVault.decrypt()
→ connector.fetch_alerts()
→ IngestClient.post()
(plaintext drops out of scope)
Plaintext exists only inside services/connectors for the duration of one fetch. It is never returned in any HTTP response — the API redacts secret-typed fields when reading instances back to the UI.
Generating the master key
In development, the .env.example file ships a placeholder; real keys are generated locally:
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
This produces a 44-character url-safe base64 string. Set it in the environment of every service that needs to encrypt or decrypt:
export AISOC_CREDENTIAL_KEY="kY7w…=" # never commit this value
The API service writes encrypted blobs; the connector microservice decrypts them at fetch time. Both services must share the same key value or polling will fail with InvalidToken.
Production deployment
Where to put the key
| Platform | Recommended location |
|---|---|
| Kubernetes | A Secret named aisoc-credential-vault, mounted as env into both aisoc-api and aisoc-connectors Deployments |
| Fly.io | fly secrets set AISOC_CREDENTIAL_KEY=… in both apps |
| AWS ECS / Fargate | An SSM Parameter referenced via secrets: in the task definition |
| Docker Compose (single-host) | A .env file outside the repo, mounted into both services |
Do not bake the key into a Docker image, a public Helm chart, or a Terraform tfvars file checked into git.
Sealing it further (optional, recommended)
For higher-assurance deployments, treat AISOC_CREDENTIAL_KEY as a long-lived encryption key and protect it with an external KMS. Two patterns work:
- KMS-wrapped key: store an AWS KMS / GCP KMS / Vault Transit-encrypted blob in the secret manager. A small init-container decrypts it at pod start and writes the plaintext key to a tmpfs the application can read.
- External KMS for envelope encryption: replace
CredentialVaultwith a wrapper that delegatesencrypt()/decrypt()to a KMS Encrypt/Decrypt API call. This keeps the master key off your fleet entirely. A reference implementation will land in a follow-on release; theCredentialVaultinterface is intentionally narrow to make this swap a single-class change.
Key rotation
Rotation is supported via Fernet's MultiFernet primitive: the new key encrypts new writes; the old key decrypts old data; old data is rewritten on next update. The procedure:
-
Generate a new key with the same one-liner above.
-
Set both keys in the environment as a comma-separated list:
AISOC_CREDENTIAL_KEY="<NEW>,<OLD>"The first entry is used for new encryption; subsequent entries are decrypt-only.
-
Roll the API and connector services. Existing connector instances continue to work because the old key is still in the decrypt set.
-
Run the rotation job (or simply edit each connector instance once in the UI) to force re-encryption with the new key.
-
After confirming all instances have been re-written with the new key, drop the old entry:
AISOC_CREDENTIAL_KEY="<NEW>"
A scheduled rotation cadence of 90 days is a reasonable default; treat any suspected key compromise as an emergency requiring immediate rotation and invalidation of all upstream credentials (Azure secrets, GCP keys, GitHub tokens).
Per-connector secret rotation
Independent of the master key, the upstream credentials inside each connector instance should be rotated on the cadence of your identity provider's policy (typically 90–180 days for OAuth client secrets and PATs). The AiSOC UI supports in-place editing: the Configure action on a connector card opens the schema-driven form pre-populated with masked values, lets you paste a new secret, and re-encrypts on save. No restart, no scheduler downtime.
What lives in the vault, and what does not
In the vault:
client_secret,api_token,service_account_json, and any field markedsecret: truein a connector'sschema().
Not in the vault (stored as plaintext JSON for operational visibility):
tenant_id,client_id,account_id,organization,project_id, and any field markedsecret: false.
This split is deliberate. Identifiers are useful for support and observability (logs, dashboards, error messages); secrets must never appear in either.
Per-tenant LLM credentials (BYOK)
In addition to connector secrets, the same vault encrypts per-tenant LLM credentials. This is the substrate for "bring your own key" (BYOK): each tenant can point AiSOC at its own OpenAI account, Azure OpenAI deployment, Anthropic key, or self-hosted OpenAI-compatible gateway (Ollama / vLLM / LiteLLM) without leaking that key to other tenants on the same control plane.
Storage
Migration 038_tenant_llm_credentials.sql creates one row per tenant
in tenant_llm_credentials:
| Column | Notes |
|---|---|
tenant_id (PK) | Foreign key to tenants.id; one BYOK row per tenant |
provider | openai, anthropic, azure-openai, openai-compatible (CHECK constraint) |
base_url | Required for openai-compatible, optional for hosted providers |
model | Optional override; falls back to env if blank |
api_key_vault | The vault-encrypted token (vault:v1:…); never returned to the UI |
settings | Free-form JSONB for provider-specific knobs (e.g. Azure api_version) |
enabled | Operators can pause BYOK without deleting the row |
created_at / updated_at / last_rotated_at | Audit timestamps |
The table has Row-Level Security bound to the same app.tenant_id
GUC as every other tenant-scoped table; an authenticated user can
only read or write their own tenant's row.
API surface
Three endpoints under /api/v1/llm/credentials, gated by the
settings:read and settings:write RBAC permissions:
GET /api/v1/llm/credentials # returns LlmCredentialView | null
PUT /api/v1/llm/credentials # upsert; encrypts api_key on the way in
DELETE /api/v1/llm/credentials # remove the row entirely
The LlmCredentialView projection deliberately exposes only
has_api_key: bool — never the plaintext key, never the ciphertext.
Rotation is "send a new api_key"; updating other fields without
sending api_key keeps the existing ciphertext in place and bumps
last_rotated_at only when a new key was actually written.
Provider invariants are enforced server-side:
openai-compatiblerows require abase_url(we won't silently default toapi.openai.comfor self-hosted gateways).- Hosted providers (
openai,anthropic,azure-openai) require anapi_keyon first write; subsequent writes can omit it to update onlybase_url/model/settings.
Audit
Every mutation emits a structured audit log via emit_audit:
audit.llm_credential.created tenant_id=… actor_user_id=… provider=…
audit.llm_credential.updated tenant_id=… actor_user_id=… provider=… rotated=true|false
audit.llm_credential.deleted tenant_id=… actor_user_id=… provider=…
The plaintext key is never logged.
Read path in the agents service
services/agents/app/security/llm_resolver.py:resolve_llm_config
is the single source of truth for "what config does this request
actually use?". It:
- Reads the env baseline (
OPENAI_*/LLM_*/AISOC_LLM_MODEL). - Looks up the tenant's row via a vendored read-path
CredentialVault(seeservices/agents/app/security/credential_vault.py), layering tenant-supplied fields over env values. - Reports the resolved
source(tenant | environment | mixed | none) so/api/v1/llm/statusand the Settings UI can show the buyer where each field came from. - Applies the air-gap rule (
AISOC_AIRGAPPED=trueblocksapi.openai.comeven with a valid tenant key, but allows private gateways likelitellm.internal:4000).
The agents-side vault is read-only. If AISOC_CREDENTIAL_KEY is
missing on the agents box, resolve_llm_config degrades gracefully
to the env baseline rather than refusing to serve.
UI
The Settings → "Deployment & AI" panel
(apps/web/src/components/settings/SettingsView.tsx) exposes the BYOK
form to users with settings:write. Read-only viewers see the same
provenance badges (provider, model, has_api_key, last rotated,
resolved source) without the form controls.
Hosted OAuth roadmap
Several connectors (Azure, GCP, Google Workspace) currently require the operator to do an Azure-AD-app or service-account dance before they can be added in AiSOC. This is fine for SOC engineers but a friction point for everyone else. Hosted OAuth is the planned solution:
- Phase 1 (current): connector schemas already declare OAuth metadata (
authorize_url,token_url,scopes,supported_in_hosted). The UI shows an "OAuth coming soon" badge for capable connectors. - Phase 2 (planned): AiSOC Cloud (and self-hosted with an OAuth-app config) will surface a one-click flow that exchanges an authorization code for an offline refresh token, encrypts it via the vault, and stores it as the connector's
auth_config. - Phase 3 (planned): scoped tokens for self-hosted air-gapped installs that cannot reach an OAuth provider — operators ship a refresh-token bundle out-of-band.
The CredentialVault and connector schema model are stable shapes. Adding hosted OAuth does not require any change to existing connector implementations.
Auditing
Every encrypt and decrypt call emits a structured log line through structlog:
event=credential.encrypt connector_id=<uuid> tenant_id=<uuid> bytes=412
event=credential.decrypt connector_id=<uuid> tenant_id=<uuid> caller=scheduler
The plaintext is never logged. Pair this with your existing log pipeline (Elastic, Datadog, Splunk, Loki) to alert on unexpected decrypt patterns — e.g. a decrypt outside the scheduler context, or a sudden burst of decrypts from a single tenant.
Threat model summary
| Threat | Mitigation |
|---|---|
| Stolen DB backup | Useless without AISOC_CREDENTIAL_KEY |
| Read-only DB role compromised | Cannot read secret payloads (only metadata) |
| Plaintext secret in API logs | Vault never logs plaintext; secret fields are filtered before request logging |
| Plaintext in HTTP response to UI | API redacts secret-typed fields on read |
| Compromised connector pod | Plaintext exposed only in-memory for the duration of one fetch |
| Lost master key | Catastrophic — stored credentials are unrecoverable. This is the intended failure mode. Recovery path: rotate every upstream credential and reconfigure each connector instance. |
| Compromised master key | Rotate via MultiFernet (see above), then rotate every upstream credential as a precaution |
Related
- Connectors overview — how the vault fits into the broader connector lifecycle.
- Environment Variables — full env reference including
AISOC_CREDENTIAL_KEYand related settings.