Principles

The honest version.

Privacy architecture

Each engagement runs inside a chamber resolved for a single House. Documents, prompts, and outputs do not leave that chamber. The chamber row carries a flag, training_eligible, that the database forces to false: a check constraint and a trigger refuse any write that sets it true, including from the service role. The privacy commitment is structural, not just contractual.

Tenant data is isolated by row-level security at the database, with an explicit ownership check in every authenticated route. Source documents live in a private bucket scoped per House at the storage policy: the first path segment must equal the caller’s House id, enforced by the database. Reads are served as signed URLs with a sixty-second window, and every download verifies the stored SHA-256 hash before the bytes are returned.

The default chamber routes inference to Anthropic under a zero-retention enterprise agreement. A self-hosted chamber on the House’s own hardware is a customer-operated deployment shape — distributed as a signed container image, with inference against an open-weight model (Llama 3.x or Mistral-class) over an OpenAI-compatible endpoint. The chamber adapter for that path refuses any code path that would reach the Anthropic SDK or AWS, by construction.

Chamber · single-tenant

Documents → chamber → memorandum · audit beneath

Schematic · v 2026

InferenceAnthropic · ZDR

KeysPer-House DEK

AuditAppend-only · 7Y

ResidencyUS · UK · EU default

Tenant-scoped chamber · zero-retention inference

Closed by construction

Documents are encrypted at rest under AES-256 and in transit under TLS 1.3. Connector credentials are encrypted with AES-256-GCM under a separate application-layer key. Data residency is a tenant election: United States, United Kingdom, or the European Union by default; Switzerland and Singapore are available on request and provisioned per House. Document retention is configurable and defaults to keeping documents until you delete them. The audit log is append-only at the database — UPDATE and DELETE are revoked from the application role and a trigger raises an exception on either — partitioned monthly and archivable to an evidence bucket under object lock, with the retention term set in the data-processing agreement.

We do not say “air-gapped” or “one-hundred-percent private” as shorthand. Privacy on this system is configurable, and the configuration matters. If the shape of an engagement makes the default unsuitable, we say so — and arrange the upgrade — before an agreement is signed.

Privacy posture

Per-House cryptography · single-tenant

v 2026 · §appendix

KeyKindScope

DEKAES-256-GCMPer-House
KEKEnvelope · wraps DEKPer-House
HMACSHA-256Per-House
ProviderZDR enterpriseInference
ResidencyUS · UK · EU defaultCH · SG on request

Single-tenant cryptography · per-House isolation

Active · revocable per-House

Security posture

Authentication is magic-link by email, sent through Resend on our own domain. Passwords are not supported. Single-factor social sign-in is not supported. Session cookies are SameSite-Lax. SSO via OIDC or SAML is available on request, configured against the House’s identity provider before the first sign-in.

Service-role access — the credential that bypasses row-level security — is reserved to a small list of paths: webhook ingestion, the scheduled monthly cron, signed-token reads on the share page, the scanner callback, the audit-log writer, and the connector envelope helpers. Every other request runs as the authenticated session, with an explicit House ownership check layered on top of RLS for defence in depth.

Five document formats are accepted: PDF, DOCX, XLSX, PPTX, and EML. The allow-list is enforced at the upload route by both MIME type and extension, with magic-byte sniffing as the final arbiter. Every source passes through a content scanner before extraction — EICAR, a hash denylist, and macro-bearing Office files when the House has not opted in to macros. A failure quarantines the source, writes a source.quarantined audit event, and stops the engagement before any model sees the bytes.

A memorandum cites at the span level. Every claim points to a source-chunk identifier; the validator refuses to lock a draft with uncited claims, replaces uncited fields with a refusal sentinel, and the PDF exporter and the share route refuse to render a memorandum that has not validated. Share links are thirty-two-byte cryptographically random tokens, set to expire after thirty days, revocable at any time, and indistinguishable from an unknown link once expired or revoked. Per-House model spend is capped at a monthly budget set in the contract; the cap is enforced atomically by a database function after every model call, and partner alerts fire at eighty per cent and one hundred per cent of budget.

The chamber boundary encrypts each review envelope with the House's own data-encryption key, and the audit trail records every read and lock against the engagement, on a table the application role cannot update or delete.

Our SOC 2 Type I observation period began on 1 March 2026; the audit firm is being engaged, and the name will appear in the security questionnaire pack we share under NDA. Type II follows on the standard six-month cadence after Type I issues. ISO 27001 is on the roadmap. The first independent third-party penetration test is scheduled for the second quarter of 2026; the attestation will be available on request once issued. We do not claim a certification before it has been issued.

A subprocessor list is published at /principles/subprocessors and reviewed quarterly; we give every active House thirty days’ notice before a new subprocessor gains access. A data-processing agreement is available on request before any document is uploaded; the template we sign is published for review at /principles/dpa, and the master services agreement is at /principles/msa. Redlines are expected — both documents are accepted as a starting point.

Conduct

Material non-public information is treated with a presumption of care. Today that care is procedural: we do not accept engagements that are public-company-adjacent — work that would turn on material non-public information about a publicly listed issuer — and a partner screens for this at intake. A compliance role on the House gives a designated member read-only access to the full audit trail across the tenant — every read, every write, every model call, every share-link issuance.

Conflicts are checked manually before each engagement. Any existing position the House declares is noted in the engagement file; any related engagement the practice holds elsewhere is disclosed before the work begins. If a conflict cannot be cleared, we decline the work and explain why. We do not charge for the time taken to clear a conflict. Expert calls, where they are used, are sourced through the House’s own counsel or through a network the House has pre-cleared. We do not retain an expert network of our own. Transcripts, when they exist, are treated as source documents and cited accordingly.

Diligence OS is not a registered investment adviser. We do not make investment recommendations. A memorandum is a record of documented findings against a structured template; the decision rests with the House and its committee. The chamber answers questions inside an engagement, with citation, or it refuses — there is no free-form chat without provenance. And documents are never used to train any model, by us or by anyone we work with: the contract with our model provider says so, and the database refuses to mark a chamber otherwise.

Measurement

We do not headline accuracy, an “evaluated” badge, or a single composite score. Every number we publish names the metric, the sealed fixture it ran against, the chamber type — CI canned or real nightly — and the as-of date. The figures below describe test datarooms we control; they are not a guarantee that a live customer data room will score the same way.

The harness runs in three layers. On every pull request, a canned chamber gate exercises the production extractor and chunker against seven sealed fixtures and enforces citation and structural completeness — plumbing validation, not analyst-grade prose. At seven UTC each night, the same fixtures run through a real chamber that calls the model the way a customer engagement does; that run catches model drift the cheap gate cannot see. A third layer, prod-shadow retrieval, compares in-process fixture loading to live hybrid search; it is operator-triggered and, as of 2026-06-04, still failing its parity gate — retrieval tuning is in progress.

The philosophy behind the harness — why we tune refusal score over summariser score, and what happens when the model rotates — is written in The intelligence is the workflow. This section holds the scorecard; the letter holds the argument.

Real-chamber nightly · as of 2026-06-04

The percentages are the latest nightly capture on a sealed fixture. The release bar beside them is what we use internally to decide whether a workflow is stable enough to treat regressions as blocking — two consecutive real-chamber runs must clear the floors for that tier. For a principal, it is the honest maturity read: which workflow types we will stand behind on test datarooms during beta, and which are still under active tuning. It is not a warranty on your engagement.

Workflow	Citation	Planted recall	Truth recall	Release bar
Manager DD	44.4%	50%	53.1%	In progress
Direct IDD	100%	100%	74%	M3 met
ODD	100%	100%	78.7%	M3 met
Tax brief	100%	91.7%–100%	98.3%–100%	M1 met
Co-invest	100%	91.7%	81.7%	Nightly met
Secondary	100%	100%	98.3%	Nightly met
Governance	100%	100%	98.3%–100%	M1 met
Risk analysis	100%	100%	100%	M3 met

Release bar labels

In progress: The latest nightly run missed at least one floor for this workflow (citation, planted recall, or truth recall). We publish the observed scores but do not treat the workflow as stable until two consecutive runs clear the bar.
M1 met: Tier-A workflows (tax brief, secondary, governance): truth recall at or above ninety-five per cent, citation at or above ninety-nine per cent, planted recall at or above eighty per cent — on two consecutive nightly runs. Prose quality (grader score) may still sit below our internal polish target.
M3 met: Tier-B workflows (ODD, direct IDD, risk analysis): truth recall at or above seventy per cent, citation at or above ninety-nine per cent, planted recall at or above eighty per cent — on two consecutive nightly runs. The truth floor is lower because the sealed answer keys are larger; citation and planted floors are not.
Nightly met: The workflow clears the default nightly regression floors on observed runs (citation, planted, truth, and counterfactual rejection). Formal milestone labels differ by tier; this badge means the numbers above passed the standard nightly gate when captured.

Citation coverage: Share of factual claims in the draft that point to a verifiable source span — not document-level attribution.
Structural completeness: Whether the memorandum contains every section the workflow template requires.
Planted finding recall: Share of deliberately seeded inconsistencies in the sealed dataroom that appear in the draft.
Truth claim recall: Share of atomic facts in the sealed answer key the memorandum surfaces — paraphrase-tolerant, not verbatim copy.
Counterfactual rejection: Share of false claims the draft correctly did not state.
Grader score: Comparison of the draft to a human gold memorandum on a 1–5 rubric — observational; not a customer-facing guarantee.

Top-K retrieval recall is not published here. The in-process eval loader bypasses live search, so the metric scores null on nightly runs until chunk-id mapping for prod-shadow completes.

Fixtures are sealed — SHA-verified bundles under tests/evals/fixtures/; integrity tests run in CI (verify:fixture:*, parity tier gate).
Canned CI vs real nightly — the canned chamber emits schema-shaped responses; planted and truth scores on canned runs validate plumbing only (CANNED_BASELINES does not enforce planted or truth recall).
Prod-shadow retrieval — as of 2026-05-24, all three ADR 0062 workflows fail the five-point truth and citation parity gate under live retrieval (k=24). Retrieval parity is in progress.
Not a customer guarantee — a principal’s live engagement depends on document quality, completeness, MNPI routing, and reviewer sign-off. The scorecard measures regression on fixed fixtures.
Grader variance — several M1 workflows pass truth, citation, and planted floors but sit below the 3.5 grader floor; prose quality is tracked separately and is not headline-worthy.

Questions that deserve a written answer go to partners@diligence-os.com. Introductions for new engagements go through Request an introduction →.

You may also write to us — the note reaches a partner directly.