Skip to main content

Email Intake

The email intake pipeline monitors a Microsoft 365 shared mailbox, downloads attachments, extracts insurance application fields using heuristic pattern matching, and creates submission records in the underwriting pipeline. The system runs on a 15-minute cron schedule and delegates AI-powered extraction to a Durable Object agent.

Architecture

Microsoft 365 Shared Mailbox

│ (Graph API poll, every 15 min)

┌──────────────────────┐
│ sweepMailboxIngest │ ◄── Cron trigger on oi-sys-api Worker
│ (mailbox-ingest.ts) │
└──────────┬───────────┘

┌─────┴──────────────────────┐
│ │
▼ ▼
┌──────────┐ ┌─────────────────────┐
│ R2 Docs │ │ EmailIntakeAgent │
│ (attach) │ │ (Durable Object) │
└──────────┘ └──────────┬──────────┘

┌────────┴────────┐
│ │
▼ ▼
┌───────────┐ ┌────────────────┐
│submissions│ │SubmissionAgent │
│ (DB) │ │ (AI extract) │
└───────────┘ └────────────────┘

Cron Schedule

The mailbox ingest runs on the */15 * * * * cron schedule (every 15 minutes). It is a no-op when:

  • MAILBOX_INGEST_ENABLED is not set to "true"
  • The Microsoft Graph credentials are not configured (AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET)
  • GRAPH_SHARED_MAILBOX is not set

Each run processes up to 25 unread messages from the shared mailbox.

Configuration

The following environment variables control the email intake pipeline:

VariableDescription
MAILBOX_INGEST_ENABLEDSet to "true" to enable polling
GRAPH_SHARED_MAILBOXEmail address of the shared mailbox (e.g., submissions@)
AZURE_TENANT_IDMicrosoft Entra (Azure AD) tenant ID
AZURE_CLIENT_IDApp registration client ID
AZURE_CLIENT_SECRETApp registration client secret
GRAPH_MAILBOX_DEFAULT_ORG_IDFallback org ID when no per-mailbox mapping exists

Organization routing is resolved via KV lookup (mailbox-org:{email}) with fallback to GRAPH_MAILBOX_DEFAULT_ORG_ID and then to the literal string "default".

Processing Pipeline

1. Fetch Unread Messages

The cron job uses @openinsure/notify's createGraphMailboxReader to authenticate with Microsoft Graph and fetch up to 25 unread messages from the configured shared mailbox.

2. Deduplication

Each message is deduplicated using its RFC 2822 internetMessageId (the Message-ID header), which is stable across folder moves. The Exchange internal message ID changes when messages are moved between folders, so the internet message ID is the reliable dedup key.

Dedup keys are stored in Cloudflare KV with a 30-day TTL:

Key:   mailbox:msg:<internetMessageId>
Value: "1"
TTL: 30 days

If the key exists, the message is skipped.

3. Attachment Download

Attachments up to 25 MB are downloaded from Graph and uploaded to Cloudflare R2 under a structured path:

docs/{orgId}/inbox/{messageId}/{filename}

Each attachment preserves its original content type in the R2 metadata. Attachments exceeding 25 MB are silently skipped.

4. Email Body Extraction

HTML email bodies are stripped of tags and normalized to plain text. The system constructs a synthetic RFC 2822 message containing:

  • From header (sender address)
  • Subject header
  • Date header (received timestamp)
  • Message-ID header
  • Plain text body

5. EmailIntakeAgent Processing

The normalized email is POSTed to the EmailIntakeAgent Durable Object, which is instantiated per organization (intake-{orgId}). The agent performs two functions:

Heuristic field extraction parses the subject and body to identify:

FieldDetection Method
Insured namePattern matching for "Named Insured:", "Account:", "Client:", etc.
Line of businessKeyword matching against a map of insurance terms
Risk stateUS state abbreviation detection

Line of business detection recognizes the following terms:

Input PhraseMapped LOB
general liability, commercial generalGL
workers compensation, workers' compWC
cyber liability, cyber riskCyber
directors and officers, D&OD&O
errors and omissions, professional liabilityE&O
medical stop loss, stop lossMedStopLoss
commercial property, property insuranceProperty
umbrellaUmbrella

Endorsement detection flags emails that contain "endorse", "add driver", or "remove driver" in the subject or body. Endorsement submissions are created with priority: "high".

6. Submission Creation

The agent creates a new submission record with status received:

{
"id": "<generated UUID>",
"orgId": "<resolved org ID>",
"status": "received",
"priority": "normal",
"sourceEmail": "broker@example.com",
"insuredName": "Acme Freight LLC",
"lob": "GL",
"state": "TX",
"extractedData": {
"isEndorsement": false,
"rawSubject": "New GL submission - Acme Freight LLC",
"rawFrom": "broker@example.com",
"rawText": "...(first 8,000 characters)...",
"messageId": "<msg-id@example.com>",
"emailDate": "2026-03-24T14:00:00Z",
"attachments": [
{ "filename": "acord_125.pdf", "mimeType": "application/pdf" },
{ "filename": "loss_runs.pdf", "mimeType": "application/pdf" }
]
},
"missingItems": ["risk_state"],
"fraudFlags": [],
"referralReasons": [],
"declineReasons": []
}

The missingItems array tracks which required fields could not be extracted: insured_name, line_of_business, risk_state.

7. SubmissionAgent Handoff

After creating the submission, the EmailIntakeAgent spawns a SubmissionAgent Durable Object for AI-powered deep extraction. The SubmissionAgent processes the raw email text and any uploaded attachments (ACORD forms, loss runs, dec pages) using LLM-based extraction to fill in missing fields and enrich the submission data.

8. Event Emission

A submission.mailbox_ingested event is sent to the Cloudflare Queue with:

{
"type": "submission.mailbox_ingested",
"orgId": "550e8400-...",
"payload": {
"messageId": "<exchange-msg-id>",
"from": "broker@example.com",
"subject": "New GL submission - Acme Freight LLC",
"attachmentCount": 2
},
"timestamp": "2026-03-24T14:15:00Z"
}

Dual Intake Paths

The EmailIntakeAgent supports two intake paths:

Cloudflare Email Routing (preferred) — When configured, inbound emails are routed directly to the agent's onEmail handler via the Agent SDK. This path uses PostalMime to parse the raw email bytes and checks for auto-reply headers to skip bounce-backs and out-of-office responses.

Cron-based Graph polling (legacy) — The sweepMailboxIngest cron polls the Microsoft 365 shared mailbox and POSTs normalized emails to the agent's /email REST endpoint. This path is used when Cloudflare Email Routing is not configured or when the mailbox is hosted on Microsoft 365.

Both paths converge on the same extraction logic and submission creation code.

Auto-Reply Filtering

The Cloudflare Email Routing path automatically filters out auto-replies by inspecting email headers. Messages matching auto-reply patterns (out-of-office, delivery status notifications, bounce messages) are silently discarded.

Error Handling

The cron job tracks three counters per run:

CounterDescription
processedMessages successfully processed through the pipeline
skippedMessages skipped due to deduplication
errorsMessages that failed (logged via logNonFatal)

Individual message failures do not halt the batch. Each message is processed independently, and errors are logged with the Exchange message ID for troubleshooting.

Monitoring

The mailbox ingest cron is monitored through the standard Axiom observability pipeline. Key signals to watch:

  • cron.mailbox_ingest — top-level cron failures
  • cron.mailbox_ingest.message — per-message processing failures
  • The submission.mailbox_ingested event in the queue provides a real-time feed of successfully ingested submissions