Email Intake
The email intake pipeline monitors a Microsoft 365 shared mailbox, downloads attachments, extracts insurance application fields using heuristic pattern matching, and creates submission records in the underwriting pipeline. The system runs on a 15-minute cron schedule and delegates AI-powered extraction to a Durable Object agent.
Architecture
Microsoft 365 Shared Mailbox
│
│ (Graph API poll, every 15 min)
▼
┌──────────────────────┐
│ sweepMailboxIngest │ ◄── Cron trigger on oi-sys-api Worker
│ (mailbox-ingest.ts) │
└──────────┬───────────┘
│
┌─────┴──────────────────────┐
│ │
▼ ▼
┌──────────┐ ┌─────────────────────┐
│ R2 Docs │ │ EmailIntakeAgent │
│ (attach) │ │ (Durable Object) │
└──────────┘ └──────────┬──────────┘
│
┌────────┴────────┐
│ │
▼ ▼
┌───────────┐ ┌────────────────┐
│submissions│ │SubmissionAgent │
│ (DB) │ │ (AI extract) │
└───────────┘ └────────────────┘
Cron Schedule
The mailbox ingest runs on the */15 * * * * cron schedule (every 15 minutes). It is a no-op when:
MAILBOX_INGEST_ENABLEDis not set to"true"- The Microsoft Graph credentials are not configured (
AZURE_TENANT_ID,AZURE_CLIENT_ID,AZURE_CLIENT_SECRET) GRAPH_SHARED_MAILBOXis not set
Each run processes up to 25 unread messages from the shared mailbox.
Configuration
The following environment variables control the email intake pipeline:
| Variable | Description |
|---|---|
MAILBOX_INGEST_ENABLED | Set to "true" to enable polling |
GRAPH_SHARED_MAILBOX | Email address of the shared mailbox (e.g., submissions@) |
AZURE_TENANT_ID | Microsoft Entra (Azure AD) tenant ID |
AZURE_CLIENT_ID | App registration client ID |
AZURE_CLIENT_SECRET | App registration client secret |
GRAPH_MAILBOX_DEFAULT_ORG_ID | Fallback org ID when no per-mailbox mapping exists |
Organization routing is resolved via KV lookup (mailbox-org:{email}) with fallback to GRAPH_MAILBOX_DEFAULT_ORG_ID and then to the literal string "default".
Processing Pipeline
1. Fetch Unread Messages
The cron job uses @openinsure/notify's createGraphMailboxReader to authenticate with Microsoft Graph and fetch up to 25 unread messages from the configured shared mailbox.
2. Deduplication
Each message is deduplicated using its RFC 2822 internetMessageId (the Message-ID header), which is stable across folder moves. The Exchange internal message ID changes when messages are moved between folders, so the internet message ID is the reliable dedup key.
Dedup keys are stored in Cloudflare KV with a 30-day TTL:
Key: mailbox:msg:<internetMessageId>
Value: "1"
TTL: 30 days
If the key exists, the message is skipped.
3. Attachment Download
Attachments up to 25 MB are downloaded from Graph and uploaded to Cloudflare R2 under a structured path:
docs/{orgId}/inbox/{messageId}/{filename}
Each attachment preserves its original content type in the R2 metadata. Attachments exceeding 25 MB are silently skipped.
4. Email Body Extraction
HTML email bodies are stripped of tags and normalized to plain text. The system constructs a synthetic RFC 2822 message containing:
Fromheader (sender address)SubjectheaderDateheader (received timestamp)Message-IDheader- Plain text body
5. EmailIntakeAgent Processing
The normalized email is POSTed to the EmailIntakeAgent Durable Object, which is instantiated per organization (intake-{orgId}). The agent performs two functions:
Heuristic field extraction parses the subject and body to identify:
| Field | Detection Method |
|---|---|
| Insured name | Pattern matching for "Named Insured:", "Account:", "Client:", etc. |
| Line of business | Keyword matching against a map of insurance terms |
| Risk state | US state abbreviation detection |
Line of business detection recognizes the following terms:
| Input Phrase | Mapped LOB |
|---|---|
| general liability, commercial general | GL |
| workers compensation, workers' comp | WC |
| cyber liability, cyber risk | Cyber |
| directors and officers, D&O | D&O |
| errors and omissions, professional liability | E&O |
| medical stop loss, stop loss | MedStopLoss |
| commercial property, property insurance | Property |
| umbrella | Umbrella |
Endorsement detection flags emails that contain "endorse", "add driver", or "remove driver" in the subject or body. Endorsement submissions are created with priority: "high".
6. Submission Creation
The agent creates a new submission record with status received:
{
"id": "<generated UUID>",
"orgId": "<resolved org ID>",
"status": "received",
"priority": "normal",
"sourceEmail": "broker@example.com",
"insuredName": "Acme Freight LLC",
"lob": "GL",
"state": "TX",
"extractedData": {
"isEndorsement": false,
"rawSubject": "New GL submission - Acme Freight LLC",
"rawFrom": "broker@example.com",
"rawText": "...(first 8,000 characters)...",
"messageId": "<msg-id@example.com>",
"emailDate": "2026-03-24T14:00:00Z",
"attachments": [
{ "filename": "acord_125.pdf", "mimeType": "application/pdf" },
{ "filename": "loss_runs.pdf", "mimeType": "application/pdf" }
]
},
"missingItems": ["risk_state"],
"fraudFlags": [],
"referralReasons": [],
"declineReasons": []
}
The missingItems array tracks which required fields could not be extracted: insured_name, line_of_business, risk_state.
7. SubmissionAgent Handoff
After creating the submission, the EmailIntakeAgent spawns a SubmissionAgent Durable Object for AI-powered deep extraction. The SubmissionAgent processes the raw email text and any uploaded attachments (ACORD forms, loss runs, dec pages) using LLM-based extraction to fill in missing fields and enrich the submission data.
8. Event Emission
A submission.mailbox_ingested event is sent to the Cloudflare Queue with:
{
"type": "submission.mailbox_ingested",
"orgId": "550e8400-...",
"payload": {
"messageId": "<exchange-msg-id>",
"from": "broker@example.com",
"subject": "New GL submission - Acme Freight LLC",
"attachmentCount": 2
},
"timestamp": "2026-03-24T14:15:00Z"
}
Dual Intake Paths
The EmailIntakeAgent supports two intake paths:
Cloudflare Email Routing (preferred) — When configured, inbound emails are routed directly to the agent's onEmail handler via the Agent SDK. This path uses PostalMime to parse the raw email bytes and checks for auto-reply headers to skip bounce-backs and out-of-office responses.
Cron-based Graph polling (legacy) — The sweepMailboxIngest cron polls the Microsoft 365 shared mailbox and POSTs normalized emails to the agent's /email REST endpoint. This path is used when Cloudflare Email Routing is not configured or when the mailbox is hosted on Microsoft 365.
Both paths converge on the same extraction logic and submission creation code.
Auto-Reply Filtering
The Cloudflare Email Routing path automatically filters out auto-replies by inspecting email headers. Messages matching auto-reply patterns (out-of-office, delivery status notifications, bounce messages) are silently discarded.
Error Handling
The cron job tracks three counters per run:
| Counter | Description |
|---|---|
processed | Messages successfully processed through the pipeline |
skipped | Messages skipped due to deduplication |
errors | Messages that failed (logged via logNonFatal) |
Individual message failures do not halt the batch. Each message is processed independently, and errors are logged with the Exchange message ID for troubleshooting.
Monitoring
The mailbox ingest cron is monitored through the standard Axiom observability pipeline. Key signals to watch:
cron.mailbox_ingest— top-level cron failurescron.mailbox_ingest.message— per-message processing failures- The
submission.mailbox_ingestedevent in the queue provides a real-time feed of successfully ingested submissions