Security & Compliance: Deploying AI Inside a Payer's Infrastructure

Research for DaisyAI's Premera Blue Cross engagement. Focused on practical realities of running LLM-based systems with PHI inside a health plan's secure environment.

Last updated: 2026-02-12


Table of Contents

  1. Premera-Specific Context
  2. PHI in AI/LLM Pipelines
  3. AI Security Gateway Architecture
  4. Enterprise AI Provisioning at Health Plans
  5. HIPAA and GenAI: BAAs, APIs, and Deployment Models
  6. Cloud Platform Comparison: Azure OpenAI vs. AWS Bedrock
  7. AI Governance Frameworks for Payers
  8. Emerging Regulations (Effective 2025-2026)
  9. Practical Deployment Patterns
  10. Implications for DaisyAI at Premera

Premera-Specific Context

What we know from the Feb 11, 2026 call:

DetailWhat It Means
All LLM calls go through their "AI security gateway"Centralized proxy/control plane between apps and LLM providers. We route through it, not around it.
Full tracing via Phoenix collectorArize Phoenix — open-source AI observability built on OpenTelemetry. Every prompt/response logged, traced, evaluated.
Data cannot leave their VPCNo calling external APIs directly. Must use Premera's own API keys via their gateway.
They use Premera's own Anthropic/OpenAI API relationshipsPremera holds the BAAs with Anthropic and OpenAI. We operate under their umbrella.
SRP tickets take ~4 weeks through 5 teamsProvisioning is bureaucratic — identity, network, data, security, and application teams all sign off.
Provisioning mistakes (global vs. data standard access) cause delaysGetting the wrong access tier means re-doing the SRP. Specificity matters upfront.

Premera's AI Governance

Premera has a public AI Practices page and a cross-functional Data & AI Ethics Committee with five principles:

  1. Be transparent — disclose when AI contributes to decisions
  2. Be fair — avoid unfair discrimination
  3. Protect privacy and security — CISO deeply involved in AI governance
  4. Be accountable — maintain human oversight
  5. Continually improve — iterate on AI safety practices

Premera was among 25+ payers/providers that signed the White House AI safety pledge for healthcare.

Premera's Security History

Context that explains their conservative posture:

  • 2014-2015 breach: APT group had unauthorized access for ~9 months, affecting 10.4 million individuals
  • $6.85M OCR HIPAA fine — 2nd largest ever at the time (HHS enforcement)
  • $10M state settlement (WA Attorney General) + $74M class-action settlement
  • Root cause: failure to conduct enterprise-wide risk analysis, inadequate audit controls, ignored auditor warnings

This history directly shapes their current security posture. They will be conservative. They will over-audit. They will require extensive documentation. This is rational behavior from their perspective.


PHI in AI/LLM Pipelines

The Core Problem

PHI + LLM = regulatory minefield. The question is not "can you do it?" but "under what conditions?"

What Constitutes PHI in LLM Context

Any of the 18 HIPAA identifiers combined with health information, including:

  • Patient names, DOBs, SSNs, MRNs in prompts
  • Clinical notes passed as context
  • Diagnosis codes linked to individuals
  • Claims data with member identifiers

How Payers Handle PHI with GenAI

Three dominant patterns emerging in production:

Pattern 1: De-identify Before Inference (Most Common)

  • Strip/replace all 18 HIPAA identifiers before prompt construction
  • Use NLP-based de-identification (e.g., John Snow Labs Healthcare NLP)
  • Tokenize identifiers with consistent pseudonyms for re-linking
  • Send de-identified data to LLM, re-link on response
  • Pro: Minimizes risk exposure. Con: Lossy — clinical context can be degraded

Pattern 2: In-VPC Processing with BAA-Covered APIs (Premera's Approach)

  • Keep all data within the organization's VPC
  • Route through BAA-covered cloud provider APIs (Azure OpenAI, AWS Bedrock)
  • Full audit trail via gateway
  • Pro: Full clinical fidelity. Con: Requires robust infrastructure and BAAs

Pattern 3: On-Premise/Private Inference

  • Self-host open-source models (Llama, Mistral, Meditron)
  • No data ever leaves organizational boundary
  • Pro: Maximum control. Con: Operational overhead, potentially lower model quality

Sources:


AI Security Gateway Architecture

What Premera Means by "AI Security Gateway"

An AI gateway (also called LLM proxy or LLM router) is a content-aware reverse proxy that sits between applications and LLM providers. Unlike a standard API gateway, it inspects prompt/response content.

Architecture Diagram (Logical)

┌─────────────────────────────────────────────────────────┐
│                    Premera VPC                          │
│                                                         │
│  ┌──────────┐    ┌──────────────────┐    ┌───────────┐ │
│  │ DaisyAI  │───▶│  AI Security     │───▶│ Anthropic │ │
│  │ App      │    │  Gateway         │    │ API (BAA) │ │
│  └──────────┘    │                  │    ├───────────┤ │
│                  │  • Auth (JWT)    │    │ OpenAI    │ │
│                  │  • PHI scanning  │    │ API (BAA) │ │
│                  │  • PII redaction │    └───────────┘ │
│                  │  • Rate limiting │                   │
│                  │  • Token budgets │    ┌───────────┐ │
│                  │  • Audit logging │───▶│ Phoenix   │ │
│                  │  • Policy rules  │    │ Collector │ │
│                  └──────────────────┘    │ (Tracing) │ │
│                                          └───────────┘ │
└─────────────────────────────────────────────────────────┘

Gateway Capabilities (Industry Standard)

Based on enterprise AI gateway patterns:

CapabilityWhat It DoesWhy It Matters
PII/PHI Detection & RedactionScans prompts for identifiers, optionally redacts before forwardingPrevents accidental PHI exposure
Policy EnforcementRules about which models, which data, which usersLeast-privilege for AI
Token Budget ManagementPer-user/per-app token limitsCost control and abuse prevention
Semantic CachingCache similar queries to reduce API callsPerformance + cost
Model RoutingRoute different request types to different modelsUse cheaper models for simple tasks
Prompt Injection DefenseFilter malicious prompt patterns (OWASP LLM01)Security
Full Audit TrailLog every interaction with user, timestamp, input, outputCompliance + forensics
Data Residency EnforcementEnsure requests only go to approved regionsRegulatory compliance

Common Implementations

  • Azure API Management AI GatewayMicrosoft Learn
  • Databricks Mosaic AI GatewayDatabricks
  • Apache APISIX AI GatewayAPISIX
  • Portkeyportkey.ai
  • Custom builds on Kong, NGINX, or Envoy with LLM-specific plugins

Phoenix Collector (Premera's Tracing)

Arize Phoenix is open-source AI observability:

  • Built on OpenTelemetry — vendor/framework agnostic
  • Traces LLM calls end-to-end: prompt → model → response → evaluation
  • Self-hostable — no external data egress required
  • Integrates with LangChain, LlamaIndex, direct API calls
  • Supports evaluation benchmarking and dataset versioning

Implication for DaisyAI: Our code must emit OpenTelemetry spans compatible with their Phoenix collector. This means instrumenting our LLM calls with the right trace context.

Sources:


Enterprise AI Provisioning at Health Plans

What Onboarding Looks Like

Based on enterprise patterns and what Premera told us about their 5-team, 4-week SRP process:

Step 1: Identity & Access Management (Week 1)

  • Contractor/consultant account creation in Premera's IdP (likely Azure AD/Entra ID)
  • MFA enrollment (required — HIPAA Security Rule NPRM makes this mandatory)
  • RBAC role assignment — critical to get right the first time
    • "Global access" vs. "data standard access" — the distinction Premera mentioned
    • Global = broader than needed, violates least privilege
    • Data standard = scoped to specific data domains (claims, clinical, member)
  • Background check, security training, HIPAA awareness certification

Step 2: Network Access (Week 1-2)

  • VPN provisioning into Premera VPC
  • Network segmentation — access only to approved subnets
  • Private endpoints for cloud services (no public internet paths)
  • Firewall rules allowing traffic to/from AI gateway

Step 3: Data Access (Week 2-3)

  • Determine which data domains the engagement requires
  • Provision database/data lake credentials with row/column-level security
  • PHI access requires specific justification per data element
  • Audit logging enabled on all data access

Step 4: AI Service Access (Week 3)

  • Register application with AI security gateway
  • Receive API keys/tokens scoped to approved models
  • Configure token budgets and rate limits
  • Set up Phoenix tracing integration

Step 5: Security Review & Go-Live (Week 3-4)

  • Security team reviews architecture, data flows, access patterns
  • Penetration test or vulnerability scan of deployed components
  • Sign-off from all 5 teams (identity, network, data, security, application)
  • Production access granted

Common Provisioning Pitfalls

PitfallWhat HappensHow to Avoid
Wrong access tier requestedRe-do the entire SRP (4 more weeks)Be extremely specific about data domains needed upfront
Missing data elements from requestCan't access needed claims fieldsMap out every data element before submitting SRP
VPN config issuesCan't reach internal servicesTest connectivity immediately, don't wait
Expired credentialsLocked out, need IT ticketTrack expiration dates, renew proactively
Overly broad access requestSecurity team rejects, sends back for scopingStart narrow, expand only if justified

Sources:


HIPAA and GenAI

Can You Send PHI to Claude/GPT APIs?

Yes, under specific conditions:

  1. BAA must be in place between the covered entity (or business associate) and the API provider
  2. API provider must be HIPAA-eligible for that specific service
  3. Data handling must meet HIPAA Security Rule requirements (encryption, access controls, audit trails)
  4. No training on PHI — the provider must contractually agree not to use PHI for model training

Anthropic (Claude)

  • BAA available on Enterprise plans — Anthropic Privacy Center
  • HIPAA-ready Enterprise plans available — Claude Help Center
  • Claude for Healthcare launched Jan 2026 with HIPAA-compliant configurations for enterprise
  • BAA covers first-party API usage; specific use cases reviewed before BAA execution
  • Key: BAAs signed before Dec 2, 2025 cover API only, not the Enterprise plan
  • Also available via AWS Bedrock (under AWS BAA) — this is likely Premera's route

OpenAI

  • BAA available for API services — OpenAI Help Center
  • Available via Azure OpenAI Service (under Microsoft BAA) — more common for enterprise healthcare
  • Zero data retention option available
  • Does not use customer data for training when BAA is active

Critical BAA Clauses for AI Systems

Standard BAAs need enhancement for LLM use cases. Must explicitly address:

ClauseWhy It Matters
No training on PHIPrevent patient data from entering model weights
Data retention limitsDefine how long prompts/responses are stored
Subcontractor flow-downBAA obligations pass to any sub-processors
Breach notification timelineUsually 60 days max under HIPAA, often negotiated shorter
Model versioningWhich model versions are covered by the BAA
Incident responseProcess for AI-specific incidents (hallucination causing harm, data exposure in outputs)

HIPAA Security Rule NPRM (Dec 2024)

The proposed update to the HIPAA Security Rule explicitly addresses AI:

  • Requires inventory of all AI technologies that interact with ePHI
  • AI tools must be included in risk analysis and risk management
  • Vulnerability scanning every 6 months, penetration testing annually
  • MFA required (removing "addressable" distinction — everything is "required" now)
  • Entities must monitor for known vulnerabilities and patch promptly

If finalized, this means every payer using AI with PHI must formally track and assess their AI systems as part of HIPAA compliance.

Sources:


Cloud Platform Comparison

Azure OpenAI Service

FeatureStatus
HIPAA eligibleYes (text models; preview features excluded)
BAA mechanismMicrosoft DPA (Data Protection Addendum) — automatic for all customers
Data retentionNo prompt/completion data stored for training; opt-out of all logging available
Network isolationVNet, private endpoints, Azure AD RBAC, Conditional Access
PHI in promptsAllowed under BAA with proper safeguards
Realtime API (audio)NOT HIPAA-eligible (still in preview)
Model training on dataExplicitly prohibited — customer data never used to retrain

Likely Premera pattern: Azure OpenAI via private endpoint within their Azure VPC, accessed through AI security gateway.

AWS Bedrock

FeatureStatus
HIPAA eligibleYes — included in AWS BAA
BAA mechanismAWS Business Associate Addendum
Data retentionCustomer data not shared with model providers, not used to improve base models
Network isolationAWS PrivateLink, VPC endpoints, IAM with least privilege
PHI in promptsAllowed under BAA
EncryptionAES-256 at rest, TLS 1.2+ in transit
Models availableClaude (Anthropic), Llama, Titan, others
MonitoringCloudTrail + CloudWatch (configured to exclude PHI from logs)

Shared Responsibility: AWS secures the infrastructure; customer secures their data, access controls, and application logic.

On-Premise / Private Inference

ModelUse CaseNotes
Llama 3 (70B/8B)General clinical NLPMeta open-source, deployable on-prem
Meditron (7B/70B)Medical-domain tasksBuilt on Llama 2, trained on medical corpus
Mistral 7BClinical note summarizationEfficient, good for constrained environments
John Snow LabsClinical NLP/de-identificationCommercial support, HIPAA-focused

On-prem models deployed via vLLM, NVIDIA Triton, or similar inference servers. Cost-effective at scale but requires ML engineering capacity.

Sources:


AI Governance Frameworks for Payers

NIST AI Risk Management Framework (AI RMF 1.0)

The primary framework payers reference. Four core functions:

  1. GOVERN — Establish policies, roles, accountability for AI risk
  2. MAP — Identify and categorize AI risks in context
  3. MEASURE — Assess and track AI risks quantitatively
  4. MANAGE — Prioritize and act on identified risks

2025 updates pushed organizations from planning to operationalizing AI risk management. RMF 1.1 guidance expected through 2026.

ECRI Institute named AI the #1 health technology hazard for 2025, pushing payer adoption of formal frameworks.

AHIP Position (Health Plan Industry)

AHIP published a one-pager (May 2025) emphasizing:

  • AI increases access to quality care and improves health outcomes
  • Health plans are investing in governance models and accountability frameworks
  • Common challenges: fragmented data, unclear value measurement, limited governance, difficulty scaling responsibly

AHIP hosted sessions in 2025 on how health plans can manage AI portfolios for strategic alignment with enterprise goals and regulatory expectations.

ONC Health IT Certification (HTI-1)

The HTI-1 Final Rule established first-of-its-kind AI transparency requirements for certified health IT:

  • Algorithmic transparency: Provide baseline information about AI/predictive algorithms
  • FAVES criteria: Fairness, Appropriateness, Validity, Effectiveness, Safety
  • USCDI v3 as baseline standard by Jan 1, 2026
  • Compliance deadline: Feb 28, 2026

However: The Trump administration's HTI-5 proposed rule (2025) would remove "model card" requirements and eliminate 50%+ of certification criteria. Status uncertain — watch this space.

Premera's Framework

Premera's governance maps to industry patterns:

  • Cross-functional Data & AI Ethics Committee
  • Five principles (transparent, fair, private/secure, accountable, improving)
  • CISO involvement in AI governance
  • White House safety pledge signatory

Sources:


Emerging Regulations

Federal: CMS Rules for Medicare Advantage AI

CMS Guidance on AI in Coverage Decisions:

  • MA orgs may use algorithms to support decisions, but full responsibility remains with the insurer
  • Every coverage decision must rely on individual member circumstances — not just algorithmic output
  • All coverage criteria used by algorithms must be publicly accessible — no black-box decisions
  • Two explicit prohibitions:
    1. Predictive algorithms cannot apply non-public internal criteria
    2. AI cannot shift or alter coverage criteria over time

Prior Authorization Transparency (2026):

  • MA orgs must publish list of all items/services requiring PA
  • Must report 8 distinct PA metrics (approval/denial rates, turnaround times) at contract level
  • Suspended: Health equity expertise requirements for UM committees and plan-level disparity reports (June 2025)

WISeR AI Pilot Program (January 2026):

  • CMS testing AI for PA screening on select Medicare services
  • AI companies handle initial screening; human clinician reviews all denials
  • AI companies prohibited from compensation tied to denial rates
  • Covers: skin/tissue substitutions, nerve stimulator implants, knee arthroscopy

Sources:

Federal: HIPAA Security Rule NPRM

See HIPAA and GenAI section above. Key additions if finalized:

  • Mandatory AI technology inventory
  • AI included in formal risk analysis
  • All security specs become "required" (no more "addressable")
  • Vulnerability scanning every 6 months, pen testing annually

State-Level AI Insurance Regulations

NAIC Model Bulletin (Adopted by 24 States as of March 2025)

The NAIC Model Bulletin on Use of AI Systems by Insurers (Dec 2023) requires:

  • Written program for responsible AI use
  • Risk management and internal audit for AI systems
  • Mitigation of adverse consumer outcomes
  • Governance framework covering all AI that affects regulated insurance practices

Adopted by: AK, CT, DE, IL, IA, KY, MD, MA, MI, NE, NV, NH, NJ, NC, PA, RI, VT, VA, WA, WV, WI, DC, and others.

California (Effective Jan 2026)

  • Health plans/insurers cannot rely solely on automated tools for coverage decisions
  • Any adverse determination must be reviewed by a licensed clinician
  • Must disclose when AI contributes to a decision
  • Accessible appeals processes required
  • GenAI developers must disclose training data sources and apply watermarking

Colorado AI Act (Enforcement June 30, 2026)

  • Toughest state framework: disclosure required when AI is used in high-risk decisions
  • Annual impact assessments
  • Anti-bias controls
  • Record-keeping for 3+ years
  • Applies to health benefit plans (effective Oct 15, 2025 for unfair discrimination rules)

Connecticut

  • Limits insurers' use of AI to deny medical care coverage
  • Aligned with NAIC Model Bulletin

Sources:


Practical Deployment Patterns

Pattern 1: Proxy Gateway with De-identification (Most Deployed)

App → De-ID Engine → AI Gateway → Cloud LLM API (BAA) → Gateway → Re-ID → App
  • De-identify PHI before it hits the LLM
  • Gateway enforces policies, logs everything
  • Re-link identifiers on the way back
  • Used when: organization wants to minimize PHI exposure to cloud providers

Tools: John Snow Labs NLP, AWS Comprehend Medical, custom NER models

Pattern 2: In-VPC Cloud API with Full PHI (Premera's Pattern)

App → AI Gateway (VPC) → Private Endpoint → Cloud LLM API (BAA) → Gateway → App
         │                                                            │
         └────────────── Phoenix Collector (Traces) ──────────────────┘
  • PHI stays within VPC boundary
  • Cloud API accessed via PrivateLink/private endpoint (no public internet)
  • BAA covers PHI handling end-to-end
  • Full audit trail via gateway + tracing
  • Used when: organization has BAA with LLM provider and strong network controls

Pattern 3: Private Inference (Air-Gapped)

App → Local Inference Server (vLLM/Triton) → App
  • Self-hosted models on organization's hardware/VPC
  • Zero data egress
  • Models: Llama 3, Mistral, Meditron, domain-fine-tuned variants
  • Used when: maximum security requirements, or when cloud APIs can't handle specific use cases

Pattern 4: Hybrid (Emerging)

                  ┌─ Simple tasks → Local small model (7B)
App → Router ─────┤
                  └─ Complex tasks → Cloud API (Claude/GPT) via gateway
  • Route by complexity/sensitivity
  • Sensitive summarization → local model
  • Complex clinical reasoning → cloud API under BAA
  • Cost optimization + security optimization

What's Actually Working in Production

Based on the research:

  1. Azure OpenAI via private endpoint is the most common pattern at large payers (Microsoft's healthcare presence is dominant)
  2. AWS Bedrock is growing, especially for organizations already on AWS
  3. De-identification before LLM is the "safe" approach most compliance teams accept first
  4. Full-PHI via BAA is where mature organizations are moving — the de-ID approach loses too much clinical context
  5. On-prem inference is mostly at academic medical centers doing research, not yet common at payers
  6. AI gateways are becoming standard — the pattern of centralized control, audit, and policy enforcement is converging across the industry

Sources:


Implications for DaisyAI at Premera

What We Must Do

  1. Instrument for Phoenix: All LLM calls must emit OpenTelemetry traces compatible with Premera's Phoenix collector. This is non-negotiable — budget time for instrumentation.

  2. Route through their gateway: No direct API calls to Anthropic/OpenAI. Our code calls Premera's AI gateway endpoint. We need their gateway URL, auth credentials, and supported models.

  3. Submit precise SRP requests: Map out exactly which data domains we need access to (claims, clinical, member demographics, provider data). Specify the exact access tier. Mistakes cost 4 weeks.

  4. Prepare for audit: Every design decision about data flow, PHI handling, and LLM usage will be scrutinized. Document architecture decisions proactively.

  5. No data extraction: Nothing leaves the VPC. No copying data to our systems. No screenshots of PHI. No local development with real data.

What We Should Prepare Before Day 1

ItemWhy
Architecture doc showing data flowsSecurity team will ask for this immediately
List of data elements needed per use casePrevents SRP rework
OpenTelemetry integration planShows we understand their tracing requirements
De-identification strategy (even if not primary pattern)Demonstrates defense-in-depth thinking
HIPAA training certificatesRequired for all personnel accessing PHI
Incident response planWhat happens if our system exposes PHI

Risk Factors

RiskLikelihoodImpactMitigation
SRP provisioning delaysHigh4+ weeks lostSubmit SRPs immediately, be precise
Wrong access tier provisionedMedium4 weeks reworkReview with Premera contact before submission
Gateway compatibility issuesMediumDays of debuggingGet gateway API docs early, build against them
Phoenix tracing format mismatchLowDays of reworkValidate OTel span format with their team
Model availability via gatewayLowArchitecture changeConfirm which Claude/GPT models are available
Regulatory change mid-engagementMediumScope changeTrack CMS/state rules actively

Key Takeaways

  1. Premera's approach is Pattern 2 — in-VPC processing with BAA-covered cloud APIs, full PHI in prompts, centralized gateway, full tracing. This is the mature enterprise pattern.

  2. Their security posture is shaped by the 2014 breach — expect conservatism, documentation requirements, and thorough audit. Work with it, not against it.

  3. The regulatory environment is tightening — CMS AI guidance, NAIC model bulletin adoption (24 states), California/Colorado laws, HIPAA NPRM all create compliance pressure. Our product helps payers navigate this.

  4. Anthropic and OpenAI both offer BAAs — but Premera holds the relationships. We operate under their umbrella. No need for us to establish separate BAAs.

  5. The AI gateway pattern is industry-standard — this isn't Premera being unusual. Every large payer is converging on this architecture. What we learn here transfers directly to other payer deployments.

  6. Provisioning is the bottleneck — not technology. 4 weeks through 5 teams. Plan accordingly. Be precise in access requests. Build against mock data until real access is granted.

Daisy

v1

What do you need?

I can pull up the fundraise pipeline, CRM accounts, hot board, meeting notes — anything in the OS.

Sonnet · read-only