Premera Day 0–30 Execution Plan
Created: 2026-02-16 Status: DRAFT — needs Thomas answers on open questions before finalizing
Context
Premera engagement: FDE/consulting at $150/hr, ~40h/week, 3-6 months. Two work streams: Auto-Authorization (augmenting existing system for ~6→hundreds of conditions) and Appeals (going to production March 1 with Elementum workflow). Working inside Premera's environment (Azure AKS, their API keys, Phoenix tracing, data stays in VPC).
Week 1: Environment + Data + First Workflow Baseline (Days 1-5)
Outcomes
- Credentialed and connected — laptops/VPN/credentials provisioned, access to Facets, ADT feeds, medical policies, Elementum
- Colt/Jamie kickoff complete — daily standup cadence agreed, Slack/Teams channel live, POC confirmed
- Appeals workflow mapped end-to-end — current state documented (2 appeal types in production March 1), AI decision points identified within Elementum, data flow diagrammed
- Auto-auth current state understood — reviewed existing 6-condition auto-auth logic, identified how GenAI layers in, gap analysis for scaling to hundreds of policies
- First baseline measurements captured — current appeal processing time, current auto-auth coverage %, nurse review time per case
Daily Breakdown
| Day | Focus | Deliverable |
|---|---|---|
| 1 | Onboarding: credentials, environment setup, security orientation | Access confirmed, dev environment running |
| 2 | Appeals deep-dive with Colt/Jamie: walk through Elementum workflow, 2 appeal types, AI touchpoints | Appeals workflow diagram v1 |
| 3 | Auto-auth deep-dive: review existing system, 6 conditions, data sources, GenAI integration points | Auto-auth architecture doc v1 |
| 4 | Data access: connect to ADT feeds, Facets, medical policies. Understand data schemas, quality, gaps | Data access inventory + gap list |
| 5 | Baseline metrics: pull current processing times, volumes, error rates. Week 1 retro with Colt | Baseline metrics doc + Week 2 plan |
❓ Questions Thomas Needs to Answer
- Q1: Provisioning timeline — Nathan warned SRP tickets take 4 weeks. Has Colt pre-staged any access? Do we need to submit requests NOW before contract signs?
- Q2: Which appeals work stream first? — They go live March 1 with 2 types. Do we embed in the March 1 launch or start on the next appeal types?
- Q3: Who is our daily POC? — Colt, Jamie, or someone on Nathan's team? This determines how fast we move.
- Q4: Can we get a sandbox/staging environment? — Or are we working directly in their production pipeline from day 1?
- Q5: Security review status — They haven't received our formal security deck yet. Is that a blocker for access?
Week 2: Pilot Workflow with Synthetic Data (Days 6-10)
Outcomes
- Appeals pilot running on synthetic data — AI parsing + reasoning pipeline processing sample appeal documents through Elementum decision points
- Auto-auth expansion prototype — selected 2-3 new conditions beyond the existing 6, built draft policy-to-criteria mapping
- LLM pipeline integrated with their security gateway — all calls routed through Premera's AI gateway, Phoenix tracing active
- First nurse feedback captured — showed appeal AI output to 1-2 nurses, documented what works / what's wrong
- Technical architecture documented — how our code fits into their stack, deployment approach, testing strategy
Key Activities
| Activity | Owner | Dependency |
|---|---|---|
| Build appeal document parser (extract key claims, supporting evidence, provider arguments) | Thomas | Access to sample appeal docs |
| Implement criteria-matching logic against InterQual for appeal review | Thomas | InterQual access/API or documentation |
| Create synthetic appeal dataset (10-20 cases covering 2 appeal types) | Both | Understanding of appeal types from Week 1 |
| Auto-auth: map 2-3 new medical policies to automation logic | Michael | Medical policy documents |
| Integrate LLM calls through Premera's AI security gateway | Thomas | Gateway credentials + docs |
| Set up Phoenix tracing for all AI inference calls | Thomas | Phoenix collector access |
| Demo to Colt: "here's what the AI sees when processing an appeal" | Both | Working prototype |
❓ Questions Thomas Needs to Answer
- Q6: InterQual access model — Can we call InterQual programmatically? Or is it a UI-only tool nurses use manually? This changes the architecture significantly.
- Q7: What does "synthetic data" mean here? — Can Premera provide de-identified real cases? Or do we generate from scratch? De-identified real data is 10x more useful.
- Q8: LLM model choice — Premera has Anthropic + OpenAI relationships. Which models are approved? Any restrictions on Claude vs. GPT for PHI?
- Q9: How do we handle the "no automated denials" constraint technically? — Every AI output that leans toward denial must route to human. Need to design the confidence threshold / escalation logic early.
Weeks 3-4: Quality Loop, Decision-Support Outputs, Internal Demos (Days 11-20)
Outcomes
- Appeals quality loop operational — AI outputs reviewed by nurses, feedback captured, prompts/logic tuned, measurable improvement across iterations
- Auto-auth expansion validated — 2-3 new conditions tested against real (de-identified) data, accuracy measured, ready for production review
- Decision-support outputs formatted for nurse workflow — outputs match Premera's templates, integrate into Elementum, require minimal manual editing
- Internal demo to Chad Murphy (CCO) — first time the business leader sees AI-augmented appeal review + auto-auth expansion in action
- Week 4 executive summary delivered — quantified results vs. baseline, roadmap for months 2-3, recommendation for expanded scope
Key Activities
| Activity | Owner | Dependency |
|---|---|---|
| Run 50+ appeal cases through pipeline, measure accuracy vs. nurse decisions | Both | Working pipeline + test cases |
| Build feedback UI/form for nurses to rate AI outputs | Thomas | Nurse availability |
| Tune prompts based on feedback (3+ iteration cycles) | Thomas | Feedback data |
| Format outputs to match Premera's existing summary templates | Michael | Template access (from Week 1) |
| Auto-auth: run new conditions against historical decisions, measure match rate | Both | Historical decision data |
| Prep Chad Murphy demo: curated examples, before/after, metrics | Both | Working prototypes |
| Draft Month 2-3 roadmap: what's next, what's needed, scaling plan | Both | Week 1-3 learnings |
| Deliver Week 4 executive summary | Michael | All above |
❓ Questions Thomas Needs to Answer
- Q10: Have you met Chad Murphy yet? — He's the CCO and key business decision-maker. Colt's team is technical. Chad controls whether this scales. When do we get in front of him?
- Q11: What does "success" look like to Chad vs. Colt? — Colt cares about technical capability. Chad cares about nurse productivity, compliance, cost. Need to present metrics that matter to both.
- Q12: Vendor insourcing target — Colt mentioned "Caroline" for advanced imaging reviews as a disruption candidate. Is that a Week 3-4 deliverable or Month 2+?
- Q13: What's the billing structure during ramp? — Full $150/hr from day 1? Or reduced rate during discovery? This affects how aggressive Week 1 can be.
Success Metrics
| Metric | Baseline (capture Week 1) | Week 2 Target | Week 4 Target | How Measured |
|---|---|---|---|---|
| Appeal processing time | TBD (current manual) | First AI-assisted time | 30%+ reduction | Time tracking per case |
| Auto-auth condition coverage | 6 conditions | 6 (understanding) | 8-9 conditions | Count of automated policies |
| AI output accuracy (appeals) | N/A | First measurements | >85% nurse agreement | Nurse review sample |
| AI output accuracy (auto-auth) | Existing system baseline | N/A | Match or exceed existing | Comparison to historical decisions |
| Review quality consistency | TBD (variance across nurses) | First measurements | Measurable reduction in variance | Inter-reviewer agreement |
| Stakeholder confidence | Low (haven't seen it work) | Colt team bought in | Chad demo positive | Qualitative feedback |
| Cycle time: idea → deployed | TBD (Nathan says weeks) | First deploy | Repeatable deploy process | Calendar tracking |
Risk-Adjusted Timeline
| Risk | Impact | Mitigation | Plan B |
|---|---|---|---|
| Provisioning takes 4+ weeks (Nathan's warning) | Week 1 is blocked | Pre-submit access requests NOW, before contract signs. Colt to champion internally. | Work on architecture/design docs + synthetic data while waiting |
| Data access is harder than expected | Can't baseline or build | Start with whatever Colt's team already has extracted. Use their existing data pipelines. | Build against synthetic data, validate later with real |
| InterQual is UI-only (no API) | Can't automate criteria matching | Build our own criteria extraction from policy docs | Manual criteria encoding for pilot conditions |
| Appeals March 1 launch is chaotic | Team too busy for us | Focus on auto-auth first, pick up appeals after launch stabilizes | Observe/document March 1 launch, design improvements for post-launch |
| Chad Murphy doesn't engage | Business side doesn't champion us | Use Colt as bridge. Deliver results that force the conversation. | Build relationship through Romilla (clinician team lead) |
Pre-Contract Actions (Do NOW)
These don't require a signed contract:
- Submit security deck to Premera — they haven't seen it yet, and it may be required for access
- Ask Colt about pre-staging access requests — if SRP tickets take 4 weeks, starting now saves the entire first month
- Request sample data — de-identified appeal docs, medical policies, auto-auth decision logs
- Confirm appeals March 1 timeline — are we joining that launch or starting after?
- Schedule Chad Murphy intro — even a 15-min hello before contract signs builds trust
- Decide individual contractor vs. company contract — Nathan recommends company. How long does that add?