The Lab

Experiments in quantifying the uninsurable.

I am currently building AION, an AI-native underwriting engine designed to bridge the gap between technical space data and insurance capital.

This feed documents the build in real-time: the technical bottlenecks, the architectural pivots, and the solutions required to deploy a working Risk Engine.

Active Module: M2 (Context & Explainability)

Saif Shah Saif Shah

AION: Module 1 Phase 3-5

By the end of Phase 2, AION's risk engine could model realistic failure curves and price missions—but it wasn't production-ready. Phases 3, 4, and 5 transformed it from prototype to platform: 7 phase-specific actuarial models, regulatory compliance tracking, Lloyd's-grade UI, and <0.3s response times. Built using AI agent orchestration across research, planning, and execution. The result: a dependable underwriting workbench that real underwriters can trust. This is the story of that transformation—the obstacles, the iterations, and the systems thinking that made it possible.

From Prototype to Production: Building an Underwriting Engine

By the end of Phase 2, AION's risk engine could model realistic failure curves, price missions, and explain its reasoning.

That was a milestone — but it still wasn't a tool an underwriter could rely on day-to-day.

The gap between "working model" and "production system" turned out to be wider than I expected. Phases 3, 4, and 5 were about closing it.

These phases weren't about training new models or adding clever features. They were about turning a model into a system: stable, secure, testable, exportable, and usable.

AION had to shift from a prototype into something with structure, memory, repeatability, and the beginnings of operational discipline.

Here's how that transformation unfolded.

Phase 3 — Systemisation: Modeling How Insurance Floors Actually Work

Phase 3 focused on building the spine of Module 1: the architecture that would let this engine operate inside a real underwriting syndicate.

This wasn't just technical refactoring—it was modeling how insurance floors actually work.

A Unified Architecture

I consolidated the entire module behind a single flow:

Mission → Risk Engine → Pricing Engine → Explainability → Exports/UI

This introduced clean separation between:

  • Technical risk

  • Business factors

  • Compliance constraints

  • Environmental modifiers

  • Explainability

Everything finally pointed in the same direction. The engine could now answer questions like: "Is this mission risky because of the tech, or because of the operator's track record?" Those layers were finally distinct.

RBAC: Permissions That Understand Hierarchy

I implemented Role-Based Access Control (RBAC), distinguishing between Viewers, Analysts, and Underwriters.

This wasn't just permission bits; it was about modeling the actual hierarchy of an insurance floor:

  • A Viewer can see assessments

  • An Analyst can price missions

  • But only an Underwriter can approve exports and sign off on quotes

Small detail, but it signals that AION understands the business, not just the math.

A Proper API Layer

The API now has:

  • Versioned routes (/api/v1/...)

  • Strict request/response schemas

  • Predictable validation

  • Stable error shapes

Once the schema was locked, downstream components stopped breaking. This was a small change with a huge effect: integration became stable.

The Orchestrator Pattern

The Risk Engine became modular, predictable, and extensible.

Each model (Weibull, Bayesian, Reliability, Environment) now plugs into a single orchestrator that handles:

  • Model execution

  • Graceful fallback

  • Agreement scoring

  • Normalisation

  • Unified output

This eliminated entire categories of bugs and made calibration easier.

Support for Multiple Mission Types

Phase 3 introduced structured support for LEO, MEO, and GEO—each with its own logic and failure behaviour.

This removed one of the biggest silent flaws from early versions: treating all missions as interchangeable.

Phase 4 — Iteration: Building the Underwriting Workbench

Phase 4 was the most demanding phase of Module 1 so far.

It wasn't about algorithms—it was about iteration. Dozens of small sprints. Daily refinements. Continuous rewrites of the UI and logic surfaces until things felt consistent.

It was the closest experience yet to real product building.

The UI Overhaul: Making Risk Legible

Phase 4 forced me to take the UI seriously.

I'd been treating it as a wrapper around the models, but every time I tested the flow, I'd lose track of where in the mission lifecycle the risk was coming from. The interface wasn't just unclear—it was hiding the logic.

So I built the 7-Phase Mission Timeline—not just a list of dates, but a visual decision path that traces risk from Manufacture through Decommissioning. You can see historical incidents (like Intelsat 33e) mapped directly onto the timeline, showing exactly when and why certain missions failed.

What started as a functional layout became a fully restructured Underwriting Workbench:

  • A calmer, darker mission-view aesthetic

  • Clear left-to-right hierarchy

  • Risk band + confidence indicators

  • Pricing decomposition

  • Compliance status

  • Mission metadata

  • Environmental modifiers

  • A unified assumptions drawer

The design language shifted too. I moved to a Lloyd's-grade palette: navy, teal, and brass—the colours of the London insurance market. This wasn't aesthetics for its own sake. If you're building a tool for underwriters, it has to look like it belongs on their desk.

Every time I rewired something in the backend, the UI had to be reconsidered. Every time the UI exposed a gap in thinking, the backend had to be adjusted.

This back-and-forth created one of the deepest learning loops of the entire project.

Underwriter Export Packs

Phase 4 introduced exportable underwriting packs:

  • Mission summary

  • Risk assessment

  • Pricing components

  • VaR/TVaR

  • Compliance notes

  • Top risk drivers

  • Assumptions

Each pack is signed with a SHA-256 hash, making the result tamper-evident.

This is the first time AION produced an artefact you could send to a colleague and say, "This is the underwriting file."

Scenario Engine: Revealing Trade-Offs

Underwriters think in deltas:

  • "Show me the difference if TRL drops"

  • "What if lifetime increases?"

  • "What if orbit shifts?"

The scenario engine made this possible. It doesn't optimise—it reveals trade-offs.

Phase 4 Was Iteration at Scale

This phase became less about "finishing features" and more about wrestling the product into clarity:

  • Fixing drift between API and UI

  • Renaming fields for clarity

  • Aligning the risk and price narratives

  • Rewriting store logic

  • Adding proper null-safety

  • Reorganising assumptions

  • Refining mission types

  • Improving performance

  • Tightening validation

By the end, Module 1 finally felt integrated. Not perfect—but coherent.

Phase 5 — Hardening: Discipline as Character

Phase 5 gave Module 1 something new: discipline.

This phase wasn't glamorous, but it changed the system's character.

The Actuarial Spine

Hardening wasn't just about writing tests. It was about replacing generic allocation logic with 1,238 lines of phase-specific actuarial code—seven distinct models, one for each mission phase:

  • Launch: Bernoulli distributions across 15 launch vehicles (Falcon 9, Ariane 6, Electron, etc.)

  • LEOP (Launch and Early Orbit Phase): Early-orbit failure curves based on historical data

  • In-Orbit: Weibull survival models calibrated to mission life

  • Payload Mission Deployment: Component-level reliability tracking

  • Collision Avoidance: Conjunction risk and debris environment

  • Quality Assurance: Manufacturing and testing rigor assessment

  • Decommissioning: Post-Mission Disposal (PMD) reliability tracking

This wasn't cosmetic. It meant that when AION says a GEO mission has a 3.4% annual loss probability, it's not guessing—it's running 100,000 Monte Carlo simulations across phase-specific hazard functions.

And it's fast:

  • Pricing latency: ~0.24 seconds

  • Assessment: ~0.07 seconds

  • Export packs (with full audit trails): ~0.27 seconds

A Full Test Suite

Module 1 now has tests for:

  • Each individual model

  • API contracts

  • Integration flows

  • Performance thresholds

  • Scenario comparisons

This caught issues that would otherwise hide in quiet corners—mismatched schemas, missing fields, orbit crossover bugs, incorrect priors, and more.

Observability

Phase 5 added:

  • run_id

  • correlation_id

  • Structured JSON logs

  • Metrics endpoints

  • API health checks

A system you can observe is a system you can fix.

Database Evolution Without Fragility

Module 1 now uses a hybrid storage strategy:

  • Relational fields for common queries

  • JSONB for risk output, pricing, explainability, scenarios, and assumptions

This prevents migration churn and keeps the engine adaptable.

Regulatory Intelligence, Not Just Risk Calculation

One thing that separates AION from a prototype is that it understands regulatory constraints, not just technical risk.

The engine now tracks:

  • UKSA liability bands (UK Space Agency regulatory thresholds)

  • ISO 24113 compliance (space debris mitigation standards)

  • Sanctions screening (flagging components from restricted jurisdictions)

When a mission lacks a deorbit plan or uses non-compliant encryption, AION doesn't just note it—it adjusts the pricing and flags the specific clause that's at risk.

This is what underwriters actually need: a tool that knows the rules before a quote goes out.

What Actually Broke (And What I Learned)

Orbit Logic Bugs

For weeks, GEO missions were being priced using LEO failure curves. The engine treated all satellites as interchangeable, which meant a geostationary comms satellite was getting the same hazard rate as a low-orbit cube sat.

I discovered this when a test GEO mission came back with a 12% annual premium—absurdly high for a stable, high-altitude orbit.

The fix required creating explicit mission classes with orbit-specific parameters.

Weibull Sampling Noise

I started with 10,000 Monte Carlo samples because "more is better," right?

Wrong.

The tail probabilities got noisier, and pricing latency spiked to 2+ seconds. Dropping to 2,000 samples cut the noise, improved speed by 80%, and cost less than 2% accuracy.

Lesson: engineering is about finding the right trade-offs, not maximising everything.

Schema Drift

Early on, I'd change a backend model output and the frontend would silently break. A field rename would cascade into missing JSON keys three layers deep.

The fix: lock API contracts with versioned schemas and enforce validation at every boundary.

Boring infrastructure work, but it's what makes iteration safe.

Missing JSON Keys

Certain nested fields triggered failures deep in the pricing logic. I added defensive defaults across the board—no more silent nulls.

Performance Bottlenecks

Unindexed columns and recomputation loops caused spikes. Profiling cleaned this up. Performance became predictable, not just "usually fast."

Each obstacle forced a deeper understanding of how underwriting logic interacts with engineering constraints.

What I Actually Built

Across Phases 3, 4, and 5:

  • 7 phase-specific actuarial models (1,238 lines of production code)

  • 15 launch vehicle profiles with tier classification and historical reliability data

  • ~40 conditional risk drivers that adapt based on mission parameters

  • ~15 actionable mitigations with quantified premium impacts

  • 100,000-run Monte Carlo pricing engine (<0.3s response time)

  • Mission timeline visualisation tracing the decision path across 7 phases

  • Regulatory compliance engine tracking UKSA, ISO 24113, and sanctions risk

  • Export bundles with SHA-256 verification and PDF report generation

  • Full observability: correlation IDs, run tracking, model versioning

  • Complete UI workbench with unified AION design system

  • 55+ bug fixes including type safety, data scoping, and performance optimisation

How I Built It

Through agent orchestration and constant research iteration.

The workflow looked like this:

Research Phase:

  • Deep dive in Perplexity and Gemini for actuarial standards, space insurance regulations, and technical specifications

  • Cross-referencing Lloyd's practices, UKSA compliance requirements, ISO 24113 standards

Planning Phase:

  • Compare architectural approaches across GPT, Claude, and Gemini

  • Iterate on sprint plans until the logic was sound across all three models

  • Use disagreement between models as a signal to research deeper

Specification Phase:

  • Build detailed spec plans in Cursor with AI assistance

  • Break phases into concrete sprint tasks with acceptance criteria

  • Define API contracts, database schemas, and UI flows before writing code

Execution Phase:

  • Specialised agents for different domains:

    • Backend agent: FastAPI routes, SQLAlchemy models, actuarial engines

    • Frontend agent: React components, UI state management, chart libraries

    • Testing agent: pytest suites, contract validation, performance benchmarks

    • Planning agent: Sprint retrospectives, obstacle analysis, next-phase roadmapping

Iteration Cycle:

Research → Compare Plans → Spec → Build → Test → Debug → Refactor → Ship

It wasn't linear. It wasn't tidy. But each cycle made the system more honest.

The key insight: AI agents don't replace judgment—they accelerate iteration. I still made every architectural decision, but I could test 5 approaches in the time it used to take to implement one.

Why I Built It This Way

Because the goal isn't the "best model."

The goal is a model an underwriter can trust—and trust requires more than accuracy. It requires:

Stability
The engine produces the same result for the same inputs, every time. Deterministic seeding, locked schemas, versioned APIs. No surprises.

Explainability
Every risk score traces back to specific drivers. Every premium breaks down into pure risk + catastrophe load + expense + profit. If an underwriter can't explain it to a broker, the model failed.

Calibration
LEO missions price between 5-15%. GEO missions between 1-5%. These aren't arbitrary—they're benchmarked against Lloyd's market rates and historical loss data.

Predictable outputs
Pricing stabilised at ~0.24s. Assessment at ~0.07s. No latency spikes. No edge-case failures. Performance became a feature, not a variable.

Clear assumptions
The assumptions drawer shows every prior, every weight, every modifier. You can see exactly what the engine believes about the mission.

Controlled surface area
Seven phase-specific models. Fifteen launch vehicles. Forty conditional risk drivers. The engine is complex enough to be realistic, simple enough to be maintainable.

A messy but brilliant model is useless. A clear, reliable one is valuable.

And in insurance, "valuable" means: Would you stake your professional reputation on this assessment?

That's the standard Module 1 had to meet.

What I Learned

1. Architecture is behaviour
Fixing structure fixes logic. The orchestrator pattern didn't just organise code—it made the engine extensible. When I needed to add a new compliance check or a seventh actuarial model, the architecture already had a place for it. Good systems are designed to evolve.

2. Explainability is part of the model, not a feature you bolt on
If you can't explain a calculation cleanly, something is wrong with how you've designed it. AION's risk drivers aren't generated by black-box ML—they're explicit, traceable, and grounded in actuarial logic. Transparency isn't optional in regulated industries.

3. Systems thinking beats feature building
Patterns compound. Features decay. The orchestrator pattern, mission type classes, validation layers—these structures paid dividends across every sprint. A good abstraction saves you from writing the same code fifty times. A bad one forces you to refactor constantly.

4. Product emerges through iteration, not planning
Phase 4's UI overhaul taught me that most clarity comes from using the system, not designing it up front. I rewrote the mission view three times. Each version exposed assumptions I hadn't questioned. The final design wasn't planned—it was discovered.

5. Reliability is the real milestone
Module 1 isn't finished—it's never "finished." But it's dependable. Tests pass. Performance is predictable. Outputs are reproducible. That dependability is what unlocks everything else: investor demos, underwriter pilots, Module 2 integration. Without reliability, nothing else matters.

6. AI agents accelerate iteration, but judgment still matters
Using GPT, Claude, Gemini, and Cursor in parallel let me test five architectural approaches in the time it used to take to build one. But the AI didn't make the decisions—it surfaced options. The discipline was in choosing well, not generating fast.

From Prototype to Platform

Module 1 started as a risk model. It ended as an underwriting workbench.

The transformation wasn't just about adding features—it was about building the architecture of trust. Every decision—RBAC roles, phase-specific models, compliance tracking, export verification—was designed to answer one question:

Would an underwriter trust this assessment?

By the end of Phase 5, the answer was yes.

Module 1 is now frozen at V1. It's production-ready, and capable of supporting the rest of AION's vision: a full-stack space insurance platform.

Read More
AION Saif Shah AION Saif Shah

AION: Module 1 Phase 2

Phase 2 of AION’s Risk Engine was about moving from a working prototype to something an underwriter could actually trust. I rebuilt the core logic around survival curves, calibration, environmental sensitivities, and a clearer mission view to make risk reasoning easier, not flashier.

Building a Risk Engine That Underwriters Can Trust

Phase 1 of AION gave me something simple but valuable:
a loop that could take a mission, run a model, and explain its result.

Phase 2 raised the standard.
The question this time wasn’t “Can I build a risk model?” but:

“Can I build one an underwriter would trust for a first-pass view?”

That meant moving away from toy numbers and towards something closer to real insurance work: probability curves, calibrated priors, realistic failure rates, and a pricing engine that behaves like a disciplined actuarial tool, not a demo.

Opening the Model Up: Real Failure Curves

The first breakthrough in Phase 2 was replacing linear logic with proper survival analysis.

The engine now uses:

  • A Weibull lifetime curve
    (infant mortality → stable phase → wear-out)

  • A Bayesian update model for TRL and heritage

  • A reliability model tuned by orbit and launch vehicle

Hardware doesn’t fail in a straight line.
It fails according to curves, and capturing that shape immediately made the engine feel more aligned with historical space behaviour.

A positive example: fixing the inverted TRL priors.
In Phase 1, immature tech was being treated as safer — a comforting mistake.
Correcting it made the system more honest and more useful.

A Pricing Engine That Shows Its Working

Phase 2 introduced a Monte Carlo pricing engine with:

  • Premium bands

  • VaR95 / VaR99

  • Tail loss behaviour

  • Decomposition across pure, risk, catastrophe, expense, profit

  • Calibration against simple industry bands
    (LEO: 5–15% of SI, GEO: 1–5%)

The engine now runs 50,000 quantile samples (seeded for reproducibility) and treats price as a consequence of the probability curve, not a lookup table.

The most important addition was the explicit premium decomposition:

  • Here is what drove the risk.

  • Here is the impact of each adjustment.

  • Here is why the premium sits here.

It forced me to think like an underwriter rather than a developer.

Calibrated Priors: Bringing Discipline to the Numbers

Phase 2 introduced a more disciplined calibration layer:

  • Blend weighting: 0.09·model + 0.91·prior

  • Orbit caps: LEO 0.12, GEO 0.06

  • Realistic failure rates: Falcon 9 ≈ 97.7% success

  • Severity modelling:

    • 20% total loss

    • 80% partial loss (Beta distribution, 10–40% SI)

These numbers aren’t perfect, but they stop the model from being overconfident just because the Monte Carlo chart looks pretty.

Calibration isn’t a cosmetic step — it’s a form of intellectual honesty.

Adding Environmental Sensitivities

Space isn’t a static environment, so the model shouldn’t be either.

Phase 2 added:

  • Conjunction density → increases probability

  • Solar activity (solar_high) → increases severity and thickens the tail

These modifiers aren’t meant to be hyper-accurate.
They’re meant to teach the right behaviour:
Risk is dynamic.

Compliance: The First Signs of Real-World Constraints

I added a light compliance layer that addresses the following areas:

  • licensing

  • deorbit planning

  • anomaly reporting

  • grey-zone mapping (fallbacks.yaml)

If a mission is unlicensed, the model applies a simple, transparent rule:
+10% premium, via a feature flag.

It’s the first time AION began to link technical risk with regulatory exposure — something Phase 1 ignored entirely.

A Mission View That Reduces Cognitive Load

Phase 2 reorganised the UI into a single mission workbench:

  • Mission profile

  • Risk band + confidence

  • Pricing band + decomposition

  • Compliance flags

  • Environmental modifiers

  • An assumptions drawer showing the engine’s reasoning

  • Adjustment chips explaining signal impacts

It’s not polished.
But it has a calm structure you can actually think inside — which is more important at this stage than visual flair.

Reducing cognitive load was the whole point.

What This Phase Taught Me

Modelling is a negotiation with reality.

Every fix — inverted priors, overflow issues, curve tuning — revealed where intuition diverged from the world.

Underwriters don’t want a number; they want the reasoning trace.

The more transparent the model became, the more confident I became in its behaviour.

Calibration is honesty encoded as math.

Small parameters grounded the system more than any big feature.

Tools shape thinking.

Designing a single mission view changed the way I built the model itself.

What Phase 2 Sets Up

Phase 2 didn’t finish the engine.
It clarified the foundation.

It taught me to build something that:

  • behaves predictably

  • explains itself rigorously

  • stays within calibrated bounds

  • is structured for future extension

  • can sit in front of a professional without apology

Phase 3 will focus on refining the narrative layer:
clearer assumptions, more consistent reasoning, and a tighter link between risk, environment, compliance, and price.

But for the first time, the engine feels usable.

Not finished.
Not perfect.
But grounded enough to build on with confidence.

Read More