Provider-Portable AI Architecture

Your path to AI-powered compliance without vendor lock-in. Use Amazon Bedrock today, Google Vertex AI tomorrow, Azure AI Foundry next week, or all at once.

5 Provider Adapters
1 Unified API
3 Routing Strategies
Zero Vendor Lock-in

The Strategic Challenge

You face a fundamental tension in AI adoption for regulatory compliance.

Immediate Value Delivery

You need to leverage cutting-edge AI capabilities for compliance automation, risk assessment, and regulatory intelligence today. Waiting means falling behind competitors and increasing regulatory exposure.

🔒

Strategic Flexibility

You require the freedom to choose the best provider for each workload, switch providers as pricing and capabilities evolve, and avoid lock-in that constrains future technology decisions.

🛡

Governance Requirements

Your organization demands consistent security, privacy, and audit controls regardless of which AI provider processes requests. Compliance cannot be an afterthought.

💰

Cost Optimization

You want to route workloads to the most cost-effective provider without sacrificing quality. Different tasks demand different models, and your architecture should support intelligent routing.

The Model Gateway Architecture

Your applications never communicate directly with foundation model providers. All AI interactions flow through a Model Gateway you own and control.

Your Applications
Compliance Automation
Risk Assessment
Regulatory Monitor
RAG Copilot
Model Gateway
Unified API Contract • Intent-Based Routing • Policy Engine • Prompt Registry • Safety Controls • Observability
Providers
Amazon Bedrock
Google Vertex AI
Azure AI Foundry
OpenAI Direct
Private / Ollama

Why This Pattern Works

The Model Gateway acts as an anti-corruption layer between your business logic and external AI providers. This separation delivers three strategic advantages:

  • Provider Independence: Switch providers through configuration changes, not application rewrites
  • Centralized Governance: Apply consistent security, privacy, and audit controls across all AI interactions
  • Optimized Economics: Route each workload to the most cost-effective option automatically
Without Gateway With Gateway
Provider-specific code in apps One API, any provider
Scattered governance Centralized controls
Expensive provider switching Configuration-based routing
Manual cost optimization Intelligent auto-routing
Fragmented observability Unified tracing and logs

Key Capabilities

RegRiskIQ delivers enterprise-grade AI governance through these integrated components.

🔌

Intent-Based Routing

Your applications specify what they need (regulatory analysis, risk scoring, document extraction) rather than which model to use. The gateway selects the optimal provider based on cost, latency, quality, and policy requirements.

📑

Prompt Registry IP ASSET

Your prompts encode institutional knowledge—they're intellectual property. The registry provides version control, team collaboration, A/B testing, and complete audit trails. Structured prompts reduce token usage by 60-76% while protecting your competitive advantage.

🛡

Policy Engine 9D ABAC

OPA-powered 9-dimensional attribute-based access control—beyond industry-standard 3D RBAC. Controls WHO uses AI (identity), WHAT they access (resources), HOW (actions), WHERE/WHEN (environment), WHY (purpose), WHOSE data (customer PII), aggregation levels (AML thresholds), cross-LOB data sharing, and break-glass for regulatory emergencies. Policy-as-code with full audit trails.

🔎

Provider Adapters

Each foundation model provider integrates through a dedicated adapter that normalizes request formats, response structures, error codes, and authentication patterns. Adding new providers requires only a new adapter.

📊

Observability

OpenTelemetry instrumentation provides end-to-end visibility across gateway, adapters, and providers. Track token usage, costs, latencies, and error rates per tenant, per provider, and per use case.

📚

RAG Independence

Your retrieval pipeline operates independently from model providers. Switch inference providers without re-indexing document stores or modifying retrieval logic. Your knowledge base stays portable.

💻

Everything as Code GitOps

Routing rules, policies, prompts, and provider configs live in Git. PR reviews for AI changes. Full audit trail. Rollback any configuration. CI/CD pipelines for your AI infrastructure. Environment parity from code.

🔄

Continuous Learning RLHF

Production feedback loops that improve AI performance over time. Thumbs up/down captures user satisfaction. Analytics identify false positive patterns. Automatic threshold tuning reduces noise. Every interaction teaches the system.

Routing Strategies

Strategy Optimizes For Use Case
Cost Minimize spend while meeting quality thresholds High-volume, non-critical workloads
Performance Minimize latency for interactive experiences Real-time compliance Q&A
Quality Maximize output quality for critical decisions Regulatory filing review
Hybrid Balance all factors dynamically Default for most workloads

Architectural Value Proposition

What the provider-portable architecture enables for your organization.

Unified Single API for All Providers
Isolated Provider-Specific Code Contained in Adapters
Flexible Route by Cost, Performance, or Quality
Observable OpenTelemetry Tracing Across All Requests

Strategic Advantages

Freedom of Choice

Evaluate and adopt new providers without application changes. Your business logic stays stable while AI capabilities evolve.

Optimized Economics

Route each workload to the most cost-effective option. Use premium models where quality matters, economical models where speed is sufficient.

Consistent Governance

Apply uniform security, privacy, and audit controls across all AI interactions. Meet regulatory requirements once, regardless of provider.

Future-Proofing

Architectural readiness for emerging models and providers. When the next breakthrough arrives, you adopt it through configuration.

The Hyperscalers Agree

AWS, Azure, and Google each publish reference architectures for this exact pattern—because they're competing to be YOUR abstraction layer.

AWS Reference Architecture

"Multi-Provider Generative AI Gateway" — Official AWS guidance for routing to Azure, OpenAI, and other providers through an AWS-hosted LiteLLM gateway on ECS/EKS.

AWS Solutions Library

Azure API Management

"AI Gateway" with native Bedrock support — Microsoft's answer: use Azure APIM to govern AWS Bedrock and non-Microsoft AI providers from your Azure control plane.

Microsoft Learn

Google Vertex AI

Model Garden with multi-provider serving — Google's unified platform supporting Anthropic Claude, Meta Llama, and partner models alongside Gemini.

Google Cloud
💡

The Strategic Takeaway

Each hyperscaler wants to be your gateway to all the others. Our architecture gives you this pattern without the platform lock-in—your gateway runs where YOU choose, not where your cloud vendor prefers.

Enterprise Authority Management

OPA-powered governance that goes far beyond standard access control—9 dimensions of context-aware policy enforcement for regulated financial services.

9-Dimensional ABAC vs Industry Standard

Industry Standard (3D RBAC)

  • ✓ Subject (Who)
  • ✓ Resource (What)
  • ✓ Action (Read/Write/Delete)

Coarse-grained, context-blind

Our Implementation (9D ABAC)

  • ✓ Subject, Resource, Action (base)
  • ✓ Environment (trading floor, VPN, device)
  • ✓ Purpose (operations, audit, reg reporting)
  • ✓ Data Subject (customer tier, consent)
  • ✓ Aggregation (AML thresholds, PII masking)
  • ✓ Cross-LOB (wealth → retail data sharing)
  • ✓ Emergency (reg exam, fraud investigation)

Context-aware, purpose-driven

Real-World Banking Scenarios

📊

Trading Floor Analyst

Scenario: Analyst needs AI-generated market insights during trading hours

9D Decision: Environment=trading_floor + time=market_hours + role=analyst → real-time data, no PII, audit logged

environment.location == "trading_floor" → market_data_only
🔍

AML Compliance Officer

Scenario: AI flags suspicious transaction pattern, needs customer history

9D Decision: Purpose=AML_investigation + SAR_filed → full transaction history, SSN masked unless SAR threshold met

purpose.type == "aml" && sar_threshold_met → full_pii
🏛

Regulatory Examiner Access

Scenario: OCC examiner requests AI model documentation and decisions

9D Decision: Emergency=reg_exam + examiner_credentials_verified → full model access, all decisions, complete audit trail

emergency.type == "regulatory_exam" → full_transparency
💲

Cross-LOB Data Request

Scenario: Wealth management AI needs retail banking transaction data for holistic customer view

9D Decision: Cross_LOB=wealth→retail + approved_use_case + data_sharing_agreement → aggregated view only, no raw transactions

cross_lob.approved && dsa_active → aggregated_view

Policy-as-Code (OPA Rego) - Banking GRC

# AML investigation grants elevated access with audit
allowed_fields[field] if {
    input.purpose.type == "aml_investigation"
    input.user.role == "compliance_officer"
    input.case.sar_filed == true
    field := customer_pii_fields[_]  # Full PII for filed SAR
}

# Trading floor restricts to market data only
allowed_fields[field] if {
    input.environment.location == "trading_floor"
    time_in_market_hours(input.environment.timestamp)
    field in {"ticker", "price", "volume", "sentiment_score"}
    not field in customer_pii_fields
}

# Cross-LOB data sharing requires approved use case
cross_lob_access_allowed if {
    input.cross_lob.source_lob != input.cross_lob.requester_lob
    approved_cross_lob_use_case(input.purpose.type)
    data_sharing_agreement_active(input.cross_lob.source_lob, input.cross_lob.requester_lob)
}

# Regulatory exam break-glass with mandatory audit
allowed_fields[field] if {
    input.emergency.type == "regulatory_exam"
    input.emergency.examiner_credentials_verified == true
    field := all_fields[_]  # Full transparency for regulators
    # Mandatory: log_regulatory_access(input)
}

Every policy decision is auditable, version-controlled, and reviewable through standard PR workflows. Integrates with existing NIST 800-53 AC and AU control families.

AI Governance Program

63 purpose-built AI controls across 14 domains—designed to complement your existing controls and satisfy emerging AI regulations.

Multi-Framework Integration

🏛
NIST 800-53
143 controls (MODERATE)
Your existing baseline
🤖
NIST AI RMF
Govern, Map, Measure, Manage
AI-specific risk management
🌎
EU AI Act
Risk classification + requirements
Global operations ready
📋
ISO 42001
AI management system
Certification pathway

71% of our AI controls map directly to or extend existing NIST 800-53 controls—rationalizing with your current investments.

63 Controls Across 14 Domains

GO
Governance & Leadership
5 controls
RM
Risk Management
5 controls
RO
Regulatory Oversight
5 controls
LC
Lifecycle Management
5 controls
SE
Security
5 controls
RS
Responsible AI
5 controls (bias, fairness)
GA
General Assessment
4 controls
PR
Privacy
4 controls
AA
Assessment & Assurance
5 controls
OM
Operational Management
4 controls
TP
Third-Party Management
4 controls
CO
Communications
4 controls
IM
Incident Management
4 controls
PL
Planning
4 controls

Governance Per AI Use Case

Each AI use case gets its own governance envelope—risk classification, applicable controls, evidence requirements, and monitoring metrics.

🎯

Risk Classification

EU AI Act tiers: Minimal, Limited, High, Unacceptable. Controls scale with risk.

📋

Control Mapping

Auto-mapped to NIST 800-53, AI RMF, ISO 42001. Gap analysis against your existing controls.

📈

Continuous Monitoring

Real-time dashboards: bias drift (RS-5), model performance (OM-1), explainability (RS-4).

Sample Control Mappings

AI Control Description Maps To Evidence
GO-1 AI Governance System Implementation NIST PM-1, ISO 42001 5.1 Charter, RACI, meeting minutes
RS-5 Fairness & Bias Management NIST AI RMF MEASURE 2.10, EU AI Act Art. 10 Bias testing reports, demographic parity metrics
RS-4 Explainability Requirements EU AI Act Art. 13, ISO 42001 8.4 SHAP/LIME outputs, decision audit logs
OM-1 Performance Monitoring NIST CA-7, ISO 42001 9.1 Model accuracy dashboards, drift alerts
TP-2 Third-Party Model Validation NIST SA-9, EU AI Act Art. 28 Vendor assessments, model cards

Continuous Improvement Architecture

Production AI systems that learn and improve from every interaction—closing the loop from deployment to enhancement.

The Learning Loop

👤
User Interaction
🤖
AI Response
👍👎
Feedback
📈
Analytics
Tuning

↻ Continuous cycle improves accuracy over time

👍

Contextual Feedback

Thumbs up/down on every AI response. Star ratings for detailed assessment. Context captured: user role, intent, provider used, latency.

📈

Analytics Dashboard

Satisfaction trends by provider, intent, and persona. False positive rates. Time-series visualization. Executive reporting exports.

🎯

False Positive Detection

Pattern analysis identifies alerts consistently marked "not helpful". Thresholds auto-tune. Alert fatigue eliminated.

🧠

Reinforcement Signals

Thumbs up/down feeds into provider selection. High-satisfaction providers get preferred routing. Poor performers downweighted.

🔔

Intelligent Notifications

Severity-based routing: critical → Slack + email. Rate limiting. Quiet hours. Weekly digests. SLA tracking.

🔄

BPMN Workflows

Feedback triggers improvement processes. Critical bugs → 2-hour SLA. UX issues → design review. Structured escalation.

Feedback Service Architecture

# Feedback logged for every AI interaction
await feedback_client.log_copilot_event({
    "event_type": "insight_feedback",
    "persona": current_user.role,
    "user_feedback": "helpful",  # or "not_helpful"
    "provider": selected_provider,
    "latency_ms": response_time,
    "governance_control": "OM-1"  # Links to AI governance
})

# Pattern analysis → threshold tuning
patterns = await feedback_client.analyze_alert_patterns()
if control_false_positive_rate > 0.3:
    adjust_alert_threshold(control_id, direction="less_sensitive")

Own Your AI Future

This isn't just a pattern to implement. It's strategic infrastructure you should own.

"The cloud era rewarded efficiency. The AI era rewards sovereignty."

— Deloitte Tech Trends 2026

💰

Negotiate from Strength

When you own your gateway, switching providers is a configuration change—not a migration project. Vendors know this. Your procurement leverage is fundamentally different.

🔒

Protect Your IP

Your prompts encode institutional knowledge—regulatory interpretations, risk frameworks, compliance logic. These are living knowledge assets. They belong in YOUR systems, version-controlled and auditable.

📈

Control Your Economics

Route expensive tasks to cost-effective models. Cache responses. Batch requests. These optimizations compound. When you own the gateway, the savings flow to you—not to a platform vendor.

The Build vs. Buy Framework

✓ Buy from Vendors When:

  • Commodity tasks (note-taking, basic Q&A)
  • Speed to market within a quarter
  • Compliance checkboxes, not competitive edge
  • No proprietary data or logic involved

★ Build/Own When:

  • AI drives competitive differentiation
  • Proprietary data, prompts, or logic
  • Multi-provider flexibility is strategic
  • Long-term cost control matters

⚡ Lightweight Infrastructure, Heavy Leverage

YOUR GATEWAY
Lightweight

Routing, policy, orchestration. Standard compute. No GPUs. Scales horizontally.

PROVIDER INFERENCE
Heavy (They Pay)

GPU clusters, TPUs, model hosting. You pay per-token, not per-GPU-hour.

Owning the gateway is feasible because you're building orchestration, not GPU infrastructure.

🎯

Our Recommendation

For regulated financial services, AI infrastructure is strategic infrastructure. The hyperscalers want to be YOUR gateway because control equals leverage. We recommend you own this layer—deployed in your environment, integrated with your governance, evolving with your needs.

Ready for the Agentic Future

96% of IT leaders plan to expand AI agents in 2025. Your gateway is the control plane.

The Shift Happening Now

66%

of enterprise AI implementations now use multi-agent architectures rather than single-model approaches.

Source: Arcade.dev Agentic Framework Adoption Report 2025

Why Governance is Critical

75%

of technology leaders list governance as their primary concern when deploying agentic AI.

Gartner projects 40% of agent projects will fail by 2027 due to inadequate controls.

Gateway as Agent Control Plane

When agents orchestrate other agents, every model call flows through your gateway. This gives you:

🤖

Agent Identity & Authorization

Each agent authenticates separately. Apply different policies, rate limits, and model access per agent type.

💰

Per-Agent Cost Allocation

Track token usage and costs per agent. Identify runaway agents before they impact budgets.

🛡

Tool Use Governance

Control which agents can invoke which tools. Enforce approval workflows for sensitive operations.

🔍

Complete Audit Trail

Every agent-to-model interaction logged. Trace decisions back through the agent chain for compliance.

🚀

Future-Proofing Your Investment

By 2028, Gartner predicts 58% of business functions will have AI agents managing at least one process daily. The gateway you build today becomes the governance layer for your agentic future. The architecture is ready—the agent orchestration capabilities plug directly into the existing routing and policy infrastructure.

RegRiskIQ Implementation Status

Core infrastructure complete. Phase 2 enhancements in the backlog.

What's Been Built

Core Model Abstraction Layer

  • BaseModelProvider abstract interface with unified contract
  • 8 model capabilities tracked (chat, embeddings, function calling, streaming, vision, code generation, structured output, audio)
  • Built-in metrics: cost, latency, token usage, error rates
Click for details →

5 Provider Implementations

OpenAIGPT-5, o3, o4-mini, GPT-4.1, embeddings
Azure OpenAIEnterprise deployments, Managed Identity
AWS BedrockNova 2, Claude 4.5, Llama 4, Mistral Large 3
Google VertexGemini 3 Flash/Pro, Model Garden
OllamaLocal models, zero API cost
Click for details →

Intelligent Routing Engine

  • 8 routing strategies with configurable weights
  • Quality scoring for 30+ models
  • Automatic fallback chains for high availability
  • Response caching (TTL-based)
Click for details →

Phase 2 Enhancements

Provider Equivalence Testing HIGH

  • Golden evaluation datasets (50+ queries per intent)
  • Batch evaluation runner with quality metrics
  • Provider comparison reports with go/no-go recommendations

Business Value: Enables confident provider switching with quality assurance

Click for details →

Multi-Tenant Gateway MEDIUM

  • Tenant ID extraction from JWT claims
  • Per-tenant rate limiting and provider preferences
  • Tenant isolation for compliance requirements

Business Value: Enterprise-ready multi-tenancy for SaaS deployment

Click for details →

Data Residency Routing MEDIUM

  • Region-based provider filtering
  • Per-tenant allowed_regions configuration
  • Audit logging of residency decisions

Business Value: GDPR/data sovereignty compliance for regulated industries

Click for details →

Architectural Trade-offs

Provider portability is real. But it requires intentional design and ongoing investment.

What We Navigate

Provider Differences Are Real

Tool calling semantics, JSON output reliability, token limits, streaming formats, and content safety features vary across providers. Our adapter layer handles this complexity so your applications stay clean.

Gateway Adds Latency

Every abstraction has overhead. The gateway layer adds processing time for routing, policy evaluation, and request normalization. For latency-sensitive workloads, this impact must be measured and optimized for your specific use cases.

Testing Requires Investment

Proving portability demands evaluation harnesses, golden datasets, and quality metrics across providers. We build this infrastructure as a first-class capability.

How We Mitigate

Adapter Test Harness

Each adapter includes a compatibility test suite that validates behavior against provider-specific edge cases. New provider integrations pass this harness before production.

Performance Budgets

Gateway components are designed with latency budgets in mind. We instrument each stage (policy evaluation, prompt resolution, routing) with OpenTelemetry tracing to identify and address bottlenecks. Specific targets are established during implementation based on measured baselines.

Continuous Evaluation

Automated quality regression tests run against all providers weekly. You get scorecards showing "can I switch to provider X" based on real data.

Implementation Roadmap

A structured approach to implementing the Provider-Portable AI Architecture in RegRiskIQ.

Phase 1: Foundation (Weeks 1-3)
Phase 2: Quality Assurance (Weeks 4-6)
Phase 3: Governance (Weeks 7-10)

Provider-Portable AI Architecture Implementation

Enable RegRiskIQ to leverage multiple foundation model providers without vendor lock-in

#9114 Epic

This epic delivers a complete provider-portable architecture enabling RegRiskIQ to use AWS Bedrock, Google Vertex AI, Azure AI Foundry, OpenAI, and Ollama interchangeably. The architecture separates AI consumption from AI provision through a Model Gateway pattern with centralized governance, observability, and quality assurance.

Centralized Prompt Registry

Version-controlled prompt management with provider-specific variants

#9115 Phase 1

Establish a centralized registry for managing AI prompts as versioned, deployable artifacts. This enables A/B testing, rollback capability, and provider-specific optimizations.

Create prompt_registry database schema
#9122
Acceptance Criteria (BDD)

Given a new RegRiskIQ database migration

When the migration is applied

Then a prompt_registry table exists with columns: id, name, version, template, provider_variants, created_at, is_active

And the version column has a unique constraint with name

And provider_variants is a JSONB column supporting arbitrary provider keys

Implement Prompt Registry CRUD API
#9123
Acceptance Criteria (BDD)

Given an authenticated API request

When I POST to /api/prompts with a valid prompt template

Then a new prompt is created with version 1

And the response includes the prompt ID and version

Given an existing prompt "regulatory_analysis"

When I PUT to /api/prompts/regulatory_analysis with updated content

Then a new version is created (immutable versioning)

And the previous version remains accessible

Add provider-specific prompt variants
#9124
Acceptance Criteria (BDD)

Given a prompt with provider_variants for "openai" and "anthropic"

When the ModelManager requests the prompt for provider "anthropic"

Then the anthropic-specific variant is returned

And Claude-specific syntax (Human:/Assistant:) is applied

Given a prompt without a variant for provider "bedrock"

When the ModelManager requests the prompt for provider "bedrock"

Then the default template is returned

Implement prompt version rollback
#9125
Acceptance Criteria (BDD)

Given a prompt "risk_scoring" with versions 1, 2, and 3 (active)

When I POST to /api/prompts/risk_scoring/rollback with version=2

Then version 2 becomes the active version

And an audit log entry is created with the rollback details

And subsequent AI requests use version 2

Task-Intent Routing

Route AI requests based on task intent, not just cost/performance

#9116 Phase 1

Extend the ModelManager to understand task intent (regulatory_analysis, risk_scoring, etc.) and route to the optimal provider/prompt combination based on intent-specific requirements.

Define TaskIntent enum
#9126
Acceptance Criteria (BDD)

Given the model_manager.py module

When I import TaskIntent

Then the enum contains: REGULATORY_ANALYSIS, RISK_SCORING, POLICY_SEARCH, COMPLIANCE_CHECK, DOCUMENT_SUMMARIZATION

And each intent has a string value matching its lowercase name

Extend TaskContext with intent fields
#9127
Acceptance Criteria (BDD)

Given a TaskContext dataclass

When I create a new TaskContext

Then I can specify intent, data_classification, compliance_framework, and tenant_id

And all new fields are optional with sensible defaults

And existing code using TaskContext continues to work without modification

Implement intent-to-prompt mapping
#9128
Acceptance Criteria (BDD)

Given a prompt registered with intent "regulatory_analysis"

When an AI request is made with TaskIntent.REGULATORY_ANALYSIS

Then the ModelManager automatically selects the correct prompt

And the provider-specific variant is applied based on the selected provider

Log routing decisions for audit
#9129
Acceptance Criteria (BDD)

Given any AI request processed by the ModelManager

When a provider and prompt are selected

Then an audit log entry is created with: timestamp, intent, selected_provider, selected_model, prompt_id, prompt_version, routing_strategy, tenant_id

And the audit log is queryable for compliance reporting

AI Observability Dashboard

Unified metrics visibility across all AI providers

#9117 Phase 1

Create a unified dashboard exposing AI metrics including provider performance, cost tracking, quality scores, and integration with existing Trust Architecture confidence metrics.

Aggregate ModelManager metrics
#9130
Acceptance Criteria (BDD)

Given the AI Models Service is running

When I GET /api/ai/metrics

Then I receive per-provider statistics: request_count, success_rate, avg_latency_ms, total_cost_usd, error_count

And metrics are available for all 6 providers: openai, azure, bedrock, vertex, ollama, anthropic

Integrate Trust Architecture confidence metrics
#9131
Acceptance Criteria (BDD)

Given the Trust Architecture service is processing requests

When I view the AI Observability Dashboard

Then I see TRAQ confidence scores aggregated by provider

And I see rejection rate due to low confidence thresholds

And I see citation accuracy metrics per intent

Create Grafana AI dashboard
#9132
Acceptance Criteria (BDD)

Given Grafana is deployed with the RegRiskIQ observability stack

When I navigate to the "AI Provider Performance" dashboard

Then I see panels for: latency (p50, p95, p99), cost per provider, success rate, requests per minute

And I can filter by time range, provider, and intent

Provider Equivalence Testing

Validate that switching providers maintains output quality

#9118 Phase 2

Build an evaluation harness with golden datasets to prove that switching from Provider A to Provider B does not degrade output quality beyond acceptable thresholds.

Curate golden evaluation datasets
#9133
Acceptance Criteria (BDD)

Given the evaluation_datasets table in the database

When I query for datasets by intent

Then I find at least 50 queries for each TaskIntent

And each query has a human-validated expected output

And queries are representative of production traffic patterns

Implement batch evaluation runner
#9134
Acceptance Criteria (BDD)

Given a golden dataset for "regulatory_analysis"

When I run: python -m evaluation.runner --intent regulatory_analysis --providers openai,bedrock

Then all queries are executed against both providers

And outputs are stored with provider, latency, cost, and raw response

And a comparison report is generated

Calculate quality metrics
#9135
Acceptance Criteria (BDD)

Given evaluation results from multiple providers

When quality metrics are calculated

Then semantic similarity score is computed using embedding cosine similarity

And citation accuracy is measured against expected sources

And factual consistency is evaluated using NLI models

Generate provider comparison reports
#9136
Acceptance Criteria (BDD)

Given completed evaluation runs for Provider A and Provider B

When I generate a comparison report

Then the report shows per-intent quality scores for both providers

And a go/no-go recommendation is provided based on >90% equivalence threshold

And specific failing cases are highlighted for review

End-to-End AI Tracing

OpenTelemetry-based trace propagation across all AI services

#9119 Phase 2

Implement distributed tracing that correlates requests from the UI through the API Gateway, ModelManager, and Provider Adapters, enabling full visibility into AI request lifecycles.

Add trace_id to ModelManager
#9137
Acceptance Criteria (BDD)

Given an AI request with an incoming trace context header

When the ModelManager processes the request

Then a child span is created under the parent trace

And the span includes attributes: ai.intent, ai.provider, ai.model, ai.tokens.input, ai.tokens.output

Correlate traces with audit logs
#9138
Acceptance Criteria (BDD)

Given a trace_id from an OpenTelemetry span

When I query the rag_query_audit table

Then I can find the corresponding audit entry by trace_id

And the audit entry includes the full request/response context

Add tracing to provider adapters
#9139
Acceptance Criteria (BDD)

Given a request routed to the Bedrock adapter

When the adapter calls the AWS Bedrock API

Then a child span is created with: provider.name, provider.region, provider.model, provider.latency_ms, provider.cost_usd

And errors are recorded as span events with full exception details

Tenant-Aware Gateway

Multi-tenant support with per-tenant rate limits and preferences

#9120 Phase 3

Enable multi-tenant operation of the Model Gateway with tenant isolation, per-tenant rate limiting, and tenant-specific provider preferences. Conditional on multi-tenant deployment requirements.

Add tenant_id to request context
#9140
Acceptance Criteria (BDD)

Given an authenticated request with a JWT containing tenant_id claim

When the request reaches the ModelManager

Then tenant_id is extracted and added to TaskContext

And all downstream operations are scoped to that tenant

Implement per-tenant rate limiting
#9141
Acceptance Criteria (BDD)

Given tenant "acme" has a rate limit of 100 requests/minute

When tenant "acme" sends their 101st request in a minute

Then a 429 Too Many Requests response is returned

And the response includes Retry-After header

And other tenants are not affected

Add tenant provider preferences
#9142
Acceptance Criteria (BDD)

Given tenant "acme" has preferred_providers: ["bedrock", "azure"]

When the routing engine selects a provider for tenant "acme"

Then only bedrock and azure are considered

And openai and ollama are excluded from routing decisions

Data Residency Routing

Route requests based on data residency and compliance requirements

#9121 Phase 3

Implement simple, configuration-driven routing rules to ensure data residency compliance. Providers in disallowed regions are automatically excluded from routing decisions.

Add allowed_regions to tenant config
#9143
Acceptance Criteria (BDD)

Given a tenant configuration schema

When I configure tenant "eu_bank" with allowed_regions: ["eu-west-1", "eu-central-1"]

Then the configuration is validated and stored

And the routing engine can query allowed regions for any tenant

Filter providers by region
#9144
Acceptance Criteria (BDD)

Given tenant "eu_bank" with allowed_regions: ["eu-west-1"]

And provider "bedrock" is configured for region "us-east-1"

When the routing engine selects providers

Then "bedrock" is excluded from eligible providers

And only EU-region providers are considered

Log data residency decisions
#9145
Acceptance Criteria (BDD)

Given a routing decision that excludes providers due to data residency

When the decision is logged

Then the audit log includes: tenant_id, allowed_regions, excluded_providers, selected_provider, reason

And compliance officers can query all residency-based routing decisions

Ready to Move Forward?

Your path to provider-portable AI compliance starts with a structured engagement.

1
Architecture Review
with Technical Teams
2
Pilot Deployment
Scope Definition
3
Provider & Policy
Configuration
4
Production
Deployment