Provider-Portable AI Architecture

Your path to AI-powered compliance without vendor lock-in. Use Amazon Bedrock today, Google Vertex AI tomorrow, Azure AI Foundry next week, or all at once.

5 Provider Adapters
1 Unified API
3 Routing Strategies
Zero Vendor Lock-in

The Strategic Challenge

You face a fundamental tension in AI adoption for regulatory compliance.

Immediate Value Delivery

You need to leverage cutting-edge AI capabilities for compliance automation, risk assessment, and regulatory intelligence today. Waiting means falling behind competitors and increasing regulatory exposure.

🔒

Strategic Flexibility

You require the freedom to choose the best provider for each workload, switch providers as pricing and capabilities evolve, and avoid lock-in that constrains future technology decisions.

🛡

Governance Requirements

Your organization demands consistent security, privacy, and audit controls regardless of which AI provider processes requests. Compliance cannot be an afterthought.

💰

Cost Optimization

You want to route workloads to the most cost-effective provider without sacrificing quality. Different tasks demand different models, and your architecture should support intelligent routing.

The Model Gateway Architecture

Your applications never communicate directly with foundation model providers. All AI interactions flow through a Model Gateway you own and control.

Your Applications
Compliance Automation
Risk Assessment
Regulatory Monitor
RAG Copilot
Model Gateway
Unified API Contract • Intent-Based Routing • Policy Engine • Prompt Registry • Safety Controls • Observability
Providers
Amazon Bedrock
Google Vertex AI
Azure AI Foundry
OpenAI Direct
Private / Ollama

Why This Pattern Works

The Model Gateway acts as an anti-corruption layer between your business logic and external AI providers. This separation delivers three strategic advantages:

  • Provider Independence: Switch providers through configuration changes, not application rewrites
  • Centralized Governance: Apply consistent security, privacy, and audit controls across all AI interactions
  • Optimized Economics: Route each workload to the most cost-effective option automatically
Without Gateway With Gateway
Provider-specific code in apps One API, any provider
Scattered governance Centralized controls
Expensive provider switching Configuration-based routing
Manual cost optimization Intelligent auto-routing
Fragmented observability Unified tracing and logs

Key Capabilities

RegRiskIQ delivers enterprise-grade AI governance through these integrated components.

🔌

Intent-Based Routing

Your applications specify what they need (regulatory analysis, risk scoring, document extraction) rather than which model to use. The gateway selects the optimal provider based on cost, latency, quality, and policy requirements.

📑

Prompt Registry

Prompts become versioned, deployable artifacts stored in a central registry. Update prompts without changing application code. Test new versions before production rollout. Maintain audit trails of prompt changes.

🛡

Policy Engine

Enforce tenant isolation, data residency requirements, guardrails, and rate limits at the gateway level. Policies apply consistently across all AI interactions regardless of provider.

🔎

Provider Adapters

Each foundation model provider integrates through a dedicated adapter that normalizes request formats, response structures, error codes, and authentication patterns. Adding new providers requires only a new adapter.

📊

Observability

OpenTelemetry instrumentation provides end-to-end visibility across gateway, adapters, and providers. Track token usage, costs, latencies, and error rates per tenant, per provider, and per use case.

📚

RAG Independence

Your retrieval pipeline operates independently from model providers. Switch inference providers without re-indexing document stores or modifying retrieval logic. Your knowledge base stays portable.

Routing Strategies

Strategy Optimizes For Use Case
Cost Minimize spend while meeting quality thresholds High-volume, non-critical workloads
Performance Minimize latency for interactive experiences Real-time compliance Q&A
Quality Maximize output quality for critical decisions Regulatory filing review
Hybrid Balance all factors dynamically Default for most workloads

Architectural Value Proposition

What the provider-portable architecture enables for your organization.

Unified Single API for All Providers
Isolated Provider-Specific Code Contained in Adapters
Flexible Route by Cost, Performance, or Quality
Observable OpenTelemetry Tracing Across All Requests

Strategic Advantages

Freedom of Choice

Evaluate and adopt new providers without application changes. Your business logic stays stable while AI capabilities evolve.

Optimized Economics

Route each workload to the most cost-effective option. Use premium models where quality matters, economical models where speed is sufficient.

Consistent Governance

Apply uniform security, privacy, and audit controls across all AI interactions. Meet regulatory requirements once, regardless of provider.

Future-Proofing

Architectural readiness for emerging models and providers. When the next breakthrough arrives, you adopt it through configuration.

The Hyperscalers Agree

AWS, Azure, and Google each publish reference architectures for this exact pattern—because they're competing to be YOUR abstraction layer.

AWS Reference Architecture

"Multi-Provider Generative AI Gateway" — Official AWS guidance for routing to Azure, OpenAI, and other providers through an AWS-hosted LiteLLM gateway on ECS/EKS.

AWS Solutions Library

Azure API Management

"AI Gateway" with native Bedrock support — Microsoft's answer: use Azure APIM to govern AWS Bedrock and non-Microsoft AI providers from your Azure control plane.

Microsoft Learn

Google Vertex AI

Model Garden with multi-provider serving — Google's unified platform supporting Anthropic Claude, Meta Llama, and partner models alongside Gemini.

Google Cloud
💡

The Strategic Takeaway

Each hyperscaler wants to be your gateway to all the others. Our architecture gives you this pattern without the platform lock-in—your gateway runs where YOU choose, not where your cloud vendor prefers.

RegRiskIQ Implementation Status

Core infrastructure complete. Phase 2 enhancements in the backlog.

What's Been Built

Core Model Abstraction Layer

  • BaseModelProvider abstract interface with unified contract
  • 8 model capabilities tracked (chat, embeddings, function calling, streaming, vision, code generation, structured output, audio)
  • Built-in metrics: cost, latency, token usage, error rates
Click for details →

5 Provider Implementations

OpenAIGPT-4, GPT-3.5, embeddings with streaming
Azure OpenAIEnterprise deployments, Managed Identity
AWS BedrockClaude, Titan, Llama, Cohere (13+ models)
Google VertexGemini, PaLM, Model Garden
OllamaLocal models, zero API cost
Click for details →

Intelligent Routing Engine

  • 8 routing strategies with configurable weights
  • Quality scoring for 30+ models
  • Automatic fallback chains for high availability
  • Response caching (TTL-based)
Click for details →

Phase 2 Enhancements

Provider Equivalence Testing HIGH

  • Golden evaluation datasets (50+ queries per intent)
  • Batch evaluation runner with quality metrics
  • Provider comparison reports with go/no-go recommendations

Business Value: Enables confident provider switching with quality assurance

Click for details →

Multi-Tenant Gateway MEDIUM

  • Tenant ID extraction from JWT claims
  • Per-tenant rate limiting and provider preferences
  • Tenant isolation for compliance requirements

Business Value: Enterprise-ready multi-tenancy for SaaS deployment

Click for details →

Data Residency Routing MEDIUM

  • Region-based provider filtering
  • Per-tenant allowed_regions configuration
  • Audit logging of residency decisions

Business Value: GDPR/data sovereignty compliance for regulated industries

Click for details →

Architectural Trade-offs

Provider portability is real. But it requires intentional design and ongoing investment.

What We Navigate

Provider Differences Are Real

Tool calling semantics, JSON output reliability, token limits, streaming formats, and content safety features vary across providers. Our adapter layer handles this complexity so your applications stay clean.

Gateway Adds Latency

Every abstraction has overhead. The gateway layer adds processing time for routing, policy evaluation, and request normalization. For latency-sensitive workloads, this impact must be measured and optimized for your specific use cases.

Testing Requires Investment

Proving portability demands evaluation harnesses, golden datasets, and quality metrics across providers. We build this infrastructure as a first-class capability.

How We Mitigate

Adapter Test Harness

Each adapter includes a compatibility test suite that validates behavior against provider-specific edge cases. New provider integrations pass this harness before production.

Performance Budgets

Gateway components are designed with latency budgets in mind. We instrument each stage (policy evaluation, prompt resolution, routing) with OpenTelemetry tracing to identify and address bottlenecks. Specific targets are established during implementation based on measured baselines.

Continuous Evaluation

Automated quality regression tests run against all providers weekly. You get scorecards showing "can I switch to provider X" based on real data.

Implementation Roadmap

A structured approach to implementing the Provider-Portable AI Architecture in RegRiskIQ.

Phase 1: Foundation (Weeks 1-3)
Phase 2: Quality Assurance (Weeks 4-6)
Phase 3: Governance (Weeks 7-10)

Provider-Portable AI Architecture Implementation

Enable RegRiskIQ to leverage multiple foundation model providers without vendor lock-in

#9114 Epic

This epic delivers a complete provider-portable architecture enabling RegRiskIQ to use AWS Bedrock, Google Vertex AI, Azure AI Foundry, OpenAI, and Ollama interchangeably. The architecture separates AI consumption from AI provision through a Model Gateway pattern with centralized governance, observability, and quality assurance.

Centralized Prompt Registry

Version-controlled prompt management with provider-specific variants

#9115 Phase 1

Establish a centralized registry for managing AI prompts as versioned, deployable artifacts. This enables A/B testing, rollback capability, and provider-specific optimizations.

Create prompt_registry database schema
#9122
Acceptance Criteria (BDD)

Given a new RegRiskIQ database migration

When the migration is applied

Then a prompt_registry table exists with columns: id, name, version, template, provider_variants, created_at, is_active

And the version column has a unique constraint with name

And provider_variants is a JSONB column supporting arbitrary provider keys

Implement Prompt Registry CRUD API
#9123
Acceptance Criteria (BDD)

Given an authenticated API request

When I POST to /api/prompts with a valid prompt template

Then a new prompt is created with version 1

And the response includes the prompt ID and version

Given an existing prompt "regulatory_analysis"

When I PUT to /api/prompts/regulatory_analysis with updated content

Then a new version is created (immutable versioning)

And the previous version remains accessible

Add provider-specific prompt variants
#9124
Acceptance Criteria (BDD)

Given a prompt with provider_variants for "openai" and "anthropic"

When the ModelManager requests the prompt for provider "anthropic"

Then the anthropic-specific variant is returned

And Claude-specific syntax (Human:/Assistant:) is applied

Given a prompt without a variant for provider "bedrock"

When the ModelManager requests the prompt for provider "bedrock"

Then the default template is returned

Implement prompt version rollback
#9125
Acceptance Criteria (BDD)

Given a prompt "risk_scoring" with versions 1, 2, and 3 (active)

When I POST to /api/prompts/risk_scoring/rollback with version=2

Then version 2 becomes the active version

And an audit log entry is created with the rollback details

And subsequent AI requests use version 2

Task-Intent Routing

Route AI requests based on task intent, not just cost/performance

#9116 Phase 1

Extend the ModelManager to understand task intent (regulatory_analysis, risk_scoring, etc.) and route to the optimal provider/prompt combination based on intent-specific requirements.

Define TaskIntent enum
#9126
Acceptance Criteria (BDD)

Given the model_manager.py module

When I import TaskIntent

Then the enum contains: REGULATORY_ANALYSIS, RISK_SCORING, POLICY_SEARCH, COMPLIANCE_CHECK, DOCUMENT_SUMMARIZATION

And each intent has a string value matching its lowercase name

Extend TaskContext with intent fields
#9127
Acceptance Criteria (BDD)

Given a TaskContext dataclass

When I create a new TaskContext

Then I can specify intent, data_classification, compliance_framework, and tenant_id

And all new fields are optional with sensible defaults

And existing code using TaskContext continues to work without modification

Implement intent-to-prompt mapping
#9128
Acceptance Criteria (BDD)

Given a prompt registered with intent "regulatory_analysis"

When an AI request is made with TaskIntent.REGULATORY_ANALYSIS

Then the ModelManager automatically selects the correct prompt

And the provider-specific variant is applied based on the selected provider

Log routing decisions for audit
#9129
Acceptance Criteria (BDD)

Given any AI request processed by the ModelManager

When a provider and prompt are selected

Then an audit log entry is created with: timestamp, intent, selected_provider, selected_model, prompt_id, prompt_version, routing_strategy, tenant_id

And the audit log is queryable for compliance reporting

AI Observability Dashboard

Unified metrics visibility across all AI providers

#9117 Phase 1

Create a unified dashboard exposing AI metrics including provider performance, cost tracking, quality scores, and integration with existing Trust Architecture confidence metrics.

Aggregate ModelManager metrics
#9130
Acceptance Criteria (BDD)

Given the AI Models Service is running

When I GET /api/ai/metrics

Then I receive per-provider statistics: request_count, success_rate, avg_latency_ms, total_cost_usd, error_count

And metrics are available for all 6 providers: openai, azure, bedrock, vertex, ollama, anthropic

Integrate Trust Architecture confidence metrics
#9131
Acceptance Criteria (BDD)

Given the Trust Architecture service is processing requests

When I view the AI Observability Dashboard

Then I see TRAQ confidence scores aggregated by provider

And I see rejection rate due to low confidence thresholds

And I see citation accuracy metrics per intent

Create Grafana AI dashboard
#9132
Acceptance Criteria (BDD)

Given Grafana is deployed with the RegRiskIQ observability stack

When I navigate to the "AI Provider Performance" dashboard

Then I see panels for: latency (p50, p95, p99), cost per provider, success rate, requests per minute

And I can filter by time range, provider, and intent

Provider Equivalence Testing

Validate that switching providers maintains output quality

#9118 Phase 2

Build an evaluation harness with golden datasets to prove that switching from Provider A to Provider B does not degrade output quality beyond acceptable thresholds.

Curate golden evaluation datasets
#9133
Acceptance Criteria (BDD)

Given the evaluation_datasets table in the database

When I query for datasets by intent

Then I find at least 50 queries for each TaskIntent

And each query has a human-validated expected output

And queries are representative of production traffic patterns

Implement batch evaluation runner
#9134
Acceptance Criteria (BDD)

Given a golden dataset for "regulatory_analysis"

When I run: python -m evaluation.runner --intent regulatory_analysis --providers openai,bedrock

Then all queries are executed against both providers

And outputs are stored with provider, latency, cost, and raw response

And a comparison report is generated

Calculate quality metrics
#9135
Acceptance Criteria (BDD)

Given evaluation results from multiple providers

When quality metrics are calculated

Then semantic similarity score is computed using embedding cosine similarity

And citation accuracy is measured against expected sources

And factual consistency is evaluated using NLI models

Generate provider comparison reports
#9136
Acceptance Criteria (BDD)

Given completed evaluation runs for Provider A and Provider B

When I generate a comparison report

Then the report shows per-intent quality scores for both providers

And a go/no-go recommendation is provided based on >90% equivalence threshold

And specific failing cases are highlighted for review

End-to-End AI Tracing

OpenTelemetry-based trace propagation across all AI services

#9119 Phase 2

Implement distributed tracing that correlates requests from the UI through the API Gateway, ModelManager, and Provider Adapters, enabling full visibility into AI request lifecycles.

Add trace_id to ModelManager
#9137
Acceptance Criteria (BDD)

Given an AI request with an incoming trace context header

When the ModelManager processes the request

Then a child span is created under the parent trace

And the span includes attributes: ai.intent, ai.provider, ai.model, ai.tokens.input, ai.tokens.output

Correlate traces with audit logs
#9138
Acceptance Criteria (BDD)

Given a trace_id from an OpenTelemetry span

When I query the rag_query_audit table

Then I can find the corresponding audit entry by trace_id

And the audit entry includes the full request/response context

Add tracing to provider adapters
#9139
Acceptance Criteria (BDD)

Given a request routed to the Bedrock adapter

When the adapter calls the AWS Bedrock API

Then a child span is created with: provider.name, provider.region, provider.model, provider.latency_ms, provider.cost_usd

And errors are recorded as span events with full exception details

Tenant-Aware Gateway

Multi-tenant support with per-tenant rate limits and preferences

#9120 Phase 3

Enable multi-tenant operation of the Model Gateway with tenant isolation, per-tenant rate limiting, and tenant-specific provider preferences. Conditional on multi-tenant deployment requirements.

Add tenant_id to request context
#9140
Acceptance Criteria (BDD)

Given an authenticated request with a JWT containing tenant_id claim

When the request reaches the ModelManager

Then tenant_id is extracted and added to TaskContext

And all downstream operations are scoped to that tenant

Implement per-tenant rate limiting
#9141
Acceptance Criteria (BDD)

Given tenant "acme" has a rate limit of 100 requests/minute

When tenant "acme" sends their 101st request in a minute

Then a 429 Too Many Requests response is returned

And the response includes Retry-After header

And other tenants are not affected

Add tenant provider preferences
#9142
Acceptance Criteria (BDD)

Given tenant "acme" has preferred_providers: ["bedrock", "azure"]

When the routing engine selects a provider for tenant "acme"

Then only bedrock and azure are considered

And openai and ollama are excluded from routing decisions

Data Residency Routing

Route requests based on data residency and compliance requirements

#9121 Phase 3

Implement simple, configuration-driven routing rules to ensure data residency compliance. Providers in disallowed regions are automatically excluded from routing decisions.

Add allowed_regions to tenant config
#9143
Acceptance Criteria (BDD)

Given a tenant configuration schema

When I configure tenant "eu_bank" with allowed_regions: ["eu-west-1", "eu-central-1"]

Then the configuration is validated and stored

And the routing engine can query allowed regions for any tenant

Filter providers by region
#9144
Acceptance Criteria (BDD)

Given tenant "eu_bank" with allowed_regions: ["eu-west-1"]

And provider "bedrock" is configured for region "us-east-1"

When the routing engine selects providers

Then "bedrock" is excluded from eligible providers

And only EU-region providers are considered

Log data residency decisions
#9145
Acceptance Criteria (BDD)

Given a routing decision that excludes providers due to data residency

When the decision is logged

Then the audit log includes: tenant_id, allowed_regions, excluded_providers, selected_provider, reason

And compliance officers can query all residency-based routing decisions

Ready to Move Forward?

Your path to provider-portable AI compliance starts with a structured engagement.

1
Architecture Review
with Technical Teams
2
Pilot Deployment
Scope Definition
3
Provider & Policy
Configuration
4
Production
Deployment