Your path to AI-powered compliance without vendor lock-in. Use Amazon Bedrock today, Google Vertex AI tomorrow, Azure AI Foundry next week, or all at once.
You face a fundamental tension in AI adoption for regulatory compliance.
You need to leverage cutting-edge AI capabilities for compliance automation, risk assessment, and regulatory intelligence today. Waiting means falling behind competitors and increasing regulatory exposure.
You require the freedom to choose the best provider for each workload, switch providers as pricing and capabilities evolve, and avoid lock-in that constrains future technology decisions.
Your organization demands consistent security, privacy, and audit controls regardless of which AI provider processes requests. Compliance cannot be an afterthought.
You want to route workloads to the most cost-effective provider without sacrificing quality. Different tasks demand different models, and your architecture should support intelligent routing.
Your applications never communicate directly with foundation model providers. All AI interactions flow through a Model Gateway you own and control.
The Model Gateway acts as an anti-corruption layer between your business logic and external AI providers. This separation delivers three strategic advantages:
| Without Gateway | With Gateway |
|---|---|
| Provider-specific code in apps | One API, any provider |
| Scattered governance | Centralized controls |
| Expensive provider switching | Configuration-based routing |
| Manual cost optimization | Intelligent auto-routing |
| Fragmented observability | Unified tracing and logs |
RegRiskIQ delivers enterprise-grade AI governance through these integrated components.
Your applications specify what they need (regulatory analysis, risk scoring, document extraction) rather than which model to use. The gateway selects the optimal provider based on cost, latency, quality, and policy requirements.
Prompts become versioned, deployable artifacts stored in a central registry. Update prompts without changing application code. Test new versions before production rollout. Maintain audit trails of prompt changes.
Enforce tenant isolation, data residency requirements, guardrails, and rate limits at the gateway level. Policies apply consistently across all AI interactions regardless of provider.
Each foundation model provider integrates through a dedicated adapter that normalizes request formats, response structures, error codes, and authentication patterns. Adding new providers requires only a new adapter.
OpenTelemetry instrumentation provides end-to-end visibility across gateway, adapters, and providers. Track token usage, costs, latencies, and error rates per tenant, per provider, and per use case.
Your retrieval pipeline operates independently from model providers. Switch inference providers without re-indexing document stores or modifying retrieval logic. Your knowledge base stays portable.
| Strategy | Optimizes For | Use Case |
|---|---|---|
| Cost | Minimize spend while meeting quality thresholds | High-volume, non-critical workloads |
| Performance | Minimize latency for interactive experiences | Real-time compliance Q&A |
| Quality | Maximize output quality for critical decisions | Regulatory filing review |
| Hybrid | Balance all factors dynamically | Default for most workloads |
What the provider-portable architecture enables for your organization.
Evaluate and adopt new providers without application changes. Your business logic stays stable while AI capabilities evolve.
Route each workload to the most cost-effective option. Use premium models where quality matters, economical models where speed is sufficient.
Apply uniform security, privacy, and audit controls across all AI interactions. Meet regulatory requirements once, regardless of provider.
Architectural readiness for emerging models and providers. When the next breakthrough arrives, you adopt it through configuration.
AWS, Azure, and Google each publish reference architectures for this exact pattern—because they're competing to be YOUR abstraction layer.
"Multi-Provider Generative AI Gateway" — Official AWS guidance for routing to Azure, OpenAI, and other providers through an AWS-hosted LiteLLM gateway on ECS/EKS.
AWS Solutions Library"AI Gateway" with native Bedrock support — Microsoft's answer: use Azure APIM to govern AWS Bedrock and non-Microsoft AI providers from your Azure control plane.
Microsoft LearnModel Garden with multi-provider serving — Google's unified platform supporting Anthropic Claude, Meta Llama, and partner models alongside Gemini.
Google CloudEach hyperscaler wants to be your gateway to all the others. Our architecture gives you this pattern without the platform lock-in—your gateway runs where YOU choose, not where your cloud vendor prefers.
Core infrastructure complete. Phase 2 enhancements in the backlog.
| OpenAI | GPT-4, GPT-3.5, embeddings with streaming |
| Azure OpenAI | Enterprise deployments, Managed Identity |
| AWS Bedrock | Claude, Titan, Llama, Cohere (13+ models) |
| Google Vertex | Gemini, PaLM, Model Garden |
| Ollama | Local models, zero API cost |
Business Value: Enables confident provider switching with quality assurance
Click for details →Business Value: Enterprise-ready multi-tenancy for SaaS deployment
Click for details →Business Value: GDPR/data sovereignty compliance for regulated industries
Click for details →Provider portability is real. But it requires intentional design and ongoing investment.
Tool calling semantics, JSON output reliability, token limits, streaming formats, and content safety features vary across providers. Our adapter layer handles this complexity so your applications stay clean.
Every abstraction has overhead. The gateway layer adds processing time for routing, policy evaluation, and request normalization. For latency-sensitive workloads, this impact must be measured and optimized for your specific use cases.
Proving portability demands evaluation harnesses, golden datasets, and quality metrics across providers. We build this infrastructure as a first-class capability.
Each adapter includes a compatibility test suite that validates behavior against provider-specific edge cases. New provider integrations pass this harness before production.
Gateway components are designed with latency budgets in mind. We instrument each stage (policy evaluation, prompt resolution, routing) with OpenTelemetry tracing to identify and address bottlenecks. Specific targets are established during implementation based on measured baselines.
Automated quality regression tests run against all providers weekly. You get scorecards showing "can I switch to provider X" based on real data.
A structured approach to implementing the Provider-Portable AI Architecture in RegRiskIQ.
Enable RegRiskIQ to leverage multiple foundation model providers without vendor lock-in
This epic delivers a complete provider-portable architecture enabling RegRiskIQ to use AWS Bedrock, Google Vertex AI, Azure AI Foundry, OpenAI, and Ollama interchangeably. The architecture separates AI consumption from AI provision through a Model Gateway pattern with centralized governance, observability, and quality assurance.
Version-controlled prompt management with provider-specific variants
Establish a centralized registry for managing AI prompts as versioned, deployable artifacts. This enables A/B testing, rollback capability, and provider-specific optimizations.
Given a new RegRiskIQ database migration
When the migration is applied
Then a prompt_registry table exists with columns: id, name, version, template, provider_variants, created_at, is_active
And the version column has a unique constraint with name
And provider_variants is a JSONB column supporting arbitrary provider keys
Given an authenticated API request
When I POST to /api/prompts with a valid prompt template
Then a new prompt is created with version 1
And the response includes the prompt ID and version
Given an existing prompt "regulatory_analysis"
When I PUT to /api/prompts/regulatory_analysis with updated content
Then a new version is created (immutable versioning)
And the previous version remains accessible
Given a prompt with provider_variants for "openai" and "anthropic"
When the ModelManager requests the prompt for provider "anthropic"
Then the anthropic-specific variant is returned
And Claude-specific syntax (Human:/Assistant:) is applied
Given a prompt without a variant for provider "bedrock"
When the ModelManager requests the prompt for provider "bedrock"
Then the default template is returned
Given a prompt "risk_scoring" with versions 1, 2, and 3 (active)
When I POST to /api/prompts/risk_scoring/rollback with version=2
Then version 2 becomes the active version
And an audit log entry is created with the rollback details
And subsequent AI requests use version 2
Route AI requests based on task intent, not just cost/performance
Extend the ModelManager to understand task intent (regulatory_analysis, risk_scoring, etc.) and route to the optimal provider/prompt combination based on intent-specific requirements.
Given the model_manager.py module
When I import TaskIntent
Then the enum contains: REGULATORY_ANALYSIS, RISK_SCORING, POLICY_SEARCH, COMPLIANCE_CHECK, DOCUMENT_SUMMARIZATION
And each intent has a string value matching its lowercase name
Given a TaskContext dataclass
When I create a new TaskContext
Then I can specify intent, data_classification, compliance_framework, and tenant_id
And all new fields are optional with sensible defaults
And existing code using TaskContext continues to work without modification
Given a prompt registered with intent "regulatory_analysis"
When an AI request is made with TaskIntent.REGULATORY_ANALYSIS
Then the ModelManager automatically selects the correct prompt
And the provider-specific variant is applied based on the selected provider
Given any AI request processed by the ModelManager
When a provider and prompt are selected
Then an audit log entry is created with: timestamp, intent, selected_provider, selected_model, prompt_id, prompt_version, routing_strategy, tenant_id
And the audit log is queryable for compliance reporting
Unified metrics visibility across all AI providers
Create a unified dashboard exposing AI metrics including provider performance, cost tracking, quality scores, and integration with existing Trust Architecture confidence metrics.
Given the AI Models Service is running
When I GET /api/ai/metrics
Then I receive per-provider statistics: request_count, success_rate, avg_latency_ms, total_cost_usd, error_count
And metrics are available for all 6 providers: openai, azure, bedrock, vertex, ollama, anthropic
Given the Trust Architecture service is processing requests
When I view the AI Observability Dashboard
Then I see TRAQ confidence scores aggregated by provider
And I see rejection rate due to low confidence thresholds
And I see citation accuracy metrics per intent
Given Grafana is deployed with the RegRiskIQ observability stack
When I navigate to the "AI Provider Performance" dashboard
Then I see panels for: latency (p50, p95, p99), cost per provider, success rate, requests per minute
And I can filter by time range, provider, and intent
Validate that switching providers maintains output quality
Build an evaluation harness with golden datasets to prove that switching from Provider A to Provider B does not degrade output quality beyond acceptable thresholds.
Given the evaluation_datasets table in the database
When I query for datasets by intent
Then I find at least 50 queries for each TaskIntent
And each query has a human-validated expected output
And queries are representative of production traffic patterns
Given a golden dataset for "regulatory_analysis"
When I run: python -m evaluation.runner --intent regulatory_analysis --providers openai,bedrock
Then all queries are executed against both providers
And outputs are stored with provider, latency, cost, and raw response
And a comparison report is generated
Given evaluation results from multiple providers
When quality metrics are calculated
Then semantic similarity score is computed using embedding cosine similarity
And citation accuracy is measured against expected sources
And factual consistency is evaluated using NLI models
Given completed evaluation runs for Provider A and Provider B
When I generate a comparison report
Then the report shows per-intent quality scores for both providers
And a go/no-go recommendation is provided based on >90% equivalence threshold
And specific failing cases are highlighted for review
OpenTelemetry-based trace propagation across all AI services
Implement distributed tracing that correlates requests from the UI through the API Gateway, ModelManager, and Provider Adapters, enabling full visibility into AI request lifecycles.
Given an AI request with an incoming trace context header
When the ModelManager processes the request
Then a child span is created under the parent trace
And the span includes attributes: ai.intent, ai.provider, ai.model, ai.tokens.input, ai.tokens.output
Given a trace_id from an OpenTelemetry span
When I query the rag_query_audit table
Then I can find the corresponding audit entry by trace_id
And the audit entry includes the full request/response context
Given a request routed to the Bedrock adapter
When the adapter calls the AWS Bedrock API
Then a child span is created with: provider.name, provider.region, provider.model, provider.latency_ms, provider.cost_usd
And errors are recorded as span events with full exception details
Multi-tenant support with per-tenant rate limits and preferences
Enable multi-tenant operation of the Model Gateway with tenant isolation, per-tenant rate limiting, and tenant-specific provider preferences. Conditional on multi-tenant deployment requirements.
Given an authenticated request with a JWT containing tenant_id claim
When the request reaches the ModelManager
Then tenant_id is extracted and added to TaskContext
And all downstream operations are scoped to that tenant
Given tenant "acme" has a rate limit of 100 requests/minute
When tenant "acme" sends their 101st request in a minute
Then a 429 Too Many Requests response is returned
And the response includes Retry-After header
And other tenants are not affected
Given tenant "acme" has preferred_providers: ["bedrock", "azure"]
When the routing engine selects a provider for tenant "acme"
Then only bedrock and azure are considered
And openai and ollama are excluded from routing decisions
Route requests based on data residency and compliance requirements
Implement simple, configuration-driven routing rules to ensure data residency compliance. Providers in disallowed regions are automatically excluded from routing decisions.
Given a tenant configuration schema
When I configure tenant "eu_bank" with allowed_regions: ["eu-west-1", "eu-central-1"]
Then the configuration is validated and stored
And the routing engine can query allowed regions for any tenant
Given tenant "eu_bank" with allowed_regions: ["eu-west-1"]
And provider "bedrock" is configured for region "us-east-1"
When the routing engine selects providers
Then "bedrock" is excluded from eligible providers
And only EU-region providers are considered
Given a routing decision that excludes providers due to data residency
When the decision is logged
Then the audit log includes: tenant_id, allowed_regions, excluded_providers, selected_provider, reason
And compliance officers can query all residency-based routing decisions
Your path to provider-portable AI compliance starts with a structured engagement.