Your path to AI-powered compliance without vendor lock-in. Use Amazon Bedrock today, Google Vertex AI tomorrow, Azure AI Foundry next week, or all at once.
You face a fundamental tension in AI adoption for regulatory compliance.
You need to leverage cutting-edge AI capabilities for compliance automation, risk assessment, and regulatory intelligence today. Waiting means falling behind competitors and increasing regulatory exposure.
You require the freedom to choose the best provider for each workload, switch providers as pricing and capabilities evolve, and avoid lock-in that constrains future technology decisions.
Your organization demands consistent security, privacy, and audit controls regardless of which AI provider processes requests. Compliance cannot be an afterthought.
You want to route workloads to the most cost-effective provider without sacrificing quality. Different tasks demand different models, and your architecture should support intelligent routing.
Your applications never communicate directly with foundation model providers. All AI interactions flow through a Model Gateway you own and control.
The Model Gateway acts as an anti-corruption layer between your business logic and external AI providers. This separation delivers three strategic advantages:
| Without Gateway | With Gateway |
|---|---|
| Provider-specific code in apps | One API, any provider |
| Scattered governance | Centralized controls |
| Expensive provider switching | Configuration-based routing |
| Manual cost optimization | Intelligent auto-routing |
| Fragmented observability | Unified tracing and logs |
RegRiskIQ delivers enterprise-grade AI governance through these integrated components.
Your applications specify what they need (regulatory analysis, risk scoring, document extraction) rather than which model to use. The gateway selects the optimal provider based on cost, latency, quality, and policy requirements.
Your prompts encode institutional knowledge—they're intellectual property. The registry provides version control, team collaboration, A/B testing, and complete audit trails. Structured prompts reduce token usage by 60-76% while protecting your competitive advantage.
OPA-powered 9-dimensional attribute-based access control—beyond industry-standard 3D RBAC. Controls WHO uses AI (identity), WHAT they access (resources), HOW (actions), WHERE/WHEN (environment), WHY (purpose), WHOSE data (customer PII), aggregation levels (AML thresholds), cross-LOB data sharing, and break-glass for regulatory emergencies. Policy-as-code with full audit trails.
Each foundation model provider integrates through a dedicated adapter that normalizes request formats, response structures, error codes, and authentication patterns. Adding new providers requires only a new adapter.
OpenTelemetry instrumentation provides end-to-end visibility across gateway, adapters, and providers. Track token usage, costs, latencies, and error rates per tenant, per provider, and per use case.
Your retrieval pipeline operates independently from model providers. Switch inference providers without re-indexing document stores or modifying retrieval logic. Your knowledge base stays portable.
Routing rules, policies, prompts, and provider configs live in Git. PR reviews for AI changes. Full audit trail. Rollback any configuration. CI/CD pipelines for your AI infrastructure. Environment parity from code.
Production feedback loops that improve AI performance over time. Thumbs up/down captures user satisfaction. Analytics identify false positive patterns. Automatic threshold tuning reduces noise. Every interaction teaches the system.
| Strategy | Optimizes For | Use Case |
|---|---|---|
| Cost | Minimize spend while meeting quality thresholds | High-volume, non-critical workloads |
| Performance | Minimize latency for interactive experiences | Real-time compliance Q&A |
| Quality | Maximize output quality for critical decisions | Regulatory filing review |
| Hybrid | Balance all factors dynamically | Default for most workloads |
What the provider-portable architecture enables for your organization.
Evaluate and adopt new providers without application changes. Your business logic stays stable while AI capabilities evolve.
Route each workload to the most cost-effective option. Use premium models where quality matters, economical models where speed is sufficient.
Apply uniform security, privacy, and audit controls across all AI interactions. Meet regulatory requirements once, regardless of provider.
Architectural readiness for emerging models and providers. When the next breakthrough arrives, you adopt it through configuration.
AWS, Azure, and Google each publish reference architectures for this exact pattern—because they're competing to be YOUR abstraction layer.
"Multi-Provider Generative AI Gateway" — Official AWS guidance for routing to Azure, OpenAI, and other providers through an AWS-hosted LiteLLM gateway on ECS/EKS.
AWS Solutions Library"AI Gateway" with native Bedrock support — Microsoft's answer: use Azure APIM to govern AWS Bedrock and non-Microsoft AI providers from your Azure control plane.
Microsoft LearnModel Garden with multi-provider serving — Google's unified platform supporting Anthropic Claude, Meta Llama, and partner models alongside Gemini.
Google CloudEach hyperscaler wants to be your gateway to all the others. Our architecture gives you this pattern without the platform lock-in—your gateway runs where YOU choose, not where your cloud vendor prefers.
OPA-powered governance that goes far beyond standard access control—9 dimensions of context-aware policy enforcement for regulated financial services.
Coarse-grained, context-blind
Context-aware, purpose-driven
Scenario: Analyst needs AI-generated market insights during trading hours
9D Decision: Environment=trading_floor + time=market_hours + role=analyst → real-time data, no PII, audit logged
environment.location == "trading_floor" → market_data_only
Scenario: AI flags suspicious transaction pattern, needs customer history
9D Decision: Purpose=AML_investigation + SAR_filed → full transaction history, SSN masked unless SAR threshold met
purpose.type == "aml" && sar_threshold_met → full_pii
Scenario: OCC examiner requests AI model documentation and decisions
9D Decision: Emergency=reg_exam + examiner_credentials_verified → full model access, all decisions, complete audit trail
emergency.type == "regulatory_exam" → full_transparency
Scenario: Wealth management AI needs retail banking transaction data for holistic customer view
9D Decision: Cross_LOB=wealth→retail + approved_use_case + data_sharing_agreement → aggregated view only, no raw transactions
cross_lob.approved && dsa_active → aggregated_view
# AML investigation grants elevated access with audit
allowed_fields[field] if {
input.purpose.type == "aml_investigation"
input.user.role == "compliance_officer"
input.case.sar_filed == true
field := customer_pii_fields[_] # Full PII for filed SAR
}
# Trading floor restricts to market data only
allowed_fields[field] if {
input.environment.location == "trading_floor"
time_in_market_hours(input.environment.timestamp)
field in {"ticker", "price", "volume", "sentiment_score"}
not field in customer_pii_fields
}
# Cross-LOB data sharing requires approved use case
cross_lob_access_allowed if {
input.cross_lob.source_lob != input.cross_lob.requester_lob
approved_cross_lob_use_case(input.purpose.type)
data_sharing_agreement_active(input.cross_lob.source_lob, input.cross_lob.requester_lob)
}
# Regulatory exam break-glass with mandatory audit
allowed_fields[field] if {
input.emergency.type == "regulatory_exam"
input.emergency.examiner_credentials_verified == true
field := all_fields[_] # Full transparency for regulators
# Mandatory: log_regulatory_access(input)
}
Every policy decision is auditable, version-controlled, and reviewable through standard PR workflows. Integrates with existing NIST 800-53 AC and AU control families.
63 purpose-built AI controls across 14 domains—designed to complement your existing controls and satisfy emerging AI regulations.
71% of our AI controls map directly to or extend existing NIST 800-53 controls—rationalizing with your current investments.
Each AI use case gets its own governance envelope—risk classification, applicable controls, evidence requirements, and monitoring metrics.
EU AI Act tiers: Minimal, Limited, High, Unacceptable. Controls scale with risk.
Auto-mapped to NIST 800-53, AI RMF, ISO 42001. Gap analysis against your existing controls.
Real-time dashboards: bias drift (RS-5), model performance (OM-1), explainability (RS-4).
| AI Control | Description | Maps To | Evidence |
|---|---|---|---|
| GO-1 | AI Governance System Implementation | NIST PM-1, ISO 42001 5.1 | Charter, RACI, meeting minutes |
| RS-5 | Fairness & Bias Management | NIST AI RMF MEASURE 2.10, EU AI Act Art. 10 | Bias testing reports, demographic parity metrics |
| RS-4 | Explainability Requirements | EU AI Act Art. 13, ISO 42001 8.4 | SHAP/LIME outputs, decision audit logs |
| OM-1 | Performance Monitoring | NIST CA-7, ISO 42001 9.1 | Model accuracy dashboards, drift alerts |
| TP-2 | Third-Party Model Validation | NIST SA-9, EU AI Act Art. 28 | Vendor assessments, model cards |
Production AI systems that learn and improve from every interaction—closing the loop from deployment to enhancement.
↻ Continuous cycle improves accuracy over time
Thumbs up/down on every AI response. Star ratings for detailed assessment. Context captured: user role, intent, provider used, latency.
Satisfaction trends by provider, intent, and persona. False positive rates. Time-series visualization. Executive reporting exports.
Pattern analysis identifies alerts consistently marked "not helpful". Thresholds auto-tune. Alert fatigue eliminated.
Thumbs up/down feeds into provider selection. High-satisfaction providers get preferred routing. Poor performers downweighted.
Severity-based routing: critical → Slack + email. Rate limiting. Quiet hours. Weekly digests. SLA tracking.
Feedback triggers improvement processes. Critical bugs → 2-hour SLA. UX issues → design review. Structured escalation.
# Feedback logged for every AI interaction
await feedback_client.log_copilot_event({
"event_type": "insight_feedback",
"persona": current_user.role,
"user_feedback": "helpful", # or "not_helpful"
"provider": selected_provider,
"latency_ms": response_time,
"governance_control": "OM-1" # Links to AI governance
})
# Pattern analysis → threshold tuning
patterns = await feedback_client.analyze_alert_patterns()
if control_false_positive_rate > 0.3:
adjust_alert_threshold(control_id, direction="less_sensitive")
This isn't just a pattern to implement. It's strategic infrastructure you should own.
"The cloud era rewarded efficiency. The AI era rewards sovereignty."
— Deloitte Tech Trends 2026
When you own your gateway, switching providers is a configuration change—not a migration project. Vendors know this. Your procurement leverage is fundamentally different.
Your prompts encode institutional knowledge—regulatory interpretations, risk frameworks, compliance logic. These are living knowledge assets. They belong in YOUR systems, version-controlled and auditable.
Route expensive tasks to cost-effective models. Cache responses. Batch requests. These optimizations compound. When you own the gateway, the savings flow to you—not to a platform vendor.
Routing, policy, orchestration. Standard compute. No GPUs. Scales horizontally.
GPU clusters, TPUs, model hosting. You pay per-token, not per-GPU-hour.
Owning the gateway is feasible because you're building orchestration, not GPU infrastructure.
For regulated financial services, AI infrastructure is strategic infrastructure. The hyperscalers want to be YOUR gateway because control equals leverage. We recommend you own this layer—deployed in your environment, integrated with your governance, evolving with your needs.
96% of IT leaders plan to expand AI agents in 2025. Your gateway is the control plane.
of enterprise AI implementations now use multi-agent architectures rather than single-model approaches.
Source: Arcade.dev Agentic Framework Adoption Report 2025
of technology leaders list governance as their primary concern when deploying agentic AI.
Gartner projects 40% of agent projects will fail by 2027 due to inadequate controls.
When agents orchestrate other agents, every model call flows through your gateway. This gives you:
Each agent authenticates separately. Apply different policies, rate limits, and model access per agent type.
Track token usage and costs per agent. Identify runaway agents before they impact budgets.
Control which agents can invoke which tools. Enforce approval workflows for sensitive operations.
Every agent-to-model interaction logged. Trace decisions back through the agent chain for compliance.
By 2028, Gartner predicts 58% of business functions will have AI agents managing at least one process daily. The gateway you build today becomes the governance layer for your agentic future. The architecture is ready—the agent orchestration capabilities plug directly into the existing routing and policy infrastructure.
Core infrastructure complete. Phase 2 enhancements in the backlog.
| OpenAI | GPT-5, o3, o4-mini, GPT-4.1, embeddings |
| Azure OpenAI | Enterprise deployments, Managed Identity |
| AWS Bedrock | Nova 2, Claude 4.5, Llama 4, Mistral Large 3 |
| Google Vertex | Gemini 3 Flash/Pro, Model Garden |
| Ollama | Local models, zero API cost |
Business Value: Enables confident provider switching with quality assurance
Click for details →Business Value: Enterprise-ready multi-tenancy for SaaS deployment
Click for details →Business Value: GDPR/data sovereignty compliance for regulated industries
Click for details →Provider portability is real. But it requires intentional design and ongoing investment.
Tool calling semantics, JSON output reliability, token limits, streaming formats, and content safety features vary across providers. Our adapter layer handles this complexity so your applications stay clean.
Every abstraction has overhead. The gateway layer adds processing time for routing, policy evaluation, and request normalization. For latency-sensitive workloads, this impact must be measured and optimized for your specific use cases.
Proving portability demands evaluation harnesses, golden datasets, and quality metrics across providers. We build this infrastructure as a first-class capability.
Each adapter includes a compatibility test suite that validates behavior against provider-specific edge cases. New provider integrations pass this harness before production.
Gateway components are designed with latency budgets in mind. We instrument each stage (policy evaluation, prompt resolution, routing) with OpenTelemetry tracing to identify and address bottlenecks. Specific targets are established during implementation based on measured baselines.
Automated quality regression tests run against all providers weekly. You get scorecards showing "can I switch to provider X" based on real data.
A structured approach to implementing the Provider-Portable AI Architecture in RegRiskIQ.
Enable RegRiskIQ to leverage multiple foundation model providers without vendor lock-in
This epic delivers a complete provider-portable architecture enabling RegRiskIQ to use AWS Bedrock, Google Vertex AI, Azure AI Foundry, OpenAI, and Ollama interchangeably. The architecture separates AI consumption from AI provision through a Model Gateway pattern with centralized governance, observability, and quality assurance.
Version-controlled prompt management with provider-specific variants
Establish a centralized registry for managing AI prompts as versioned, deployable artifacts. This enables A/B testing, rollback capability, and provider-specific optimizations.
Given a new RegRiskIQ database migration
When the migration is applied
Then a prompt_registry table exists with columns: id, name, version, template, provider_variants, created_at, is_active
And the version column has a unique constraint with name
And provider_variants is a JSONB column supporting arbitrary provider keys
Given an authenticated API request
When I POST to /api/prompts with a valid prompt template
Then a new prompt is created with version 1
And the response includes the prompt ID and version
Given an existing prompt "regulatory_analysis"
When I PUT to /api/prompts/regulatory_analysis with updated content
Then a new version is created (immutable versioning)
And the previous version remains accessible
Given a prompt with provider_variants for "openai" and "anthropic"
When the ModelManager requests the prompt for provider "anthropic"
Then the anthropic-specific variant is returned
And Claude-specific syntax (Human:/Assistant:) is applied
Given a prompt without a variant for provider "bedrock"
When the ModelManager requests the prompt for provider "bedrock"
Then the default template is returned
Given a prompt "risk_scoring" with versions 1, 2, and 3 (active)
When I POST to /api/prompts/risk_scoring/rollback with version=2
Then version 2 becomes the active version
And an audit log entry is created with the rollback details
And subsequent AI requests use version 2
Route AI requests based on task intent, not just cost/performance
Extend the ModelManager to understand task intent (regulatory_analysis, risk_scoring, etc.) and route to the optimal provider/prompt combination based on intent-specific requirements.
Given the model_manager.py module
When I import TaskIntent
Then the enum contains: REGULATORY_ANALYSIS, RISK_SCORING, POLICY_SEARCH, COMPLIANCE_CHECK, DOCUMENT_SUMMARIZATION
And each intent has a string value matching its lowercase name
Given a TaskContext dataclass
When I create a new TaskContext
Then I can specify intent, data_classification, compliance_framework, and tenant_id
And all new fields are optional with sensible defaults
And existing code using TaskContext continues to work without modification
Given a prompt registered with intent "regulatory_analysis"
When an AI request is made with TaskIntent.REGULATORY_ANALYSIS
Then the ModelManager automatically selects the correct prompt
And the provider-specific variant is applied based on the selected provider
Given any AI request processed by the ModelManager
When a provider and prompt are selected
Then an audit log entry is created with: timestamp, intent, selected_provider, selected_model, prompt_id, prompt_version, routing_strategy, tenant_id
And the audit log is queryable for compliance reporting
Unified metrics visibility across all AI providers
Create a unified dashboard exposing AI metrics including provider performance, cost tracking, quality scores, and integration with existing Trust Architecture confidence metrics.
Given the AI Models Service is running
When I GET /api/ai/metrics
Then I receive per-provider statistics: request_count, success_rate, avg_latency_ms, total_cost_usd, error_count
And metrics are available for all 6 providers: openai, azure, bedrock, vertex, ollama, anthropic
Given the Trust Architecture service is processing requests
When I view the AI Observability Dashboard
Then I see TRAQ confidence scores aggregated by provider
And I see rejection rate due to low confidence thresholds
And I see citation accuracy metrics per intent
Given Grafana is deployed with the RegRiskIQ observability stack
When I navigate to the "AI Provider Performance" dashboard
Then I see panels for: latency (p50, p95, p99), cost per provider, success rate, requests per minute
And I can filter by time range, provider, and intent
Validate that switching providers maintains output quality
Build an evaluation harness with golden datasets to prove that switching from Provider A to Provider B does not degrade output quality beyond acceptable thresholds.
Given the evaluation_datasets table in the database
When I query for datasets by intent
Then I find at least 50 queries for each TaskIntent
And each query has a human-validated expected output
And queries are representative of production traffic patterns
Given a golden dataset for "regulatory_analysis"
When I run: python -m evaluation.runner --intent regulatory_analysis --providers openai,bedrock
Then all queries are executed against both providers
And outputs are stored with provider, latency, cost, and raw response
And a comparison report is generated
Given evaluation results from multiple providers
When quality metrics are calculated
Then semantic similarity score is computed using embedding cosine similarity
And citation accuracy is measured against expected sources
And factual consistency is evaluated using NLI models
Given completed evaluation runs for Provider A and Provider B
When I generate a comparison report
Then the report shows per-intent quality scores for both providers
And a go/no-go recommendation is provided based on >90% equivalence threshold
And specific failing cases are highlighted for review
OpenTelemetry-based trace propagation across all AI services
Implement distributed tracing that correlates requests from the UI through the API Gateway, ModelManager, and Provider Adapters, enabling full visibility into AI request lifecycles.
Given an AI request with an incoming trace context header
When the ModelManager processes the request
Then a child span is created under the parent trace
And the span includes attributes: ai.intent, ai.provider, ai.model, ai.tokens.input, ai.tokens.output
Given a trace_id from an OpenTelemetry span
When I query the rag_query_audit table
Then I can find the corresponding audit entry by trace_id
And the audit entry includes the full request/response context
Given a request routed to the Bedrock adapter
When the adapter calls the AWS Bedrock API
Then a child span is created with: provider.name, provider.region, provider.model, provider.latency_ms, provider.cost_usd
And errors are recorded as span events with full exception details
Multi-tenant support with per-tenant rate limits and preferences
Enable multi-tenant operation of the Model Gateway with tenant isolation, per-tenant rate limiting, and tenant-specific provider preferences. Conditional on multi-tenant deployment requirements.
Given an authenticated request with a JWT containing tenant_id claim
When the request reaches the ModelManager
Then tenant_id is extracted and added to TaskContext
And all downstream operations are scoped to that tenant
Given tenant "acme" has a rate limit of 100 requests/minute
When tenant "acme" sends their 101st request in a minute
Then a 429 Too Many Requests response is returned
And the response includes Retry-After header
And other tenants are not affected
Given tenant "acme" has preferred_providers: ["bedrock", "azure"]
When the routing engine selects a provider for tenant "acme"
Then only bedrock and azure are considered
And openai and ollama are excluded from routing decisions
Route requests based on data residency and compliance requirements
Implement simple, configuration-driven routing rules to ensure data residency compliance. Providers in disallowed regions are automatically excluded from routing decisions.
Given a tenant configuration schema
When I configure tenant "eu_bank" with allowed_regions: ["eu-west-1", "eu-central-1"]
Then the configuration is validated and stored
And the routing engine can query allowed regions for any tenant
Given tenant "eu_bank" with allowed_regions: ["eu-west-1"]
And provider "bedrock" is configured for region "us-east-1"
When the routing engine selects providers
Then "bedrock" is excluded from eligible providers
And only EU-region providers are considered
Given a routing decision that excludes providers due to data residency
When the decision is logged
Then the audit log includes: tenant_id, allowed_regions, excluded_providers, selected_provider, reason
And compliance officers can query all residency-based routing decisions
Your path to provider-portable AI compliance starts with a structured engagement.