AWS Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API.

Key Features

✓ No Infrastructure Management: Fully managed service with instant availability
✓ Multiple Providers: Access models from Anthropic, Amazon, Meta, Mistral AI, and more
✓ Pay-Per-Use: Token-based pricing with no minimum fees or idle costs
✓ Enterprise Security: Data encryption, VPC support, and compliance certifications
✓ Customization: Fine-tuning and continued pre-training capabilities

Available Models

Staque IO supports all AWS Bedrock foundation models. Here are some popular options:

Amazon Nova (Recommended)

Nova Pro: Advanced multimodal model with text and image understanding
Nova Lite: Fast, cost-effective model for text tasks
Nova Micro: Ultra-low latency for simple completions

Anthropic Claude

Claude 3 Opus: Most capable model for complex tasks
Claude 3 Sonnet: Balanced performance and speed
Claude 3 Haiku: Fastest model for simple queries

Amazon Titan

Titan Text Premier: Advanced text generation
Titan Text Lite: Cost-effective text generation
Titan Embeddings: Text embeddings for search and RAG

Other Providers

Meta Llama 3: Open-source models in various sizes
Mistral AI: High-performance European models
Cohere: Specialized models for enterprise use

How It Works in Staque IO

1. Model Selection
   ↓
2. API Access Configuration (no deployment needed)
   ↓
3. Instant availability for chat/inference
   ↓
4. Pay only for tokens used

Deployment Process

Unlike traditional deployments, Bedrock models don't require infrastructure provisioning. When you "deploy" a Bedrock model in Staque IO, the platform:

Verifies model availability in your AWS region
Creates a configuration entry in the database
Sets up the API endpoint for invocations
Makes the model immediately available for chat

Inference Profile Support

Staque IO automatically handles cross-region inference profiles for Nova models, allowing you to use us.amazon.nova-pro-v1:0 or eu.amazon.nova-pro-v1:0depending on your configured region.

Pricing

Token-Based Pricing

Bedrock uses a pay-per-token model with separate rates for input and output tokens:

Model	Input (per 1K tokens)	Output (per 1K tokens)
Amazon Nova Pro	$0.0008	$0.0032
Claude 3 Sonnet	$0.003	$0.015
Claude 3 Haiku	$0.00025	$0.00125
Titan Text Premier	$0.0005	$0.0015

💰 No Idle Costs: You only pay when you use the model. No charges for keeping models "deployed" or during periods of no usage.

Configuration

Required Environment Variables

# AWS Credentials
STAQUE_AWS_REGION=eu-north-1
STAQUE_AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
STAQUE_AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# Optional: JWT Secret for authentication
JWT_SECRET=your-secret-key-here

IAM Permissions Required

Your AWS IAM user or role needs the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:ListFoundationModels",
        "bedrock:GetFoundationModel",
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "*"
    }
  ]
}

System Prompts

Staque IO allows you to customize the system prompt for each Bedrock model. The default prompt is specialized for biomarker triage, but you can modify it via the API or UI to suit your use case.

// Update system prompt
POST /api/bedrock/system-prompt
{
  "modelId": "amazon.nova-pro-v1:0",
  "systemPrompt": "You are a helpful AI assistant..."
}

// Retrieve current prompt
GET /api/bedrock/system-prompt?modelId=amazon.nova-pro-v1:0

Usage Examples

Deploying a Bedrock Model

// Deploy via API
POST /api/deploy/bedrock
{
  "modelId": "amazon.nova-pro-v1:0",
  "endpointName": "my-nova-assistant",
  "dryRun": false
}

// Response
{
  "success": true,
  "message": "Bedrock model access configured",
  "endpoint": "https://bedrock-runtime.eu-north-1.amazonaws.com/...",
  "modelId": "amazon.nova-pro-v1:0",
  "region": "eu-north-1"
}

Sending a Chat Message

POST /api/chat/thread
{
  "message": "Analyze these biomarker values: Glucose: 180 mg/dL",
  "conversationId": "conversation-uuid",
  "resourceId": "resource-uuid",
  "threadId": "thread-uuid"  // Optional
}

// Response includes token usage and costs
{
  "success": true,
  "threadId": "thread-uuid",
  "messages": [
    {
      "role": "user",
      "content": "Analyze these biomarker values...",
      "timestamp": "2024-01-10T12:00:00Z"
    },
    {
      "role": "assistant",
      "content": "Based on the glucose level of 180 mg/dL...",
      "tokens_in": 25,
      "tokens_out": 150,
      "tokens_total": 175,
      "latency_ms": 823
    }
  ]
}

Best Practices

Model Selection

Prototyping: Start with Nova Lite or Claude Haiku for fast, cost-effective development
Production: Use Nova Pro or Claude Sonnet for balanced performance
Complex Tasks: Use Claude Opus for tasks requiring deep reasoning
High Volume: Consider Nova Micro for simple, high-throughput workloads

Cost Optimization

Use prompt caching for repeated context (when available)
Implement input length limits to control costs
Choose the smallest model that meets your quality requirements
Monitor token usage via Staque IO's built-in tracking
Use streaming responses to provide faster user feedback

Performance Tips

Use cross-region inference profiles for better availability
Implement retry logic for transient failures
Cache frequently requested responses client-side
Use smaller models for latency-sensitive applications
Consider response streaming for long-form generation

📚 Learn More

Troubleshooting

Model Not Available Error

Problem: "Model not found or not accessible"

Solution:

Verify the model is available in your AWS region
Check that you've requested access to the model in the AWS Bedrock console
Ensure your IAM credentials have the necessary permissions

Access Denied Error

Problem: 403 Forbidden or Access Denied

Solution:

Verify your AWS credentials are correctly configured
Check IAM policy includes bedrock:InvokeModel permission
Ensure model access has been requested and approved in Bedrock console

High Latency

Problem: Slow response times

Solution:

Use a smaller model for faster responses (e.g., Nova Lite instead of Nova Pro)
Enable response streaming to provide incremental results
Consider using cross-region inference profiles
Reduce input context length when possible