AWS Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API.

Key Features

  • No Infrastructure Management: Fully managed service with instant availability
  • Multiple Providers: Access models from Anthropic, Amazon, Meta, Mistral AI, and more
  • Pay-Per-Use: Token-based pricing with no minimum fees or idle costs
  • Enterprise Security: Data encryption, VPC support, and compliance certifications
  • Customization: Fine-tuning and continued pre-training capabilities

Available Models

Staque IO supports all AWS Bedrock foundation models. Here are some popular options:

Amazon Nova (Recommended)

  • Nova Pro: Advanced multimodal model with text and image understanding
  • Nova Lite: Fast, cost-effective model for text tasks
  • Nova Micro: Ultra-low latency for simple completions

Anthropic Claude

  • Claude 3 Opus: Most capable model for complex tasks
  • Claude 3 Sonnet: Balanced performance and speed
  • Claude 3 Haiku: Fastest model for simple queries

Amazon Titan

  • Titan Text Premier: Advanced text generation
  • Titan Text Lite: Cost-effective text generation
  • Titan Embeddings: Text embeddings for search and RAG

Other Providers

  • Meta Llama 3: Open-source models in various sizes
  • Mistral AI: High-performance European models
  • Cohere: Specialized models for enterprise use

How It Works in Staque IO

1. Model Selection
   ↓
2. API Access Configuration (no deployment needed)
   ↓
3. Instant availability for chat/inference
   ↓
4. Pay only for tokens used

Deployment Process

Unlike traditional deployments, Bedrock models don't require infrastructure provisioning. When you "deploy" a Bedrock model in Staque IO, the platform:

  1. Verifies model availability in your AWS region
  2. Creates a configuration entry in the database
  3. Sets up the API endpoint for invocations
  4. Makes the model immediately available for chat

Inference Profile Support

Staque IO automatically handles cross-region inference profiles for Nova models, allowing you to use us.amazon.nova-pro-v1:0 or eu.amazon.nova-pro-v1:0depending on your configured region.

Pricing

Token-Based Pricing

Bedrock uses a pay-per-token model with separate rates for input and output tokens:

ModelInput (per 1K tokens)Output (per 1K tokens)
Amazon Nova Pro$0.0008$0.0032
Claude 3 Sonnet$0.003$0.015
Claude 3 Haiku$0.00025$0.00125
Titan Text Premier$0.0005$0.0015

💰 No Idle Costs: You only pay when you use the model. No charges for keeping models "deployed" or during periods of no usage.

Configuration

Required Environment Variables

# AWS Credentials
STAQUE_AWS_REGION=eu-north-1
STAQUE_AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
STAQUE_AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# Optional: JWT Secret for authentication
JWT_SECRET=your-secret-key-here

IAM Permissions Required

Your AWS IAM user or role needs the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:ListFoundationModels",
        "bedrock:GetFoundationModel",
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "*"
    }
  ]
}

System Prompts

Staque IO allows you to customize the system prompt for each Bedrock model. The default prompt is specialized for biomarker triage, but you can modify it via the API or UI to suit your use case.

// Update system prompt
POST /api/bedrock/system-prompt
{
  "modelId": "amazon.nova-pro-v1:0",
  "systemPrompt": "You are a helpful AI assistant..."
}

// Retrieve current prompt
GET /api/bedrock/system-prompt?modelId=amazon.nova-pro-v1:0

Usage Examples

Deploying a Bedrock Model

// Deploy via API
POST /api/deploy/bedrock
{
  "modelId": "amazon.nova-pro-v1:0",
  "endpointName": "my-nova-assistant",
  "dryRun": false
}

// Response
{
  "success": true,
  "message": "Bedrock model access configured",
  "endpoint": "https://bedrock-runtime.eu-north-1.amazonaws.com/...",
  "modelId": "amazon.nova-pro-v1:0",
  "region": "eu-north-1"
}

Sending a Chat Message

POST /api/chat/thread
{
  "message": "Analyze these biomarker values: Glucose: 180 mg/dL",
  "conversationId": "conversation-uuid",
  "resourceId": "resource-uuid",
  "threadId": "thread-uuid"  // Optional
}

// Response includes token usage and costs
{
  "success": true,
  "threadId": "thread-uuid",
  "messages": [
    {
      "role": "user",
      "content": "Analyze these biomarker values...",
      "timestamp": "2024-01-10T12:00:00Z"
    },
    {
      "role": "assistant",
      "content": "Based on the glucose level of 180 mg/dL...",
      "tokens_in": 25,
      "tokens_out": 150,
      "tokens_total": 175,
      "latency_ms": 823
    }
  ]
}

Best Practices

Model Selection

  • Prototyping: Start with Nova Lite or Claude Haiku for fast, cost-effective development
  • Production: Use Nova Pro or Claude Sonnet for balanced performance
  • Complex Tasks: Use Claude Opus for tasks requiring deep reasoning
  • High Volume: Consider Nova Micro for simple, high-throughput workloads

Cost Optimization

  • Use prompt caching for repeated context (when available)
  • Implement input length limits to control costs
  • Choose the smallest model that meets your quality requirements
  • Monitor token usage via Staque IO's built-in tracking
  • Use streaming responses to provide faster user feedback

Performance Tips

  • Use cross-region inference profiles for better availability
  • Implement retry logic for transient failures
  • Cache frequently requested responses client-side
  • Use smaller models for latency-sensitive applications
  • Consider response streaming for long-form generation

Troubleshooting

Model Not Available Error

Problem: "Model not found or not accessible"

Solution:

  • Verify the model is available in your AWS region
  • Check that you've requested access to the model in the AWS Bedrock console
  • Ensure your IAM credentials have the necessary permissions

Access Denied Error

Problem: 403 Forbidden or Access Denied

Solution:

  • Verify your AWS credentials are correctly configured
  • Check IAM policy includes bedrock:InvokeModel permission
  • Ensure model access has been requested and approved in Bedrock console

High Latency

Problem: Slow response times

Solution:

  • Use a smaller model for faster responses (e.g., Nova Lite instead of Nova Pro)
  • Enable response streaming to provide incremental results
  • Consider using cross-region inference profiles
  • Reduce input context length when possible