AWS Bedrock
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API.
Key Features
- ✓ No Infrastructure Management: Fully managed service with instant availability
- ✓ Multiple Providers: Access models from Anthropic, Amazon, Meta, Mistral AI, and more
- ✓ Pay-Per-Use: Token-based pricing with no minimum fees or idle costs
- ✓ Enterprise Security: Data encryption, VPC support, and compliance certifications
- ✓ Customization: Fine-tuning and continued pre-training capabilities
Available Models
Staque IO supports all AWS Bedrock foundation models. Here are some popular options:
Amazon Nova (Recommended)
- Nova Pro: Advanced multimodal model with text and image understanding
- Nova Lite: Fast, cost-effective model for text tasks
- Nova Micro: Ultra-low latency for simple completions
Anthropic Claude
- Claude 3 Opus: Most capable model for complex tasks
- Claude 3 Sonnet: Balanced performance and speed
- Claude 3 Haiku: Fastest model for simple queries
Amazon Titan
- Titan Text Premier: Advanced text generation
- Titan Text Lite: Cost-effective text generation
- Titan Embeddings: Text embeddings for search and RAG
Other Providers
- Meta Llama 3: Open-source models in various sizes
- Mistral AI: High-performance European models
- Cohere: Specialized models for enterprise use
How It Works in Staque IO
1. Model Selection ↓ 2. API Access Configuration (no deployment needed) ↓ 3. Instant availability for chat/inference ↓ 4. Pay only for tokens used
Deployment Process
Unlike traditional deployments, Bedrock models don't require infrastructure provisioning. When you "deploy" a Bedrock model in Staque IO, the platform:
- Verifies model availability in your AWS region
- Creates a configuration entry in the database
- Sets up the API endpoint for invocations
- Makes the model immediately available for chat
Inference Profile Support
Staque IO automatically handles cross-region inference profiles for Nova models, allowing you to use us.amazon.nova-pro-v1:0 or eu.amazon.nova-pro-v1:0depending on your configured region.
Pricing
Token-Based Pricing
Bedrock uses a pay-per-token model with separate rates for input and output tokens:
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|
| Amazon Nova Pro | $0.0008 | $0.0032 |
| Claude 3 Sonnet | $0.003 | $0.015 |
| Claude 3 Haiku | $0.00025 | $0.00125 |
| Titan Text Premier | $0.0005 | $0.0015 |
💰 No Idle Costs: You only pay when you use the model. No charges for keeping models "deployed" or during periods of no usage.
Configuration
Required Environment Variables
# AWS Credentials STAQUE_AWS_REGION=eu-north-1 STAQUE_AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE STAQUE_AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY # Optional: JWT Secret for authentication JWT_SECRET=your-secret-key-here
IAM Permissions Required
Your AWS IAM user or role needs the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:ListFoundationModels",
"bedrock:GetFoundationModel",
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
}
]
}System Prompts
Staque IO allows you to customize the system prompt for each Bedrock model. The default prompt is specialized for biomarker triage, but you can modify it via the API or UI to suit your use case.
// Update system prompt
POST /api/bedrock/system-prompt
{
"modelId": "amazon.nova-pro-v1:0",
"systemPrompt": "You are a helpful AI assistant..."
}
// Retrieve current prompt
GET /api/bedrock/system-prompt?modelId=amazon.nova-pro-v1:0Usage Examples
Deploying a Bedrock Model
// Deploy via API
POST /api/deploy/bedrock
{
"modelId": "amazon.nova-pro-v1:0",
"endpointName": "my-nova-assistant",
"dryRun": false
}
// Response
{
"success": true,
"message": "Bedrock model access configured",
"endpoint": "https://bedrock-runtime.eu-north-1.amazonaws.com/...",
"modelId": "amazon.nova-pro-v1:0",
"region": "eu-north-1"
}Sending a Chat Message
POST /api/chat/thread
{
"message": "Analyze these biomarker values: Glucose: 180 mg/dL",
"conversationId": "conversation-uuid",
"resourceId": "resource-uuid",
"threadId": "thread-uuid" // Optional
}
// Response includes token usage and costs
{
"success": true,
"threadId": "thread-uuid",
"messages": [
{
"role": "user",
"content": "Analyze these biomarker values...",
"timestamp": "2024-01-10T12:00:00Z"
},
{
"role": "assistant",
"content": "Based on the glucose level of 180 mg/dL...",
"tokens_in": 25,
"tokens_out": 150,
"tokens_total": 175,
"latency_ms": 823
}
]
}Best Practices
Model Selection
- Prototyping: Start with Nova Lite or Claude Haiku for fast, cost-effective development
- Production: Use Nova Pro or Claude Sonnet for balanced performance
- Complex Tasks: Use Claude Opus for tasks requiring deep reasoning
- High Volume: Consider Nova Micro for simple, high-throughput workloads
Cost Optimization
- Use prompt caching for repeated context (when available)
- Implement input length limits to control costs
- Choose the smallest model that meets your quality requirements
- Monitor token usage via Staque IO's built-in tracking
- Use streaming responses to provide faster user feedback
Performance Tips
- Use cross-region inference profiles for better availability
- Implement retry logic for transient failures
- Cache frequently requested responses client-side
- Use smaller models for latency-sensitive applications
- Consider response streaming for long-form generation
📚 Learn More
Troubleshooting
Model Not Available Error
Problem: "Model not found or not accessible"
Solution:
- Verify the model is available in your AWS region
- Check that you've requested access to the model in the AWS Bedrock console
- Ensure your IAM credentials have the necessary permissions
Access Denied Error
Problem: 403 Forbidden or Access Denied
Solution:
- Verify your AWS credentials are correctly configured
- Check IAM policy includes
bedrock:InvokeModelpermission - Ensure model access has been requested and approved in Bedrock console
High Latency
Problem: Slow response times
Solution:
- Use a smaller model for faster responses (e.g., Nova Lite instead of Nova Pro)
- Enable response streaming to provide incremental results
- Consider using cross-region inference profiles
- Reduce input context length when possible