Pricing & Billing
Understanding costs, pricing models, and cost optimization strategies for AI model deployments.
Pricing Models
Token-Based Pricing
Pay only for what you use based on input and output tokens
Instance-Based Pricing
Pay for compute instances running your models
AWS Bedrock Pricing
Bedrock uses token-based pricing with different rates for input and output tokens:
Example Pricing (Amazon Nova Pro)
| Token Type | Price per 1K tokens |
|---|---|
| Input Tokens | $0.0008 |
| Output Tokens | $0.0032 |
Cost Calculation Example
Conversation Details: - Input: 150 tokens - Output: 280 tokens - Model: Amazon Nova Pro Cost Calculation: Input cost = (150 / 1000) × $0.0008 = $0.00012 Output cost = (280 / 1000) × $0.0032 = $0.000896 Total cost = $0.001016 (~$0.001) For 1,000 similar conversations: Total cost = $1.016 (~$1.00)
AWS SageMaker Pricing
SageMaker charges based on instance type and usage time:
Common Instance Types
| Instance Type | vCPUs | Memory | Hourly Rate (approx) |
|---|---|---|---|
ml.t3.medium | 2 | 4 GB | $0.065 |
ml.m5.xlarge | 4 | 16 GB | $0.269 |
ml.g4dn.xlarge | 4 | 16 GB + GPU | $0.736 |
ml.p3.2xlarge | 8 | 61 GB + V100 GPU | $3.825 |
Monthly Cost Example
Instance: ml.m5.xlarge Hourly rate: $0.269 Daily cost = $0.269 × 24 hours = $6.46 Monthly cost = $0.269 × 24 × 30 = $193.68 Annual cost = $0.269 × 24 × 365 = $2,356.44 ⚠️ SageMaker instances run continuously until stopped ⚠️ Remember to delete unused endpoints!
NVIDIA NIM Pricing
NVIDIA NIM uses token-based pricing similar to Bedrock. Pricing varies by model. Check the NVIDIA NIM Catalog for current rates.
Real-Time Pricing API
Staque IO provides real-time pricing data through the Pricing API:
Get Instance Pricing
GET /api/pricing?instanceType=ml.m5.xlarge&service=sagemaker
Response:
{
"success": true,
"service": "sagemaker",
"instanceType": "ml.m5.xlarge",
"region": "eu-north-1",
"pricing": {
"hourly": 0.269,
"monthly": 193.68,
"annual": 2356.44
}
}Update Pricing Cache
POST /api/pricing/update
{ "force": true }
Response:
{
"success": true,
"message": "Pricing updated successfully",
"hasChanges": true,
"lastUpdated": "2024-01-10T12:00:00Z"
}Usage Tracking
Track token usage and costs in real-time:
GET /api/usage/current?conversationId=conv-uuid
Response:
{
"success": true,
"mtd": {
"cost_usd": 12.45,
"tokens_in": 125000,
"tokens_out": 187500,
"tokens_total": 312500,
"requests": 850
},
"last24h": {
"cost_usd": 2.34,
"tokens_in": 15000,
"tokens_out": 22500,
"tokens_total": 37500,
"requests": 102
}
}Cost Optimization Strategies
1. Choose the Right Platform
- Low/Variable Traffic: Use Bedrock (no idle cost)
- High Constant Traffic: SageMaker may be more cost-effective
- Specific Models: Check if available on NVIDIA NIM
2. Optimize Token Usage
- Concise Prompts: Remove unnecessary context
- System Prompts: Use efficient, focused system prompts
- Truncation: Limit conversation history length
- Caching: Cache responses for repeated queries
3. Right-Size Instances
- Start with smaller instances and scale up if needed
- Use CloudWatch metrics to monitor utilization
- Consider auto-scaling for variable workloads
- Delete unused endpoints immediately
4. Use Cheaper Models
- Use smaller models for simple tasks
- Route requests to appropriate model based on complexity
- Test if cheaper models meet your quality requirements
Cost Monitoring
Set Up AWS Budgets
1. Go to AWS Budgets in AWS Console 2. Create a new budget 3. Set monthly budget threshold (e.g., $100) 4. Configure alerts: - 50% of budget - 80% of budget - 100% of budget 5. Add email/SNS notifications
CloudWatch Alarms
# Example: Alert on high Bedrock costs aws cloudwatch put-metric-alarm \ --alarm-name high-bedrock-costs \ --alarm-description "Alert when daily Bedrock costs exceed $50" \ --metric-name EstimatedCharges \ --namespace AWS/Billing \ --statistic Maximum \ --period 86400 \ --threshold 50 \ --comparison-operator GreaterThanThreshold
Billing Best Practices
- Daily Reviews: Check usage dashboards daily
- Set Budgets: Configure AWS Budget alerts
- Tag Resources: Use tags for cost allocation
- Delete Unused: Remove idle SageMaker endpoints
- Monitor Trends: Track cost trends over time
- Optimize Regularly: Review and optimize monthly
Cost Comparison Calculator
Example Scenario: 10,000 requests/day
• Cost: (200×0.0008 + 300×0.0032) / 1000 × 10,000
• Daily: $11.20
• Monthly: $336
• Daily: $6.46
• Monthly: $194
💰 Cost Tip
The break-even point between token-based and instance-based pricing depends on your usage patterns. Use the Staque IO AI Recommendations feature to get personalized cost analysis!