Pricing & Billing

Understanding costs, pricing models, and cost optimization strategies for AI model deployments.

Pricing Models

Token-Based Pricing

Pay only for what you use based on input and output tokens

Platforms:Bedrock, NVIDIA NIM
Idle Cost:$0
Billing Unit:Per 1K tokens

Instance-Based Pricing

Pay for compute instances running your models

Platforms:SageMaker
Idle Cost:Hourly rate
Billing Unit:Per hour

AWS Bedrock Pricing

Bedrock uses token-based pricing with different rates for input and output tokens:

Example Pricing (Amazon Nova Pro)

Token TypePrice per 1K tokens
Input Tokens$0.0008
Output Tokens$0.0032

Cost Calculation Example

Conversation Details:
- Input: 150 tokens
- Output: 280 tokens
- Model: Amazon Nova Pro

Cost Calculation:
Input cost  = (150 / 1000) × $0.0008 = $0.00012
Output cost = (280 / 1000) × $0.0032 = $0.000896
Total cost  = $0.001016 (~$0.001)

For 1,000 similar conversations:
Total cost = $1.016 (~$1.00)

AWS SageMaker Pricing

SageMaker charges based on instance type and usage time:

Common Instance Types

Instance TypevCPUsMemoryHourly Rate (approx)
ml.t3.medium24 GB$0.065
ml.m5.xlarge416 GB$0.269
ml.g4dn.xlarge416 GB + GPU$0.736
ml.p3.2xlarge861 GB + V100 GPU$3.825

Monthly Cost Example

Instance: ml.m5.xlarge
Hourly rate: $0.269

Daily cost  = $0.269 × 24 hours = $6.46
Monthly cost = $0.269 × 24 × 30 = $193.68
Annual cost = $0.269 × 24 × 365 = $2,356.44

⚠️  SageMaker instances run continuously until stopped
⚠️  Remember to delete unused endpoints!

NVIDIA NIM Pricing

NVIDIA NIM uses token-based pricing similar to Bedrock. Pricing varies by model. Check the NVIDIA NIM Catalog for current rates.

Real-Time Pricing API

Staque IO provides real-time pricing data through the Pricing API:

Get Instance Pricing

GET /api/pricing?instanceType=ml.m5.xlarge&service=sagemaker

Response:
{
  "success": true,
  "service": "sagemaker",
  "instanceType": "ml.m5.xlarge",
  "region": "eu-north-1",
  "pricing": {
    "hourly": 0.269,
    "monthly": 193.68,
    "annual": 2356.44
  }
}

Update Pricing Cache

POST /api/pricing/update
{ "force": true }

Response:
{
  "success": true,
  "message": "Pricing updated successfully",
  "hasChanges": true,
  "lastUpdated": "2024-01-10T12:00:00Z"
}

Usage Tracking

Track token usage and costs in real-time:

GET /api/usage/current?conversationId=conv-uuid

Response:
{
  "success": true,
  "mtd": {
    "cost_usd": 12.45,
    "tokens_in": 125000,
    "tokens_out": 187500,
    "tokens_total": 312500,
    "requests": 850
  },
  "last24h": {
    "cost_usd": 2.34,
    "tokens_in": 15000,
    "tokens_out": 22500,
    "tokens_total": 37500,
    "requests": 102
  }
}

Cost Optimization Strategies

1. Choose the Right Platform

  • Low/Variable Traffic: Use Bedrock (no idle cost)
  • High Constant Traffic: SageMaker may be more cost-effective
  • Specific Models: Check if available on NVIDIA NIM

2. Optimize Token Usage

  • Concise Prompts: Remove unnecessary context
  • System Prompts: Use efficient, focused system prompts
  • Truncation: Limit conversation history length
  • Caching: Cache responses for repeated queries

3. Right-Size Instances

  • Start with smaller instances and scale up if needed
  • Use CloudWatch metrics to monitor utilization
  • Consider auto-scaling for variable workloads
  • Delete unused endpoints immediately

4. Use Cheaper Models

  • Use smaller models for simple tasks
  • Route requests to appropriate model based on complexity
  • Test if cheaper models meet your quality requirements

Cost Monitoring

Set Up AWS Budgets

1. Go to AWS Budgets in AWS Console
2. Create a new budget
3. Set monthly budget threshold (e.g., $100)
4. Configure alerts:
   - 50% of budget
   - 80% of budget
   - 100% of budget
5. Add email/SNS notifications

CloudWatch Alarms

# Example: Alert on high Bedrock costs
aws cloudwatch put-metric-alarm \
  --alarm-name high-bedrock-costs \
  --alarm-description "Alert when daily Bedrock costs exceed $50" \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing \
  --statistic Maximum \
  --period 86400 \
  --threshold 50 \
  --comparison-operator GreaterThanThreshold

Billing Best Practices

  • Daily Reviews: Check usage dashboards daily
  • Set Budgets: Configure AWS Budget alerts
  • Tag Resources: Use tags for cost allocation
  • Delete Unused: Remove idle SageMaker endpoints
  • Monitor Trends: Track cost trends over time
  • Optimize Regularly: Review and optimize monthly

Cost Comparison Calculator

Example Scenario: 10,000 requests/day

Bedrock (Nova Pro)
• Avg 200 input + 300 output tokens per request
• Cost: (200×0.0008 + 300×0.0032) / 1000 × 10,000
Daily: $11.20
Monthly: $336
SageMaker (ml.m5.xlarge)
• Hourly: $0.269
Daily: $6.46
Monthly: $194
💡 For this workload, SageMaker is more cost-effective!

💰 Cost Tip

The break-even point between token-based and instance-based pricing depends on your usage patterns. Use the Staque IO AI Recommendations feature to get personalized cost analysis!