Pricing & Billing

Understanding costs, pricing models, and cost optimization strategies for AI model deployments.

Pricing Models

Token-Based Pricing

Pay only for what you use based on input and output tokens

Platforms:Bedrock, NVIDIA NIM

Idle Cost:$0

Billing Unit:Per 1K tokens

Instance-Based Pricing

Pay for compute instances running your models

Platforms:SageMaker

Idle Cost:Hourly rate

Billing Unit:Per hour

AWS Bedrock Pricing

Bedrock uses token-based pricing with different rates for input and output tokens:

Example Pricing (Amazon Nova Pro)

Token Type	Price per 1K tokens
Input Tokens	$0.0008
Output Tokens	$0.0032

Cost Calculation Example

Conversation Details:
- Input: 150 tokens
- Output: 280 tokens
- Model: Amazon Nova Pro

Cost Calculation:
Input cost  = (150 / 1000) × $0.0008 = $0.00012
Output cost = (280 / 1000) × $0.0032 = $0.000896
Total cost  = $0.001016 (~$0.001)

For 1,000 similar conversations:
Total cost = $1.016 (~$1.00)

AWS SageMaker Pricing

SageMaker charges based on instance type and usage time:

Common Instance Types

Instance Type	vCPUs	Memory	Hourly Rate (approx)
`ml.t3.medium`	2	4 GB	$0.065
`ml.m5.xlarge`	4	16 GB	$0.269
`ml.g4dn.xlarge`	4	16 GB + GPU	$0.736
`ml.p3.2xlarge`	8	61 GB + V100 GPU	$3.825

Monthly Cost Example

Instance: ml.m5.xlarge
Hourly rate: $0.269

Daily cost  = $0.269 × 24 hours = $6.46
Monthly cost = $0.269 × 24 × 30 = $193.68
Annual cost = $0.269 × 24 × 365 = $2,356.44

⚠️  SageMaker instances run continuously until stopped
⚠️  Remember to delete unused endpoints!

NVIDIA NIM Pricing

NVIDIA NIM uses token-based pricing similar to Bedrock. Pricing varies by model. Check the NVIDIA NIM Catalog for current rates.

Real-Time Pricing API

Staque IO provides real-time pricing data through the Pricing API:

Get Instance Pricing

GET /api/pricing?instanceType=ml.m5.xlarge&service=sagemaker

Response:
{
  "success": true,
  "service": "sagemaker",
  "instanceType": "ml.m5.xlarge",
  "region": "eu-north-1",
  "pricing": {
    "hourly": 0.269,
    "monthly": 193.68,
    "annual": 2356.44
  }
}

Update Pricing Cache

POST /api/pricing/update
{ "force": true }

Response:
{
  "success": true,
  "message": "Pricing updated successfully",
  "hasChanges": true,
  "lastUpdated": "2024-01-10T12:00:00Z"
}

Usage Tracking

Track token usage and costs in real-time:

GET /api/usage/current?conversationId=conv-uuid

Response:
{
  "success": true,
  "mtd": {
    "cost_usd": 12.45,
    "tokens_in": 125000,
    "tokens_out": 187500,
    "tokens_total": 312500,
    "requests": 850
  },
  "last24h": {
    "cost_usd": 2.34,
    "tokens_in": 15000,
    "tokens_out": 22500,
    "tokens_total": 37500,
    "requests": 102
  }
}

Cost Optimization Strategies

1. Choose the Right Platform

Low/Variable Traffic: Use Bedrock (no idle cost)
High Constant Traffic: SageMaker may be more cost-effective
Specific Models: Check if available on NVIDIA NIM

2. Optimize Token Usage

Concise Prompts: Remove unnecessary context
System Prompts: Use efficient, focused system prompts
Truncation: Limit conversation history length
Caching: Cache responses for repeated queries

3. Right-Size Instances

Start with smaller instances and scale up if needed
Use CloudWatch metrics to monitor utilization
Consider auto-scaling for variable workloads
Delete unused endpoints immediately

4. Use Cheaper Models

Use smaller models for simple tasks
Route requests to appropriate model based on complexity
Test if cheaper models meet your quality requirements

Cost Monitoring

Set Up AWS Budgets

1. Go to AWS Budgets in AWS Console
2. Create a new budget
3. Set monthly budget threshold (e.g., $100)
4. Configure alerts:
   - 50% of budget
   - 80% of budget
   - 100% of budget
5. Add email/SNS notifications

CloudWatch Alarms

# Example: Alert on high Bedrock costs
aws cloudwatch put-metric-alarm \
  --alarm-name high-bedrock-costs \
  --alarm-description "Alert when daily Bedrock costs exceed $50" \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing \
  --statistic Maximum \
  --period 86400 \
  --threshold 50 \
  --comparison-operator GreaterThanThreshold

Billing Best Practices

Daily Reviews: Check usage dashboards daily
Set Budgets: Configure AWS Budget alerts
Tag Resources: Use tags for cost allocation
Delete Unused: Remove idle SageMaker endpoints
Monitor Trends: Track cost trends over time
Optimize Regularly: Review and optimize monthly

Cost Comparison Calculator

Example Scenario: 10,000 requests/day

Bedrock (Nova Pro)

• Avg 200 input + 300 output tokens per request
• Cost: (200×0.0008 + 300×0.0032) / 1000 × 10,000
• Daily: $11.20
• Monthly: $336

SageMaker (ml.m5.xlarge)

• Hourly: $0.269
• Daily: $6.46
• Monthly: $194

💡 For this workload, SageMaker is more cost-effective!

💰 Cost Tip

The break-even point between token-based and instance-based pricing depends on your usage patterns. Use the Staque IO AI Recommendations feature to get personalized cost analysis!