Deployment Guide
Staque IO supports deploying AI models across multiple platforms. Each platform has its own characteristics, pricing model, and use cases.
Deployment Platforms
AWS Bedrock
Managed foundation models with pay-per-use pricing
✓ Token-based pricing
✓ Instant availability
AWS SageMaker
Custom model hosting with full control
✓ Instance-based pricing
✓ VPC isolation
NVIDIA NIM
Hosted NVIDIA models via API
✓ No AWS required
✓ Global availability
Deployment Workflow
1. Browse Available Models
└─ GET /api/models/{platform}
2. Get AI Recommendation (Optional)
└─ POST /api/ai/recommendations
3. Create Conversation & Deploy Resource
└─ POST /api/conversations
├─ Creates conversation
├─ Deploys resource
└─ Creates model configuration
4. Configure System Prompt (Optional)
└─ POST /api/bedrock/system-prompt
5. Start Chatting
└─ POST /api/chat/threadPlatform Comparison
| Feature | AWS Bedrock | AWS SageMaker | NVIDIA NIM |
|---|---|---|---|
| Setup Time | Instant | 5-15 minutes | Instant |
| Pricing Model | Per token | Per hour | Per token |
| Idle Cost | $0 | Instance cost | $0 |
| Customization | System prompt only | Full model control | System prompt only |
| Scalability | Auto-scaled | Manual scaling | Auto-scaled |
| VPC Support | No | Yes | No |
| Best For | Quick start, foundation models | Custom models, VPC requirements | Multi-cloud, specific NVIDIA models |
General Deployment Steps
Step 1: Choose Your Model
Browse available models through the UI or API. Consider factors like:
- Use Case: Text generation, code, embeddings, etc.
- Performance: Response time, throughput requirements
- Cost: Token pricing vs instance pricing
- Compliance: Data residency, encryption requirements
Step 2: Configure Deployment
Provide deployment configuration:
{
"title": "Production AI Assistant",
"use_case": "customer-support",
"deployed_resource": {
"resource_name": "Support Bot",
"resource_type": "bedrock", // or "sagemaker", "nvidia-nim"
"aws_resource_id": "amazon.nova-pro-v1:0",
"region": "eu-north-1",
"estimated_hourly_cost": 0
}
}Step 3: Set System Prompt
Configure the model's behavior with a custom system prompt:
POST /api/bedrock/system-prompt
{
"modelId": "amazon.nova-pro-v1:0",
"systemPrompt": "You are a helpful customer support assistant..."
}Step 4: Test and Monitor
After deployment:
- Test the model with sample queries
- Monitor response times and error rates
- Track token usage and costs
- Adjust configuration as needed
Environment Variables
Required environment variables for deployment:
# AWS Configuration (Required for Bedrock & SageMaker) STAQUE_AWS_REGION=eu-north-1 STAQUE_AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE STAQUE_AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY # SageMaker Specific (Required for SageMaker deployments) SAGEMAKER_SUBNET_IDS=subnet-12345,subnet-67890 SAGEMAKER_SECURITY_GROUP_IDS=sg-12345678 SAGEMAKER_EXECUTION_ROLE_ARN=arn:aws:iam::123456789012:role/SageMakerRole # NVIDIA NIM (Required for NVIDIA NIM deployments) NVIDIA_API_KEY=nvapi-xxxxxxxxxxxxx NIM_BASE_URL=https://integrate.api.nvidia.com
Cost Optimization
Bedrock & NIM (Token-Based)
- No idle costs - only pay for what you use
- Optimize prompts to reduce token usage
- Use cheaper models for simple tasks
- Implement caching for repeated queries
SageMaker (Instance-Based)
- Right-size instances based on actual load
- Use auto-scaling to match demand
- Delete endpoints when not in use
- Consider Spot instances for dev/test
- Use smaller instances for low-traffic models
Security Best Practices
- AWS Credentials: Use IAM roles with minimal required permissions
- API Keys: Never commit keys to version control
- VPC: Use VPC isolation for SageMaker endpoints handling sensitive data
- Encryption: Enable encryption at rest and in transit
- Access Control: Implement role-based access in your application
- Monitoring: Set up CloudWatch alarms for unusual activity
Troubleshooting
Common Issues
Error: Model not found
Solution: Verify model ID is correct and available in your region. Some models are region-specific.
Error: Insufficient permissions
Solution: Check IAM permissions for your AWS credentials. Ensure bedrock:InvokeModel or sagemaker:InvokeEndpoint is granted.
SageMaker endpoint stuck in "Creating"
Solution: Check VPC configuration (subnets, security groups). Ensure execution role has required permissions. Typical creation time is 5-10 minutes.