Deployment Guide

Staque IO supports deploying AI models across multiple platforms. Each platform has its own characteristics, pricing model, and use cases.

Deployment Platforms

AWS Bedrock

Managed foundation models with pay-per-use pricing

✓ No infrastructure management
✓ Token-based pricing
✓ Instant availability

AWS SageMaker

Custom model hosting with full control

✓ Custom models
✓ Instance-based pricing
✓ VPC isolation

NVIDIA NIM

Hosted NVIDIA models via API

✓ API-based access
✓ No AWS required
✓ Global availability

Deployment Workflow

1. Browse Available Models
   └─ GET /api/models/{platform}

2. Get AI Recommendation (Optional)
   └─ POST /api/ai/recommendations

3. Create Conversation & Deploy Resource
   └─ POST /api/conversations
       ├─ Creates conversation
       ├─ Deploys resource
       └─ Creates model configuration

4. Configure System Prompt (Optional)
   └─ POST /api/bedrock/system-prompt

5. Start Chatting
   └─ POST /api/chat/thread

Platform Comparison

Feature	AWS Bedrock	AWS SageMaker	NVIDIA NIM
Setup Time	Instant	5-15 minutes	Instant
Pricing Model	Per token	Per hour	Per token
Idle Cost	$0	Instance cost	$0
Customization	System prompt only	Full model control	System prompt only
Scalability	Auto-scaled	Manual scaling	Auto-scaled
VPC Support	No	Yes	No
Best For	Quick start, foundation models	Custom models, VPC requirements	Multi-cloud, specific NVIDIA models

General Deployment Steps

Step 1: Choose Your Model

Browse available models through the UI or API. Consider factors like:

Use Case: Text generation, code, embeddings, etc.
Performance: Response time, throughput requirements
Cost: Token pricing vs instance pricing
Compliance: Data residency, encryption requirements

Step 2: Configure Deployment

Provide deployment configuration:

{
  "title": "Production AI Assistant",
  "use_case": "customer-support",
  "deployed_resource": {
    "resource_name": "Support Bot",
    "resource_type": "bedrock",  // or "sagemaker", "nvidia-nim"
    "aws_resource_id": "amazon.nova-pro-v1:0",
    "region": "eu-north-1",
    "estimated_hourly_cost": 0
  }
}

Step 3: Set System Prompt

Configure the model's behavior with a custom system prompt:

POST /api/bedrock/system-prompt

{
  "modelId": "amazon.nova-pro-v1:0",
  "systemPrompt": "You are a helpful customer support assistant..."
}

Step 4: Test and Monitor

After deployment:

Test the model with sample queries
Monitor response times and error rates
Track token usage and costs
Adjust configuration as needed

Environment Variables

Required environment variables for deployment:

# AWS Configuration (Required for Bedrock & SageMaker)
STAQUE_AWS_REGION=eu-north-1
STAQUE_AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
STAQUE_AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# SageMaker Specific (Required for SageMaker deployments)
SAGEMAKER_SUBNET_IDS=subnet-12345,subnet-67890
SAGEMAKER_SECURITY_GROUP_IDS=sg-12345678
SAGEMAKER_EXECUTION_ROLE_ARN=arn:aws:iam::123456789012:role/SageMakerRole

# NVIDIA NIM (Required for NVIDIA NIM deployments)
NVIDIA_API_KEY=nvapi-xxxxxxxxxxxxx
NIM_BASE_URL=https://integrate.api.nvidia.com

Cost Optimization

Bedrock & NIM (Token-Based)

No idle costs - only pay for what you use
Optimize prompts to reduce token usage
Use cheaper models for simple tasks
Implement caching for repeated queries

SageMaker (Instance-Based)

Right-size instances based on actual load
Use auto-scaling to match demand
Delete endpoints when not in use
Consider Spot instances for dev/test
Use smaller instances for low-traffic models

Security Best Practices

AWS Credentials: Use IAM roles with minimal required permissions
API Keys: Never commit keys to version control
VPC: Use VPC isolation for SageMaker endpoints handling sensitive data
Encryption: Enable encryption at rest and in transit
Access Control: Implement role-based access in your application
Monitoring: Set up CloudWatch alarms for unusual activity

Troubleshooting

Common Issues

Error: Model not found

Solution: Verify model ID is correct and available in your region. Some models are region-specific.

Error: Insufficient permissions

Solution: Check IAM permissions for your AWS credentials. Ensure bedrock:InvokeModel or sagemaker:InvokeEndpoint is granted.

SageMaker endpoint stuck in "Creating"

Solution: Check VPC configuration (subnets, security groups). Ensure execution role has required permissions. Typical creation time is 5-10 minutes.