Three users uploaded tests simultaneously. Two succeeded, one failed with an IAM permission error.
Sequential testing hides race conditions.
This post covers what breaking taught me about scaling — Lambda concurrency limits, API Gateway throttling, Bedrock rate limits, S3 eventual consistency quirks, and how I shrunk the frontend bundle from 890KB to 340KB.
Lambda Concurrency: The Hidden Limit
AWS Lambda scales automatically. That’s the sales pitch.
The reality: each account has a default concurrency limit (usually 1,000 concurrent invocations per region). And individual functions can starve each other if you don’t set reserved concurrency.
The Problem
AnalyzeFunction and ProcessorFunction share the same concurrency pool. If ProcessorFunction gets invoked 50 times simultaneously (slow, long-running jobs), it can starve AnalyzeFunction.
Result: new uploads fail with IAM timeout errors because AnalyzeFunction can’t get a container to run.
The Fix: Reserved Concurrency
ProcessorFunction:
Type: AWS::Serverless::Function
Properties:
ReservedConcurrentExecutions: 50 # Cap processor
# ... rest of config
AnalyzeFunction:
Type: AWS::Serverless::Function
Properties:
ReservedConcurrentExecutions: 100 # Guarantee analyze capacity
Now ProcessorFunction can’t consume all available concurrency. AnalyzeFunction always has capacity to accept new jobs.
Trade-off: max 50 concurrent analyses. At ~20 seconds per analysis, that’s 50 * 3 = 150 tests/minute throughput.
For Blood’s current scale? More than enough. If we hit this limit, it’s a good problem to have.
API Gateway Throttling
Default API Gateway throttling: burst 5,000 requests, rate 10,000/second.
I set it lower: burst 30, rate 10/s.
Why? Because I want to catch abuse early and fail gracefully. Better to throttle at the gateway than let Lambdas spin up uncontrollably and rack up costs.
HttpApi:
Type: AWS::Serverless::HttpApi
Properties:
DefaultRouteSettings:
ThrottlingBurstLimit: 30
ThrottlingRateLimit: 10
This is day-one throttling. Production might need adjustment based on actual traffic patterns.
Bedrock Throttling & Backoff
Amazon Bedrock has rate limits too. For Nova 2 Lite in eu-west-3:
- Input tokens: 400,000 TPM (tokens per minute)
- Output tokens: 80,000 TPM
- Invocations: varies by model
When you hit the limit, Bedrock returns ThrottlingException.
The Fix: Exponential Backoff + Jitter
import random
import time
from botocore.exceptions import ClientError
def invoke_with_backoff(client, prompt, max_retries=5):
for attempt in range(max_retries):
try:
return client.invoke_model(modelId=model_id, body=prompt)
except ClientError as e:
if e.response['Error']['Code'] == 'ThrottlingException':
if attempt == max_retries - 1:
raise
# Exponential backoff + jitter
delay = (2 ** attempt) + random.uniform(0, 1)
time.sleep(delay)
else:
raise
Without jitter, multiple clients retrying simultaneously can create thundering herd problems. Jitter spreads retries out.
S3 Eventual Consistency Quirks
S3 is “strongly consistent” now (since 2020). But there’s a catch: that applies to read-after-write for the same key.
In Blood’s async job pattern:
ProcessorFunctionwrites results tos3://jobs/{job_id}.json- Frontend polls
/api/status/{job_id}which reads from S3
If the poll hits before S3 propagates the write, you get a 404.
The Fix: Polling with Grace Period
Frontend polls every 2 seconds. First poll starts 4 seconds after job submission. This gives S3 time to propagate.
Also added retry logic in the status endpoint:
try:
obj = s3.get_object(Bucket=JOBS_BUCKET, Key=f"{job_id}.json")
except ClientError as e:
if e.response['Error']['Code'] == 'NoSuchKey':
return {"status": "processing", "stage": current_stage}
raise
Missing key = still processing, not error.
Frontend Bundle Optimization
Initial build: 890KB gzipped.
That’s unacceptable for mobile users on 3G. Target: <500KB.
What Was Bloated the Bundle
- Entire
markerInfo.js(283KB) — imported at top level, loaded on every page - Unused component code — Svelte tree-shaking wasn’t aggressive enough
- Duplicate dependencies — both
pdfjs-distandpdfplumberJS wrapper
The Fixes
1. Lazy-load marker data:
// Old: import { CANONICAL } from './markerInfo.js'
// New: load on-demand only when needed
const loadMarkerInfo = () => import('./markerInfo.js')
2. Code-split by route:
// Vite config
build: {
rollupOptions: {
output: {
manualChunks: {
vendor: ['svelte'],
results: ['./src/routes/results/+page.svelte'],
trends: ['./src/routes/trends/+page.svelte'],
}
}
}
}
3. Remove unused deps:
// Before
"dependencies": {
"pdfjs-dist": "^4.0", // Not actually used in frontend
"svelte": "^5.0"
}
// After
"dependencies": {
"svelte": "^5.0"
}
Results
| Metric | Before | After | Improvement |
|---|---|---|---|
| Total bundle | 890KB | 340KB | -62% |
| Initial load | 420KB | 145KB | -65% |
| FCP (3G) | ~8s | ~3s | -62% |
Mobile users thank me.
CloudWatch Monitoring
Can’t improve what you don’t measure.
Set up CloudWatch dashboards tracking:
- Lambda invocations (success/error)
- Lambda duration (p50, p95, p99)
- Bedrock token usage (input/output)
- API Gateway 4xx/5xx rates
- S3 request rates
Alarm thresholds:
- Error rate > 5% → Discord notification
- Duration p95 > 45s → investigate
- Token usage spike → check for abuse
Load Testing Results
Ran Artillery against test environment: 100 virtual users over 5 minutes.
Results:
- p50 latency: 18s
- p95 latency: 34s
- p99 latency: 41s
- Error rate: 0.3% (3/1000 requests)
The outliers were all IAM race conditions during cold starts. Warm Lambdas performed consistently.
Target met: <3s warm, <8s cold (for Lambda startup, not full analysis time).
This is post #5 in the Blood Development Log series. Read post #4 → | Series index →