Scaling & Performance

Three users uploaded tests simultaneously. Two succeeded, one failed with an IAM permission error.

Sequential testing hides race conditions.

This post covers what breaking taught me about scaling — Lambda concurrency limits, API Gateway throttling, Bedrock rate limits, S3 eventual consistency quirks, and how I shrunk the frontend bundle from 890KB to 340KB.

Lambda Concurrency: The Hidden Limit

AWS Lambda scales automatically. That’s the sales pitch.

The reality: each account has a default concurrency limit (usually 1,000 concurrent invocations per region). And individual functions can starve each other if you don’t set reserved concurrency.

The Problem

AnalyzeFunction and ProcessorFunction share the same concurrency pool. If ProcessorFunction gets invoked 50 times simultaneously (slow, long-running jobs), it can starve AnalyzeFunction.

Result: new uploads fail with IAM timeout errors because AnalyzeFunction can’t get a container to run.

The Fix: Reserved Concurrency

ProcessorFunction:
  Type: AWS::Serverless::Function
  Properties:
    ReservedConcurrentExecutions: 50  # Cap processor
    # ... rest of config

AnalyzeFunction:
  Type: AWS::Serverless::Function
  Properties:
    ReservedConcurrentExecutions: 100  # Guarantee analyze capacity

Now ProcessorFunction can’t consume all available concurrency. AnalyzeFunction always has capacity to accept new jobs.

Trade-off: max 50 concurrent analyses. At ~20 seconds per analysis, that’s 50 * 3 = 150 tests/minute throughput.

For Blood’s current scale? More than enough. If we hit this limit, it’s a good problem to have.

API Gateway Throttling

Default API Gateway throttling: burst 5,000 requests, rate 10,000/second.

I set it lower: burst 30, rate 10/s.

Why? Because I want to catch abuse early and fail gracefully. Better to throttle at the gateway than let Lambdas spin up uncontrollably and rack up costs.

HttpApi:
  Type: AWS::Serverless::HttpApi
  Properties:
    DefaultRouteSettings:
      ThrottlingBurstLimit: 30
      ThrottlingRateLimit: 10

This is day-one throttling. Production might need adjustment based on actual traffic patterns.

Bedrock Throttling & Backoff

Amazon Bedrock has rate limits too. For Nova 2 Lite in eu-west-3:

Input tokens: 400,000 TPM (tokens per minute)
Output tokens: 80,000 TPM
Invocations: varies by model

When you hit the limit, Bedrock returns ThrottlingException.

The Fix: Exponential Backoff + Jitter

import random
import time
from botocore.exceptions import ClientError

def invoke_with_backoff(client, prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.invoke_model(modelId=model_id, body=prompt)
        except ClientError as e:
            if e.response['Error']['Code'] == 'ThrottlingException':
                if attempt == max_retries - 1:
                    raise
                # Exponential backoff + jitter
                delay = (2 ** attempt) + random.uniform(0, 1)
                time.sleep(delay)
            else:
                raise

Without jitter, multiple clients retrying simultaneously can create thundering herd problems. Jitter spreads retries out.

S3 Eventual Consistency Quirks

S3 is “strongly consistent” now (since 2020). But there’s a catch: that applies to read-after-write for the same key.

In Blood’s async job pattern:

ProcessorFunction writes results to s3://jobs/{job_id}.json
Frontend polls /api/status/{job_id} which reads from S3

If the poll hits before S3 propagates the write, you get a 404.

The Fix: Polling with Grace Period

Frontend polls every 2 seconds. First poll starts 4 seconds after job submission. This gives S3 time to propagate.

Also added retry logic in the status endpoint:

try:
    obj = s3.get_object(Bucket=JOBS_BUCKET, Key=f"{job_id}.json")
except ClientError as e:
    if e.response['Error']['Code'] == 'NoSuchKey':
        return {"status": "processing", "stage": current_stage}
    raise

Missing key = still processing, not error.

Frontend Bundle Optimization

Initial build: 890KB gzipped.

That’s unacceptable for mobile users on 3G. Target: <500KB.

What Was Bloated the Bundle

Entire markerInfo.js (283KB) — imported at top level, loaded on every page
Unused component code — Svelte tree-shaking wasn’t aggressive enough
Duplicate dependencies — both pdfjs-dist and pdfplumber JS wrapper

The Fixes

1. Lazy-load marker data:

// Old: import { CANONICAL } from './markerInfo.js'
// New: load on-demand only when needed
const loadMarkerInfo = () => import('./markerInfo.js')

2. Code-split by route:

// Vite config
build: {
  rollupOptions: {
    output: {
      manualChunks: {
        vendor: ['svelte'],
        results: ['./src/routes/results/+page.svelte'],
        trends: ['./src/routes/trends/+page.svelte'],
      }
    }
  }
}

3. Remove unused deps:

// Before
"dependencies": {
  "pdfjs-dist": "^4.0",  // Not actually used in frontend
  "svelte": "^5.0"
}

// After
"dependencies": {
  "svelte": "^5.0"
}

Results

Metric	Before	After	Improvement
Total bundle	890KB	340KB	-62%
Initial load	420KB	145KB	-65%
FCP (3G)	~8s	~3s	-62%

Mobile users thank me.

CloudWatch Monitoring

Can’t improve what you don’t measure.

Set up CloudWatch dashboards tracking:

Lambda invocations (success/error)
Lambda duration (p50, p95, p99)
Bedrock token usage (input/output)
API Gateway 4xx/5xx rates
S3 request rates

Alarm thresholds:

Error rate > 5% → Discord notification
Duration p95 > 45s → investigate
Token usage spike → check for abuse

Load Testing Results

Ran Artillery against test environment: 100 virtual users over 5 minutes.

Results:

p50 latency: 18s
p95 latency: 34s
p99 latency: 41s
Error rate: 0.3% (3/1000 requests)

The outliers were all IAM race conditions during cold starts. Warm Lambdas performed consistently.

Target met: <3s warm, <8s cold (for Lambda startup, not full analysis time).

This is post #5 in the Blood Development Log series. Read post #4 → | Series index →