The Hard Problems
Resumability isn't a silver bullet. Understanding its limitations is essential for building robust systems.
Challenge 1: What Can't Be Serialized
Some state is inherently ephemeral or external
External API State
Database connections, open file handles, active WebSocket connections, OAuth tokens mid-refresh.
Time-Sensitive Information
Stock prices, inventory counts, weather data, real-time analytics that change between checkpoint and restore.
User Session Changes
If the user logs out, changes permissions, or their account state changes between sessions.
Model Version Differences
When restoring to a different model version, learned behaviors or capabilities may differ.
Mitigation Strategy
Checkpoints should include metadata about external dependencies so the restoration process can validate or refresh them:
interface CheckpointDependencies {
externalAPIs: Array<{
name: string;
lastVerified: number;
refreshStrategy: 'revalidate' | 'invalidate' | 'ignore';
}>;
timeSensitiveData: Array<{
key: string;
value: unknown;
capturedAt: number;
staleDuration: number; // After this many ms, mark as stale
}>;
sessionRequirements: {
userId: string;
requiredPermissions: string[];
validUntil?: number;
};
}Challenge 2: Validation & Staleness
When is a checkpoint still valid? How do we know?
Staleness Factors
Time Decay
Confidence decreases as checkpoint ages
confidence = originalConfidence × e^(-λt)After 24 hours, a 0.95 confidence fact might be 0.7
External Events
Known world changes invalidate specific beliefs
if (event.affects(belief)) belief.confidence = 0API deprecation announcement invalidates integration plans
Dependency Chains
If a parent fact becomes stale, children may too
child.maxConfidence = min(ancestors.confidence)If 'user is admin' is stale, 'user can delete' is too
Full Restore Strategy
When checkpoint is recent and context hasn't changed.
- • Load entire checkpoint as-is
- • Trust all facts and beliefs
- • Resume from exact point
Partial Restore Strategy
When some facts are stale but structure is valid.
- • Load structure and relationships
- • Re-verify time-sensitive facts
- • Mark stale beliefs for re-evaluation
1 function validateCheckpoint( 2 checkpoint: CognitiveCheckpoint, 3 currentContext: Context 4 ): ValidationResult { 5 const warnings: string[] = []; 6 const invalidations: string[] = []; 7 8 // Check time-based staleness 9 const ageHours = (Date.now() - checkpoint.timestamp) / (1000 * 60 * 60); 10 if (ageHours > 24) { 11 warnings.push(`Checkpoint is ${ageHours.toFixed(1)}h old`); 12 } 13 14 // Check external dependencies 15 for (const fact of checkpoint.epistemicState.facts) { 16 if (fact.source.startsWith('api:')) { 17 const apiName = fact.source.slice(4); 18 if (!currentContext.availableAPIs.includes(apiName)) { 19 invalidations.push(`API ${apiName} no longer available`); 20 } 21 } 22 } 23 24 // Check user context 25 if (checkpoint.metadata?.userId !== currentContext.userId) { 26 invalidations.push('Different user context'); 27 } 28 29 return { 30 valid: invalidations.length === 0, 31 warnings, 32 invalidations, 33 suggestedStrategy: invalidations.length > 0 34 ? 'partial' 35 : warnings.length > 0 36 ? 'verify' 37 : 'full', 38 }; 39 }
Challenge 3: Security & Privacy
Checkpoints contain sensitive cognitive state
Sensitive Data in State
- •PII from conversations
- •API keys observed during tool use
- •Proprietary business logic
- •Medical/legal information
Access Control Challenges
- •Who can restore a checkpoint?
- •Can checkpoints be shared?
- •Cross-organization transfer
- •Audit trail requirements
Compliance Considerations
- •GDPR right to deletion
- •HIPAA data handling
- •SOC2 access controls
- •Data residency requirements
Security Best Practices
Encryption
- • Encrypt checkpoints at rest (AES-256)
- • Encrypt in transit (TLS 1.3)
- • Per-user encryption keys
- • Key rotation policies
Access Control
- • Owner-only access by default
- • Explicit sharing with consent
- • Time-limited access tokens
- • Audit logging of all access
1 interface SecureCheckpoint extends CognitiveCheckpoint { 2 security: { 3 encryptedAt: number; 4 algorithm: 'AES-256-GCM'; 5 keyId: string; // Reference to key in KMS 6 7 accessControl: { 8 owner: string; // User ID 9 allowedUsers: string[]; // Explicit shares 10 expiresAt?: number; // Auto-delete time 11 }; 12 13 redaction: { 14 piiRemoved: boolean; 15 sensitiveFieldsHashed: string[]; 16 redactionPolicy: string; 17 }; 18 19 compliance: { 20 dataClassification: 'public' | 'internal' | 'confidential' | 'restricted'; 21 retentionPolicy: string; 22 auditLogId: string; 23 }; 24 }; 25 } 26 27 async function storeSecureCheckpoint( 28 checkpoint: CognitiveCheckpoint, 29 options: SecurityOptions 30 ): Promise<SecureCheckpoint> { 31 // 1. Redact sensitive information 32 const redacted = await redactPII(checkpoint, options.redactionPolicy); 33 34 // 2. Encrypt the content 35 const { ciphertext, keyId } = await encrypt( 36 JSON.stringify(redacted), 37 options.keyId 38 ); 39 40 // 3. Store with access controls 41 const secure: SecureCheckpoint = { 42 ...checkpoint, 43 // Replace content with encrypted version 44 nodes: ciphertext as any, 45 security: { 46 encryptedAt: Date.now(), 47 algorithm: 'AES-256-GCM', 48 keyId, 49 accessControl: { 50 owner: options.userId, 51 allowedUsers: [], 52 expiresAt: options.ttl ? Date.now() + options.ttl : undefined, 53 }, 54 redaction: { 55 piiRemoved: true, 56 sensitiveFieldsHashed: options.hashFields || [], 57 redactionPolicy: options.redactionPolicy, 58 }, 59 compliance: options.compliance, 60 }, 61 }; 62 63 // 4. Log to audit trail 64 await auditLog.record({ 65 action: 'checkpoint_created', 66 checkpointId: checkpoint.id, 67 userId: options.userId, 68 timestamp: Date.now(), 69 }); 70 71 return secure; 72 }
Challenge 4: Versioning & Schema Evolution
Checkpoints need to survive schema changes
Version Compatibility Matrix
| Change Type | Backward Compatible | Migration Strategy |
|---|---|---|
| Add optional field | Yes | Use default value when missing |
| Add required field | No | Derive from existing data or mark invalid |
| Remove field | Yes | Ignore unknown fields on read |
| Rename field | No | Migration function to map old → new |
| Change field type | No | Version-aware deserialization |
| Add new node type | Yes | Treat unknown types as generic |
Schema Version Tracking
interface VersionedCheckpoint {
schemaVersion: number; // Increment on breaking changes
minReaderVersion: number; // Minimum compatible reader
migrations: Array<{
fromVersion: number;
toVersion: number;
applied: boolean;
}>;
}Migration Registry
const migrations: Migration[] = [
{
from: 1, to: 2,
migrate: (cp) => ({
...cp,
// v2 split 'metadata' into separate fields
metrics: cp.metadata?.metrics,
tags: cp.metadata?.tags,
}),
},
];Git-Inspired Versioning
Just like git manages code history, cognitive state can be version-controlled:
- 1Commits = Checkpoints with parent references
- 2Branches = Parallel reasoning paths
- 3Merge = Combining branch insights
- 4Diff = Incremental checkpoints
Summary: When to Use Resumability
Good Fit
- ✓Long-running conversations (hours/days)
- ✓Complex multi-step reasoning tasks
- ✓Frequent resume/pause patterns
- ✓Branching exploration needed
- ✓Cost-sensitive applications
Poor Fit
- ✗Short, stateless Q&A
- ✗Highly time-sensitive data required
- ✗Context changes completely between sessions
- ✗Strict compliance with no storage
- ✗Simple tasks where overhead isn't worth it