Building Resumable AI Systems
Complete technical reference with data structures, code examples, and integration patterns for implementing cognitive state resumability.
Data Structures
Core types for representing cognitive state
CognitiveNode
The fundamental unit of reasoning. Each node represents a single cognitive step with full context about its purpose, status, and relationships.
1 interface CognitiveNode { 2 id: string; // Unique identifier 3 type: NodeType; // intent | decision | reasoning | tool-call | observation | reflection | checkpoint 4 phase: NodePhase; // setup | exploration | planning | execution | verification | reflection 5 status: NodeStatus; // pending | active | paused | completed | failed | deferred 6 7 // Core content 8 label: string; // Human-readable summary 9 description?: string; // Detailed explanation 10 11 // Reasoning context - CRITICAL for resumability 12 assumptions: string[]; // What we're assuming to be true 13 openQuestions: string[]; // Unresolved uncertainties 14 confidence: number; // 0-1 confidence in this step 15 16 // Tool binding state - preserves mid-execution tool calls 17 toolBinding?: { 18 toolName: string; 19 inputs: Record<string, unknown>; 20 inputsComplete: boolean; 21 rejectedAlternatives?: Array<{ 22 toolName: string; 23 reason: string; 24 }>; 25 }; 26 27 // Graph structure 28 parentId?: string; // Parent node in reasoning tree 29 childIds: string[]; // Child nodes (sub-steps) 30 blockedBy?: string[]; // Nodes that must complete first 31 32 // Timing metadata 33 createdAt: number; 34 startedAt?: number; 35 completedAt?: number; 36 37 // Checkpoint reference 38 checkpointId?: string; // Which checkpoint created this 39 branchId: string; // Which branch this belongs to 40 }
EpistemicState
Represents what the AI "knows" - the accumulated knowledge, assumptions, open questions, and beliefs with their confidence levels.
1 interface EpistemicState { 2 // Verified facts with provenance 3 facts: Array<{ 4 id: string; 5 content: string; // The fact itself 6 source: string; // Where it came from (tool output, user, inference) 7 confidence: number; // How certain we are (0-1) 8 }>; 9 10 // Active assumptions - things we're treating as true but haven't verified 11 assumptions: Array<{ 12 id: string; 13 content: string; 14 validatedAt?: number; // When (if ever) this was validated 15 }>; 16 17 // Unresolved questions that may block progress 18 openQuestions: Array<{ 19 id: string; 20 question: string; 21 priority: 'high' | 'medium' | 'low'; 22 blocksNodes: string[]; // Which nodes are waiting on this answer 23 }>; 24 25 // Beliefs with supporting evidence 26 beliefs: Array<{ 27 id: string; 28 statement: string; 29 confidence: number; 30 evidence: string[]; // IDs of facts supporting this belief 31 }>; 32 }
IntentGraph
Hierarchical structure of goals and sub-goals. This is the AI's "to-do list" with priorities and dependencies.
1 interface IntentGraph { 2 // Hierarchical goals 3 goals: Array<{ 4 id: string; 5 description: string; 6 status: 'active' | 'achieved' | 'abandoned' | 'deferred'; 7 subGoalIds: string[]; // Child goals 8 parentGoalId?: string; // Parent goal (if any) 9 priority: number; // Higher = more important 10 }>; 11 12 // The root intent - the original user request 13 rootIntentId: string; 14 }
CognitiveCheckpoint
A complete snapshot of the cognitive state at a point in time. This is what enables instant restoration.
1 interface CognitiveCheckpoint { 2 id: string; 3 timestamp: number; 4 label: string; // Human-readable name 5 description?: string; 6 7 // Full state snapshot 8 nodes: CognitiveNode[]; 9 epistemicState: EpistemicState; 10 intentGraph: IntentGraph; 11 12 // Branch information 13 branchId: string; 14 parentCheckpointId?: string; // For checkpoint chains 15 16 // Metrics at checkpoint time 17 metrics: { 18 totalNodes: number; 19 completedNodes: number; 20 tokenEstimate: number; // Estimated tokens to represent this state 21 elapsedMs: number; // Time since session start 22 }; 23 }
Continuation Record (CR)
The core artifact for cross-agent resumability
The ContinuationRecord is what gets serialized, passed between agents, and enables "resumable cognition". Unlike raw conversation history, it captures structured state that any thinking agent (LLM, human, deterministic system) can use to continue work.
ContinuationRecord
1 interface ContinuationRecord { 2 // === Identity & Audit Trail === 3 id: string; 4 version: string; // Schema version 5 contentHash: string; // Hash of CR content (immutable link) 6 parentCRHash?: string; // Links to previous CR in chain 7 merkleRoot?: string; // For efficient history verification 8 9 // === Task Definition === 10 task: { 11 id: string; 12 goal: string; 13 successCriteria: string[]; 14 status: 'not-started' | 'in-progress' | 'blocked' | 'completed' | 'failed'; 15 startedAt?: number; 16 completedAt?: number; 17 }; 18 19 // === Authorship === 20 createdBy: CRAuthor; 21 createdAt: number; 22 lastModifiedBy: CRAuthor; 23 lastModifiedAt: number; 24 25 // === Execution Summary === 26 executionSummary: { 27 stepsCompleted: number; 28 stepsFailed: number; 29 tokensConsumed: number; 30 elapsedMs: number; 31 actions: Array<{ 32 description: string; 33 timestamp: number; 34 outcome: 'success' | 'failure' | 'partial'; 35 }>; 36 }; 37 38 // === Core State === 39 decisions: CRDecision[]; // Tracked choices 40 learnings: CRLearning[]; // Captured insights 41 questionsForUser: CRQuestion[]; // Human intervention needed 42 blockers: CRBlocker[]; // Obstacles 43 resumePoints: CRResumePoint[]; // Continuation gates 44 toolEffects: CRToolEffect[]; // Idempotent action log 45 invariants: CRInvariant[]; // Must-remain-true constraints 46 47 // === Metadata === 48 metadata: { 49 tags: string[]; 50 priority: 'critical' | 'high' | 'medium' | 'low'; 51 estimatedRemainingWork?: string; 52 }; 53 }
CRDecision
Tracked decisions with status, rationale, and override capability. Enables humans to contest or override agent choices.
1 type DecisionStatus = 'tentative' | 'accepted' | 'rejected' | 'contested'; 2 3 interface CRDecision { 4 id: string; 5 description: string; 6 status: DecisionStatus; 7 rationale: string; 8 madeBy: CRAuthor; 9 madeAt: number; 10 11 // For contested decisions - show the disagreement 12 variants?: Array<{ 13 author: CRAuthor; 14 decision: string; 15 rationale: string; 16 }>; 17 18 // What this decision affects 19 affects: string[]; // Node IDs or goal IDs 20 21 // Who can change this 22 overridableBy: ('human' | 'agent' | 'system')[]; 23 }
CRResumePoint
Explicit continuation gates. Define when work can continue, what authority is required, and what happens on timeout.
1 interface CRResumePoint { 2 id: string; 3 label: string; 4 description: string; 5 6 // When this applies 7 when: string; // Human-readable condition 8 9 // Gate requirements 10 preconditions: string[]; 11 requiredAuthority: 'agent' | 'human' | 'system' | 'any'; 12 13 // Safety 14 safeToAutoExecute: boolean; 15 16 // Timeout behavior 17 onTimeout: 'abort' | 'escalate' | 'continue' | 'retry'; 18 timeoutMs?: number; 19 20 // Approval tracking 21 approvalStatus: 'pending' | 'approved' | 'denied' | 'auto-approved'; 22 approvedBy?: CRAuthor; 23 approvedAt?: number; 24 25 // What happens next 26 nextSteps: string[]; 27 alternativePaths?: Array<{ 28 condition: string; 29 steps: string[]; 30 }>; 31 }
CRQuestion
Human intervention requests. Four types: clarification questions, approval requests, override requests, and scope changes.
1 type InterventionType = 'question' | 'approval' | 'override' | 'scope-change'; 2 3 interface CRQuestion { 4 id: string; 5 type: InterventionType; 6 question: string; 7 context: string; 8 options?: string[]; // For multiple choice 9 status: 'open' | 'answered' | 'expired' | 'skipped'; 10 11 // Answer tracking 12 answer?: string; 13 answeredBy?: CRAuthor; 14 answeredAt?: number; 15 16 // Impact 17 blocksResumePoints: string[]; // What's blocked until answered 18 priority: 'blocking' | 'high' | 'medium' | 'low'; 19 expiresAt?: number; 20 }
CRToolEffect
Record of tool invocations for idempotency. Prevents re-execution of side effects on resume.
1 interface CRToolEffect { 2 id: string; 3 toolName: string; 4 operation: string; 5 inputs: Record<string, unknown>; 6 outputs?: Record<string, unknown>; 7 8 // Idempotency 9 idempotencyKey?: string; // Unique key for deduplication 10 isReplayable: boolean; // Safe to re-run? 11 12 // Effects 13 sideEffects: string[]; // What this changed 14 reversible: boolean; // Can we undo it? 15 16 // Timing 17 executedAt: number; 18 durationMs: number; 19 }
CRStore Interface
The interface for managing Continuation Records. Supports CRUD, human interventions, history, and diff/merge.
1 interface CRStore { 2 // CRUD 3 create(cr: Omit<ContinuationRecord, 'id' | 'contentHash'>): ContinuationRecord; 4 get(id: string): ContinuationRecord | undefined; 5 getByHash(hash: string): ContinuationRecord | undefined; 6 7 // Updates (creates new version in chain) 8 update(id: string, updates: Partial<CR>, author: CRAuthor): ContinuationRecord; 9 10 // Human interventions 11 answerQuestion(crId: string, questionId: string, answer: string, author: CRAuthor): CR; 12 approveResumePoint(crId: string, resumePointId: string, author: CRAuthor): CR; 13 denyResumePoint(crId: string, resumePointId: string, reason: string, author: CRAuthor): CR; 14 overrideDecision(crId: string, decisionId: string, newDecision: string, author: CRAuthor): CR; 15 16 // History & verification 17 getHistory(id: string): ContinuationRecord[]; 18 getAncestors(hash: string): ContinuationRecord[]; 19 verifyChain(crId: string): { valid: boolean; brokenAt?: string }; 20 21 // Diff/Merge (version control for thought) 22 diff(crIdA: string, crIdB: string): CRDiff; 23 merge(crIdA: string, crIdB: string, strategy: 'prefer-a' | 'prefer-b' | 'manual'): CRMergeResult; 24 25 // Events 26 subscribe(callback: (event: CREvent) => void): () => void; 27 }
Code Examples
Working implementations you can use
Creating a Checkpoint
Capture the current cognitive state into a restorable checkpoint.
1 import { v4 as uuid } from 'uuid'; 2 import type { CognitiveSession, CognitiveCheckpoint } from './types'; 3 4 export function createCheckpoint( 5 session: CognitiveSession, 6 options: { label: string; description?: string } 7 ): CognitiveCheckpoint { 8 const { nodes, epistemicState, intentGraph } = session.workingState; 9 const now = Date.now(); 10 11 // Calculate metrics 12 const completedNodes = nodes.filter(n => n.status === 'completed').length; 13 const tokenEstimate = estimateTokens(session); 14 15 return { 16 id: uuid(), 17 timestamp: now, 18 label: options.label, 19 description: options.description, 20 nodes: structuredClone(nodes), // Deep clone 21 epistemicState: structuredClone(epistemicState), 22 intentGraph: structuredClone(intentGraph), 23 branchId: session.currentBranchId, 24 parentCheckpointId: session.branches[session.currentBranchId]?.headCheckpointId, 25 metrics: { 26 totalNodes: nodes.length, 27 completedNodes, 28 tokenEstimate, 29 elapsedMs: now - session.createdAt, 30 }, 31 }; 32 }
Restoring from Checkpoint
Restore the cognitive state from a previously saved checkpoint.
1 export function restoreFromCheckpoint( 2 checkpoint: CognitiveCheckpoint, 3 currentSession?: CognitiveSession 4 ): { success: boolean; session: CognitiveSession; warnings: string[] } { 5 const warnings: string[] = []; 6 7 // Check for staleness 8 const ageHours = (Date.now() - checkpoint.timestamp) / (1000 * 60 * 60); 9 if (ageHours > 24) { 10 warnings.push(`Checkpoint is ${Math.round(ageHours)} hours old`); 11 } 12 13 // Validate checkpoint integrity 14 const validation = validateCheckpoint(checkpoint); 15 if (!validation.valid) { 16 throw new Error(`Invalid checkpoint: ${validation.errors.join(', ')}`); 17 } 18 warnings.push(...validation.warnings); 19 20 // Create restored session 21 const restoredSession: CognitiveSession = { 22 id: currentSession?.id || uuid(), 23 name: `Restored: ${checkpoint.label}`, 24 description: `Restored from checkpoint at ${new Date(checkpoint.timestamp).toISOString()}`, 25 createdAt: currentSession?.createdAt || Date.now(), 26 lastModifiedAt: Date.now(), 27 currentBranchId: checkpoint.branchId, 28 activeNodeId: undefined, 29 branches: currentSession?.branches || {}, 30 checkpoints: { 31 ...(currentSession?.checkpoints || {}), 32 [checkpoint.id]: checkpoint, 33 }, 34 workingState: { 35 nodes: structuredClone(checkpoint.nodes), 36 epistemicState: structuredClone(checkpoint.epistemicState), 37 intentGraph: structuredClone(checkpoint.intentGraph), 38 }, 39 totalMetrics: { 40 ...currentSession?.totalMetrics, 41 resumeCount: (currentSession?.totalMetrics.resumeCount || 0) + 1, 42 }, 43 }; 44 45 return { success: true, session: restoredSession, warnings }; 46 }
Checkpoint Validation
Validate checkpoint integrity before restoration.
1 export function validateCheckpoint(checkpoint: CognitiveCheckpoint): { 2 valid: boolean; 3 errors: string[]; 4 warnings: string[]; 5 } { 6 const errors: string[] = []; 7 const warnings: string[] = []; 8 const nodeIds = new Set(checkpoint.nodes.map(n => n.id)); 9 10 // Check required fields 11 if (!checkpoint.id) errors.push('Missing checkpoint ID'); 12 if (!checkpoint.label) errors.push('Missing label'); 13 if (!checkpoint.branchId) errors.push('Missing branch ID'); 14 15 // Validate node graph integrity 16 for (const node of checkpoint.nodes) { 17 // Check parent references 18 if (node.parentId && !nodeIds.has(node.parentId)) { 19 errors.push(`Node ${node.id} references missing parent`); 20 } 21 22 // Check child references 23 for (const childId of node.childIds) { 24 if (!nodeIds.has(childId)) { 25 errors.push(`Node ${node.id} references missing child`); 26 } 27 } 28 29 // Validate confidence bounds 30 if (node.confidence < 0 || node.confidence > 1) { 31 warnings.push(`Node ${node.id} has invalid confidence`); 32 } 33 } 34 35 // Check for circular references 36 if (hasCircularReferences(checkpoint.nodes)) { 37 errors.push('Circular reference detected in node graph'); 38 } 39 40 return { valid: errors.length === 0, errors, warnings }; 41 }
Incremental Checkpoints (Diff-based)
For large states, store only changes since the last checkpoint.
1 interface CheckpointDiff { 2 baseCheckpointId: string; 3 timestamp: number; 4 addedNodes: CognitiveNode[]; 5 modifiedNodes: Array<{ id: string; changes: Partial<CognitiveNode> }>; 6 removedNodeIds: string[]; 7 epistemicChanges: Partial<EpistemicState>; 8 } 9 10 export function diffCheckpoints( 11 base: CognitiveCheckpoint, 12 current: CognitiveCheckpoint 13 ): CheckpointDiff { 14 const baseNodeMap = new Map(base.nodes.map(n => [n.id, n])); 15 const currentNodeMap = new Map(current.nodes.map(n => [n.id, n])); 16 17 const addedNodes = current.nodes.filter(n => !baseNodeMap.has(n.id)); 18 const removedNodeIds = base.nodes 19 .filter(n => !currentNodeMap.has(n.id)) 20 .map(n => n.id); 21 22 const modifiedNodes = current.nodes 23 .filter(n => { 24 const baseNode = baseNodeMap.get(n.id); 25 return baseNode && JSON.stringify(baseNode) !== JSON.stringify(n); 26 }) 27 .map(n => ({ 28 id: n.id, 29 changes: computeNodeDiff(baseNodeMap.get(n.id)!, n), 30 })); 31 32 return { 33 baseCheckpointId: base.id, 34 timestamp: current.timestamp, 35 addedNodes, 36 modifiedNodes, 37 removedNodeIds, 38 epistemicChanges: computeEpistemicDiff( 39 base.epistemicState, 40 current.epistemicState 41 ), 42 }; 43 }
Integration Patterns
How to integrate with LLM APIs
Middleware Pattern
Wrap your LLM calls with automatic checkpointing middleware.
1 type LLMMiddleware = ( 2 request: LLMRequest, 3 next: (req: LLMRequest) => Promise<LLMResponse> 4 ) => Promise<LLMResponse>; 5 6 export function createCheckpointMiddleware( 7 store: CognitiveStore, 8 options: { autoCheckpointInterval?: number } = {} 9 ): LLMMiddleware { 10 let requestCount = 0; 11 const interval = options.autoCheckpointInterval || 10; 12 13 return async (request, next) => { 14 requestCount++; 15 16 // Check for resume instruction in system prompt 17 if (request.systemPrompt?.includes('RESUME_FROM:')) { 18 const checkpointId = extractCheckpointId(request.systemPrompt); 19 const checkpoint = store.getCheckpoint(checkpointId); 20 if (checkpoint) { 21 store.restoreFromCheckpoint(checkpoint); 22 request.systemPrompt = request.systemPrompt 23 .replace(/RESUME_FROM:[\w-]+/, '') 24 + formatStateForPrompt(checkpoint); 25 } 26 } 27 28 // Execute the request 29 const response = await next(request); 30 31 // Update cognitive state from response 32 store.updateFromResponse(response); 33 34 // Auto-checkpoint every N requests 35 if (requestCount % interval === 0) { 36 store.createCheckpoint({ 37 label: `Auto-checkpoint #${requestCount}`, 38 description: 'Automatic periodic checkpoint', 39 }); 40 } 41 42 return response; 43 }; 44 }
LLM API Integration
Example integration with Claude/OpenAI APIs.
1 import Anthropic from '@anthropic-ai/sdk'; 2 import { CognitiveStore } from './cognitive-store'; 3 import { serializeCheckpoint } from './serialization'; 4 5 export class ResumableAIClient { 6 private anthropic: Anthropic; 7 private store: CognitiveStore; 8 9 constructor(apiKey: string) { 10 this.anthropic = new Anthropic({ apiKey }); 11 this.store = new CognitiveStore(); 12 } 13 14 async chat(userMessage: string): Promise<string> { 15 // Build context from current cognitive state 16 const context = this.buildContextFromState(); 17 18 const response = await this.anthropic.messages.create({ 19 model: 'claude-3-sonnet-20240229', 20 max_tokens: 4096, 21 system: `You are a resumable AI assistant. Current cognitive state: 22 ${context} 23 24 When you make decisions or learn new facts, structure your response to include: 25 - FACT: statements you've confirmed 26 - ASSUMPTION: things you're treating as true 27 - QUESTION: unresolved queries 28 - INTENT: goals you're working toward`, 29 messages: [{ role: 'user', content: userMessage }], 30 }); 31 32 // Parse and update cognitive state 33 this.parseAndUpdateState(response); 34 35 return response.content[0].text; 36 } 37 38 private buildContextFromState(): string { 39 const checkpoint = this.store.getCurrentState(); 40 return serializeCheckpoint(checkpoint); 41 } 42 43 saveCheckpoint(label: string): string { 44 const checkpoint = this.store.createCheckpoint({ label }); 45 return checkpoint.id; 46 } 47 48 restoreCheckpoint(id: string): void { 49 const checkpoint = this.store.getCheckpoint(id); 50 if (checkpoint) { 51 this.store.restoreFromCheckpoint(checkpoint); 52 } 53 } 54 }
Event-Driven Checkpoints
Create checkpoints based on significant events.
1 type CheckpointTrigger = 2 | 'phase_change' // When cognitive phase changes 3 | 'tool_complete' // After successful tool execution 4 | 'decision_made' // When a decision node completes 5 | 'high_confidence' // When confidence exceeds threshold 6 | 'branch_created' // When a new branch is created 7 | 'error_recovery'; // After recovering from an error 8 9 export function setupCheckpointTriggers( 10 store: CognitiveStore, 11 triggers: CheckpointTrigger[] 12 ): () => void { 13 const handlers: Array<() => void> = []; 14 15 if (triggers.includes('phase_change')) { 16 const unsub = store.subscribe((state, prevState) => { 17 const currentPhase = state.workingState.nodes.at(-1)?.phase; 18 const prevPhase = prevState.workingState.nodes.at(-1)?.phase; 19 if (currentPhase && currentPhase !== prevPhase) { 20 store.createCheckpoint({ 21 label: `Phase: ${currentPhase}`, 22 description: `Entering ${currentPhase} phase`, 23 }); 24 } 25 }); 26 handlers.push(unsub); 27 } 28 29 if (triggers.includes('tool_complete')) { 30 const unsub = store.subscribe((state, prevState) => { 31 const newCompletedTools = state.workingState.nodes.filter( 32 n => n.type === 'tool-call' && 33 n.status === 'completed' && 34 !prevState.workingState.nodes.find( 35 pn => pn.id === n.id && pn.status === 'completed' 36 ) 37 ); 38 for (const node of newCompletedTools) { 39 store.createCheckpoint({ 40 label: `Tool: ${node.toolBinding?.toolName}`, 41 description: `Completed ${node.label}`, 42 }); 43 } 44 }); 45 handlers.push(unsub); 46 } 47 48 // Return cleanup function 49 return () => handlers.forEach(h => h()); 50 }