RPG Notetaker Bench
RPG Notetaker Bench
A comprehensive comparison of how different AI models process and organize tabletop RPG session transcripts. This benchmark analyzes a 2.5-hour D&D session transcript processed by six different AI models, comparing their approaches to note-taking, organization, and information extraction.
Source Project: This benchmark is the result of experiments conducted as part of the AudioToLogs project - a web-based audio transcription application that converts RPG session recordings into well-formatted markdown logs using OpenAI’s Whisper API and various LLM models for post-processing.
Methodology
This benchmark was created as part of developing AudioToLogs’ modular LLM pipeline for generating GM notes from session transcripts. The system uses a three-stage approach:
- Stage 1 (Summarization): Chunks the transcript and generates bullet points using mid-tier models
- Stage 2 (Reconciliation): Merges summaries using more capable models with full context
- Stage 3 (Polishing): Final prose formatting using high-tier models
To evaluate different model combinations and approaches, the same 2.5-hour transcript was processed using six different AI models with varying prompts and processing strategies. Each model was given the task of converting the raw transcript into organized session notes, allowing for direct comparison of their approaches to note-taking, organization, and information extraction.
Technical Details:
- Original Source: 115MB MP3 recording processed through OpenAI Whisper API
- Transcript Size: 116KB, 5,522 lines of raw transcription
- Processing: Each model was given the complete transcript and asked to produce organized notes
- Cost Tracking: Each processing run included cost estimation and token usage analysis
About AudioToLogs
AudioToLogs is a comprehensive web application designed to streamline the process of converting RPG session recordings into useful campaign notes. The project features:
- Audio Processing: Drag-and-drop interface supporting multiple audio formats (MP3, WAV, M4A, WEBM)
- Transcription: High-quality transcription using OpenAI’s Whisper API with automatic chunking for large files
- AI-Powered Note Generation: Modular LLM pipeline that processes transcripts into organized GM notes
- Cost Management: Built-in budget controls and cost tracking for API usage
- Campaign Management: Support for session metadata, participant tracking, and campaign organization
The modular LLM pipeline was specifically designed to balance cost efficiency with quality output, using different model tiers for different processing stages. This benchmark represents the experimental phase of that development, testing various model combinations to optimize the note-generation process.
Repository: github.com/BabyToad/AudioToLogs
The Session: “Dicing With Death #225”
Session Details:
- Date: June 26, 2025
- Duration: 2h 37m
- Participants: Neal (GM), Ryan (Phoenix the Warlock)
- System: D&D
- Campaign: Dragon politics and intrigue in Arcadia
AI Model Comparison
Each model was tasked with converting the raw transcript into organized session notes as part of the AudioToLogs pipeline development. These results represent experimental runs using different model configurations to evaluate their effectiveness for RPG session note generation. Click through the different versions to see how each AI approached the task:
Original Transcript
Size: 116KB, 5,522 lines
Format: Raw audio transcript with timestamps
The unprocessed transcript directly from the recording, including all the natural speech patterns, interruptions, and cross-talk that occur during live play.
GPT-4.1 Processing
Size: 10KB, 205 lines
Approach: Comprehensive scene-by-scene breakdown with detailed NPC ledgers
Creates highly structured notes with scene summaries, character tracking, and forward-looking session preparation notes.
GPT-4o Processing
Size: 3.1KB, 72 lines
Approach: Concise summary format
Focuses on the essential plot points and key moments, creating a streamlined overview of the session.
GPT-4o Mini Processing
Size: 4.2KB, 88 lines
Approach: Balanced summary with key highlights
Provides a middle-ground approach between comprehensive detail and concise summarization.
O1-Mini Processing
Size: 7.5KB, 203 lines
Approach: Analytical breakdown with strategic insights
Emphasizes the tactical and strategic elements of the session, with detailed combat analysis.
O3-Mini Processing
Size: 12KB, 147 lines
Approach: Formatted sections with processing metadata
Creates visually distinct sections with clear headers and includes processing statistics.
O4-Mini Processing
Size: 6.4KB, 124 lines
Approach: Narrative-focused organization
Emphasizes story flow and character development over mechanical details.