RPG Notetaker Bench

RPG Notetaker Bench

A comprehensive comparison of how different AI models process and organize tabletop RPG session transcripts. This benchmark analyzes a 2.5-hour D&D session transcript processed by six different AI models, comparing their approaches to note-taking, organization, and information extraction.

Source Project: This benchmark is the result of experiments conducted as part of the AudioToLogs project - a web-based audio transcription application that converts RPG session recordings into well-formatted markdown logs using OpenAI’s Whisper API and various LLM models for post-processing.

Methodology

This benchmark was created as part of developing AudioToLogs’ modular LLM pipeline for generating GM notes from session transcripts. The system uses a three-stage approach:

  1. Stage 1 (Summarization): Chunks the transcript and generates bullet points using mid-tier models
  2. Stage 2 (Reconciliation): Merges summaries using more capable models with full context
  3. Stage 3 (Polishing): Final prose formatting using high-tier models

To evaluate different model combinations and approaches, the same 2.5-hour transcript was processed using six different AI models with varying prompts and processing strategies. Each model was given the task of converting the raw transcript into organized session notes, allowing for direct comparison of their approaches to note-taking, organization, and information extraction.

Technical Details:

  • Original Source: 115MB MP3 recording processed through OpenAI Whisper API
  • Transcript Size: 116KB, 5,522 lines of raw transcription
  • Processing: Each model was given the complete transcript and asked to produce organized notes
  • Cost Tracking: Each processing run included cost estimation and token usage analysis

About AudioToLogs

AudioToLogs is a comprehensive web application designed to streamline the process of converting RPG session recordings into useful campaign notes. The project features:

  • Audio Processing: Drag-and-drop interface supporting multiple audio formats (MP3, WAV, M4A, WEBM)
  • Transcription: High-quality transcription using OpenAI’s Whisper API with automatic chunking for large files
  • AI-Powered Note Generation: Modular LLM pipeline that processes transcripts into organized GM notes
  • Cost Management: Built-in budget controls and cost tracking for API usage
  • Campaign Management: Support for session metadata, participant tracking, and campaign organization

The modular LLM pipeline was specifically designed to balance cost efficiency with quality output, using different model tiers for different processing stages. This benchmark represents the experimental phase of that development, testing various model combinations to optimize the note-generation process.

Repository: github.com/BabyToad/AudioToLogs

The Session: “Dicing With Death #225”

Session Details:

  • Date: June 26, 2025
  • Duration: 2h 37m
  • Participants: Neal (GM), Ryan (Phoenix the Warlock)
  • System: D&D
  • Campaign: Dragon politics and intrigue in Arcadia

AI Model Comparison

Each model was tasked with converting the raw transcript into organized session notes as part of the AudioToLogs pipeline development. These results represent experimental runs using different model configurations to evaluate their effectiveness for RPG session note generation. Click through the different versions to see how each AI approached the task:

Original Transcript

Size: 116KB, 5,522 lines

Format: Raw audio transcript with timestamps

The unprocessed transcript directly from the recording, including all the natural speech patterns, interruptions, and cross-talk that occur during live play.

GPT-4.1 Processing

Size: 10KB, 205 lines

Approach: Comprehensive scene-by-scene breakdown with detailed NPC ledgers

Creates highly structured notes with scene summaries, character tracking, and forward-looking session preparation notes.

GPT-4o Processing

Size: 3.1KB, 72 lines

Approach: Concise summary format

Focuses on the essential plot points and key moments, creating a streamlined overview of the session.

GPT-4o Mini Processing

Size: 4.2KB, 88 lines

Approach: Balanced summary with key highlights

Provides a middle-ground approach between comprehensive detail and concise summarization.

O1-Mini Processing

Size: 7.5KB, 203 lines

Approach: Analytical breakdown with strategic insights

Emphasizes the tactical and strategic elements of the session, with detailed combat analysis.

O3-Mini Processing

Size: 12KB, 147 lines

Approach: Formatted sections with processing metadata

Creates visually distinct sections with clear headers and includes processing statistics.

O4-Mini Processing

Size: 6.4KB, 124 lines

Approach: Narrative-focused organization

Emphasizes story flow and character development over mechanical details.