feat: v0.4.0 — rich content support with typed blocks and loss visibility

Extracts per-message content into a typed `blocks` list (text, code,
thinking, tool_use, tool_result, image_placeholder, file_placeholder,
unknown) and renders them at exporter write time. Voice transcripts,
Custom Instructions, and image references now appear in exports
instead of being silently dropped.

Foundation:
- src/blocks.py: pure block constructors, _safe_fence (fence-corruption
  defense, verified live in Joplin), _blockquote_prefix, render
- src/loss_report.py: per-run tally surfaced as INFO summary at end of
  export so silently-dropped data becomes visible

Providers:
- ChatGPT: dispatch on content_type produces typed blocks; voice shapes
  (audio_transcription, audio_asset_pointer, real_time_user_audio_video_
  asset_pointer) locked from live DevTools capture; Custom Instructions
  bug fix (parts-vs-direct-fields); role filter lifted; hidden-context
  marker driven by is_visually_hidden_from_conversation flag
- Claude: defensive dispatch for text/thinking/tool_use/tool_result/image
  with recursive nested-block flattening; untested against real rich-
  content data — fix-forward in v0.4.1

Exporter:
- Markdown renders from blocks at write time via render_blocks_to_markdown;
  backward-compat fallback to content for any pre-v0.4.0 cached data

Tests:
- 27 new tests across providers, exporters, CLI; fixtures rebuilt with
  real-shape ChatGPT voice + Custom Instructions cases
- 181/181 pass

Behavior changes (intentional):
- JSON output omits content; consumers should read blocks
- Per-conversation message counts increase (Custom Instructions, image-
  only, tool-only messages now appear)
- Existing exports not auto-re-rendered; users wanting fresh output run
  cache --clear then export

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
JesseMarkowitz
2026-05-04 23:17:18 -04:00
parent 4798edcea7
commit 473d02f71a
16 changed files with 1786 additions and 232 deletions

View File

@@ -127,3 +127,50 @@ class TestExportSinceValidation:
},
)
assert "Invalid --since date" not in result.output
# ---------------------------------------------------------------------------
# LossReport summary
# ---------------------------------------------------------------------------
class TestLossReportSummary:
"""The LossReport's format_summary() pinned format covers zero, top-5, and overflow cases."""
def test_zero_summary_uses_none_sentinel(self):
from src.loss_report import LossReport
report = LossReport()
out = report.format_summary()
assert "[export] Run summary:" in out
assert "conversations: 0" in out
assert "messages rendered: 0" in out
# Both "(none)" sentinels present — never empty parens
assert out.count("(none)") == 2
def test_top_5_breakdown(self):
from src.loss_report import LossReport
report = LossReport()
for raw_type in ("a", "b", "c", "d", "e", "f", "g"):
report.record_unknown(raw_type)
if raw_type == "a":
# Make 'a' the most common
for _ in range(4):
report.record_unknown("a")
out = report.format_summary()
# Top entry shown
assert "a=5" in out
# Overflow line present (7 types, top 5 + 2 more)
assert "+ 2 more types" in out
def test_messages_and_conversations_recorded(self):
from src.loss_report import LossReport
report = LossReport()
report.record_conversation()
report.record_message()
report.record_message()
out = report.format_summary()
assert "conversations: 1" in out
assert "messages rendered: 2" in out