feat: v0.4.1 — ChatGPT tool-output content types and conv_id fix

First real-data export against v0.4.0 surfaced 66 unknown blocks across
three content types — captured live and added.

Added:
- execution_output (Code Interpreter / container.exec / python tool
  output) → tool_result block. output=content.text,
  tool_name=author.name, is_error=metadata.aggregate_result.status,
  summary=metadata.reasoning_title
- system_error → error tool_result with tool_name=author.name
- tether_browsing_display: spinner placeholders (empty result+summary)
  skip silently with DEBUG log; defensive populated-case branch maps
  to tool_result (untested in real data)
- tool_result block schema: optional `summary` field rendered as
  italic line between header and fence
- tool_result rendering: tool_name appears in header when present
  (e.g. `📤 Result: container.exec`); existing tool_name=None calls
  unchanged
- _ROLE_LABELS["tool"] = ("🔧 Tool", "tool")

Fixed:
- chatgpt.normalize_conversation reads `conversation_id` as fallback
  for `id`. Live API uses conversation_id; fixtures use id.
  Pre-fix: empty id in YAML frontmatter and missing context in
  WARNING logs.

Tests: 11 new (192 total, 0 failures). Fixture extended with 4
tool-output cases (execution_output success, empty execution_output
that should skip, system_error, tether_browsing_display spinner).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
JesseMarkowitz
2026-05-05 09:25:55 -04:00
parent 473d02f71a
commit 68e8d532be
8 changed files with 446 additions and 7 deletions

View File

@@ -3,6 +3,22 @@
All notable changes to this project will be documented here. All notable changes to this project will be documented here.
Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [0.4.1] - Unreleased
### Added
- ChatGPT `execution_output` (Code Interpreter / `container.exec` / `python`) renders as a `tool_result` block with `tool_name` from `author.name`, `is_error` from `metadata.aggregate_result.status`, and the optional `summary` line populated from `metadata.reasoning_title`. Captured live during planning.
- ChatGPT `system_error` content (e.g. browse-service 503) renders as an error `tool_result` block with `tool_name` from `author.name` (typically `"web"`).
- ChatGPT `tether_browsing_display` populated case (defensive, not observed in real data) renders as a `tool_result` block; transient spinner placeholders (empty `result`+`summary`) skip silently with DEBUG log.
- `tool_result` block schema gains optional `summary: str | None` field, rendered as italic line between header and fenced output.
- `tool_result` rendering shows `tool_name` in the header when present (e.g. `📤 **Result: container.exec**`); when absent, header stays as `📤 **Result**` (no regression).
- Markdown exporter: `_ROLE_LABELS["tool"] = ("🔧 Tool", "tool")` so tool-role messages render under a recognisable header instead of the generic fallback.
- 11 new tests covering all four cases plus the conv_id fallback (192 total, all passing).
### Fixed
- ChatGPT `normalize_conversation` now reads `conversation_id` as a fallback for `id`. Live ChatGPT detail responses use `conversation_id` at top level; fixtures and listing summaries use `id`. Without the fallback, normalized conversations had empty `id` (visible as blank `conversation_id:` in YAML frontmatter and missing context in WARNING log lines).
### Migration
- No new schema breaks; `tool_result` blocks gain a `summary` field that defaults to None on legacy data. Existing exports re-render cleanly with the cache-clear-and-export workflow from v0.4.0.
## [0.4.0] - Unreleased ## [0.4.0] - Unreleased
### Added ### Added
- Rich content support: messages now carry an ordered `blocks` list (text, code, thinking, tool_use, tool_result, citation, image_placeholder, file_placeholder, unknown) - Rich content support: messages now carry an ordered `blocks` list (text, code, thinking, tool_use, tool_result, citation, image_placeholder, file_placeholder, unknown)

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "ai-chat-exporter" name = "ai-chat-exporter"
version = "0.4.0" version = "0.4.1"
description = "Export ChatGPT and Claude conversation history to Markdown for personal archival in Joplin" description = "Export ChatGPT and Claude conversation history to Markdown for personal archival in Joplin"
requires-python = ">=3.11" requires-python = ">=3.11"
dependencies = [ dependencies = [

View File

@@ -80,13 +80,19 @@ def make_tool_result_block(
output: str, output: str,
tool_name: str | None = None, tool_name: str | None = None,
is_error: bool = False, is_error: bool = False,
summary: str | None = None,
) -> dict: ) -> dict:
"""Return a tool_result block.""" """Return a tool_result block.
``summary`` is an optional short human label rendered between header and
fence (e.g. ChatGPT's ``metadata.reasoning_title`` for execution_output).
"""
return { return {
"type": BLOCK_TYPE_TOOL_RESULT, "type": BLOCK_TYPE_TOOL_RESULT,
"tool_name": tool_name, "tool_name": tool_name,
"output": output if isinstance(output, str) else str(output), "output": output if isinstance(output, str) else str(output),
"is_error": bool(is_error), "is_error": bool(is_error),
"summary": summary,
} }
@@ -217,10 +223,21 @@ def _render_one(block: dict) -> str:
if btype == BLOCK_TYPE_TOOL_RESULT: if btype == BLOCK_TYPE_TOOL_RESULT:
output = block.get("output", "") output = block.get("output", "")
is_error = bool(block.get("is_error")) is_error = bool(block.get("is_error"))
header = "❌ **Result (error)**" if is_error else "📤 **Result**" tool_name = block.get("tool_name") or ""
summary = block.get("summary") or ""
icon = "" if is_error else "📤"
label = "Result (error)" if is_error else "Result"
if tool_name:
header = f"{icon} **{label}: {tool_name}**"
else:
header = f"{icon} **{label}**"
fence = _safe_fence(output) fence = _safe_fence(output)
body = f"{fence}\n{output}\n{fence}" body = f"{fence}\n{output}\n{fence}"
return _blockquote_prefix(f"{header}\n{body}") if summary:
inner = f"{header}\n*{summary}*\n{body}"
else:
inner = f"{header}\n{body}"
return _blockquote_prefix(inner)
if btype == BLOCK_TYPE_CITATION: if btype == BLOCK_TYPE_CITATION:
url = block.get("url", "") url = block.get("url", "")
title = block.get("title") or url title = block.get("title") or url

View File

@@ -16,6 +16,7 @@ _ROLE_LABELS = {
"user": ("🧑 Human", "user"), "user": ("🧑 Human", "user"),
"assistant": ("🤖 Assistant", "assistant"), "assistant": ("🤖 Assistant", "assistant"),
"system": ("⚙️ System", "system"), "system": ("⚙️ System", "system"),
"tool": ("🔧 Tool", "tool"),
} }

View File

@@ -35,6 +35,7 @@ from src.blocks import (
make_image_placeholder, make_image_placeholder,
make_text_block, make_text_block,
make_thinking_block, make_thinking_block,
make_tool_result_block,
make_unknown_block, make_unknown_block,
) )
from src.loss_report import LossReport from src.loss_report import LossReport
@@ -576,7 +577,10 @@ class ChatGPTProvider(BaseProvider):
include project information. include project information.
""" """
report = loss_report if loss_report is not None else LossReport() report = loss_report if loss_report is not None else LossReport()
conv_id = raw.get("id", "") # ChatGPT's /backend-api/conversation/<id> response uses ``conversation_id``
# at the top level (not ``id``); fixtures and listing summaries use ``id``.
# Read both so both code paths populate the normalized ``id`` correctly.
conv_id = raw.get("id") or raw.get("conversation_id") or ""
title = raw.get("title") or "Untitled" title = raw.get("title") or "Untitled"
created_at = _ts_to_iso(raw.get("create_time")) created_at = _ts_to_iso(raw.get("create_time"))
updated_at = _ts_to_iso(raw.get("update_time")) updated_at = _ts_to_iso(raw.get("update_time"))
@@ -708,9 +712,11 @@ def _build_message(
ts = msg_data.get("create_time") ts = msg_data.get("create_time")
metadata = msg_data.get("metadata") or {} metadata = msg_data.get("metadata") or {}
is_hidden = bool(metadata.get("is_visually_hidden_from_conversation")) is_hidden = bool(metadata.get("is_visually_hidden_from_conversation"))
author_name = author.get("name") or None
blocks = _extract_blocks_for_content( blocks = _extract_blocks_for_content(
content_type, content_obj, role, conv_id, node_id, report content_type, content_obj, role, conv_id, node_id, report,
author_name=author_name, msg_metadata=metadata,
) )
if not blocks: if not blocks:
@@ -766,6 +772,8 @@ def _extract_blocks_for_content(
conv_id: str, conv_id: str,
node_id: str, node_id: str,
report: LossReport, report: LossReport,
author_name: str | None = None,
msg_metadata: dict | None = None,
) -> list[dict]: ) -> list[dict]:
"""Dispatch on content_type and return a list of blocks for one message.""" """Dispatch on content_type and return a list of blocks for one message."""
@@ -775,6 +783,19 @@ def _extract_blocks_for_content(
if content_type == "multimodal_text": if content_type == "multimodal_text":
return _extract_multimodal_blocks(content_obj, role, conv_id, node_id, report) return _extract_multimodal_blocks(content_obj, role, conv_id, node_id, report)
if content_type == "execution_output":
return _extract_execution_output_blocks(
content_obj, author_name, msg_metadata or {}, conv_id, node_id
)
if content_type == "system_error":
return _extract_system_error_blocks(content_obj, author_name)
if content_type == "tether_browsing_display":
return _extract_tether_browsing_display_blocks(
content_obj, author_name, conv_id, node_id
)
if content_type == "code": if content_type == "code":
code_text = content_obj.get("text") or "\n".join( code_text = content_obj.get("text") or "\n".join(
p for p in content_obj.get("parts", []) if isinstance(p, str) p for p in content_obj.get("parts", []) if isinstance(p, str)
@@ -1085,3 +1106,110 @@ def _extract_editable_context_blocks(
) )
return blocks return blocks
def _extract_execution_output_blocks(
content_obj: dict,
author_name: str | None,
msg_metadata: dict,
conv_id: str,
node_id: str,
) -> list[dict]:
"""Map a ChatGPT ``execution_output`` content (Code Interpreter / container.exec
/ python tool output) onto a ``tool_result`` block.
Locked shape (captured live during planning v0.4.1):
content.text → output
author.name → tool_name
metadata.aggregate_result.status → "error" → is_error=True
metadata.reasoning_title → summary
Empty ``content.text`` → skip (DEBUG log) — a tool that emits no output is
a transient artifact, not archival content.
"""
text = content_obj.get("text") or ""
if not text.strip():
logger.debug(
"[chatgpt] Skipping empty execution_output in conversation %s message %s",
conv_id[:8],
node_id[:8],
)
return []
aggregate = msg_metadata.get("aggregate_result") or {}
status = aggregate.get("status") if isinstance(aggregate, dict) else None
is_error = isinstance(status, str) and status.lower() == "error"
summary = msg_metadata.get("reasoning_title") or None
return [
make_tool_result_block(
output=text,
tool_name=author_name,
is_error=is_error,
summary=summary,
)
]
def _extract_system_error_blocks(
content_obj: dict,
author_name: str | None,
) -> list[dict]:
"""Map a ChatGPT ``system_error`` content onto an error ``tool_result`` block.
Captured shape: ``{content_type, name, text}`` where ``text`` is the error
message (e.g. ``"Error: Error from browse service: 503"``). ``author.name``
identifies the failing tool (e.g. ``"web"``).
"""
text = content_obj.get("text") or ""
if not text:
text = "(error with no message)"
return [
make_tool_result_block(
output=text,
tool_name=author_name,
is_error=True,
)
]
def _extract_tether_browsing_display_blocks(
content_obj: dict,
author_name: str | None,
conv_id: str,
node_id: str,
) -> list[dict]:
"""Handle ChatGPT's ``tether_browsing_display`` content.
Captured live: most instances are **spinner placeholders** (transient UI
state — empty fields, ``metadata.command == "spinner"``). The actual
retrieval content arrives as a sibling/child ``multimodal_text`` message
that already extracts cleanly via the existing handler.
Locked behavior:
- If ``result`` AND ``summary`` are both empty → skip silently (DEBUG).
These are spinners; the real content is elsewhere.
- Otherwise (defensive: never observed populated in real data) → render
as a ``tool_result`` block carrying ``result`` as output and
``summary`` as the optional summary line.
"""
result = content_obj.get("result") or ""
summary = content_obj.get("summary") or ""
if not result.strip() and not summary.strip():
logger.debug(
"[chatgpt] Skipping tether_browsing_display spinner in "
"conversation %s message %s (empty result/summary)",
conv_id[:8],
node_id[:8],
)
return []
return [
make_tool_result_block(
output=result or summary,
tool_name=author_name,
is_error=False,
summary=summary if result and summary else None,
)
]

View File

@@ -112,7 +112,7 @@
"node-image-only": { "node-image-only": {
"id": "node-image-only", "id": "node-image-only",
"parent": "node-mm-user-rev", "parent": "node-mm-user-rev",
"children": [], "children": ["node-exec-output"],
"message": { "message": {
"id": "node-image-only", "id": "node-image-only",
"author": {"role": "user"}, "author": {"role": "user"},
@@ -124,6 +124,73 @@
] ]
} }
} }
},
"node-exec-output": {
"id": "node-exec-output",
"parent": "node-image-only",
"children": ["node-exec-output-empty"],
"message": {
"id": "node-exec-output",
"author": {"role": "tool", "name": "container.exec", "metadata": {}},
"create_time": 1704067600.0,
"content": {
"content_type": "execution_output",
"text": "Hello from container.exec\nLine 2 of output"
},
"metadata": {
"aggregate_result": {"status": "success", "messages": []},
"reasoning_title": "Reading skill documentation"
}
}
},
"node-exec-output-empty": {
"id": "node-exec-output-empty",
"parent": "node-exec-output",
"children": ["node-system-error"],
"message": {
"id": "node-exec-output-empty",
"author": {"role": "tool", "name": "python", "metadata": {}},
"create_time": 1704067610.0,
"content": {
"content_type": "execution_output",
"text": ""
},
"metadata": {}
}
},
"node-system-error": {
"id": "node-system-error",
"parent": "node-exec-output-empty",
"children": ["node-tether-spinner"],
"message": {
"id": "node-system-error",
"author": {"role": "tool", "name": "web", "metadata": {}},
"create_time": 1704067620.0,
"content": {
"content_type": "system_error",
"name": "tool_error",
"text": "Error: Error from browse service: Error calling browse service: 503"
},
"metadata": {}
}
},
"node-tether-spinner": {
"id": "node-tether-spinner",
"parent": "node-system-error",
"children": [],
"message": {
"id": "node-tether-spinner",
"author": {"role": "tool", "name": "file_search", "metadata": {}},
"create_time": 1704067630.0,
"content": {
"content_type": "tether_browsing_display",
"result": "",
"summary": "",
"assets": null,
"tether_id": null
},
"metadata": {"command": "spinner", "status": "running"}
}
} }
} }
} }

View File

@@ -371,6 +371,37 @@ class TestRenderBlocks:
assert "❌ **Result (error)**" in out assert "❌ **Result (error)**" in out
assert "📤" not in out assert "📤" not in out
def test_tool_result_with_tool_name_in_header(self):
out = render_blocks_to_markdown(
[make_tool_result_block("done", tool_name="container.exec")]
)
assert "📤 **Result: container.exec**" in out
def test_tool_result_error_with_tool_name(self):
out = render_blocks_to_markdown(
[make_tool_result_block("503", tool_name="web", is_error=True)]
)
assert "❌ **Result (error): web**" in out
def test_tool_result_summary_renders_as_italic_line(self):
out = render_blocks_to_markdown(
[
make_tool_result_block(
"output",
tool_name="container.exec",
summary="Reading skill documentation",
)
]
)
# Summary line is italic, lives between header and fence,
# all inside the blockquote prefix.
assert "> *Reading skill documentation*" in out
# Order: header before summary before fence
header_idx = out.index("Result: container.exec")
summary_idx = out.index("Reading skill documentation")
fence_idx = out.index("output")
assert header_idx < summary_idx < fence_idx
def test_image_placeholder_rendering(self): def test_image_placeholder_rendering(self):
out = render_blocks_to_markdown( out = render_blocks_to_markdown(
[make_image_placeholder(ref="file-123", source="user_upload")] [make_image_placeholder(ref="file-123", source="user_upload")]

View File

@@ -15,6 +15,7 @@ from src.blocks import (
BLOCK_TYPE_TOOL_RESULT, BLOCK_TYPE_TOOL_RESULT,
BLOCK_TYPE_TOOL_USE, BLOCK_TYPE_TOOL_USE,
BLOCK_TYPE_UNKNOWN, BLOCK_TYPE_UNKNOWN,
render_blocks_to_markdown,
) )
from src.loss_report import LossReport from src.loss_report import LossReport
@@ -462,3 +463,181 @@ class TestClaudeNormalization:
report = LossReport() report = LossReport()
result = p.normalize_conversation(raw, report) result = p.normalize_conversation(raw, report)
assert report.messages_rendered == len(result["messages"]) assert report.messages_rendered == len(result["messages"])
# ---------------------------------------------------------------------------
# v0.4.1 — execution_output, system_error, tether_browsing_display, conv_id
# ---------------------------------------------------------------------------
class TestChatGPTToolOutputs:
"""v0.4.1 ChatGPT tool-role content_types map onto tool_result blocks."""
def _get_provider(self):
from src.providers.chatgpt import ChatGPTProvider
p = ChatGPTProvider.__new__(ChatGPTProvider)
import requests
p._session = requests.Session()
p._org_id = None
p._project_ids = []
p._project_map = {}
p._project_name_cache = {}
return p
def test_execution_output_emits_tool_result_with_metadata(self):
raw = json.loads((FIXTURES / "chatgpt_conversation.json").read_text())
p = self._get_provider()
result = p.normalize_conversation(raw)
exec_msgs = [
m for m in result["messages"]
if any(
b.get("type") == BLOCK_TYPE_TOOL_RESULT
and b.get("tool_name") == "container.exec"
for b in (m.get("blocks") or [])
)
]
assert exec_msgs, "expected execution_output to render as tool_result"
block = next(
b for b in exec_msgs[0]["blocks"] if b.get("type") == BLOCK_TYPE_TOOL_RESULT
)
assert block["output"].startswith("Hello from container.exec")
assert block["is_error"] is False
assert block["summary"] == "Reading skill documentation"
def test_execution_output_message_role_is_tool(self):
raw = json.loads((FIXTURES / "chatgpt_conversation.json").read_text())
p = self._get_provider()
result = p.normalize_conversation(raw)
tool_msgs = [m for m in result["messages"] if m["role"] == "tool"]
assert tool_msgs, "tool-role messages must pass through (filter lifted in v0.4.0)"
def test_empty_execution_output_skipped(self, caplog):
raw = json.loads((FIXTURES / "chatgpt_conversation.json").read_text())
p = self._get_provider()
with caplog.at_level(logging.DEBUG, logger="src.providers.chatgpt"):
result = p.normalize_conversation(raw)
# The empty execution_output (author.name="python") must NOT appear.
python_msgs = [
m for m in result["messages"]
if any(
b.get("type") == BLOCK_TYPE_TOOL_RESULT and b.get("tool_name") == "python"
for b in (m.get("blocks") or [])
)
]
assert not python_msgs, "empty execution_output should be skipped"
assert any("Skipping empty execution_output" in r.message for r in caplog.records)
def test_system_error_emits_error_tool_result(self):
raw = json.loads((FIXTURES / "chatgpt_conversation.json").read_text())
p = self._get_provider()
result = p.normalize_conversation(raw)
web_err = [
m for m in result["messages"]
if any(
b.get("type") == BLOCK_TYPE_TOOL_RESULT
and b.get("tool_name") == "web"
and b.get("is_error") is True
for b in (m.get("blocks") or [])
)
]
assert web_err, "system_error should render as tool_result with is_error=True"
block = next(b for b in web_err[0]["blocks"] if b.get("tool_name") == "web")
assert "503" in block["output"]
def test_tether_browsing_display_spinner_skipped(self, caplog):
raw = json.loads((FIXTURES / "chatgpt_conversation.json").read_text())
p = self._get_provider()
with caplog.at_level(logging.DEBUG, logger="src.providers.chatgpt"):
result = p.normalize_conversation(raw)
spinner_msgs = [
m for m in result["messages"]
if any(
b.get("type") == BLOCK_TYPE_TOOL_RESULT and b.get("tool_name") == "file_search"
for b in (m.get("blocks") or [])
)
]
assert not spinner_msgs, "spinner tether_browsing_display should be skipped"
assert any("tether_browsing_display spinner" in r.message for r in caplog.records)
def test_tether_browsing_display_populated_renders_defensively(self):
"""Defensive case (never observed in real data) — populated browse renders."""
conv = {
"id": "test-tether",
"title": "T",
"create_time": 1700000000.0,
"update_time": 1700000001.0,
"mapping": {
"root": {"id": "root", "message": None, "parent": None, "children": ["m1"]},
"m1": {
"id": "m1",
"parent": "root",
"children": [],
"message": {
"id": "m1",
"author": {"role": "tool", "name": "browser"},
"content": {
"content_type": "tether_browsing_display",
"result": "Found 3 results about kubernetes ingress.",
"summary": "ingress search",
"assets": None,
"tether_id": None,
},
},
},
},
}
p = self._get_provider()
result = p.normalize_conversation(conv)
assert any(
b.get("type") == BLOCK_TYPE_TOOL_RESULT and b.get("tool_name") == "browser"
for m in result["messages"]
for b in (m.get("blocks") or [])
)
class TestChatGPTConvIdFallback:
"""v0.4.1: live ChatGPT detail responses use conversation_id, not id."""
def _get_provider(self):
from src.providers.chatgpt import ChatGPTProvider
p = ChatGPTProvider.__new__(ChatGPTProvider)
import requests
p._session = requests.Session()
p._org_id = None
p._project_ids = []
p._project_map = {}
p._project_name_cache = {}
return p
def test_falls_back_to_conversation_id(self):
raw = {
"conversation_id": "live-chatgpt-uuid",
"title": "T",
"create_time": 1700000000.0,
"update_time": 1700000001.0,
"mapping": {
"root": {"id": "root", "message": None, "parent": None, "children": []},
},
}
p = self._get_provider()
result = p.normalize_conversation(raw)
assert result["id"] == "live-chatgpt-uuid"
def test_id_takes_precedence_when_both_present(self):
raw = {
"id": "from-id",
"conversation_id": "from-conversation-id",
"title": "T",
"create_time": 1700000000.0,
"update_time": 1700000001.0,
"mapping": {
"root": {"id": "root", "message": None, "parent": None, "children": []},
},
}
p = self._get_provider()
result = p.normalize_conversation(raw)
assert result["id"] == "from-id"