Extracts per-message content into a typed `blocks` list (text, code,
thinking, tool_use, tool_result, image_placeholder, file_placeholder,
unknown) and renders them at exporter write time. Voice transcripts,
Custom Instructions, and image references now appear in exports
instead of being silently dropped.
Foundation:
- src/blocks.py: pure block constructors, _safe_fence (fence-corruption
defense, verified live in Joplin), _blockquote_prefix, render
- src/loss_report.py: per-run tally surfaced as INFO summary at end of
export so silently-dropped data becomes visible
Providers:
- ChatGPT: dispatch on content_type produces typed blocks; voice shapes
(audio_transcription, audio_asset_pointer, real_time_user_audio_video_
asset_pointer) locked from live DevTools capture; Custom Instructions
bug fix (parts-vs-direct-fields); role filter lifted; hidden-context
marker driven by is_visually_hidden_from_conversation flag
- Claude: defensive dispatch for text/thinking/tool_use/tool_result/image
with recursive nested-block flattening; untested against real rich-
content data — fix-forward in v0.4.1
Exporter:
- Markdown renders from blocks at write time via render_blocks_to_markdown;
backward-compat fallback to content for any pre-v0.4.0 cached data
Tests:
- 27 new tests across providers, exporters, CLI; fixtures rebuilt with
real-shape ChatGPT voice + Custom Instructions cases
- 181/181 pass
Behavior changes (intentional):
- JSON output omits content; consumers should read blocks
- Per-conversation message counts increase (Custom Instructions, image-
only, tool-only messages now appear)
- Existing exports not auto-re-rendered; users wanting fresh output run
cache --clear then export
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Support __Secure-next-auth.session-token.0/.1 split cookies; ChatGPT
now issues tokens that exceed the 4KB per-cookie limit and must be
sent as two named chunks or the auth endpoint returns no accessToken.
Add CHATGPT_SESSION_TOKEN_1 env var; update auth wizard instructions.
- Fix Claude conversations exported to wrong directory when project name
is present in the listing but absent from the detail endpoint response.
Explicitly propagate "project" alongside _-prefixed annotation keys.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude's list endpoint returns conversations with a `name` field rather
than `title`, so every Claude row was falling through to "Untitled".
Also set no_wrap + ellipsis overflow and tune column widths so the table
renders one row per conversation in Windows Command Prompt (80 cols).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Core features:
- Add `joplin` command: syncs exported Markdown to Joplin via local REST API
- Notebooks auto-created per provider+project (e.g. "ChatGPT - My Project")
- Idempotent: notes updated (not duplicated) on re-run; note ID tracked in manifest
- Add `--project` filter to `export` and `list` commands (substring or 'none')
- Add ChatGPT Projects support via CHATGPT_PROJECT_IDS env var
Config:
- Add JOPLIN_API_TOKEN, JOPLIN_API_URL, JOPLIN_REQUEST_TIMEOUT
- Version now read from importlib.metadata (single source of truth: pyproject.toml)
- Bump version to 0.2.0
Quality:
- Explicit Timeout handling in JoplinClient with actionable error messages
- token validation (validate_token) separate from connectivity (ping)
- Remove debug_auth.py, debug_claude.py, and untracked .har file
- Add *.har to .gitignore (may contain auth cookies/session tokens)
- Update README, CHANGELOG, FUTURE.md to reflect v0.2.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
claude.ai has the same Cloudflare TLS fingerprinting protection as
chatgpt.com. Apply the same fix: curl_cffi impersonate=chrome120,
remove base class User-Agent to avoid JA3/UA mismatch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
curl_cffi sets a User-Agent consistent with its JA3 TLS fingerprint.
BaseProvider's custom UA (Chrome/121) conflicted with the chrome120
TLS fingerprint, causing Cloudflare to flag the request as a bot.
Removing the UA from session headers lets curl_cffi manage its own.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chatgpt.com uses Cloudflare's TLS fingerprinting (JA3/JA4) which
blocks Python requests regardless of cookies. curl_cffi impersonates
Chrome's exact TLS handshake, making requests indistinguishable from
a real browser at the transport layer.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Using self._session.cookies.set() ensures the cookie is sent correctly
by the requests session on all calls, including /api/auth/session.
Also add sec-fetch-* headers required by chatgpt.com.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The __Secure-next-auth.session-token cannot be used directly as a Bearer
token. It must first be exchanged via GET /api/auth/session (with the token
sent as a Cookie) to obtain a short-lived accessToken. This accessToken is
then used as the Authorization: Bearer header for all backend-api calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Doctor was reading env vars before loading .env, so tokens set in .env
were invisible. ChatGPT now uses JWE (encrypted JWT) tokens which
PyJWT cannot decode without the server key — treat decode failure as
"token set, expiry unknown" rather than a FAIL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>