docs: add README
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
301
README.md
Normal file
301
README.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# AI Chat Exporter
|
||||
|
||||
A personal backup tool for ChatGPT and Claude conversation history. Exports your chats to Markdown files structured for archival in [Joplin](https://joplinapp.org/). Each conversation becomes a single `.md` file with YAML frontmatter, organised into folders that map directly to Joplin notebooks.
|
||||
|
||||
Supports incremental sync — only new or updated conversations are exported on each run. Every run is resumable: if interrupted, re-running picks up exactly where it left off.
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Terms of Service Warning
|
||||
|
||||
**Read this before using this tool.**
|
||||
|
||||
This tool works by accessing **unofficial, undocumented internal web API endpoints** used by the ChatGPT and Claude web apps. These endpoints are not publicly supported by OpenAI or Anthropic and are subject to change or removal without notice.
|
||||
|
||||
**Use of this tool may conflict with their Terms of Service:**
|
||||
- OpenAI: https://openai.com/policies/terms-of-use
|
||||
- Anthropic: https://www.anthropic.com/legal/consumer-terms
|
||||
|
||||
**By using this tool, you accept that:**
|
||||
- You are using it entirely at your own risk
|
||||
- Your account could potentially be suspended for automated or scripted access
|
||||
- The internal APIs this tool relies on may break at any time without notice
|
||||
- This tool is for **personal archival use only** — not commercial use
|
||||
|
||||
This tool is designed for a single user backing up their own conversations. Do not use it to scrape data at scale or for any commercial purpose.
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
git clone <repo-url>
|
||||
cd ai-chat-exporter
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## First Run: Run Doctor
|
||||
|
||||
Before anything else, validate your setup:
|
||||
|
||||
```bash
|
||||
python -m src.main doctor
|
||||
```
|
||||
|
||||
This checks token presence, format, expiry, directory permissions, disk space, and live API connectivity. Fix any failures before proceeding.
|
||||
|
||||
---
|
||||
|
||||
## Getting Your Session Tokens
|
||||
|
||||
Session tokens are how your browser stays logged in. This tool uses them to access your chat history on your behalf.
|
||||
|
||||
### Token Lifetimes
|
||||
|
||||
| Provider | Cookie Name | Lifetime | Expiry Detection |
|
||||
|----------|-------------|----------|-----------------|
|
||||
| ChatGPT | `__Secure-next-auth.session-token` | ~7 days | JWT `exp` claim (decoded automatically) |
|
||||
| Claude | `sessionKey` | ~30 days | Only detectable via 401 response |
|
||||
|
||||
### Finding Tokens in Chrome DevTools
|
||||
|
||||
1. Open the provider's website and make sure you're logged in
|
||||
2. Press **F12** (Windows/Linux) or **Cmd+Option+I** (macOS) to open DevTools
|
||||
3. Click the **Application** tab
|
||||
4. In the left panel, expand **Cookies** and click the site URL
|
||||
5. Find the cookie by name and copy its **Value**
|
||||
|
||||
**ChatGPT:** go to `https://chatgpt.com` → find `__Secure-next-auth.session-token` → copy Value (starts with `eyJ`)
|
||||
|
||||
**Claude:** go to `https://claude.ai` → find `sessionKey` → copy Value
|
||||
|
||||
### When Tokens Expire
|
||||
|
||||
When a token expires you'll see a `401 Unauthorized` error. To refresh:
|
||||
- Re-run the `auth` wizard: `python -m src.main auth`
|
||||
- Or manually update the value in your `.env` file
|
||||
|
||||
---
|
||||
|
||||
## The `auth` Command
|
||||
|
||||
The easiest way to configure tokens is the interactive wizard:
|
||||
|
||||
```bash
|
||||
python -m src.main auth
|
||||
```
|
||||
|
||||
This walks you through finding your token, validates it, shows the expiry date (ChatGPT only), and offers to write it to your `.env` automatically. Tokens are never echoed to the terminal.
|
||||
|
||||
---
|
||||
|
||||
## `.env` Setup
|
||||
|
||||
Copy `.env.example` to `.env` and fill in your values:
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `CHATGPT_SESSION_TOKEN` | — | Your ChatGPT JWT session token |
|
||||
| `CLAUDE_SESSION_KEY` | — | Your Claude session key |
|
||||
| `EXPORT_DIR` | `./exports` | Where to write exported files |
|
||||
| `OUTPUT_STRUCTURE` | `provider/project/year` | Folder structure (see below) |
|
||||
| `CACHE_DIR` | `~/.ai-chat-exporter` | Where to store the sync manifest |
|
||||
| `LOG_FILE` | `~/.ai-chat-exporter/logs/exporter.log` | Log file path (`none` to disable) |
|
||||
|
||||
---
|
||||
|
||||
## Output Structure
|
||||
|
||||
All exported files go under `EXPORT_DIR`. The structure maps to Joplin notebooks.
|
||||
|
||||
### Default: `provider/project/year`
|
||||
|
||||
```
|
||||
exports/
|
||||
├── chatgpt/
|
||||
│ ├── no-project/
|
||||
│ │ └── 2024/
|
||||
│ │ └── 2024-03-15_my-conversation_abc12345.md
|
||||
│ └── learning-python/
|
||||
│ └── 2024/
|
||||
│ └── 2024-03-15_async-tutorial_def67890.md
|
||||
└── claude/
|
||||
├── no-project/
|
||||
│ └── 2024/
|
||||
│ └── 2024-06-01_docker-explained_ghi11111.md
|
||||
└── startos-packaging/
|
||||
└── 2024/
|
||||
└── 2024-06-10_manifest-setup_jkl22222.md
|
||||
```
|
||||
|
||||
### Joplin Notebook Mapping (for future automated import)
|
||||
|
||||
| Export folder | Joplin notebook |
|
||||
|---------------|-----------------|
|
||||
| `exports/chatgpt/learning-python/` | `ChatGPT - Learning Python` |
|
||||
| `exports/claude/startos-packaging/` | `Claude - Startos Packaging` |
|
||||
| `exports/chatgpt/no-project/` | `ChatGPT - No Project` |
|
||||
| `exports/claude/no-project/` | `Claude - No Project` |
|
||||
|
||||
### Other `OUTPUT_STRUCTURE` options
|
||||
|
||||
| Value | Result |
|
||||
|-------|--------|
|
||||
| `provider/project/year` (default) | `exports/claude/my-project/2024/file.md` |
|
||||
| `provider/project` | `exports/claude/my-project/file.md` |
|
||||
| `provider/year` | `exports/claude/2024/file.md` (projects ignored) |
|
||||
|
||||
### Filename format
|
||||
|
||||
`YYYY-MM-DD_{title-slug}_{id[:8]}.md` — e.g. `2024-06-10_manifest-setup_jkl22222.md`
|
||||
|
||||
---
|
||||
|
||||
## CLI Reference
|
||||
|
||||
### Global flags
|
||||
|
||||
```
|
||||
--verbose / -v DEBUG output to console
|
||||
--quiet / -q WARNING and above only
|
||||
--debug DEBUG + full tracebacks + redacted API response bodies
|
||||
--no-log-file Disable file logging
|
||||
--version Print version and exit
|
||||
```
|
||||
|
||||
### `auth` — Interactive token setup
|
||||
|
||||
```bash
|
||||
python -m src.main auth
|
||||
```
|
||||
|
||||
Guided wizard to find and save session tokens. Detects OS and shows the correct DevTools shortcut.
|
||||
|
||||
### `doctor` — Health check
|
||||
|
||||
```bash
|
||||
python -m src.main doctor
|
||||
```
|
||||
|
||||
Checks: token presence, JWT validity and expiry, directory permissions, disk space, live API reachability. Exits with code 0 if all pass, 1 if any fail.
|
||||
|
||||
### `export` — Export conversations
|
||||
|
||||
```bash
|
||||
# Export everything (new/updated only)
|
||||
python -m src.main export
|
||||
|
||||
# Single provider
|
||||
python -m src.main export --provider claude
|
||||
|
||||
# JSON output
|
||||
python -m src.main export --format json
|
||||
|
||||
# Both Markdown and JSON
|
||||
python -m src.main export --format both
|
||||
|
||||
# Only conversations updated since a date
|
||||
python -m src.main export --since 2024-06-01
|
||||
|
||||
# Write to a custom directory
|
||||
python -m src.main export --output /path/to/my/notes
|
||||
|
||||
# Preview without writing anything
|
||||
python -m src.main export --dry-run
|
||||
```
|
||||
|
||||
Options: `--provider [chatgpt|claude|all]`, `--format [markdown|json|both]`, `--output PATH`, `--since YYYY-MM-DD`, `--dry-run`
|
||||
|
||||
### `list` — List conversations
|
||||
|
||||
```bash
|
||||
python -m src.main list --provider chatgpt
|
||||
```
|
||||
|
||||
Fetches and displays all conversations without exporting them.
|
||||
|
||||
### `cache` — Manage the sync manifest
|
||||
|
||||
```bash
|
||||
# Show statistics
|
||||
python -m src.main cache --show
|
||||
|
||||
# Clear all cached entries (forces full re-export next run)
|
||||
python -m src.main cache --clear
|
||||
|
||||
# Clear a single provider
|
||||
python -m src.main cache --clear --provider claude
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## How the Cache Works
|
||||
|
||||
The cache manifest lives at `~/.ai-chat-exporter/manifest.json` and records every exported conversation: its title, project, `updated_at` timestamp, and output file path.
|
||||
|
||||
On every run:
|
||||
1. Fetch the full conversation list from the provider
|
||||
2. Compare each conversation's `updated_at` against the manifest
|
||||
3. Export only conversations that are new or have been updated
|
||||
4. Write each successfully exported conversation to the manifest **immediately** (not batched)
|
||||
|
||||
**This design makes every run inherently resumable.** If the tool is interrupted for any reason — rate limit, network drop, Ctrl+C, crash — simply re-run the same command. It will skip already-exported conversations and continue from where it stopped.
|
||||
|
||||
To force a full re-export: `python -m src.main cache --clear` then re-run export.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### `401 Unauthorized`
|
||||
Your session token has expired.
|
||||
- Run `python -m src.main auth` to get a new token interactively
|
||||
- Or manually copy a fresh cookie value into your `.env` file
|
||||
|
||||
Note: Claude's `sessionKey` is an opaque string — the only way to know it's expired is the 401 error. ChatGPT JWTs have an `exp` claim that the `doctor` command can decode and display.
|
||||
|
||||
### `429 Rate Limited`
|
||||
The tool automatically pauses, saves progress, and exits with a clear message showing how many conversations were exported vs remaining. Just re-run the same export command to resume — the cache picks up exactly where it left off.
|
||||
|
||||
### Schema warnings in logs (`Unexpected API response shape`)
|
||||
The provider's internal API may have changed. Run with `--debug`, sanitize the output (remove any personal content), and check the project's GitHub Issues for known fixes.
|
||||
|
||||
### Non-text content warnings
|
||||
Images, code interpreter outputs, DALL-E generations, and Claude artifacts are not exported in v0.1.0. A WARNING is logged for each skipped item. See `FUTURE.md` for the v0.4.0 roadmap.
|
||||
|
||||
### Empty export / all conversations skipped
|
||||
No new or updated conversations since your last run. To verify: `python -m src.main cache --show`. To force a full re-export: `python -m src.main cache --clear`.
|
||||
|
||||
### Filing a bug report
|
||||
1. Run with `--debug`: `python -m src.main export --debug 2>&1 | tee debug.log`
|
||||
2. Remove any personal conversation content from `debug.log`
|
||||
3. Open a GitHub Issue with the sanitized log and the exact command you ran
|
||||
|
||||
---
|
||||
|
||||
## Future Work
|
||||
|
||||
See `FUTURE.md` for planned features:
|
||||
|
||||
- **v0.1.x** — `export --force` flag to bypass cache for a single run
|
||||
- **v0.2.0** — Joplin integration: auto-import exported files via Joplin's local REST API
|
||||
- **v0.3.0** — Official API fallback: parse export ZIP files from ChatGPT/Claude settings
|
||||
- **v0.4.0** — Rich content: images, artifacts, code interpreter output, extended thinking
|
||||
|
||||
---
|
||||
|
||||
## Security Notes
|
||||
|
||||
- All exported data is stored **locally only** — nothing is sent anywhere
|
||||
- Exported files and the cache manifest are created with `600` permissions (owner read/write only)
|
||||
- `.env` is in `.gitignore` — **never commit it**
|
||||
- Session tokens are never logged, printed, or included in error messages
|
||||
- If you accidentally commit `.env`: immediately log out and back in to invalidate the token, then remove it from git history using [BFG Repo Cleaner](https://rtyley.github.io/bfg-repo-cleaner/) or `git filter-branch`
|
||||
Reference in New Issue
Block a user