Overview
MCP (Model Context Protocol) lets AI assistants like Claude call your own server-side tools. The VidTSX MCP
exposes three credit-bearing capabilities — image generation, video generation, and speech-to-text — plus a free
whoami diagnostic. Every call is billed against your VidTSX credit balance and recorded in your
Usage log.
The credit-bearing tools are asynchronous: submit returns a job_id immediately, and
you poll the matching get_*_job tool until the job is completed or failed.
Credits are reserved at submit and refunded automatically on any failure.
1. Generate an MCP token
Claude.ai's connector UI accepts only a URL — it can't send custom HTTP headers — so the credential has to live in the URL itself. To keep your main license key out of URLs, browser histories, and access logs, the keyed route uses a separate MCP token bound to your license. Each token can be revoked independently; revoking one never affects your license key or the desktop app.
Generate one from the License tab (log in first) — click Generate MCP token and copy the URL once. The format is:
https://learnwithhasan.com/vidtsx/mcp/key/mcpt_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/
2. Connect from Claude.ai
- Open Claude.ai → Settings → Connectors.
- Click Add custom connector.
- Name:
VidTSX MCP(or anything you like). - Paste the URL from step 1.
- Leave the OAuth fields empty — auth is handled by the URL.
- Save, then enable the connector for your conversation.
3. Connect with Bearer token (Desktop / Code)
Clients that can send custom HTTP headers should use the canonical, header-authenticated URL — your license key stays out of the URL bar and access logs.
URL: https://learnwithhasan.com/vidtsx/mcp/
Header: Authorization: Bearer VTSX-XXXX-XXXX-XXXX
For Claude Code, add the connector via the slash command or your MCP config file.
4. Test the connection
Once connected, ask Claude:
Use the VidTSX MCP whoami tool.
Expected output:
# Vidtsx MCP — whoami
- **Email:** [email protected]
- **License key:** `VTSX-XXXX-XXXX-XXXX`
- **Tier:** free
- **Credits available:** 0
- **Connection:** ok
If you get this card, your auth chain is working end-to-end.
5. Diagnostics
If something looks off, hit the status endpoint directly — it always returns 200 so it's safe to probe from a browser or script:
GET https://learnwithhasan.com/vidtsx/mcp/key/mcpt_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/status/
{
"mcp_online": true,
"authenticated": true,
"user": {"email": "[email protected]", "tier": "free"},
"credits_available": 0,
"license_key": "VTSX-XXXX-XXXX-XXXX"
}
mcp_online: false→ the MCP server is down.authenticated: false→ your MCP token was revoked or never existed. Generate a new one from the License tab.- Otherwise it's a Claude.ai connector cache — disconnect and re-add the connector.
How the credit-bearing tools work
Every credit-bearing tool — image, video, transcription — is async: the submit tool reserves
credits, queues a background job, and returns a job_id immediately. You then poll the matching
get_*_job(job_id) tool until status is completed (read the result) or
failed (read error and error_type).
Credits are reserved at submit and refunded on failure. If a job errors for any reason — safety block, timeout, audio too long, anything — your balance is restored. You never net-pay for a job that didn't produce a result.
Submit response shape (the same for all three tools):
{
"success": true,
"job_id": "00000000-0000-0000-0000-000000000000",
"status": "pending",
"request_id": "..."
}
While in flight you'll see:
{ "success": true, "job_id": "...", "status": "pending" } // or "running"
Completed and failed shapes are per-tool — see each tool's section below.
Tool: whoami
Diagnostic tool. Returns the calling user's email, license tier, and current credit balance. Useful for confirming your MCP connection works and seeing how many credits you have before invoking a paid tool.
Arguments: none.
Tool: generate_image
Generate an image with one of three VidTSX image models. Supports up to 14 reference images for character / location / style consistency.
Models
| Model | Quality | Cost |
|---|---|---|
vidtsx-image-01 (default) | Fast | 4 credits |
vidtsx-image-02 | Balanced | 7 credits |
vidtsx-image-03 | High quality | 14 credits |
Arguments
prompt— required. Narrative description of the image. Don't redescribe details already visible in your reference images.reference_image_urls— optional. Up to 14http(s)://URLs. Pass theimage_urlreturned from a previous job to chain shots without re-uploading.model— optional. Defaultvidtsx-image-01.aspect_ratio— optional. One of1:1, 1:4, 1:8, 2:3, 3:2, 3:4, 4:1, 4:3, 4:5, 5:4, 8:1, 9:16, 16:9, 21:9. Default3:2.
Completed payload
Poll with get_image_job(job_id). On completed:
{
"success": true,
"job_id": "...",
"status": "completed",
"image_url": "https://s3.powerkit.dev/learnwithhasan-files/vidtsx/generated/abc.png",
"model_used": "vidtsx-image-01",
"aspect_ratio": "3:2",
"credits_consumed": 4,
"elapsed_seconds": 4.8
}
Tool: generate_video
Generate a video with one of two VidTSX video models. Both tiers support up to 4 reference images for character / subject consistency, first/last frame anchors for image-to-video, optional native audio generation, and an optional seed for deterministic output.
Models
| Model | Tier | Durations | Aspect ratios |
|---|---|---|---|
vidtsx-video-lite (default) |
Fast, 720p | 4, 5, 6, 8, 10, 12, 15 s | 16:9, 9:16, 1:1, 4:3, 3:4, 21:9 |
vidtsx-video-pro |
High quality | 5, 10 s | 16:9, 9:16, 1:1 |
Cost: 14 credits per second on both tiers. 5 s = 70 cr, 10 s = 140 cr, 15 s = 210 cr.
Arguments
prompt— required, always (including for image-to-video calls).model— optional. Defaultvidtsx-video-lite.aspect_ratio— optional, per-model. Default16:9.duration_seconds— optional, per-model discrete set. Default5.first_frame_url— optional. URL of an image to anchor the first frame (image-to-video when only this is set).last_frame_url— optional. URL of an image to anchor the last frame. Requiresfirst_frame_url.reference_image_urls— optional. Up to 4 URLs.generate_audio— optional. Whentrue, the model generates native audio (background, ambience, voice depending on the model). Defaultfalse.seed— optional integer for deterministic generation.
Completed payload
Poll with get_video_job(job_id). Generation typically takes 30 s to a few minutes; the
rendered MP4 is saved to VidTSX storage so the returned URL is permanent.
{
"success": true,
"job_id": "...",
"status": "completed",
"video_url": "https://s3.powerkit.dev/learnwithhasan-files/vidtsx/videos/abc.mp4",
"model_used": "vidtsx-video-lite",
"aspect_ratio": "16:9",
"duration_seconds": 5,
"has_audio": false,
"credits_consumed": 70,
"elapsed_seconds": 48.2
}
Tool: vidtsx_transcribe
Transcribe audio to text with speaker labels, word-level timestamps, and (per tier) keyterms, highlights, sentiment, or audio-event tagging. Pricing is per minute of audio, rounded up.
Tiers
| Model | Cost | Cap | Features |
|---|---|---|---|
vidtsx-stt-fast |
1 cr / min | 24 min | Words, timestamps |
vidtsx-stt-smart (default) |
2 cr / min | 60 min | Words, timestamps, speaker labels, keyterms, highlights, sentiment |
vidtsx-stt-premium |
3 cr / min | 60 min | Words, timestamps, diarization, audio events. Best for Arabic. |
Examples: 5-min audio = 5 / 10 / 15 cr. 30-min audio = 30 / 60 / 90 cr.
Arguments
model— one of the three tier IDs above. Defaultvidtsx-stt-smart.audio_url— required.http(s)://URL of the source audio. We re-host to VidTSX storage before transcribing.duration_seconds— optional hint. If passed, used to size the upfront credit reservation (fast-fail if your balance is short). The worker probes actual duration server-side and that value wins for billing — over-reservations are automatically refunded, under-reservations top up.language— ISO-639 code (e.g.en,ar) orauto. Defaultauto.word_timestamps— include word-level timestamps in the response. Defaulttrue.keyterms—vidtsx-stt-smartonly. Up to 1000 strings to bias the transcript towards (names, jargon). Loud-fails on other tiers.auto_highlights—vidtsx-stt-smartonly. Defaulttrue.sentiment_analysis—vidtsx-stt-smartonly. Defaultfalse.audio_event_tagging—vidtsx-stt-premiumonly. Tag non-speech events (laughter, applause, music). Defaulttrue.output_format—"json"(default) or"srt". SRT is uploaded to our storage and the URL appears assrt_urlon the completed job.
Completed payload
Poll with get_transcription_job(job_id):
{
"success": true,
"job_id": "...",
"status": "completed",
"model_used": "vidtsx-stt-smart",
"text": "Hello world.",
"words": [
{"text": "Hello", "start": 0.12, "end": 0.45, "confidence": 0.99, "speaker": "A"},
{"text": "world.", "start": 0.50, "end": 0.90, "confidence": 0.97, "speaker": "A"}
],
"utterances": [
{"speaker": "A", "text": "Hello world.", "start": 0.12, "end": 0.90}
],
"highlights": [{"text": "Hello world", "rank": 0.91, "count": 1}],
"detected_language": "en",
"duration_seconds": 60,
"audio_storage_url": "https://s3.powerkit.dev/learnwithhasan-files/vidtsx/audio/abc.mp3",
"srt_url": "https://s3.powerkit.dev/learnwithhasan-files/vidtsx/transcripts/abc.srt",
"credits_consumed": 2,
"elapsed_seconds": 8.4
}
Per-tier optional fields (utterances, highlights, sentiments,
audio_events, srt_url) are omitted when not applicable — the JSON stays
tight.
Status polling
Each submit tool has a free polling companion:
get_image_job(job_id)— status for agenerate_imagejob.get_video_job(job_id)— status for agenerate_videojob.get_transcription_job(job_id)— status for avidtsx_transcribejob.
All three are free, scoped to the calling user (another user's job returns not_found), and return
the same envelope:
- While in flight:
{success: true, status: "pending"or"running"}. - On
completed: tool-specific payload (see each tool above). - On
failed:{success: false, status: "failed", error, error_type}.
Error types
The error_type field on a failed job is a stable enum — branch on it instead of parsing
error messages:
| error_type | Meaning |
|---|---|
validation | Bad input (unknown model, illegal flag combo, claimed duration over tier cap). |
insufficient_credits | Balance below the cost of the requested call (or below the reconciled actual charge for transcription). |
invalid_audio | Audio URL unreachable or unparseable (transcription only). |
audio_too_long | Probed audio exceeds the tier's duration cap (transcription only). Full refund, no charge. |
safety_block | Prompt or reference rejected by the model's safety filter (image / video). |
quota_exceeded | Rate limit hit for the requested model. Retry later. |
timeout | The model didn't respond within the worker's wall-clock budget. |
language_unsupported | Explicit language code rejected by the transcription model. |
storage_error | Couldn't fetch the source audio / image, or couldn't save the output. |
upstream_degraded | Circuit breaker open for that model. Try again shortly. |
unknown | Anything else. Open a ticket with the request_id from the response. |
On any failed job, the credits reserved at submit are refunded automatically. Check the Usage tab to see your MCP activity, credit consumption, and refunds in one place.