VidTSX / Docs / MCP
MCP — Model Context Protocol

Drive Vidtsx from Claude.

Generate images, render videos, and transcribe audio straight from Claude conversations, flows, and agents — all authenticated by your VidTSX license key, billed against your credit balance, and visible in your usage log.

Overview

MCP (Model Context Protocol) lets AI assistants like Claude call your own server-side tools. The VidTSX MCP exposes three credit-bearing capabilities — image generation, video generation, and speech-to-text — plus a free whoami diagnostic. Every call is billed against your VidTSX credit balance and recorded in your Usage log.

The credit-bearing tools are asynchronous: submit returns a job_id immediately, and you poll the matching get_*_job tool until the job is completed or failed. Credits are reserved at submit and refunded automatically on any failure.

1. Generate an MCP token

Claude.ai's connector UI accepts only a URL — it can't send custom HTTP headers — so the credential has to live in the URL itself. To keep your main license key out of URLs, browser histories, and access logs, the keyed route uses a separate MCP token bound to your license. Each token can be revoked independently; revoking one never affects your license key or the desktop app.

Generate one from the License tab (log in first) — click Generate MCP token and copy the URL once. The format is:

https://learnwithhasan.com/vidtsx/mcp/key/mcpt_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/
Treat this URL as a credential. Anyone who has it can spend credits from your account. Don't paste it into shared docs, screenshots, or public chats. If it leaks, revoke that single token from the License tab — your license key keeps working.

2. Connect from Claude.ai

  1. Open Claude.ai → Settings → Connectors.
  2. Click Add custom connector.
  3. Name: VidTSX MCP (or anything you like).
  4. Paste the URL from step 1.
  5. Leave the OAuth fields empty — auth is handled by the URL.
  6. Save, then enable the connector for your conversation.

3. Connect with Bearer token (Desktop / Code)

Clients that can send custom HTTP headers should use the canonical, header-authenticated URL — your license key stays out of the URL bar and access logs.

URL:    https://learnwithhasan.com/vidtsx/mcp/
Header: Authorization: Bearer VTSX-XXXX-XXXX-XXXX

For Claude Code, add the connector via the slash command or your MCP config file.

4. Test the connection

Once connected, ask Claude:

Use the VidTSX MCP whoami tool.

Expected output:

# Vidtsx MCP — whoami

- **Email:** [email protected]
- **License key:** `VTSX-XXXX-XXXX-XXXX`
- **Tier:** free
- **Credits available:** 0
- **Connection:** ok

If you get this card, your auth chain is working end-to-end.

5. Diagnostics

If something looks off, hit the status endpoint directly — it always returns 200 so it's safe to probe from a browser or script:

GET https://learnwithhasan.com/vidtsx/mcp/key/mcpt_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/status/
Response
{
  "mcp_online": true,
  "authenticated": true,
  "user": {"email": "[email protected]", "tier": "free"},
  "credits_available": 0,
  "license_key": "VTSX-XXXX-XXXX-XXXX"
}
  • mcp_online: false → the MCP server is down.
  • authenticated: false → your MCP token was revoked or never existed. Generate a new one from the License tab.
  • Otherwise it's a Claude.ai connector cache — disconnect and re-add the connector.

How the credit-bearing tools work

Every credit-bearing tool — image, video, transcription — is async: the submit tool reserves credits, queues a background job, and returns a job_id immediately. You then poll the matching get_*_job(job_id) tool until status is completed (read the result) or failed (read error and error_type).

Credits are reserved at submit and refunded on failure. If a job errors for any reason — safety block, timeout, audio too long, anything — your balance is restored. You never net-pay for a job that didn't produce a result.

Submit response shape (the same for all three tools):

JSON
{
  "success": true,
  "job_id": "00000000-0000-0000-0000-000000000000",
  "status": "pending",
  "request_id": "..."
}

While in flight you'll see:

JSON
{ "success": true, "job_id": "...", "status": "pending" }  // or "running"

Completed and failed shapes are per-tool — see each tool's section below.

Tool: whoami

whoami() Free

Diagnostic tool. Returns the calling user's email, license tier, and current credit balance. Useful for confirming your MCP connection works and seeing how many credits you have before invoking a paid tool.

Arguments: none.

Tool: generate_image

generate_image(prompt, …) 4 / 7 / 14 cr

Generate an image with one of three VidTSX image models. Supports up to 14 reference images for character / location / style consistency.

Models

ModelQualityCost
vidtsx-image-01 (default)Fast4 credits
vidtsx-image-02Balanced7 credits
vidtsx-image-03High quality14 credits

Arguments

  • prompt — required. Narrative description of the image. Don't redescribe details already visible in your reference images.
  • reference_image_urls — optional. Up to 14 http(s):// URLs. Pass the image_url returned from a previous job to chain shots without re-uploading.
  • model — optional. Default vidtsx-image-01.
  • aspect_ratio — optional. One of 1:1, 1:4, 1:8, 2:3, 3:2, 3:4, 4:1, 4:3, 4:5, 5:4, 8:1, 9:16, 16:9, 21:9. Default 3:2.

Completed payload

Poll with get_image_job(job_id). On completed:

JSON
{
  "success": true,
  "job_id": "...",
  "status": "completed",
  "image_url": "https://s3.powerkit.dev/learnwithhasan-files/vidtsx/generated/abc.png",
  "model_used": "vidtsx-image-01",
  "aspect_ratio": "3:2",
  "credits_consumed": 4,
  "elapsed_seconds": 4.8
}

Tool: generate_video

generate_video(prompt, …) 14 cr / sec

Generate a video with one of two VidTSX video models. Both tiers support up to 4 reference images for character / subject consistency, first/last frame anchors for image-to-video, optional native audio generation, and an optional seed for deterministic output.

Models

ModelTierDurationsAspect ratios
vidtsx-video-lite (default) Fast, 720p 4, 5, 6, 8, 10, 12, 15 s 16:9, 9:16, 1:1, 4:3, 3:4, 21:9
vidtsx-video-pro High quality 5, 10 s 16:9, 9:16, 1:1

Cost: 14 credits per second on both tiers. 5 s = 70 cr, 10 s = 140 cr, 15 s = 210 cr.

Arguments

  • prompt — required, always (including for image-to-video calls).
  • model — optional. Default vidtsx-video-lite.
  • aspect_ratio — optional, per-model. Default 16:9.
  • duration_seconds — optional, per-model discrete set. Default 5.
  • first_frame_url — optional. URL of an image to anchor the first frame (image-to-video when only this is set).
  • last_frame_url — optional. URL of an image to anchor the last frame. Requires first_frame_url.
  • reference_image_urls — optional. Up to 4 URLs.
  • generate_audio — optional. When true, the model generates native audio (background, ambience, voice depending on the model). Default false.
  • seed — optional integer for deterministic generation.

Completed payload

Poll with get_video_job(job_id). Generation typically takes 30 s to a few minutes; the rendered MP4 is saved to VidTSX storage so the returned URL is permanent.

JSON
{
  "success": true,
  "job_id": "...",
  "status": "completed",
  "video_url": "https://s3.powerkit.dev/learnwithhasan-files/vidtsx/videos/abc.mp4",
  "model_used": "vidtsx-video-lite",
  "aspect_ratio": "16:9",
  "duration_seconds": 5,
  "has_audio": false,
  "credits_consumed": 70,
  "elapsed_seconds": 48.2
}

Tool: vidtsx_transcribe

vidtsx_transcribe(model, audio_url, …) 1 / 2 / 3 cr / min

Transcribe audio to text with speaker labels, word-level timestamps, and (per tier) keyterms, highlights, sentiment, or audio-event tagging. Pricing is per minute of audio, rounded up.

Tiers

ModelCostCapFeatures
vidtsx-stt-fast 1 cr / min 24 min Words, timestamps
vidtsx-stt-smart (default) 2 cr / min 60 min Words, timestamps, speaker labels, keyterms, highlights, sentiment
vidtsx-stt-premium 3 cr / min 60 min Words, timestamps, diarization, audio events. Best for Arabic.

Examples: 5-min audio = 5 / 10 / 15 cr. 30-min audio = 30 / 60 / 90 cr.

Arguments

  • model — one of the three tier IDs above. Default vidtsx-stt-smart.
  • audio_url — required. http(s):// URL of the source audio. We re-host to VidTSX storage before transcribing.
  • duration_seconds — optional hint. If passed, used to size the upfront credit reservation (fast-fail if your balance is short). The worker probes actual duration server-side and that value wins for billing — over-reservations are automatically refunded, under-reservations top up.
  • language — ISO-639 code (e.g. en, ar) or auto. Default auto.
  • word_timestamps — include word-level timestamps in the response. Default true.
  • keytermsvidtsx-stt-smart only. Up to 1000 strings to bias the transcript towards (names, jargon). Loud-fails on other tiers.
  • auto_highlightsvidtsx-stt-smart only. Default true.
  • sentiment_analysisvidtsx-stt-smart only. Default false.
  • audio_event_taggingvidtsx-stt-premium only. Tag non-speech events (laughter, applause, music). Default true.
  • output_format"json" (default) or "srt". SRT is uploaded to our storage and the URL appears as srt_url on the completed job.

Completed payload

Poll with get_transcription_job(job_id):

JSON
{
  "success": true,
  "job_id": "...",
  "status": "completed",
  "model_used": "vidtsx-stt-smart",
  "text": "Hello world.",
  "words": [
    {"text": "Hello",  "start": 0.12, "end": 0.45, "confidence": 0.99, "speaker": "A"},
    {"text": "world.", "start": 0.50, "end": 0.90, "confidence": 0.97, "speaker": "A"}
  ],
  "utterances": [
    {"speaker": "A", "text": "Hello world.", "start": 0.12, "end": 0.90}
  ],
  "highlights": [{"text": "Hello world", "rank": 0.91, "count": 1}],
  "detected_language": "en",
  "duration_seconds": 60,
  "audio_storage_url": "https://s3.powerkit.dev/learnwithhasan-files/vidtsx/audio/abc.mp3",
  "srt_url": "https://s3.powerkit.dev/learnwithhasan-files/vidtsx/transcripts/abc.srt",
  "credits_consumed": 2,
  "elapsed_seconds": 8.4
}

Per-tier optional fields (utterances, highlights, sentiments, audio_events, srt_url) are omitted when not applicable — the JSON stays tight.

Status polling

Each submit tool has a free polling companion:

  • get_image_job(job_id) — status for a generate_image job.
  • get_video_job(job_id) — status for a generate_video job.
  • get_transcription_job(job_id) — status for a vidtsx_transcribe job.

All three are free, scoped to the calling user (another user's job returns not_found), and return the same envelope:

  • While in flight: {success: true, status: "pending" or "running"}.
  • On completed: tool-specific payload (see each tool above).
  • On failed: {success: false, status: "failed", error, error_type}.

Error types

The error_type field on a failed job is a stable enum — branch on it instead of parsing error messages:

error_typeMeaning
validationBad input (unknown model, illegal flag combo, claimed duration over tier cap).
insufficient_creditsBalance below the cost of the requested call (or below the reconciled actual charge for transcription).
invalid_audioAudio URL unreachable or unparseable (transcription only).
audio_too_longProbed audio exceeds the tier's duration cap (transcription only). Full refund, no charge.
safety_blockPrompt or reference rejected by the model's safety filter (image / video).
quota_exceededRate limit hit for the requested model. Retry later.
timeoutThe model didn't respond within the worker's wall-clock budget.
language_unsupportedExplicit language code rejected by the transcription model.
storage_errorCouldn't fetch the source audio / image, or couldn't save the output.
upstream_degradedCircuit breaker open for that model. Try again shortly.
unknownAnything else. Open a ticket with the request_id from the response.

On any failed job, the credits reserved at submit are refunded automatically. Check the Usage tab to see your MCP activity, credit consumption, and refunds in one place.