YouTube Transcripts
Extract transcripts from YouTube videos as plain text or with timing data
Setup
Set your SearchAPI key in .env:
SEARCHAPI_API_KEY=your_searchapi_key
Basic Usage
Get the full transcript as a single string:
from SimplerLLM.tools.youtube import get_youtube_transcript
transcript = get_youtube_transcript("https://www.youtube.com/watch?v=VIDEO_ID")
print(transcript) # "First sentence. Second sentence. Third sentence."
With Timing
Get the transcript with start time and duration for each segment:
from SimplerLLM.tools.youtube import get_youtube_transcript_with_timing
transcript = get_youtube_transcript_with_timing("https://www.youtube.com/watch?v=VIDEO_ID")
for segment in transcript.segments:
print(f"[{segment.start:.1f}s] {segment.text}")
Functions
| Function | Returns | Description |
|---|---|---|
get_youtube_transcript(video_url) |
str |
Full transcript as a single string with periods added |
get_youtube_transcript_with_timing(video_url) |
Transcript |
Transcript with timing data per segment |
Both functions accept the same parameter:
| Parameter | Type | Description |
|---|---|---|
video_url |
str |
YouTube video URL |
Response Format
get_youtube_transcript_with_timing() returns a Transcript object:
transcript = get_youtube_transcript_with_timing("https://youtu.be/VIDEO_ID")
print(len(transcript.segments)) # Number of segments
segment = transcript.segments[0]
print(segment.text) # "Hello and welcome"
print(segment.start) # 0.0 (seconds)
print(segment.duration) # 2.5 (seconds)
| Field | Type | Description |
|---|---|---|
Transcript.segments |
List[TranscriptSegment] |
List of transcript segments |
TranscriptSegment.text |
str |
Segment text content |
TranscriptSegment.start |
float |
Start time in seconds |
TranscriptSegment.duration |
float |
Duration in seconds |
Supported URL Formats
Both standard and shortened YouTube URLs work:
# Standard
get_youtube_transcript("https://www.youtube.com/watch?v=VIDEO_ID")
# Shortened
get_youtube_transcript("https://youtu.be/VIDEO_ID")