Content Loading

Load text content from URLs, PDFs, Word documents, text files, and CSVs

Basic Usage

from SimplerLLM.tools.generic_loader import load_content

# From a URL
doc = load_content("https://example.com/article")
print(doc.content)

# From a local file
doc = load_content("report.pdf")
print(doc.content)

Supported Formats

Format	Extensions	Parser
Web articles	URLs (`http://`, `https://`)	newspaper
PDF	`.pdf`	PyPDF2
Word	`.docx`	python-docx
Text	`.txt`	Built-in
CSV (as text)	`.csv`	Built-in
Other	Any	Falls back to plain text

Note: For structured CSV data with row/column access, use read_csv_file() instead.

Loading from URL

Extracts clean text from web articles and blog posts:

from SimplerLLM.tools.generic_loader import load_content

doc = load_content("https://example.com/blog-post")

print(doc.title)            # Article title
print(doc.word_count)       # 1250
print(doc.content[:200])    # First 200 characters

Loading from File

Works with PDF, Word, and text files:

from SimplerLLM.tools.generic_loader import load_content

# PDF
doc = load_content("report.pdf")
print(f"{doc.word_count} words, {doc.file_size} bytes")

# Word document
doc = load_content("notes.docx")
print(doc.content)

# Text file
doc = load_content("data.txt")
print(doc.content)

Loading CSV Files

For structured CSV access with row and column data:

from SimplerLLM.tools.file_loader import read_csv_file

csv_doc = read_csv_file("data.csv")

print(csv_doc.row_count)        # 100
print(csv_doc.column_count)     # 5
print(csv_doc.content[0])       # First row (list of strings)
print(csv_doc.content[0][0])    # First cell

Response Format

TextDocument

Returned by load_content():

doc = load_content("report.pdf")

print(doc.content)           # The extracted text
print(doc.word_count)        # 1250
print(doc.character_count)   # 7500
print(doc.file_size)         # 45000 (bytes)
print(doc.title)             # Title (populated for URLs)
print(doc.url_or_path)       # Source path or URL

Field	Type	Description
`content`	`str`	The extracted text content
`word_count`	`int`	Number of words
`character_count`	`int`	Number of characters
`file_size`	`int` or `None`	File size in bytes
`title`	`str` or `None`	Document title (populated for URLs)
`url_or_path`	`str` or `None`	Source URL or file path

CSVDocument

Returned by read_csv_file():

Field	Type	Description
`content`	`List[List[str]]`	Rows of CSV data
`row_count`	`int`	Number of rows
`column_count`	`int`	Number of columns
`total_fields`	`int`	Total number of fields across all rows
`file_size`	`int` or `None`	File size in bytes
`url_or_path`	`str` or `None`	File path

Content Loading

Basic Usage

Supported Formats

Loading from URL

Loading from File

Loading CSV Files

Response Format

TextDocument

CSVDocument

We use cookies

Essential

Analytics

Marketing