Skip to content

Chunk Any Piece of Text

🔑 ID:

35318

👨‍💻

Python

🕒

18/03/2024
Free

Description:

This is a function that you can use to split/chunk any piece of text into paragraphs or sentences depending on what you need.

 

Code:

import re
from typing import List

def chunk_text(text, chunk_by) -> List[str]:
    """
    Splits the input text into chunks based on the specified granularity (sentences or paragraphs).
    
    Parameters:
    - text: The input text to be chunked.
    - chunk_by: The granularity for chunking ('sentence' or 'paragraph').
    
    Returns:
    - A list of strings, where each string is a chunk of the original text.
    """
    if chunk_by == "sentence":
        sentences = re.split(r'(?<!\d)[.?!](?!\d)', text)
        sentences = [sentence.strip() for sentence in sentences if sentence.strip()]
        return sentences
    elif chunk_by == "paragraph":
        paragraphs = [paragraph.strip() for paragraph in text.split("\n") if paragraph.strip()]
        return paragraphs
    else:
        raise ValueError("Invalid chunk_by value. Choose 'sentence' or 'paragraph'.")

 

Untitled design (82)

GitHub Link

✖️ Not Available

Untitled design (83)

Download File

✖️ Not Available

If you’re encountering any problems or need further assistance with this code, we’re here to help! Join our community on the forum or Discord for support, tips, and discussion.