Chunk Any Piece of Text
🔑 ID:
35318
👨💻
Python
🕒
18/03/2024
Free
Description:
This is a function that you can use to split/chunk any piece of text into paragraphs or sentences depending on what you need.
Code:
import re from typing import List def chunk_text(text, chunk_by) -> List[str]: """ Splits the input text into chunks based on the specified granularity (sentences or paragraphs). Parameters: - text: The input text to be chunked. - chunk_by: The granularity for chunking ('sentence' or 'paragraph'). Returns: - A list of strings, where each string is a chunk of the original text. """ if chunk_by == "sentence": sentences = re.split(r'(?<!\d)[.?!](?!\d)', text) sentences = [sentence.strip() for sentence in sentences if sentence.strip()] return sentences elif chunk_by == "paragraph": paragraphs = [paragraph.strip() for paragraph in text.split("\n") if paragraph.strip()] return paragraphs else: raise ValueError("Invalid chunk_by value. Choose 'sentence' or 'paragraph'.")
GitHub Link
✖️ Not Available
Download File
✖️ Not Available
If you’re encountering any problems or need further assistance with this code, we’re here to help! Join our community on the forum or Discord for support, tips, and discussion.