text_utils
toolkitx.text_utils
Functions
split_text_by_word_count(text, max_words=300, overlap=0)
Split a long text into overlapping chunks (trunks), each with at most max_words words,
and overlap words overlapping between consecutive trunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The input text. |
required |
max_words
|
int
|
Maximum number of words per chunk. |
300
|
overlap
|
int
|
Number of overlapping words between adjacent chunks. |
0
|
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of text chunks. |
Examples:
>>> split_text_by_word_count("one two three four five", max_words=2)
['one two', 'three four', 'five']
>>> split_text_by_word_count("one two three four five", max_words=3, overlap=1)
['one two three', 'three four five']
Source code in toolkitx/text_utils.py
truncate_text_smart(text, limit=100, mode='char', suffix='...', tolerance=10)
Smartly truncates text based on character or word limit, with tolerance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The original string. |
required |
limit
|
int
|
The target truncation length (in characters or words). |
100
|
mode
|
str
|
Truncation mode: 'char' for character-based, 'word' for word-based. |
'char'
|
suffix
|
str
|
The suffix to append after truncation. |
'...'
|
tolerance
|
int
|
The allowed deviation from the limit for smart truncation. |
10
|
Returns:
| Type | Description |
|---|---|
str
|
The truncated string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the mode is not 'char' or 'word'. |
Examples:
>>> truncate_text_smart("Hello World. This is a test.", limit=12)
'Hello World...'
>>> truncate_text_smart("Hello World. This is a test.", limit=15, mode="word")
'Hello World. This is a test.'
>>> truncate_text_smart("A very long sentence that should be truncated by word count.", limit=5, mode="word", tolerance=2)
'A very long sentence that...'
Source code in toolkitx/text_utils.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |