SAS String Functions
Quick reference for character and string manipulation functions in SAS Base. All functions work in DATA step and most work in PROC SQL as well.
Length & Trimming
| Function | Description | Example | Result |
|---|---|---|---|
LENGTH(str) | Length including trailing spaces | LENGTH('abc ') | 6 |
LENGTHN(str) | Length excluding trailing spaces | LENGTHN('abc ') | 3 |
TRIM(str) | Remove trailing spaces | TRIM('abc ') | 'abc' |
STRIP(str) | Remove leading and trailing spaces | STRIP(' abc ') | 'abc' |
LEFT(str) | Remove leading spaces (left-align) | LEFT(' abc') | 'abc' |
TRIMN(str) | Remove trailing spaces, returns empty string (not blank) if all spaces | TRIMN(' ') | '' |
SAS character variables are fixed-length. Trailing spaces are always present up to the variable's defined length. Use STRIP or TRIM when concatenating.
Case Conversion
| Function | Description | Example | Result |
|---|---|---|---|
UPCASE(str) | Convert to uppercase | UPCASE('Hello') | 'HELLO' |
LOWCASE(str) | Convert to lowercase | LOWCASE('Hello') | 'hello' |
PROPCASE(str) | Title case (first letter of each word) | PROPCASE('hello world') | 'Hello World' |
PROPCASE(str, delims) | Title case with custom delimiters | PROPCASE('hello-world', '-') | 'Hello-World' |
Substrings & Position
| Function | Description | Example | Result |
|---|---|---|---|
SUBSTR(str, pos, len) | Extract substring (1-indexed) | SUBSTR('abcdef', 2, 3) | 'bcd' |
SUBSTR(str, pos) | From position to end | SUBSTR('abcdef', 4) | 'def' |
INDEX(str, substr) | Position of first occurrence (0 if not found) | INDEX('abcabc', 'bc') | 2 |
INDEXC(str, chars) | Position of first character from set | INDEXC('ab12', '0123456789') | 3 |
INDEXW(str, word) | Position of whole word | INDEXW('one two', 'two') | 5 |
FIND(str, substr, dir, start) | Find with direction and start | FIND('abcabc','bc','B') | 5 (from back) |
FINDC(str, chars, dir) | Find character from set with direction | FINDC('abc123','0123456789') | 4 |
Note:
INDEX returns 0 when not found (not -1 like many other languages). FIND also returns 0 when not found.Concatenation
| Method | Description | Example | Result |
|---|---|---|---|
|| | Concatenate (preserves trailing spaces) | 'abc ' || 'def' | 'abc def' |
CAT(args) | Concatenate (preserves trailing spaces) | CAT('abc', ' ', 'def') | 'abc def' |
CATS(args) | Concatenate stripping all trailing spaces | CATS('abc ', 'def') | 'abcdef' |
CATX(sep, args) | Concatenate with separator, stripping spaces | CATX('-', 'a', 'b', 'c') | 'a-b-c' |
CATT(args) | Concatenate trimming trailing spaces only | CATT(' a ', ' b') | ' a b' |
Best practice: Use
CATS or CATX instead of || to avoid unwanted trailing spaces from fixed-length character variables.Replace & Translate
| Function | Description | Example | Result |
|---|---|---|---|
TRANWRD(str, from, to) | Replace all occurrences of a word/string | TRANWRD('a b a', 'a', 'x') | 'x b x' |
TRANSLATE(str, to, from) | Replace characters one-for-one | TRANSLATE('abc', 'xyz', 'abc') | 'xyz' |
COMPRESS(str, chars, mods) | Remove specified characters | COMPRESS('a1b2c3','0123456789') | 'abc' |
COMPRESS(str, '', 'kd') | Keep only digits (modifier k = keep) | COMPRESS('a1b2','','kd') | '12' |
PRXCHANGE(regexp, n, str) | Regex replace (n times, -1 = all) | PRXCHANGE('s/\d+/X/', -1, 'a1b22') | 'aXbX' |
COMPRESS modifiers: a=letters, d=digits, s=spaces, p=punctuation, k=keep (instead of remove).
Padding & Alignment
| Function | Description | Example | Result |
|---|---|---|---|
REPEAT(str, n) | Repeat string n+1 times | REPEAT('ab', 2) | 'ababab' |
SUBSTR(var, 1, n) = str | Pad by assigning to fixed-length var | Var length 10, assign 'abc' | 'abc ' |
PUT(num, z5.) | Zero-pad a number as string | PUT(42, z5.) | '00042' |
Type Conversion
| Function | Description | Example | Result |
|---|---|---|---|
INPUT(str, informat) | Character → numeric | INPUT('3.14', 8.2) | 3.14 |
PUT(num, format) | Numeric → character | PUT(3.14, 8.2) | ' 3.14' |
INPUT(str, $char20.) | Read character with informat | Reads up to 20 chars |
Pattern Matching (Regex)
| Function | Description | Example |
|---|---|---|
PRXMATCH(regexp, str) | Returns position of match (0 if none) | PRXMATCH('/\d+/', 'abc123') → 4 |
PRXPARSE(regexp) | Compile regex, returns pattern ID | pid = PRXPARSE('/\d+/'); |
PRXCHANGE(regexp, n, str) | Replace with regex | PRXCHANGE('s/\s+/ /', -1, str) |
PRXPOSN(pid, cap, start, len) | Get capture group position & length | After PRXNEXT or PRXMATCH |
/* Compile once for performance */
pid = prxparse('/(\d{4})-(\d{2})-(\d{2})/');
if prxmatch(pid, date_str) then do;
call prxposn(pid, 1, start, len);
year = substr(date_str, start, len);
end;Common Patterns
/* Check if string contains only digits */
is_numeric = (compress(str,'','kd') = str and str ne '');
/* Extract digits from mixed string */
digits_only = compress(str, '', 'kd');
/* Trim and collapse internal spaces */
clean = prxchange('s/\s+/ /', -1, strip(str));
/* Left-pad number to fixed width */
padded = put(id, z8.);
/* Case-insensitive search */
if index(upcase(str), upcase(search_term)) > 0;
/* Split on delimiter (first part) */
first_part = scan(str, 1, ',');
/* Count occurrences of substring */
n = (length(str) - length(compress(str, target))) / length(target);Word & Token Parsing
| Function | Description | Example | Result |
|---|---|---|---|
SCAN(str, n, delim) | Return nth word (negative = from end) | SCAN('a,b,c', 2, ',') | 'b' |
SCAN(str, -1) | Last word (default delimiters) | SCAN('one two three', -1) | 'three' |
COUNTW(str, delim) | Count words/tokens | COUNTW('a,b,c', ',') | 3 |
WORD(str, which, delim) | Alias for SCAN in some contexts |