SAS String Functions

Quick reference for character and string manipulation functions in SAS Base. All functions work in DATA step and most work in PROC SQL as well.

Length & Trimming

Function	Description	Example	Result
`LENGTH(str)`	Length including trailing spaces	`LENGTH('abc ')`	`6`
`LENGTHN(str)`	Length excluding trailing spaces	`LENGTHN('abc ')`	`3`
`TRIM(str)`	Remove trailing spaces	`TRIM('abc ')`	`'abc'`
`STRIP(str)`	Remove leading and trailing spaces	`STRIP(' abc ')`	`'abc'`
`LEFT(str)`	Remove leading spaces (left-align)	`LEFT(' abc')`	`'abc'`
`TRIMN(str)`	Remove trailing spaces, returns empty string (not blank) if all spaces	`TRIMN(' ')`	`''`

SAS character variables are fixed-length. Trailing spaces are always present up to the variable's defined length. Use STRIP or TRIM when concatenating.

Case Conversion

Function	Description	Example	Result
`UPCASE(str)`	Convert to uppercase	`UPCASE('Hello')`	`'HELLO'`
`LOWCASE(str)`	Convert to lowercase	`LOWCASE('Hello')`	`'hello'`
`PROPCASE(str)`	Title case (first letter of each word)	`PROPCASE('hello world')`	`'Hello World'`
`PROPCASE(str, delims)`	Title case with custom delimiters	`PROPCASE('hello-world', '-')`	`'Hello-World'`

Substrings & Position

Function	Description	Example	Result
`SUBSTR(str, pos, len)`	Extract substring (1-indexed)	`SUBSTR('abcdef', 2, 3)`	`'bcd'`
`SUBSTR(str, pos)`	From position to end	`SUBSTR('abcdef', 4)`	`'def'`
`INDEX(str, substr)`	Position of first occurrence (0 if not found)	`INDEX('abcabc', 'bc')`	`2`
`INDEXC(str, chars)`	Position of first character from set	`INDEXC('ab12', '0123456789')`	`3`
`INDEXW(str, word)`	Position of whole word	`INDEXW('one two', 'two')`	`5`
`FIND(str, substr, dir, start)`	Find with direction and start	`FIND('abcabc','bc','B')`	`5` (from back)
`FINDC(str, chars, dir)`	Find character from set with direction	`FINDC('abc123','0123456789')`	`4`

Note: INDEX returns 0 when not found (not -1 like many other languages). FIND also returns 0 when not found.

Concatenation

Method	Description	Example	Result
`\|\|`	Concatenate (preserves trailing spaces)	`'abc ' \|\| 'def'`	`'abc def'`
`CAT(args)`	Concatenate (preserves trailing spaces)	`CAT('abc', ' ', 'def')`	`'abc def'`
`CATS(args)`	Concatenate stripping all trailing spaces	`CATS('abc ', 'def')`	`'abcdef'`
`CATX(sep, args)`	Concatenate with separator, stripping spaces	`CATX('-', 'a', 'b', 'c')`	`'a-b-c'`
`CATT(args)`	Concatenate trimming trailing spaces only	`CATT(' a ', ' b')`	`' a b'`

Best practice: Use CATS or CATX instead of || to avoid unwanted trailing spaces from fixed-length character variables.

Replace & Translate

Function	Description	Example	Result
`TRANWRD(str, from, to)`	Replace all occurrences of a word/string	`TRANWRD('a b a', 'a', 'x')`	`'x b x'`
`TRANSLATE(str, to, from)`	Replace characters one-for-one	`TRANSLATE('abc', 'xyz', 'abc')`	`'xyz'`
`COMPRESS(str, chars, mods)`	Remove specified characters	`COMPRESS('a1b2c3','0123456789')`	`'abc'`
`COMPRESS(str, '', 'kd')`	Keep only digits (modifier `k` = keep)	`COMPRESS('a1b2','','kd')`	`'12'`
`PRXCHANGE(regexp, n, str)`	Regex replace (n times, -1 = all)	`PRXCHANGE('s/\d+/X/', -1, 'a1b22')`	`'aXbX'`

COMPRESS modifiers: a=letters, d=digits, s=spaces, p=punctuation, k=keep (instead of remove).

Padding & Alignment

Function	Description	Example	Result
`REPEAT(str, n)`	Repeat string n+1 times	`REPEAT('ab', 2)`	`'ababab'`
`SUBSTR(var, 1, n) = str`	Pad by assigning to fixed-length var	Var length 10, assign `'abc'`	`'abc '`
`PUT(num, z5.)`	Zero-pad a number as string	`PUT(42, z5.)`	`'00042'`

Type Conversion

Function	Description	Example	Result
`INPUT(str, informat)`	Character → numeric	`INPUT('3.14', 8.2)`	`3.14`
`PUT(num, format)`	Numeric → character	`PUT(3.14, 8.2)`	`' 3.14'`
`INPUT(str, $char20.)`	Read character with informat	Reads up to 20 chars

Pattern Matching (Regex)

Function	Description	Example
`PRXMATCH(regexp, str)`	Returns position of match (0 if none)	`PRXMATCH('/\d+/', 'abc123')` → `4`
`PRXPARSE(regexp)`	Compile regex, returns pattern ID	`pid = PRXPARSE('/\d+/');`
`PRXCHANGE(regexp, n, str)`	Replace with regex	`PRXCHANGE('s/\s+/ /', -1, str)`
`PRXPOSN(pid, cap, start, len)`	Get capture group position & length	After `PRXNEXT` or `PRXMATCH`

/* Compile once for performance */
pid = prxparse('/(\d{4})-(\d{2})-(\d{2})/');
if prxmatch(pid, date_str) then do;
  call prxposn(pid, 1, start, len);
  year = substr(date_str, start, len);
end;

Common Patterns

/* Check if string contains only digits */
is_numeric = (compress(str,'','kd') = str and str ne '');

/* Extract digits from mixed string */
digits_only = compress(str, '', 'kd');

/* Trim and collapse internal spaces */
clean = prxchange('s/\s+/ /', -1, strip(str));

/* Left-pad number to fixed width */
padded = put(id, z8.);

/* Case-insensitive search */
if index(upcase(str), upcase(search_term)) > 0;

/* Split on delimiter (first part) */
first_part = scan(str, 1, ',');

/* Count occurrences of substring */
n = (length(str) - length(compress(str, target))) / length(target);

Word & Token Parsing

Function	Description	Example	Result
`SCAN(str, n, delim)`	Return nth word (negative = from end)	`SCAN('a,b,c', 2, ',')`	`'b'`
`SCAN(str, -1)`	Last word (default delimiters)	`SCAN('one two three', -1)`	`'three'`
`COUNTW(str, delim)`	Count words/tokens	`COUNTW('a,b,c', ',')`	`3`
`WORD(str, which, delim)`	Alias for SCAN in some contexts