3.1 유용한 문자열 관련 함수

3.1.1 `nchar()`

인간이 눈으로 읽을 수 있는 문자의 개수(길이)를 반환
공백, 줄바꿈 표시자(예: \n)도 하나의 문자 개수로 인식
한글의 한 글자는 2 바이트(byte)지만 한 글자로 인식 \(\rightarrow\) byte 단위 반환 가능

# 문자열을 구성하는 문자 개수 반환
nchar(
  x, # 문자형 벡터
  type # "bytes": 바이트 단위 길이 반환
       # "char": 인간이 읽을 수 있는 글자 길이 반환
       # "width": 문자열이 표현된 폭의 길이 반환
)

예시

x <- "Carlos Gardel's song: Por Una Cabeza"
nchar(x)

[1] 36

y <- "abcde\nfghij"
nchar(y)

[1] 11

z <- "양준일: 가나다라마바사"
nchar(z)

[1] 12

# 문자열 벡터
str <- sentences[1:10]
nchar(str)

 [1] 42 43 38 40 36 37 43 43 35 40

s <- c("abc", "가나다", "1234[]", "R programming\n", "\"R\"")

nchar(s, type = "char")

[1]  3  3  6 14  3

nchar(s, type = "byte")

[1]  3  6  6 14  3

nchar(s, type = "width")

[1]  3  6  6 14  3

백터의 원소 개수를 반환하는 length() 함수와는 다름.

3.1.2 `paste()`, `paste0()`

하나 이상의 문자열을 연결하여 하나의 문자열로 만들어주는 함수
Excel의 문자열 연결자인 &와 거의 동일한 기능을 수행

paste(
  ..., # 한 개 이상의 R 객체. 강제로 문자형 변환
  sep  # 연결 구분자: 디폴트 값은 공백(" ")
  collapse # 묶을 객체가 하나의 문자열 벡터인 경우
           # 모든 원소를 collapse 구분자로 묶은 길이가 1인 벡터 반환
)

paste0()은 paste()의 wrapper 함수이고 paste()의 구분자 인수 sep = "" 일 때와 동일한 결과 반환
예시

i <- 1:length(letters)

paste(letters, i) # sep = " "

 [1] "a 1"  "b 2"  "c 3"  "d 4"  "e 5"  "f 6"  "g 7"  "h 8"  "i 9"  "j 10"
[11] "k 11" "l 12" "m 13" "n 14" "o 15" "p 16" "q 17" "r 18" "s 19" "t 20"
[21] "u 21" "v 22" "w 23" "x 24" "y 25" "z 26"

paste(letters, i, sep = "_") # sep = "-"

 [1] "a_1"  "b_2"  "c_3"  "d_4"  "e_5"  "f_6"  "g_7"  "h_8"  "i_9"  "j_10"
[11] "k_11" "l_12" "m_13" "n_14" "o_15" "p_16" "q_17" "r_18" "s_19" "t_20"
[21] "u_21" "v_22" "w_23" "x_24" "y_25" "z_26"

paste0(letters, i) # paste(letters, i, sep = "") 동일

 [1] "a1"  "b2"  "c3"  "d4"  "e5"  "f6"  "g7"  "h8"  "i9"  "j10" "k11" "l12"
[13] "m13" "n14" "o15" "p16" "q17" "r18" "s19" "t20" "u21" "v22" "w23" "x24"
[25] "y25" "z26"

# collapse 인수 활용
paste(letters, collapse = "")

[1] "abcdefghijklmnopqrstuvwxyz"

writeLines(paste(str, collapse = "\n"))

The birch canoe slid on the smooth planks.
Glue the sheet to the dark blue background.
It's easy to tell the depth of a well.
These days a chicken leg is a rare dish.
Rice is often served in round bowls.
The juice of lemons makes fine punch.
The box was thrown beside the parked truck.
The hogs were fed chopped corn and garbage.
Four hours of steady work faced us.
Large size in stockings is hard to sell.

# 3개 이상 객체 묶기
paste("Col", 1:2, c(TRUE, FALSE, TRUE), sep =" ", collapse = "<->")

[1] "Col 1 TRUE<->Col 2 FALSE<->Col 1 TRUE"

# paste 함수 응용
# 스트링 명령어 실행 
exprs <- paste("lm(mpg ~", names(mtcars)[3:5], ", data = mtcars)")
exprs

[1] "lm(mpg ~ disp , data = mtcars)" "lm(mpg ~ hp , data = mtcars)"  
[3] "lm(mpg ~ drat , data = mtcars)"

sapply(1:length(exprs), function(i) coef(eval(parse(text = exprs[i]))))

                   [,1]        [,2]      [,3]
(Intercept) 29.59985476 30.09886054 -7.524618
disp        -0.04121512 -0.06822828  7.678233

3.1.3 `sprintf()`

C 언어의 sprintf() 함수와 동일하며 특정 변수들의 값을 이용해 문자열을 반환함
수치형 값의 소숫점 자리수를 맞추거나 할 때 유용하게 사용
포맷팅 문자열을 통해 수치형의 자릿수를 지정 뿐 아니라 전체 문자열의 길이 및 정렬 가능
대표적인 포맷팅 문자열은 아래 표와 같음.

Format	설명
%s	문자열
%d	정수형
%f	부동 소수점 수
%e, %E	지수형

예시

options()$digits #

[1] 7

pi # 파이 값

[1] 3.141593

sprintf("%f", pi)

[1] "3.141593"

# 소숫점 자리수 3자리 까지 출력
sprintf("%.3f", pi)

[1] "3.142"

# 소숫점 출력 하지 않음
sprintf("%1.0f", pi)

[1] "3"

# 출력 문자열의 길이를 5로 고정 후
# 소숫점 한 자리까지 출력
sprintf("%5.1f", pi)

[1] "  3.1"

nchar(sprintf("%5.1f", pi))

[1] 5

# 빈 공백에 0값 대입
sprintf("%05.1f", pi)

[1] "003.1"

# 양수/음수 표현
sprintf("%+f", pi)

[1] "+3.141593"

sprintf("%+f", -pi)

[1] "-3.141593"

# 출력 문자열의 첫 번째 값을 공백으로
sprintf("% f", pi)

[1] " 3.141593"

# 왼쪽 정렬
sprintf("%-10.3f", pi)

[1] "3.142     "

# 수치형에 정수 포맷을 입력하면?
sprintf("%d", pi)

Error in sprintf("%d", pi): '%d'는 유효하지 않은 포맷입니다; 수치형 객체들에는 포맷 %f, %e, %g 또는 %a를 사용해 주세요

sprintf("%d", 100); sprintf("%d", 20L)

[1] "100"

[1] "20"

# 지수형
sprintf("%e", pi)

[1] "3.141593e+00"

sprintf("%E", pi)

[1] "3.141593E+00"

sprintf("%.2E", pi)

[1] "3.14E+00"

# 문자열 
sprintf("%s = %.2f", "Mean", pi)

[1] "Mean = 3.14"

# 응용 
mn <- apply(cars, 2, mean)
std <- apply(cars, 2, sd)

# Mean ± SD 형태로 결과 출력 (소숫점 2자리 고정)
res <- sprintf("%.2f \U00B1 %.2f", mn, std)
resp <- paste(paste0(names(cars), ": ", res), collapse = "\n")
writeLines(resp)

speed: 15.40 ± 5.29
dist: 42.98 ± 25.77

3.1.4 `substr()`

문자열에서 특정 부분을 추출하는 함수
보통 한 문자열이 주어졌을 때 start에서 end 까지 추출

substr(
  x, # 문자형 벡터
  start, # 문자열 추출 시작 위치
  stop # 무자열 추출 종료 위치
)

예시

cnu <- "충남대학교 자연과학대학 정보통계학과"
substr(cnu, start = 14, stop = nchar(str))

[1] "정보통계학과"

# 문자열 벡터에서 각 원소 별 적용
substr(str, 5, 15)

 [1] "birch canoe" " the sheet " " easy to te" "e days a ch" " is often s"
 [6] "juice of le" "box was thr" "hogs were f" " hours of s" "e size in s"

3.1.5 `tolower()`, `toupper()`

대문자를 소문자(tolower()) 혹은 소문자를 대문자(toupper())로 변환

LETTERS; tolower(LETTERS)

 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"

letters; toupper(letters)

 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"

 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

3.1 유용한 문자열 관련 함수

3.1.1 nchar()

3.1.2 paste(), paste0()

3.1.3 sprintf()

3.1.4 substr()

3.1.5 tolower(), toupper()

3.1.1 `nchar()`

3.1.2 `paste()`, `paste0()`

3.1.3 `sprintf()`

3.1.4 `substr()`

3.1.5 `tolower()`, `toupper()`