The study area includes four wilderness areas located in
the Roosevelt National Forest of northern Colorado.
Each observation is a 30m x 30m patch.
You are asked to predict an integer classification for the forest cover type(FCT).

The seven types are:

1 - Spruce/Fir
2 - Lodgepole Pine
3 - Ponderosa Pine
4 - Cottonwood/Willow
5 - Aspen
6 - Douglas-fir
7 - Krummholz

The training set (15120 observations) contains both features and the Cover_Type.
The test set contains only the features.
You must predict the Cover_Type for every row in the test set (565892 observations).

Data Fields

Elevation - 미터 단위 고도
Aspect - 방위각의 종횡비 (위치)
Slope - 경사 기울기
Horizontal_Distance_To_Hydrology - 해수면까지의 수평거리
Vertical_Distance_To_Hydrology - 해수면까지의 수직거리
Horizontal_Distance_To_Roadways - 도로와의 수평 거리
Hillshade_9am (0 to 255 index) - 여름, 오전 9시 Hillshade
Hillshade_Noon (0 to 255 index) - 여름, 정오 Hillshade
Hillshade_3pm (0 to 255 index) - 여름, 오후 3시 Hillshade
Horizontal_Distance_To_Fire_Points - 산불 발화점까지 수평거리

Wilderness_Area

: 야생지역
- 4 개의 columns (토양 유형 지정)
+ 0 = 없음
+ 1 = 있음

Soil_Type

: 토양 유형 지정
- 40 개의 columns
+ 0 = 없음
+ 1 = 있음

Cover_Type

FCT 지정
br> - 7 개 columns

+ 0 = 없음

+ 1 = 있음

The wilderness areas are:

1 - Rawah Wilderness Area
2 - Neota Wilderness Area
3 - Comanche Peak Wilderness Area
4 - Cache la Poudre Wilderness Area

The soil types are:

1 Cathedral family - Rock outcrop complex, extremely stony.

2 Vanet - Ratake families complex, very stony.

3 Haploborolis - Rock outcrop complex, rubbly.

4 Ratake family - Rock outcrop complex, rubbly.

5 Vanet family - Rock outcrop complex complex, rubbly.

6 Vanet - Wetmore families - Rock outcrop complex, stony.

7 Gothic family. Na

8 Supervisor - Limber families complex.

9 Troutville family, very stony.

10 Bullwark - Catamount families - Rock outcrop complex, rubbly.

11 Bullwark - Catamount families - Rock land complex, rubbly.

12 Legault family - Rock land complex, stony.

13 Catamount family - Rock land - Bullwark family complex, rubbly.

14 Pachic Argiborolis - Aquolis complex.

15 unspecified in the USFS Soil and ELU Survey. (Na)

16 Cryaquolis - Cryoborolis complex.

17 Gateview family - Cryaquolis complex.

18 Rogert family, very stony.

19 Typic Cryaquolis - Borohemists complex.

20 Typic Cryaquepts - Typic Cryaquolls complex.

21 Typic Cryaquolls - Leighcan family, till substratum complex.

22 Leighcan family, till substratum, extremely bouldery.

23 Leighcan family, till substratum - Typic Cryaquolls complex.

24 Leighcan family, extremely stony.

25 Leighcan family, warm, extremely stony.

26 Granile - Catamount families complex, very stony.

27 Leighcan family, warm - Rock outcrop complex, extremely stony.

28 Leighcan family - Rock outcrop complex, extremely stony.

29 Como - Legault families complex, extremely stony.

30 Como family - Rock land - Legault family complex, extremely stony.

31 Leighcan - Catamount families complex, extremely stony.

32 Catamount family - Rock outcrop - Leighcan family complex, extremely stony.

33 Leighcan - Catamount families - Rock outcrop complex, extremely stony.

34 Cryorthents - Rock land complex, extremely stony.

35 Cryumbrepts - Rock outcrop - Cryaquepts complex.

36 Bross family - Rock land - Cryumbrepts complex, extremely stony.

37 Rock outcrop - Cryumbrepts - Cryorthents complex, extremely stony.

38 Leighcan - Moran families - Cryaquolls complex, extremely stony.

39 Moran family - Cryorthents - Leighcan family complex, extremely stony.

40 Moran family - Cryorthents - Rock land complex, extremely stony.

경사(Slope) : 어떤 지점의 지반이 수평을 기준으로 몇도 기울어져 있는가
- θ(theta) 로 표현
- 각이 클 수록 지반의 경사가 급하고 각이 0이면 평편한 지반
향(Aspect): 지반의 경사면이 어디를 향하는가
- 북: 0도, 동: 90도, 남: 180도, 서: 270도.
- 완전히 평편할 경우 GIS 시스템마다 다른 값, Null 가능, (-1과 같은 값이 적당)

Ref.

Evaluation

TPS12_Evaluation

각각의 ID 를 cover type 과 Matching하여 file format 형태를 만들어 제출 하면 됩니다.

2021-12-17 게시 됨2021-12-19 업데이트 됨R / data_science6분안에 읽기 (약 837 단어)

Text Mining in Python

data 불러오기

# ---- 데이터 불러오기 ----

library(ggplot2) # 시각화 코드
# install.packages("dplyr")
# install.packages("tidyr")
library(dplyr) # 데이터 가공
library(reshape) # 데이터 가공 <-- tidyr
library(readr) # 파일 입출력


raw_reviews = read_csv("data/Womens Clothing E-Commerce Reviews.csv") %>% select(-1)

# raw_reviews <- raw_reviews %>% select(-1)
glimpse(raw_reviews)

colnames(raw_reviews) <- c("ID", "Age", "Title", "Review", "Rating", "Recommend", "Liked", "Division", "Dept", "Class")

glimpse(raw_reviews) 

# age 리뷰 작성한 고객의 연령
# Title, Review Text 리뷰 제목, 내용
# Rating: 고객이 부여한 평점
# Recommend IND: 추천 여부
# Positive Feedback Count: 좋아요 수치
# Division, Dept, Class --> 상품의 대분류 정보

data 전처리

# ---- 데이터 전처리 ----
# 결측치 확인
colSums(is.na(raw_reviews))

table(raw_reviews$Age)

age_group = cut(as.numeric(raw_reviews$Age), 
                breaks = seq(10, 100, by = 10), 
                include.lowest = TRUE, 
                right = FALSE, 
                labels = paste0(seq(10, 90, by = 10), "th"))

age_group[1:10]

# 새로운 변수 추가
raw_reviews$age_group = age_group
table(raw_reviews$age_group)

# 감성 사전 데이터셋 변환
summary(raw_reviews$Liked)
table(raw_reviews$Liked)

# 층화추출? / 임의추출
idx = sample(1:nrow(raw_reviews), nrow(raw_reviews) * 0.1, replace = FALSE)

raw_reviews2 = raw_reviews[idx, ] 

raw_reviews2 %>% 
  mutate(pos_binary = ifelse(Liked > 0, 1, 0)) %>% # 이산형 변수로 변환
  select(Liked, pos_binary) -> pos_binary_df

pos_binary_df$pos_binary <- as.factor(pos_binary_df$pos_binary)

table(pos_binary_df$pos_binary) # 0 부정, 1 긍정

# ---- 키워드 데이터셋 생성
REVIEW_TEXT = as.character(raw_reviews2$Review)
REVIEW_TEXT = tolower(raw_reviews2$Review)

# 단어를 이어 붙인 후, 토큰화된 단어들로 문장 재구성
library(tokenizers)

TEXT_Token = c()
for(i in 1:length(REVIEW_TEXT)) {
  token_words = unlist(tokenize_word_stems(REVIEW_TEXT[i]))
  Sentence = ""
  
  for (tw in token_words) {
    Sentence = paste(Sentence, tw)
  }
  
  TEXT_Token[i] = Sentence
  
}

Text 전처리

# ---- 텍스트 전처리
library(tm)

Corpus_token = Corpus(VectorSource(TEXT_Token))
Corpus_tm_token = tm_map(Corpus_token, removePunctuation)
Corpus_tm_token = tm_map(Corpus_token, removeNumbers)
Corpus_tm_token = tm_map(Corpus_token, removeWords, c(stopwords("English")))


#TDM과 DTM 의 차이 (TDM :term Document Matrix)
# T=ODF . DTM = CountVectprozor(in Python)
DTM_Token = DocumentTermMatrix(Corpus_tm_token)
DTM_Matrix_Token = as.matrix(DTM_Token)

# 상위 키워드 추출
# quantile() 함수 활용
top_1_pct = colSums(DTM_Matrix_Token) > quantile(colSums(DTM_Matrix_Token), probs = 0.99)

DTM_Matrix_Token_selected = DTM_Matrix_Token[, top_1_pct]

ncol(DTM_Matrix_Token_selected)

#Error
DTM_df = as.data.frame(DTM_Matrix_Token_selected)
DTM_df

pos_final_df = cbind(pos_binary_df, DTM_df)

glimpse(pos_final_df)


#희소행렬 문제가 나타나게 된다.

ncol(pos_final_df)

훈련, 검증용 data 분류

# ---- 훈련 검증용 데이터 분류 ----
set.seed(1234)
idx = sample(1:nrow(pos_final_df), nrow(pos_final_df) * 0.7, replace = FALSE)
train = pos_final_df[idx, ]
test = pos_final_df[-idx, ]

Logistic Regression Model Develop

# --- 로지스틱 회귀 모형 개발 ---

start_time = Sys.time()

glm_model = step(glm(pos_binary ~ .,
                     data = train[-1],
                     family = binomial(link = "logit")),
                 direction = "backward") # 후진소거법

End_time = Sys.time()
difftime(End_time, start_time, units = "secs")

Step: AIC=2202.56

Logistic regression 안의 평가 기준
낮을 수록 좋다.

Step: AIC=2202.2
pos_binary ~ love + veri + just + size + dress + fit + will +
back + like + tri + flatter + top + length + realli + shirt +
materi

AIC_LogisticR

모형 성능 측정

# ---- 모형 성능 측정 ----
# install.packages("pROC")
library(pROC)
preds = predict(glm_model, newdata = test, type = "response")
roc_glm = roc(test$pos_binary, preds)
plot.roc(roc_glm, print.auc=TRUE)

R_calssification_pROC

정리

1. 정형 데이터 가져 오기 
2. 정형 데이터 가공
    - 좋아요 수를 활용하여 긍정/부정 data 나눔
3. 정형 데이터 분리 : 텍스트 데이터 따로 분리 
4. 텍스트 데이터 처리 (전처리, 토큰화, 코퍼스, DTM)
5. 텍스트 데이터 + 기존 data 합침
6. ML 모형 진행 (다른 모형을 진행 해도 된다. )

하지만, 혹시 지금까지 배운 내용이 너무 어렵다면 python으로만 하는 것도
나쁘지 않다.

2021-12-15 게시 됨2021-12-15 업데이트 됨python / machineLeaning19분안에 읽기 (약 2920 단어)

Text Mining in Python

개요

빅데이터 분석 및 시각화 & 텍스트 마이닝

Ref01_ Matplotlib 히스토그램 그리기
Ref02_ 딥 러닝을 이용한 자연어 처리 입문
네이버 쇼핑 리뷰 감성 분류하기(Naver Shopping Review Sentiment Analysis)

평가

다음은 네이버 쇼핑 리뷰 감성 분류하기 예제입니다.
빈칸에 # 코드 입력란에 적당한 코드를 작성하시기를 바랍니다.
각 빈칸당 10점입니다.

Colab에 Mecab 설치

# Colab에 Mecab 설치
!git clone https://github.com/SOMJANG/Mecab-ko-for-Google-Colab.git
%cd Mecab-ko-for-Google-Colab
!bash install_mecab-ko_on_colab190912.sh

Cloning into 'Mecab-ko-for-Google-Colab'...
remote: Enumerating objects: 91, done.[K
remote: Total 91 (delta 0), reused 0 (delta 0), pack-reused 91[K
Unpacking objects: 100% (91/91), done.
/content/Mecab-ko-for-Google-Colab
Installing konlpy.....
Collecting konlpy
  Downloading konlpy-0.5.2-py2.py3-none-any.whl (19.4 MB)
[K     |████████████████████████████████| 19.4 MB 2.4 MB/s 
[?25hCollecting JPype1>=0.7.0
  Downloading JPype1-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (448 kB)
[K     |████████████████████████████████| 448 kB 23.5 MB/s 
[?25hRequirement already satisfied: lxml>=4.1.0 in /usr/local/lib/python3.7/dist-packages (from konlpy) (4.2.6)
Collecting colorama
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: tweepy>=3.7.0 in /usr/local/lib/python3.7/dist-packages (from konlpy) (3.10.0)
Requirement already satisfied: numpy>=1.6 in /usr/local/lib/python3.7/dist-packages (from konlpy) (1.19.5)
Collecting beautifulsoup4==4.6.0
  Downloading beautifulsoup4-4.6.0-py3-none-any.whl (86 kB)
[K     |████████████████████████████████| 86 kB 2.4 MB/s 
[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from JPype1>=0.7.0->konlpy) (3.10.0.2)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from tweepy>=3.7.0->konlpy) (1.3.0)
Requirement already satisfied: requests[socks]>=2.11.1 in /usr/local/lib/python3.7/dist-packages (from tweepy>=3.7.0->konlpy) (2.23.0)
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.7/dist-packages (from tweepy>=3.7.0->konlpy) (1.15.0)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->tweepy>=3.7.0->konlpy) (3.1.1)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (2021.10.8)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (2.10)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (1.7.1)
Installing collected packages: JPype1, colorama, beautifulsoup4, konlpy
  Attempting uninstall: beautifulsoup4
    Found existing installation: beautifulsoup4 4.6.3
    Uninstalling beautifulsoup4-4.6.3:
      Successfully uninstalled beautifulsoup4-4.6.3
Successfully installed JPype1-1.3.0 beautifulsoup4-4.6.0 colorama-0.4.4 konlpy-0.5.2
Done
Installing mecab-0.996-ko-0.9.2.tar.gz.....
Downloading mecab-0.996-ko-0.9.2.tar.gz.......
from https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz
--2021-12-15 08:19:45--  https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz
Resolving bitbucket.org (bitbucket.org)... 104.192.141.1, 2406:da00:ff00::22c0:3470, 2406:da00:ff00::22e9:9f55, ...
Connecting to bitbucket.org (bitbucket.org)|104.192.141.1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bbuseruploads.s3.amazonaws.com/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz?Signature=Djk%2BX4VYfoZUGHzDRgTrcVVdFvE%3D&Expires=1639557778&AWSAccessKeyId=AKIA6KOSE3BNJRRFUUX6&versionId=null&response-content-disposition=attachment%3B%20filename%3D%22mecab-0.996-ko-0.9.2.tar.gz%22&response-content-encoding=None [following]
--2021-12-15 08:19:46--  https://bbuseruploads.s3.amazonaws.com/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz?Signature=Djk%2BX4VYfoZUGHzDRgTrcVVdFvE%3D&Expires=1639557778&AWSAccessKeyId=AKIA6KOSE3BNJRRFUUX6&versionId=null&response-content-disposition=attachment%3B%20filename%3D%22mecab-0.996-ko-0.9.2.tar.gz%22&response-content-encoding=None
Resolving bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)... 52.216.113.163
Connecting to bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)|52.216.113.163|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1414979 (1.3M) [application/x-tar]
Saving to: ‘mecab-0.996-ko-0.9.2.tar.gz’

mecab-0.996-ko-0.9. 100%[===================>]   1.35M  1.07MB/s    in 1.3s    

2021-12-15 08:19:48 (1.07 MB/s) - ‘mecab-0.996-ko-0.9.2.tar.gz’ saved [1414979/1414979]

Done
Unpacking mecab-0.996-ko-0.9.2.tar.gz.......
Done
Change Directory to mecab-0.996-ko-0.9.2.......
installing mecab-0.996-ko-0.9.2.tar.gz........
configure
make
make check
make install
ldconfig
Done
Change Directory to /content
Downloading mecab-ko-dic-2.1.1-20180720.tar.gz.......
from https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
--2021-12-15 08:21:19--  https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
Resolving bitbucket.org (bitbucket.org)... 104.192.141.1, 2406:da00:ff00::6b17:d1f5, 2406:da00:ff00::22cd:e0db, ...
Connecting to bitbucket.org (bitbucket.org)|104.192.141.1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bbuseruploads.s3.amazonaws.com/a4fcd83e-34f1-454e-a6ac-c242c7d434d3/downloads/b5a0c703-7b64-45ed-a2d7-180e962710b6/mecab-ko-dic-2.1.1-20180720.tar.gz?Signature=ZNAR2x6%2FNWxJ4p%2BOkG%2BjdG77Dqk%3D&Expires=1639558279&AWSAccessKeyId=AKIA6KOSE3BNJRRFUUX6&versionId=tzyxc1TtnZU_zEuaaQDGN4F76hPDpyFq&response-content-disposition=attachment%3B%20filename%3D%22mecab-ko-dic-2.1.1-20180720.tar.gz%22&response-content-encoding=None [following]
--2021-12-15 08:21:19--  https://bbuseruploads.s3.amazonaws.com/a4fcd83e-34f1-454e-a6ac-c242c7d434d3/downloads/b5a0c703-7b64-45ed-a2d7-180e962710b6/mecab-ko-dic-2.1.1-20180720.tar.gz?Signature=ZNAR2x6%2FNWxJ4p%2BOkG%2BjdG77Dqk%3D&Expires=1639558279&AWSAccessKeyId=AKIA6KOSE3BNJRRFUUX6&versionId=tzyxc1TtnZU_zEuaaQDGN4F76hPDpyFq&response-content-disposition=attachment%3B%20filename%3D%22mecab-ko-dic-2.1.1-20180720.tar.gz%22&response-content-encoding=None
Resolving bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)... 54.231.82.195
Connecting to bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)|54.231.82.195|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49775061 (47M) [application/x-tar]
Saving to: ‘mecab-ko-dic-2.1.1-20180720.tar.gz’

mecab-ko-dic-2.1.1- 100%[===================>]  47.47M  13.0MB/s    in 4.5s    

2021-12-15 08:21:25 (10.5 MB/s) - ‘mecab-ko-dic-2.1.1-20180720.tar.gz’ saved [49775061/49775061]

Done
Unpacking  mecab-ko-dic-2.1.1-20180720.tar.gz.......
Done
Change Directory to mecab-ko-dic-2.1.1-20180720
Done
installing........
configure
make
make install
apt-get update
apt-get upgrade
apt install curl
apt install git
bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh)
Done
Successfully Installed
Now you can use Mecab
from konlpy.tag import Mecab
mecab = Mecab()
사용자 사전 추가 방법 : https://bit.ly/3k0ZH53
NameError: name 'Tagger' is not defined 오류 발생 시 런타임을 재실행 해주세요
블로그에 해결 방법을 남겨주신 tana님 감사합니다.

네이버 쇼핑 리뷰 데이터에 대한 이해와 전처리

import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import urllib.request
from collections import Counter
from konlpy.tag import Mecab
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

데이터 불러오기

1	urllib.request.urlretrieve("https://raw.githubusercontent.com/bab2min/corpus/master/sentiment/naver_shopping.txt", filename="ratings_total.txt")

('ratings_total.txt', <http.client.HTTPMessage at 0x7f7d3557f750>)

해당 데이터에는 열 제목이 별도로 없음. 그래서 임의로 두 개의 열제목인 “ratings”와 “reviews” 추가

1
2
3

# (1) 데이터 불러오고, 전체 리뷰 개수 출력 # 200,000
totalDt = pd.read_table('ratings_total.txt', names=['ratings', 'reviews'])
print('전체 리뷰 개수 :',len(totalDt)) # 전체 리뷰 개수 출력

전체 리뷰 개수 : 200000

1	totalDt[:5]

	ratings	reviews
0	5	배공빠르고 굿
1	2	택배가 엉망이네용 저희집 밑에층에 말도없이 놔두고가고
2	5	아주좋아요 바지 정말 좋아서2개 더 구매했어요 이가격에 대박입니다. 바느질이 조금 ...
3	2	선물용으로 빨리 받아서 전달했어야 하는 상품이었는데 머그컵만 와서 당황했습니다. 전...
4	5	민트색상 예뻐요. 옆 손잡이는 거는 용도로도 사용되네요 ㅎㅎ

훈련 데이터와 테스트 데이터 분리하기

1 2	totalDt['label'] = np.select([totalDt.ratings > 3], [1], default=0) totalDt[:5]

	ratings	reviews	label
0	5	배공빠르고 굿	1
1	2	택배가 엉망이네용 저희집 밑에층에 말도없이 놔두고가고	0
2	5	아주좋아요 바지 정말 좋아서2개 더 구매했어요 이가격에 대박입니다. 바느질이 조금 ...	1
3	2	선물용으로 빨리 받아서 전달했어야 하는 상품이었는데 머그컵만 와서 당황했습니다. 전...	0
4	5	민트색상 예뻐요. 옆 손잡이는 거는 용도로도 사용되네요 ㅎㅎ	1

각 열에 대해서 중복을 제외한 샘플의 수 카운트

1	totalDt['ratings'].nunique(), totalDt['reviews'].nunique(), totalDt['label'].nunique()

(4, 199908, 2)

ratings열의 경우 1, 2, 4, 5라는 네 가지 값을 가지고 있습니다. reviews열에서 중복을 제외한 경우 199,908개입니다. 현재 20만개의 리뷰가 존재하므로 이는 현재 갖고 있는 데이터에 중복인 샘플들이 있다는 의미입니다. 중복인 샘플들을 제거해줍니다.

1
2
3

# (2) review열에서 중복 데이터 제거 drop_duplicates() 함수 활용
totalDt.drop_duplicates(subset=['reviews'], inplace=True)
print('총 샘플의 수 :',len(totalDt))

총 샘플의 수 : 199908

NULL 값 유무 확인

1	print(totalDt.isnull().values.any())

False

훈련 데이터와 테스트 데이터를 3:1 비율로 분리

1
2
3

train_data, test_data = train_test_split(totalDt, test_size = 0.25, random_state = 42)
print('훈련용 리뷰의 개수 :', len(train_data))
print('테스트용 리뷰의 개수 :', len(test_data))

훈련용 리뷰의 개수 : 149931
테스트용 리뷰의 개수 : 49977

레이블의 분포 확인

# (3) label 1, 0 막대그래프 그리기
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(1,1,figsize=(7,5))
width = 0.15

plot_Dt= train_data['label'].value_counts().plot(kind = 'bar', color='orange', edgecolor='black').legend()

plt.title('train_data',fontsize=20) ## 타이틀 출력
plt.ylabel('Count',fontsize=10) ## y축 라벨 출력
plt.show()

train_data

1	print(train_data.groupby('label').size().reset_index(name = 'count'))

   label  count
0      0  74918
1      1  75013

두 레이블 모두 약 7만 5천개로 50:50 비율을 가짐

데이터 정제하기

정규 표현식을 사용하여 한글을 제외하고 모두 제거해줍니다.

# 한글과 공백을 제외하고 모두 제거
# (4) 한글 및 공백 제외한 모든 글자 제거
train_data['reviews'] = train_data['reviews'].str.replace("[^ㄱ-ㅎㅏ-ㅣ가-힣 ]","")
train_data['reviews'].replace('', np.nan, inplace=True)
print(train_data.isnull().sum())

ratings    0
reviews    0
label      0
dtype: int64

테스트 데이터에 대해서도 같은 과정을 거칩니다.

# (5) 데스트 데이터에 적용하기
# 코드 1 중복 제거
# 코드 2 정규 표현식 수행
# 코드 3 공백은 Null 값으로 변경
# 코드 4 Null 값 제거
test_data.drop_duplicates(subset = ['reviews'], inplace=True) # 중복 제거
test_data['reviews'] = test_data['reviews'].str.replace("[^ㄱ-ㅎㅏ-ㅣ가-힣 ]","") # 정규 표현식 수행
test_data['reviews'].replace('', np.nan, inplace=True) # 공백은 Null 값으로 변경
test_data = test_data.dropna(how='any') # Null 값 제거
print('전처리 후 테스트용 샘플의 개수 :',len(test_data))

전처리 후 테스트용 샘플의 개수 : 49977

토큰화

형태소 분석기 Mecab을 사용하여 토큰화 작업을 수행한다.

1
2
3

# (6) Mecab 클래스 호출하기
mecab = Mecab()
print(mecab.morphs('와 이런 것도 상품이라고 차라리 내가 만드는 게 나을 뻔'))

['와', '이런', '것', '도', '상품', '이', '라고', '차라리', '내', '가', '만드', '는', '게', '나을', '뻔']

불용어를 지정하여 필요없는 토큰들을 제거하도록 한다.

1
2

# (7) 불용어 만들기
stopwords = ['도', '는', '다', '의', '가', '이', '은', '한', '에', '하', '고', '을', '를', '인', '듯', '과', '와', '네', '들', '듯', '지', '임', '게']

훈련 데이터와 테스트 데이터에 대해서 동일한 과정을 거친다.

1 2	train_data['tokenized'] = train_data['reviews'].apply(mecab.morphs) train_data['tokenized'] = train_data['tokenized'].apply(lambda x: [item for item in x if item not in stopwords])

1 2	test_data['tokenized'] = test_data['reviews'].apply(mecab.morphs) test_data['tokenized'] = test_data['tokenized'].apply(lambda x: [item for item in x if item not in stopwords])

단어와 길이 분포 확인하기

긍정 리뷰에는 주로 어떤 단어들이 많이 등장하고, 부정 리뷰에는 주로 어떤 단어들이 등장하는지 두 가지 경우에 대해서 각 단어의 빈도수를 계산해보겠습니다. 각 레이블에 따라서 별도로 단어들의 리스트를 저장해줍니다.

negative_W = np.hstack(train_data[train_data.label == 0]['tokenized'].values)
positive_W = np.hstack(train_data[train_data.label == 1]['tokenized'].values)
negative_W
positive_W

array(['적당', '만족', '합니다', ..., '잘', '삿', '어요'], dtype='<U25')

Counter()를 사용하여 각 단어에 대한 빈도수를 카운트한다. 우선 부정 리뷰에 대해서 빈도수가 높은 상위 20개 단어 출력

1 2	negative_word_count = Counter(negative_W) print(negative_word_count.most_common(20))

[('네요', 31799), ('는데', 20295), ('안', 19718), ('어요', 14849), ('있', 13200), ('너무', 13058), ('했', 11783), ('좋', 9812), ('배송', 9677), ('같', 8997), ('구매', 8876), ('어', 8869), ('거', 8854), ('없', 8670), ('아요', 8642), ('습니다', 8436), ('그냥', 8355), ('되', 8345), ('잘', 8029), ('않', 7984)]

‘네요’, ‘는데’, ‘안’, ‘않’, ‘너무’, ‘없’ 등과 같은 단어들이 부정 리뷰에서 주로 등장합니다. 긍정 리뷰에 대해서도 동일하게 출력해봅시다.

1 2	positive_word_count = Counter(positive_W) print(positive_word_count.most_common(20))

[('좋', 39488), ('아요', 21184), ('네요', 19895), ('어요', 18686), ('잘', 18602), ('구매', 16171), ('습니다', 13320), ('있', 12391), ('배송', 12275), ('는데', 11670), ('했', 9818), ('합니다', 9801), ('먹', 9635), ('재', 9273), ('너무', 8397), ('같', 7868), ('만족', 7261), ('거', 6482), ('어', 6294), ('쓰', 6292)]

‘좋’, ‘아요’, ‘네요’, ‘잘’, ‘너무’, ‘만족’ 등과 같은 단어들이 주로 많이 등장합니다. 두 가지 경우에 대해서 각각 길이 분포를 확인해봅시다.

# (8) 긍정 리뷰와 부정 리뷰 히스토그램 작성하기

fig,(ax1,ax2) = plt.subplots(1,2,figsize=(9,5))
text_len = train_data[train_data['label']==1]['tokenized'].map(lambda x: len(x))
ax1.hist(text_len, color='pink', edgecolor='black')
ax1.set_title('Positive Reviews')
ax1.set_xlabel('length of samples')
ax1.set_ylabel('number of samples')
print('긍정 리뷰의 평균 길이 :', np.mean(text_len))

text_len = train_data[train_data['label']==0]['tokenized'].map(lambda x: len(x))
ax2.hist(text_len, color='skyblue', edgecolor='black')
ax2.set_title('부정 리뷰')
ax2.set_title('Negative Reviews')
fig.suptitle('Words in texts')
ax2.set_xlabel('length of samples')
ax2.set_ylabel('number of samples')
print('부정 리뷰의 평균 길이 :', np.mean(text_len))
plt.show()

긍정 리뷰의 평균 길이 : 13.5877381253916
부정 리뷰의 평균 길이 : 17.02948557089084

Review_Histogram

긍정 리뷰보다는 부정 리뷰가 좀 더 길게 작성된 경향이 있는 것 같다.

X_train = train_data['tokenized'].values
y_train = train_data['label'].values
X_test= test_data['tokenized'].values
y_test = test_data['label'].values

정수 인코딩

이제 기계가 텍스트를 숫자로 처리할 수 있도록 훈련 데이터와 테스트 데이터에 정수 인코딩을 수행해야 합니다. 우선, 훈련 데이터에 대해서 단어 집합(vocaburary)을 만들어봅시다.

1
2
3

# (9) 정수 인코딩 클래스 호출 및 X_train 데이터에 적합하기
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)

단어 집합이 생성되는 동시에 각 단어에 고유한 정수가 부여되었습니다. 이는 tokenizer.word_index를 출력하여 확인 가능합니다. 등장 횟수가 1회인 단어들은 자연어 처리에서 배제하고자 합니다. 이 단어들이 이 데이터에서 얼만큼의 비중을 차지하는지 확인해봅시다.

threshold = 2
total_cnt = len(tokenizer.word_index) # 단어의 수
rare_cnt = 0 # 등장 빈도수가 threshold보다 작은 단어의 개수를 카운트
total_freq = 0 # 훈련 데이터의 전체 단어 빈도수 총 합
rare_freq = 0 # 등장 빈도수가 threshold보다 작은 단어의 등장 빈도수의 총 합

# 단어와 빈도수의 쌍(pair)을 key와 value로 받는다.
for key, value in tokenizer.word_counts.items():
    total_freq = total_freq + value

    # 단어의 등장 빈도수가 threshold보다 작으면
    if(value < threshold):
        rare_cnt = rare_cnt + 1
        rare_freq = rare_freq + value

print('단어 집합(vocabulary)의 크기 :',total_cnt)
print('등장 빈도가 %s번 이하인 희귀 단어의 수: %s'%(threshold - 1, rare_cnt))
print("단어 집합에서 희귀 단어의 비율:", (rare_cnt / total_cnt)*100)
print("전체 등장 빈도에서 희귀 단어 등장 빈도 비율:", (rare_freq / total_freq)*100)

단어 집합(vocabulary)의 크기 : 39998
등장 빈도가 1번 이하인 희귀 단어의 수: 18213
단어 집합에서 희귀 단어의 비율: 45.53477673883694
전체 등장 빈도에서 희귀 단어 등장 빈도 비율: 0.7935698749320282

단어가 약 40,000개가 존재합니다. 등장 빈도가 threshold 값인 2회 미만. 즉, 1회인 단어들은 단어 집합에서 약 45%를 차지합니다. 하지만, 실제로 훈련 데이터에서 등장 빈도로 차지하는 비중은 매우 적은 수치인 약 0.8%밖에 되지 않습니다. 아무래도 등장 빈도가 1회인 단어들은 자연어 처리에서 별로 중요하지 않을 듯 합니다. 그래서 이 단어들은 정수 인코딩 과정에서 배제시키겠습니다.

등장 빈도수가 1인 단어들의 수를 제외한 단어의 개수를 단어 집합의 최대 크기로 제한하겠습니다.

# 전체 단어 개수 중 빈도수 2이하인 단어 개수는 제거.
# 0번 패딩 토큰과 1번 OOV 토큰을 고려하여 +2
vocab_size = total_cnt - rare_cnt + 2
print('단어 집합의 크기 :',vocab_size)

단어 집합의 크기 : 21787

이제 단어 집합의 크기는 21,787개입니다. 이를 토크나이저의 인자로 넘겨주면, 토크나이저는 텍스트 시퀀스를 숫자 시퀀스로 변환합니다. 이러한 정수 인코딩 과정에서 이보다 큰 숫자가 부여된 단어들은 OOV로 변환하겠습니다.

# (10) 토크나이저 클래스 호출 및 OOV 변환 코드 작성
# 코드 1
# 코드 2

tokenizer = Tokenizer(vocab_size, oov_token = 'OOV') 
tokenizer.fit_on_texts(X_train)

X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)

정수 인코딩이 진행되었는지 확인하고자 X_train과 X_test에 대해서 상위 3개의 샘플만 출력합니다.

1	print(X_train[:3])

[[67, 2060, 299, 14259, 263, 73, 6, 236, 168, 137, 805, 2951, 625, 2, 77, 62, 207, 40, 1343, 155, 3, 6], [482, 409, 52, 8530, 2561, 2517, 339, 2918, 250, 2357, 38, 473, 2], [46, 24, 825, 105, 35, 2372, 160, 7, 10, 8061, 4, 1319, 29, 140, 322, 41, 59, 160, 140, 7, 1916, 2, 113, 162, 1379, 323, 119, 136]]

1	print(X_test[:3])

[[14, 704, 767, 116, 186, 252, 12], [339, 3904, 62, 3816, 1651], [11, 69, 2, 49, 164, 3, 27, 15, 6, 1, 513, 289, 17, 92, 110, 564, 59, 7, 2]]

패딩

이제 서로 다른 길이의 샘플들의 길이를 동일하게 맞춰주는 패딩 작업을 진행해보겠습니다. 전체 데이터에서 가장 길이가 긴 리뷰와 전체 데이터의 길이 분포를 알아보겠습니다.

print('리뷰의 최대 길이 :',max(len(l) for l in X_train))
print('리뷰의 평균 길이 :',sum(map(len, X_train))/len(X_train))
plt.hist([len(s) for s in X_train], bins=35, label='bins=35', color="skyblue")
plt.xlabel('length of samples')
plt.ylabel('number of samples')
plt.show()

리뷰의 최대 길이 : 85
리뷰의 평균 길이 : 15.307521459871541

LengthOfReview

리뷰의 최대 길이는 85, 평균 길이는 약 15입니다.

그리고 그래프로 봤을 때, 전체적으로는 60이하의 길이를 가지는 것으로 보입니다.

def below_threshold_len(max_len, nested_list):
  count = 0
  for sentence in nested_list:
    if(len(sentence) <= max_len):
        count = count + 1
  print('전체 샘플 중 길이가 %s 이하인 샘플의 비율: %s'%(max_len, (count / len(nested_list))*100))

최대 길이가 85이므로 만약 80으로 패딩할 경우, 몇 개의 샘플들을 온전히 보전할 수 있는지 확인해봅시다.

1 2	max_len = 80 below_threshold_len(max_len, X_train)

전체 샘플 중 길이가 80 이하인 샘플의 비율: 99.99933302652553

훈련용 리뷰의 99.99%가 80이하의 길이를 가집니다. 훈련용 리뷰를 길이 80으로 패딩하겠습니다.

1 2	X_train = pad_sequences(X_train, maxlen = max_len) X_test = pad_sequences(X_test, maxlen = max_len)

GRU로 네이버 쇼핑 리뷰 감성 분류하기

from tensorflow.keras.layers import Embedding, Dense, GRU
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import load_model
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

embedding_dim = 100
hidden_units = 128

model = Sequential()
model.add(Embedding(vocab_size, embedding_dim))
model.add(GRU(hidden_units))
model.add(Dense(1, activation='sigmoid'))

es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=4)
mc = ModelCheckpoint('best_model.h5', monitor='val_acc', mode='max', verbose=1, save_best_only=True)

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model.fit(X_train, y_train, epochs=15, callbacks=[es, mc], batch_size=64, validation_split=0.2)

def sentiment_predict(new_sentence):
  new_sentence = re.sub(r'[^ㄱ-ㅎㅏ-ㅣ가-힣 ]','', new_sentence)
  new_sentence = mecab.morphs(new_sentence) # 토큰화
  new_sentence = [word for word in new_sentence if not word in stopwords] # 불용어 제거
  encoded = tokenizer.texts_to_sequences([new_sentence]) # 정수 인코딩
  pad_new = pad_sequences(encoded, maxlen = max_len) # 패딩

  score = float(model.predict(pad_new)) # 예측
  if(score > 0.5):
    print("{:.2f}% 확률로 긍정 리뷰입니다.".format(score * 100))
  else:
    print("{:.2f}% 확률로 부정 리뷰입니다.".format((1 - score) * 100))

Epoch 1/15
1875/1875 [==============================] - ETA: 0s - loss: 0.2725 - acc: 0.8967
Epoch 00001: val_acc improved from -inf to 0.91916, saving model to best_model.h5
1875/1875 [==============================] - 54s 25ms/step - loss: 0.2725 - acc: 0.8967 - val_loss: 0.2301 - val_acc: 0.9192
Epoch 2/15
1875/1875 [==============================] - ETA: 0s - loss: 0.2158 - acc: 0.9213
Epoch 00002: val_acc improved from 0.91916 to 0.92240, saving model to best_model.h5
1875/1875 [==============================] - 43s 23ms/step - loss: 0.2158 - acc: 0.9213 - val_loss: 0.2137 - val_acc: 0.9224
Epoch 3/15
1875/1875 [==============================] - ETA: 0s - loss: 0.1985 - acc: 0.9289
Epoch 00003: val_acc improved from 0.92240 to 0.92637, saving model to best_model.h5
1875/1875 [==============================] - 44s 24ms/step - loss: 0.1985 - acc: 0.9289 - val_loss: 0.2060 - val_acc: 0.9264
Epoch 4/15
1873/1875 [============================>.] - ETA: 0s - loss: 0.1878 - acc: 0.9332
Epoch 00004: val_acc did not improve from 0.92637
1875/1875 [==============================] - 43s 23ms/step - loss: 0.1878 - acc: 0.9332 - val_loss: 0.2031 - val_acc: 0.9260
Epoch 5/15
1874/1875 [============================>.] - ETA: 0s - loss: 0.1783 - acc: 0.9369
Epoch 00005: val_acc improved from 0.92637 to 0.92670, saving model to best_model.h5
1875/1875 [==============================] - 46s 24ms/step - loss: 0.1783 - acc: 0.9369 - val_loss: 0.2030 - val_acc: 0.9267
Epoch 6/15
1873/1875 [============================>.] - ETA: 0s - loss: 0.1698 - acc: 0.9405
Epoch 00006: val_acc improved from 0.92670 to 0.92764, saving model to best_model.h5
1875/1875 [==============================] - 44s 24ms/step - loss: 0.1697 - acc: 0.9405 - val_loss: 0.2055 - val_acc: 0.9276
Epoch 7/15
1873/1875 [============================>.] - ETA: 0s - loss: 0.1611 - acc: 0.9436
Epoch 00007: val_acc did not improve from 0.92764
1875/1875 [==============================] - 44s 24ms/step - loss: 0.1610 - acc: 0.9437 - val_loss: 0.2098 - val_acc: 0.9244
Epoch 8/15
1875/1875 [==============================] - ETA: 0s - loss: 0.1526 - acc: 0.9473
Epoch 00008: val_acc did not improve from 0.92764
1875/1875 [==============================] - 44s 23ms/step - loss: 0.1526 - acc: 0.9473 - val_loss: 0.2269 - val_acc: 0.9189
Epoch 9/15
1875/1875 [==============================] - ETA: 0s - loss: 0.1435 - acc: 0.9507
Epoch 00009: val_acc did not improve from 0.92764
1875/1875 [==============================] - 44s 24ms/step - loss: 0.1435 - acc: 0.9507 - val_loss: 0.2258 - val_acc: 0.9204
Epoch 00009: early stopping

1	sentiment_predict('이 상품 진짜 싫어요... 교환해주세요')

99.03% 확률로 부정 리뷰입니다.

1	sentiment_predict('이 상품 진짜 좋아여... 강추합니다. ')

99.51% 확률로 긍정 리뷰입니다.

2021-12-15 게시 됨2021-12-14 업데이트 됨R / data_science1분안에 읽기 (약 115 단어)

Text Mining in R

Text Mining in R (03)

앞선 내용 : Text Mining in R (01) : library(KoNLP), useNIADic() 사용/설치 확인 Text Mining in R (02): Rcppmecab 설치, 확인

다음 내용 :

Lecture

앞서서 설치한 files 바탕으로 TextMining을 해 보자.

data 수집

data 전처리

정규표현식

cheatsheets

Tokenize

tidytextmining

1	install.packages("tidytext")

2021-12-14 게시 됨2021-12-14 업데이트 됨R / data_science8분안에 읽기 (약 1150 단어)

Text Mining in R(02)

Text Mining in R (02)

앞선 내용 : Text Mining in R (01): library(KoNLP), useNIADic() 사용/설치

다음 내용 :

Text Mining in R (03)

§ MeCab 설치

Mecab-ko 형태소 분석기 사용 위해서는 Rcppmecab 패키지가 있어야함.

RcppMeCab install file URL:

해당 깃허브에서 설치해야 할 파일을 다운로드 받은 후,

RcppMeCab_zipfiles

압축 해제 시에 C drive 에서 mecab folder 생성
오른쪽 버튼 클릭 후 여기에압출풀기를 선택하면 쉽다.

이 과정에서
위의 file내의 폴더 형태와, file 명, 경로 가 같지 않으면 다음과 같은 에러가 난다.

Exception:
list()

경로, file명 등을 확인 하기 바란다.
오류 해결 참조

§ R 에서 설치

# library(remotes)
remotes::install_github("junhewk/RcppMeCab", force = TRUE)

library(RcppMeCab)

# library(remotes)
remotes::install_github(“junhewk/RcppMeCab”, force = TRUE)
Downloading GitHub repo junhewk/RcppMeCab@HEAD
Installing 2 packages: BH, RcppParallel
‘C:/Users/brill/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/BH_1.75.0-0.zip'
Content type ‘application/zip’ length 19675040 bytes (18.8 MB)
downloaded 18.8 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/RcppParallel_5.1.4.zip'
Content type ‘application/zip’ length 2140731 bytes (2.0 MB)
downloaded 2.0 MB

package ‘BH’ successfully unpacked and MD5 sums checked
package ‘RcppParallel’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\brill\AppData\Local\Temp\RtmpmuDZXg\downloaded_packages
√ checking for file ‘C:\Users\brill\AppData\Local\Temp\RtmpmuDZXg\remotes2cd0f4c5d4d\junhewk-RcppMeCab-e1800aa/DESCRIPTION’ (414ms)

preparing ‘RcppMeCab’: (373ms)
√ checking DESCRIPTION meta-information …
cleaning src
checking for LF line-endings in source and make files and shell scripts
checking for empty or unneeded directories
Omitted ‘LazyData’ from DESCRIPTION
building ‘RcppMeCab_0.0.1.3-2.tar.gz’

‘C:/Users/brill/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)

installing source package ‘RcppMeCab’ …
- using staged installation
- libs
  “C:/rtools40/mingw64/bin/“g++ -std=gnu++11 -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -I../inst/include -DBOOST_NO_AUTO_PTR -I’C:/Users/brill/Documents/R/win-library/4.1/Rcpp/include’ -I’C:/Users/brill/Documents/R/win-library/4.1/RcppParallel/include’ -I’C:/Users/brill/Documents/R/win-library/4.1/BH/include’ -DRCPP_PARALLEL_USE_TBB=1 -DDLL_IMPORT -DSTRICT_R_HEADERS -Wno-parentheses -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c RcppExports.cpp -o RcppExports.o
  “C:/rtools40/mingw64/bin/“g++ -std=gnu++11 -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -I../inst/include -DBOOST_NO_AUTO_PTR -I’C:/Users/brill/Documents/R/win-library/4.1/Rcpp/include’ -I’C:/Users/brill/Documents/R/win-library/4.1/RcppParallel/include’ -I’C:/Users/brill/Documents/R/win-library/4.1/BH/include’ -DRCPP_PARALLEL_USE_TBB=1 -DDLL_IMPORT -DSTRICT_R_HEADERS -Wno-parentheses -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c posParallelRcpp.cpp -o posParallelRcpp.o
  “C:/rtools40/mingw64/bin/“g++ -std=gnu++11 -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -I../inst/include -DBOOST_NO_AUTO_PTR -I’C:/Users/brill/Documents/R/win-library/4.1/Rcpp/include’ -I’C:/Users/brill/Documents/R/win-library/4.1/RcppParallel/include’ -I’C:/Users/brill/Documents/R/win-library/4.1/BH/include’ -DRCPP_PARALLEL_USE_TBB=1 -DDLL_IMPORT -DSTRICT_R_HEADERS -Wno-parentheses -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c posRcpp.cpp -o posRcpp.o
  “C:/rtools40/mingw64/bin/“g++ -std=gnu++11 -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -I../inst/include -DBOOST_NO_AUTO_PTR -I’C:/Users/brill/Documents/R/win-library/4.1/Rcpp/include’ -I’C:/Users/brill/Documents/R/win-library/4.1/RcppParallel/include’ -I’C:/Users/brill/Documents/R/win-library/4.1/BH/include’ -DRCPP_PARALLEL_USE_TBB=1 -DDLL_IMPORT -DSTRICT_R_HEADERS -Wno-parentheses -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c posloopRcpp.cpp -o posloopRcpp.o
  C:/rtools40/mingw64/bin/g++ -shared -s -static-libgcc -o RcppMeCab.dll tmp.def RcppExports.o posParallelRcpp.o posRcpp.o posloopRcpp.o -L../inst/libs/x64 -LC:/Users/brill/Documents/R/win-library/4.1/RcppParallel/lib/x64 -ltbb -ltbbmalloc -lm -llibmecab -LC:/PROGRA~~1/R/R-41~~1.2/bin/x64 -lR
  installing to C:/Users/brill/Documents/R/win-library/4.1/00LOCK-RcppMeCab/00new/RcppMeCab/libs/x64
- R
- inst
- byte-compile and prepare package for lazy loading
- help
** installing help indices
converting help for package ‘RcppMeCab’
finding HTML links … done
RcppMeCab html
pos html
posParallel html
- building package indices
- testing if installed package can be loaded from temporary location
- testing if installed package can be loaded from final location
- testing if installed package keeps a record of temporary installation path
DONE (RcppMeCab)

RcppMeCab 설치 확인 (형태소 분리기)

text 1에 한글을 써 본다.

1 2	text1 = "안녕하세요?!" pos(sentence = text1)

text1 = “안녕하세요?!”
pos(sentence = text1)
$�ȳ\xe7\xc7ϼ��\xe4?!
[1] “�/SY” “ȳ/SL” “\xe7\xc7\xcf/SH”
[4] “��/SY” “\xe4?!/SH”

- 인코딩이 UTF-8로 되어 있지 안아서 생기는 문제이다.

1 2	text2 = enc2utf8(text1) pos(sentence = text2)

text2 = enc2utf8(text1)

pos(sentence = text2)

$안녕하세요?!

[1] “안녕/NNG” “하/XSV” “세요/EP+EF” “?/SF” “!/SF”

강사님 도움 받기
강사님 강의 듣기

페이가 안맞아서 그런가 우리 수업에서는 이렇게 안해준다.
못가르치는 것이 아니라 안가르치는 것이어서 화가 나지만, 각자의 사정이 있는것이겠지.
나도 국비 과정 들으면서 너무 많은 것을 바란건 아닌지 생각 해 본다.

설치/ 확인 끝

2021-12-14 게시 됨2021-12-14 업데이트 됨R / data_science38분안에 읽기 (약 5680 단어)

Text Mining in R(01)

R을 이용한 TextMining

NLP : 자연어
NLP slide 110p, python코드 책 3권 참조
딥 러닝을 이용한 자연어 처리 입문/Ko
- 1. 토픽 모델링(Topic Modeling) 까지는 가능
다음 내용 :
Text Mining in R (02)

** R Install에 관한 내용은
여기 있다.

빅카인즈 (Korea)

Text data 분석 하는 곳
bigkinds/ko

감정분석

댓글에서 부정/ 긍정 에 대해 확인

R 환경 설정

install.packaged("multilinguer")
#위에 Install이 안되면, 아래 것으로 설치 

install.packages("remotes")
remotes::install_github("mrchypark/multilinguer")

install_jdk()
#자바 설치가 자동으로 path 설정 까지 될 수 있도록 해줌

package ‘rJava’ successfully unpacked and MD5 sums checked

R-tool 설치 (path 설정)

이미 R-tool 이 설치가 되어있다면, Pass
R-tool 설치 후
아래 코드를 실행 한 후 R Studio program 종료후 재시작

1
2
3

write('PATH="${RTOOLS40_HOME}\\usr\\bin;${PATH}"', 
      file = "~/.Renviron", append = TRUE)
Sys.which("make")

write(‘PATH=”${RTOOLS40_HOME}\usr\bin;${PATH}”‘,

file = “~/.Renviron”, append = TRUE)

Sys.which(“make”)

………………………………………make

“C:\rtools40\usr\bin\make.exe”

jsonlite install

1	install.packages("jsonlite", type = "source")

install.packages(“jsonlite”, type = “source”)
‘C:/Users/brill/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)
trying URL ‘https://cran.rstudio.com/src/contrib/jsonlite_1.7.2.tar.gz'
Content type ‘application/x-gzip’ length 421716 bytes (411 KB)
downloaded 411 KB

installing source package ‘jsonlite’ …
- package ‘jsonlite’ successfully unpacked and MD5 sums checked
- using staged installation
- libs
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c base64.c -o base64.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c collapse_array.c -o collapse_array.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c collapse_object.c -o collapse_object.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c collapse_pretty.c -o collapse_pretty.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c escape_chars.c -o escape_chars.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c integer64_to_na.c -o integer64_to_na.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c is_datelist.c -o is_datelist.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c is_recordlist.c -o is_recordlist.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c is_scalarlist.c -o is_scalarlist.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c modp_numtoa.c -o modp_numtoa.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c null_to_na.c -o null_to_na.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c num_to_char.c -o num_to_char.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c parse.c -o parse.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c prettify.c -o prettify.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c push_parser.c -o push_parser.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c r-base64.c -o r-base64.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c register.c -o register.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c row_collapse.c -o row_collapse.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c transpose_list.c -o transpose_list.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c validate.c -o validate.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c yajl/yajl.c -o yajl/yajl.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c yajl/yajl_alloc.c -o yajl/yajl_alloc.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c yajl/yajl_buf.c -o yajl/yajl_buf.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c yajl/yajl_encode.c -o yajl/yajl_encode.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c yajl/yajl_gen.c -o yajl/yajl_gen.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c yajl/yajl_lex.c -o yajl/yajl_lex.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c yajl/yajl_parser.c -o yajl/yajl_parser.o
  “C:/rtools40/mingw64/bin/“gcc -I”C:/PROGRA~~1/R/R-41~~1.2/include” -DNDEBUG -Iyajl/api -D__USE_MINGW_ANSI_STDIO -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c yajl/yajl_tree.c -o yajl/yajl_tree.o
  “C:/rtools40/mingw64/bin/“ar rcs yajl/libstatyajl.a yajl/yajl.o yajl/yajl_alloc.o yajl/yajl_buf.o yajl/yajl_encode.o yajl/yajl_gen.o yajl/yajl_lex.o yajl/yajl_parser.o yajl/yajl_tree.o
  C:/rtools40/mingw64/bin/gcc -shared -s -static-libgcc -o jsonlite.dll tmp.def base64.o collapse_array.o collapse_object.o collapse_pretty.o escape_chars.o integer64_to_na.o is_datelist.o is_recordlist.o is_scalarlist.o modp_numtoa.o null_to_na.o num_to_char.o parse.o prettify.o push_parser.o r-base64.o register.o row_collapse.o transpose_list.o validate.o -Lyajl -lstatyajl -LC:/PROGRA~~1/R/R-41~~1.2/bin/x64 -lR
  installing to C:/Users/brill/Documents/R/win-library/4.1/00LOCK-jsonlite/00new/jsonlite/libs/x64
- R
- inst
- byte-compile and prepare package for lazy loading
  in method for ‘asJSON’ with signature ‘“blob”‘: no definition for class “blob”
- help
** installing help indices
converting help for package ‘jsonlite’
finding HTML links … done
base64 html
flatten html
fromJSON html
prettify html
rbind_pages html
read_json html
serializeJSON html
stream_in html
unbox html
validate html
- building package indices
- installing vignettes
- testing if installed package can be loaded from temporary location
- testing if installed package can be loaded from final location
- testing if installed package keeps a record of temporary installation path
DONE (jsonlite)

R packages 설치

1 2	install.packages(c("stringr", "hash", "tau", "Sejong", "RSQLite", "devtools"), type = "binary")

The downloaded source packages are in
‘C:\Users\brill\AppData\Local\Temp\RtmpmuDZXg\downloaded_packages’
install.packages(c(“stringr”, “hash”, “tau”, “Sejong”, “RSQLite”, “devtools”),

```
             type = "binary")
```
‘C:/Users/brill/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)
‘fastmap’, ‘highr’, ‘xfun’, ‘diffobj’, ‘rematch2’, ‘bit’, ‘cachem’, ‘processx’, ‘prettyunits’, ‘digest’, ‘xopen’, ‘brew’, ‘commonmark’, ‘knitr’, ‘cpp11’, ‘brio’, ‘evaluate’, ‘praise’, ‘ps’, ‘waldo’, ‘bit64’, ‘blob’, ‘DBI’, ‘memoise’, ‘Rcpp’, ‘plogr’, ‘callr’, ‘pkgbuild’, ‘pkgload’, ‘rcmdcheck’, ‘roxygen2’, ‘rversions’, ‘sessioninfo’, ‘testthat’(들)을 또한 설치합니다.

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/fastmap_1.1.0.zip'
Content type ‘application/zip’ length 215381 bytes (210 KB)
downloaded 210 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/highr_0.9.zip'
Content type ‘application/zip’ length 46725 bytes (45 KB)
downloaded 45 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/xfun_0.28.zip'
Content type ‘application/zip’ length 386111 bytes (377 KB)
downloaded 377 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/diffobj_0.3.5.zip'
Content type ‘application/zip’ length 999001 bytes (975 KB)
downloaded 975 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/rematch2_2.1.2.zip'
Content type ‘application/zip’ length 47584 bytes (46 KB)
downloaded 46 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/bit_4.0.4.zip'
Content type ‘application/zip’ length 635254 bytes (620 KB)
downloaded 620 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/cachem_1.0.6.zip'
Content type ‘application/zip’ length 79002 bytes (77 KB)
downloaded 77 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/processx_3.5.2.zip'
Content type ‘application/zip’ length 1246508 bytes (1.2 MB)
downloaded 1.2 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/prettyunits_1.1.1.zip'
Content type ‘application/zip’ length 37755 bytes (36 KB)
downloaded 36 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/digest_0.6.29.zip'
Content type ‘application/zip’ length 266591 bytes (260 KB)
downloaded 260 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/xopen_1.0.0.zip'
Content type ‘application/zip’ length 24785 bytes (24 KB)
downloaded 24 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/brew_1.0-6.zip'
Content type ‘application/zip’ length 113926 bytes (111 KB)
downloaded 111 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/commonmark_1.7.zip'
Content type ‘application/zip’ length 265490 bytes (259 KB)
downloaded 259 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/knitr_1.36.zip'
Content type ‘application/zip’ length 1469306 bytes (1.4 MB)
downloaded 1.4 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/cpp11_0.4.2.zip'
Content type ‘application/zip’ length 327396 bytes (319 KB)
downloaded 319 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/brio_1.1.3.zip'
Content type ‘application/zip’ length 48880 bytes (47 KB)
downloaded 47 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/evaluate_0.14.zip'
Content type ‘application/zip’ length 76790 bytes (74 KB)
downloaded 74 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/praise_1.0.0.zip'
Content type ‘application/zip’ length 19849 bytes (19 KB)
downloaded 19 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/ps_1.6.0.zip'
Content type ‘application/zip’ length 775912 bytes (757 KB)
downloaded 757 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/waldo_0.3.1.zip'
Content type ‘application/zip’ length 96434 bytes (94 KB)
downloaded 94 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/bit64_4.0.5.zip'
Content type ‘application/zip’ length 565517 bytes (552 KB)
downloaded 552 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/blob_1.2.2.zip'
Content type ‘application/zip’ length 48321 bytes (47 KB)
downloaded 47 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/DBI_1.1.1.zip'
Content type ‘application/zip’ length 686681 bytes (670 KB)
downloaded 670 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/memoise_2.0.1.zip'
Content type ‘application/zip’ length 50131 bytes (48 KB)
downloaded 48 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/Rcpp_1.0.7.zip'
Content type ‘application/zip’ length 3263462 bytes (3.1 MB)
downloaded 3.1 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/plogr_0.2.0.zip'
Content type ‘application/zip’ length 18943 bytes (18 KB)
downloaded 18 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/callr_3.7.0.zip'
Content type ‘application/zip’ length 437774 bytes (427 KB)
downloaded 427 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/pkgbuild_1.3.0.zip'
Content type ‘application/zip’ length 146266 bytes (142 KB)
downloaded 142 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/pkgload_1.2.4.zip'
Content type ‘application/zip’ length 156265 bytes (152 KB)
downloaded 152 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/rcmdcheck_1.4.0.zip'
Content type ‘application/zip’ length 170257 bytes (166 KB)
downloaded 166 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/roxygen2_7.1.2.zip'
Content type ‘application/zip’ length 1352846 bytes (1.3 MB)
downloaded 1.3 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/rversions_2.1.1.zip'
Content type ‘application/zip’ length 67399 bytes (65 KB)
downloaded 65 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/sessioninfo_1.2.2.zip'
Content type ‘application/zip’ length 186234 bytes (181 KB)
downloaded 181 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/testthat_3.1.1.zip'
Content type ‘application/zip’ length 2545637 bytes (2.4 MB)
downloaded 2.4 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/stringr_1.4.0.zip'
Content type ‘application/zip’ length 216715 bytes (211 KB)
downloaded 211 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/hash_2.2.6.1.zip'
Content type ‘application/zip’ length 178061 bytes (173 KB)
downloaded 173 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/tau_0.0-24.zip'
Content type ‘application/zip’ length 186662 bytes (182 KB)
downloaded 182 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/Sejong_0.01.zip'
Content type ‘application/zip’ length 1617954 bytes (1.5 MB)
downloaded 1.5 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/RSQLite_2.2.9.zip'
Content type ‘application/zip’ length 2511267 bytes (2.4 MB)
downloaded 2.4 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/devtools_2.4.3.zip'
Content type ‘application/zip’ length 423398 bytes (413 KB)
downloaded 413 KB

package ‘fastmap’ successfully unpacked and MD5 sums checked
package ‘highr’ successfully unpacked and MD5 sums checked
package ‘xfun’ successfully unpacked and MD5 sums checked
package ‘diffobj’ successfully unpacked and MD5 sums checked
package ‘rematch2’ successfully unpacked and MD5 sums checked
package ‘bit’ successfully unpacked and MD5 sums checked
package ‘cachem’ successfully unpacked and MD5 sums checked
package ‘processx’ successfully unpacked and MD5 sums checked
package ‘prettyunits’ successfully unpacked and MD5 sums checked
package ‘digest’ successfully unpacked and MD5 sums checked
package ‘xopen’ successfully unpacked and MD5 sums checked
package ‘brew’ successfully unpacked and MD5 sums checked
package ‘commonmark’ successfully unpacked and MD5 sums checked
package ‘knitr’ successfully unpacked and MD5 sums checked
package ‘cpp11’ successfully unpacked and MD5 sums checked
package ‘brio’ successfully unpacked and MD5 sums checked
package ‘evaluate’ successfully unpacked and MD5 sums checked
package ‘praise’ successfully unpacked and MD5 sums checked
package ‘ps’ successfully unpacked and MD5 sums checked
package ‘waldo’ successfully unpacked and MD5 sums checked
package ‘bit64’ successfully unpacked and MD5 sums checked
package ‘blob’ successfully unpacked and MD5 sums checked
package ‘DBI’ successfully unpacked and MD5 sums checked
package ‘memoise’ successfully unpacked and MD5 sums checked
package ‘Rcpp’ successfully unpacked and MD5 sums checked
package ‘plogr’ successfully unpacked and MD5 sums checked
package ‘callr’ successfully unpacked and MD5 sums checked
package ‘pkgbuild’ successfully unpacked and MD5 sums checked
package ‘pkgload’ successfully unpacked and MD5 sums checked
package ‘rcmdcheck’ successfully unpacked and MD5 sums checked
package ‘roxygen2’ successfully unpacked and MD5 sums checked
package ‘rversions’ successfully unpacked and MD5 sums checked
package ‘sessioninfo’ successfully unpacked and MD5 sums checked
package ‘testthat’ successfully unpacked and MD5 sums checked
package ‘stringr’ successfully unpacked and MD5 sums checked
package ‘hash’ successfully unpacked and MD5 sums checked
package ‘tau’ successfully unpacked and MD5 sums checked
package ‘Sejong’ successfully unpacked and MD5 sums checked
package ‘RSQLite’ successfully unpacked and MD5 sums checked
package ‘devtools’ successfully unpacked and MD5 sums checked

명사 분리기(KoNLP) 설치를 위한 remotes packages 설치 (in R)

# install.packages("remotes")
remotes::install_github("haven-jeon/KoNLP",
                        upgrade = "never",
                        force = TRUE,
                        INSTALL_opts = c("--no-multiarch"))

# install.packages(“remotes”)
remotes::install_github(“haven-jeon/KoNLP”,
                    upgrade = "never",
                    force = TRUE,
                    INSTALL_opts = c("--no-multiarch"))
Downloading GitHub repo haven-jeon/KoNLP@HEAD
√ checking for file ‘C:\Users\brill\AppData\Local\Temp\RtmpmuDZXg\remotes2cd03d177e06\haven-jeon-KoNLP-960fbbc/DESCRIPTION’ …
preparing ‘KoNLP’: (722ms)
√ checking DESCRIPTION meta-information …

checking for LF line-endings in source and make files and shell scripts

checking for empty or unneeded directories
looking to see if a ‘data/datalist’ file should be added
building ‘KoNLP_0.80.2.tar.gz’

‘C:/Users/brill/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)

installing source package ‘KoNLP’ …
- using staged installation
- R
- data
- inst
- byte-compile and prepare package for lazy loading
- help
** installing help indices
converting help for package ‘KoNLP’
finding HTML links … done
HangulAutomata html
KtoS html
MorphAnalyzer html
SimplePos09 html
SimplePos22 html
StoK html
backupUsrDic html
buildDictionary html
concordance_file html
concordance_str html
convertHangulStringToJamos html
convertHangulStringToKeyStrokes html
convertTag html
editweights html
extractNoun html
get_dictionary html
is.ascii html
is.hangul html
is.jaeum html
is.jamo html
is.moeum html
mergeUserDic html
mutualinformation html
reloadAllDic html
reloadUserDic html
restoreUsrDic html
scala_library_install html
statDic html
tags html
useNIADic html
useSejongDic html
useSystemDic html
- building package indices
- installing vignettes
- testing if installed package can be loaded from temporary location
  [1] “DEBUG start”
  [1] “C:/Users/brill/Documents/R/win-library/4.1/00LOCK-KoNLP/00new/KoNLP/java/scala-library-2.11.8.jar”
  [1] “My R is over 3.2.0”
  [1] “scala-library target url: https://repo1.maven.org/maven2/org/scala-lang/scala-library/2.11.8/scala-library-2.11.8.jar"
  [1] “‘method’ parameter for download.file() function in your R: wininet”
  URL ‘https://repo1.maven.org/maven2/org/scala-lang/scala-library/2.11.8/scala-library-2.11.8.jar'을 시도합니다
  Content type ‘application/java-archive’ length 5744974 bytes (5.5 MB)
  downloaded 5.5 MB

[1] TRUE
[1] 5744974
Successfully installed Scala runtime library in C:/Users/brill/Documents/R/win-library/4.1/00LOCK-KoNLP/00new/KoNLP/java/scala-library-2.11.8.jar
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path

DONE (KoNLP)

명사 분리기 KoNLP 설치

1 2	library(KoNLP) useNIADic()

library(KoNLP)
useNIADic()
Backup was just finished!
Downloading package from url: https://github.com/haven-jeon/NIADic/releases/download/0.0.1/NIADic_0.0.1.tar.gz
Installing 16 packages: colorspace, viridisLite, RColorBrewer, munsell, labeling, farver, base64enc, htmltools, scales, isoband, gtable, jquerylib, tinytex, ggplot2, data.table, rmarkdown
‘C:/Users/brill/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/colorspace_2.0-2.zip'
Content type ‘application/zip’ length 2645307 bytes (2.5 MB)
downloaded 2.5 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/viridisLite_0.4.0.zip'
Content type ‘application/zip’ length 1299504 bytes (1.2 MB)
downloaded 1.2 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/RColorBrewer_1.1-2.zip'
Content type ‘application/zip’ length 55707 bytes (54 KB)
downloaded 54 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/munsell_0.5.0.zip'
Content type ‘application/zip’ length 245486 bytes (239 KB)
downloaded 239 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/labeling_0.4.2.zip'
Content type ‘application/zip’ length 62679 bytes (61 KB)
downloaded 61 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/farver_2.1.0.zip'
Content type ‘application/zip’ length 1752621 bytes (1.7 MB)
downloaded 1.7 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/base64enc_0.1-3.zip'
Content type ‘application/zip’ length 43156 bytes (42 KB)
downloaded 42 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/htmltools_0.5.2.zip'
Content type ‘application/zip’ length 347310 bytes (339 KB)
downloaded 339 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/scales_1.1.1.zip'
Content type ‘application/zip’ length 558895 bytes (545 KB)
downloaded 545 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/isoband_0.2.5.zip'
Content type ‘application/zip’ length 2726764 bytes (2.6 MB)
downloaded 2.6 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/gtable_0.3.0.zip'
Content type ‘application/zip’ length 434327 bytes (424 KB)
downloaded 424 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/jquerylib_0.1.4.zip'
Content type ‘application/zip’ length 525848 bytes (513 KB)
downloaded 513 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/tinytex_0.35.zip'
Content type ‘application/zip’ length 126495 bytes (123 KB)
downloaded 123 KB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/ggplot2_3.3.5.zip'
Content type ‘application/zip’ length 4130301 bytes (3.9 MB)
downloaded 3.9 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/data.table_1.14.2.zip'
Content type ‘application/zip’ length 2600846 bytes (2.5 MB)
downloaded 2.5 MB

trying URL ‘https://cran.rstudio.com/bin/windows/contrib/4.1/rmarkdown_2.11.zip'
Content type ‘application/zip’ length 3660449 bytes (3.5 MB)
downloaded 3.5 MB

package ‘colorspace’ successfully unpacked and MD5 sums checked
package ‘viridisLite’ successfully unpacked and MD5 sums checked
package ‘RColorBrewer’ successfully unpacked and MD5 sums checked
package ‘munsell’ successfully unpacked and MD5 sums checked
package ‘labeling’ successfully unpacked and MD5 sums checked
package ‘farver’ successfully unpacked and MD5 sums checked
package ‘base64enc’ successfully unpacked and MD5 sums checked
package ‘htmltools’ successfully unpacked and MD5 sums checked
package ‘scales’ successfully unpacked and MD5 sums checked
package ‘isoband’ successfully unpacked and MD5 sums checked
package ‘gtable’ successfully unpacked and MD5 sums checked
package ‘jquerylib’ successfully unpacked and MD5 sums checked
package ‘tinytex’ successfully unpacked and MD5 sums checked
package ‘ggplot2’ successfully unpacked and MD5 sums checked
package ‘data.table’ successfully unpacked and MD5 sums checked
package ‘rmarkdown’ successfully unpacked and MD5 sums checked

preparing ‘NIADic’:
√ checking DESCRIPTION meta-information …
√ checking vignette meta-information …
checking for LF line-endings in source and make files and shell scripts
checking for empty or unneeded directories
building ‘NIADic_0.0.1.tar.gz’

‘C:/Users/brill/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)

installing source package ‘NIADic’ …
- using staged installation
- R
- inst
- byte-compile and prepare package for lazy loading
- help
** installing help indices
converting help for package ‘NIADic’
finding HTML links … done
get_dic html
- building package indices
- installing vignettes
- testing if installed package can be loaded from temporary location
- testing if installed package can be loaded from final location
- testing if installed package keeps a record of temporary installation path
DONE (NIADic)
1213109 words dictionary was built.

명사 분리기 설치 후 확인

New


text = "뿌리산업’의 기반이 되는 공정기술의 범위가 관련법 제정 10년 만에 확대 개편된다. 뿌리기업 우대 지원과 청년층 등 신규인력 유입 지원을 강화하기 위한 법적 토대도 마련된다.
산업통상자원부는 이 같은 내용을 담은 ‘뿌리산업 진흥과 첨단화에 관한 법률(뿌리산업법) 시행령’ 개정안이 14일 국무회의에서 의결돼 오는 16일부터 시행된다고 밝혔다.
먼저 뿌리산업법 기반 공정기술(뿌리기술)의 범위가 기존 6개(주조, 금형, 소성가공, 용접, 표면처리, 열처리)에서 14개로 늘어난다.
구체적으로 소재 다원화 공정기술에 사출·프레스, 정밀가공, 적층제조, 산업용 필름 및 지류 등 4개 기술이 포함된다. 산업부는 이를 통해 세라믹, 플라스틱, 탄성소재, 탄소, 펄프 등 다양한 소재 기반 제조 공정을 확산할 계획이다. 또 지능화 공정기술로 로봇, 센서, 산업 지능형 소프트웨어, 엔지니어링 설계 등 4개 기술이 추가된다.
뿌리기술 범위가 확대되면서 뿌리산업의 범위도 기존 6대 산업, 76개 업종에서 14대 산업, 111개 업종으로 늘어난다.

이번 개정을 통해 뿌리기업 확인 절차, 확인서 유효기간(3년), 사후관리 등에 관한 규정도 신설됐다. 뿌리기업은 뿌리기술을 활용해 사업을 영위하는 업종 또는 뿌리기술에 활용되는 장비 제조 분야를 말한다.
뿌리기업 확인 제도는 외국인 근로자 고용 우대 혜택 등이 주어지는 뿌리산업 관련 우대 지원 대상을 명확히 정하기 위한 것으로 국가뿌리산업진흥센터에서 확인서를 발급해오고 있다. 2012년부터 1만1766건이 발급됐으며 현재 5843건이 유효한 것으로 집계됐다.

‘일하기 좋은 뿌리기업’ 선정을 위한 기준과 절차, 지원 내용 등에 관한 규정도 새로 만들어졌다. ‘일하기 좋은 뿌리기업’은 뿌리산업에 청년층 등 신규 인력 유입을 촉진하기 위해 근로·복지 환경, 성장 역량 등이 우수한 기업을 산업부가 선정해 홍보 등을 지원하는 제도다.
산업부는 이번 개정 사항이 원활히 시행될 수 있도록 업종별 협·단체, 뿌리기업, 지자체 등을 대상으로 적극 홍보할 방침이다. 아울러 매년 발간하는 뿌리산업 백서를 통해 새롭게 추가되는 8대 차세대 공정기술에 대한 내용, 기술 동향 등을 상세하게 제공하기로 했다．
산업부 관계자는 “이번 개정은 2011년 뿌리산업법 제정 후 10년 만에 뿌리기술을 소재다원화와 지능화 중심으로 확장한 것으로, 뿌리산업의 기술 융복합화와 첨단화를 촉진하고 신규 인력 유입 지원을 강화하기 위한 법적 토대를 마련하였다는 데에 의의가 있다”고 말했다."

extractNoun(text)

extractNoun(text)
[1] “뿌리산업’” “기반” “공정기술”
[4] “범위” “관련” “법”
[7] “제정” “10” “년”
[10] “만” “확대” “개편”
[13] “뿌리” “기업” “우대”
[16] “지원” “청년층” “등”
[19] “신규인력” “유입” “지원”
[22] “강화” “하기” “토대”
[25] “마련” “산업” “통상”
[28] “자원” “부” “내용”
[31] “담” “‘뿌리산업” “진흥”
[34] “첨단화” “법률(뿌리산업법)” “시행령’”
[37] “개정안” “14” “국무회의”
[40] “의결” “16” “일”
[43] “시행” “뿌리” “산업”
[46] “법” “기반” “공정기술(뿌리기술)”
[49] “범위” “기존” “6개(주조”
[52] “금형” “소성” “가공”
[55] “용접” “표면처리” “열처리”
[58] “14” “개” “구체”
[61] “적” “소재” “다원화”
[64] “공정기술” “사출·프레스” “정밀가공”
[67] “적층” “제조” “산업용”
[70] “필름” “지류” “등”
[73] “4” “개” “기술”
[76] “포함” “산업” “부”
[79] “이” “세라믹” “플라스틱”
[82] “탄성소” “재” “탄소”
[85] “펄프” “등” “다양”
[88] “한” “소재” “기반”
[91] “제조” “공정” “확산”
[94] “할” “계획” “지능화”
[97] “공정기술” “로봇” “센서”
[100] “산업” “지능형” “소프트웨어”
[103] “엔지니어링” “설계” “등”
[106] “4” “개” “기술”
[109] “추가” “뿌리” “기술”
[112] “범위” “확대” “되”
[115] “뿌리” “산업” “범위”
[118] “기존” “6” “대”
[121] “산업” “76” “개”
[124] “업종” “14” “대”
[127] “산업” “111” “개”
[130] “업종” “이번” “개정”
[133] “뿌리” “기업” “확인”
[136] “절차” “확인” “유효”
[139] “기” “3” “년”
[142] “사후관리” “등” “규정도”
[145] “신설” “뿌리” “기업”
[148] “뿌리” “기술” “활용”
[151] “해” “사업” “영위”
[154] “하” “업종” “뿌리”
[157] “기술” “활용” “되”
[160] “장비” “제조” “분야”
[163] “말” “뿌리” “기업”
[166] “확인” “제” “외국”
[169] “근로자” “고용” “우대”
[172] “혜택” “등” “뿌리”
[175] “산업” “관련” “우대”
[178] “지원” “대상” “것”
[181] “국가” “뿌리” “산업진흥”
[184] “센터” “확인서” “발급”
[187] “해오” “2012” “년”
[190] “1” “만” “1766”
[193] “건” “발급” “5843”
[196] “건” “유효” “한”
[199] “것” “집계” “‘일하기”
[202] “뿌리기업’” “선정” “기준”
[205] “절차” “지원” “내용”
[208] “등” “규정도” “‘일하기”
[211] “뿌리기업’은” “뿌리” “산업”
[214] “청년층” “등” “신규”
[217] “인력” “유입” “촉진”
[220] “하기” “근로·복지” “환경”
[223] “성장” “역량” “등”
[226] “우수” “한” “기업”
[229] “산업” “부” “선정”
[232] “해” “홍보” “등”
[235] “지원” “하” “제도”
[238] “산업” “부” “이번”
[241] “개정” “사항” “시행”
[244] “수” “업종” “별”
[247] “협·단체” “뿌리” “기업”
[250] “지자체” “등” “대상”
[253] “적극” “홍보” “할”
[256] “방침” “발간” “하”
[259] “뿌리” “산업” “백서”
[262] “추가” “되” “8”
[265] “대” “차세대” “공정기술”
[268] “내용” “기술” “동향”
[271] “등” “상세” “하게”
[274] “제공” “하기” “산업”
[277] “부” “관계자” ““이번”
[280] “개정” “2011” “년”
[283] “뿌리” “산업” “법”
[286] “제정” “후” “10”
[289] “년” “만” “뿌리”
[292] “기술” “소재” “다원화”
[295] “지능화” “중심” “확장”
[298] “한” “것” “뿌리”
[301] “산업” “기술” “융복합”
[304] “화” “첨단화” “촉진”
[307] “신규” “인력” “유입”
[310] “지원” “강화” “하기”
[313] “토대” “마련” “데”
[316] “의의” “있다”고” “말”

- 명사 분리기 설치 끝

2021-12-14 게시 됨2021-12-14 업데이트 됨python3분안에 읽기 (약 483 단어)

python_basic_Exeption

python

Exception

# /c/Users/brill/Desktop/PyThon_Function/venv/Scripts/python
# -*- coding : UTF-8

def error01():
    a=10
    a/0
    #ZeroDivisionError: division by zero

def error02():
    a= [1, 2, 3, 4, 5]
    a[10]
    #IndexError: list index out of range

def error03():
    a = 1000
    a + "Hello"
    #TypeError: unsupported operand type(s) for +: 'int' and 'str'

def error04():
    a=10
    a+b
    #NameError: name 'b' is not defined

if __name__ == "__main__":
    error01()
    error02()
    error03()
    error04()
    print("program is done")

크롤링 코드를 작성 했을때

“https://sports.news.naver.com/news?oid=109&aid=0004526080" : 페이지있음,
“https://sports.news.naver.com/news?oid=109&aid=0004526081" : 페이지가 없다면,
“https://sports.news.naver.com/news?oid=109&aid=0005526080" : 페이지 있음,
크롤링 코드 멈춤
프로그램이 멈춰서 안됨

Exeption의 종류

java 의 try catch 구문과 같음

# /c/Users/brill/Desktop/PyThon_Function/venv/Scripts/python
# -*- coding : UTF-8

def try_func(x, idx):
    try:
        return 100/x[idx]
    except ZeroDivisionError:
        print("did't divide zero")
    except IndexError:
        print("not in range of Index")
    except TypeError:
        print("there is type Error")
    except NameError:
        print("it is not definated parameter")
    finally:
        print("무조건 실행됨")


def main():
    a = [50, 60, 0, 70]
    print(try_func(a,1))

    # Zero Division Error
    print(try_func(a,0))

    # Index Error
    print(try_func(a,5))

    # type Error
    print(try_func(a, "hi"))


if __name__ == "__main__":
    main()

어떻게던 프로그램이 돌아 갈 수 있도록
만들어 주는 것이 중요하다.

class 정리

__init__ : set_name, set_id 해 주지 않고, 통합시켜주는 역할
__eq__, __ne__ : 부등호 연산자
상속, 다형성(서로다른 클래스에서 공통으로 쓰는 함수)
Exception
class attribute / instance attribute / instance method 차이
추상 class (안배웠음)
data incapsulation

# /c/Users/brill/Desktop/PyThon_Function/venv/Scripts/python
# -*- coding : UTF-8

class SalaryExcept(ValueError): pass # 상속
class TipExept(SalaryExcept): pass # 상속

class Employee:

    MIN_SALARY = 30000
    MAX_Bonus = 20000

    def __init__(self, name, salary = 30000):
        self.name = name
        if salary< Employee.MIN_SALARY:
            raise SalaryExcept("급여가 너무 낮아요!")
        self.salary = salary

    def give_bonus(self, amount):
        if amount > Employee.MAX_Bonus:
            print("보너스가 너무 많아 ")
        elif self.salary + amount < Employee.MIN_SALARY :
            print("보너스 지급 후의 급여도 매우 낮다. ")
        else:
            self.salary += amount

if __name__ == "__main__":
    emp = Employee("YH", salary= 10000)

    try:
        emp.give_bonus(70000)
    except SalaryExcept:
        print("Error Salary")

    try:
        emp.give_bonus(-10000)
    except tipExcept:
        print("Error Tip")

여전히 Error가 나는 코드
나는 Exception 안됨

2021-12-13 게시 됨2021-12-14 업데이트 됨python3분안에 읽기 (약 523 단어)

python_basic_Bank

python

Bank _ 계좌 만들기

# /c/Users/brill/Desktop/PyThon_Function/venv/Scripts/python
# -*- coding : UTF-8

class Human:

    def __init__(self, name):
        self.name = name


if __name__ == "__main__":
    human01 = Human(name="A")
    human02 = Human(name="A")

    print(human01 == human02)
    print("human 01 : ", human01)
    print("human 02 : ", human02)

False
human 01 : <__main__.Human object at 0x000001686E41CC10>
human 02 : <__main__.Human object at 0x000001686E41CE50>

저장되는 장소가 다르기 때문에 다르다.

Bank _ customer ID 확인하기

# /c/Users/brill/Desktop/PyThon_Function/venv/Scripts/python
# -*- coding : UTF-8

class Bank:

    #instance attribute
    def __init__(self, cust_id, balance=0):
        self.balance = balance
        self.cust_id = cust_id

    #instance methode
    def withdraw(self, amount):
        self.balance -= amount

    def __eq__(self, other):
        print("__eq()__ is called")
        return self.cust_id == other.cust_id

if __name__ == "__main__":
    account01 = Bank(123, 1000)
    account02 = Bank(123, 1000)
    account03 = Bank(456, 1000)
    print(account01 == account02)
    print(account02 == account03)
    print(account01 == account03)

eq() is called
True
eq() is called
False
eq() is called
False

부등호 연산자
- != : ne()
- >= : ge()
- <= : le()
- > : gt()
- < : lt()

eq() 함수 사용하기

# /c/Users/brill/Desktop/PyThon_Function/venv/Scripts/python
# -*- coding : UTF-8

class Bank:

    #instance attribute
    def __init__(self, cust_id, balance=0):
        self.balance, self.cust_id = balance, cust_id


    #instance methode
    def withdraw(self, amount):
        self.balance -= amount

    def __eq__(self, other):
        print("__eq()__ is called")
        return (self.cust_id == other.cust_id) and (type(self) == type(other))

class Phone:

    def __init__(self, cust_id):
        self.cust_id = cust_id

    def __eq__(self, other):
        return self.cust_id == other.cust_id


if __name__ == "__main__":
    account01 = Bank(1234)
    phone01 = Phone(1234)

    print(account01 == phone01)

eq() is called
False

eq를 불러와서 같은지 확인 할 수 있다.

접근기록, log 기록 확인하기

# /c/Users/brill/Desktop/PyThon_Function/venv/Scripts/python
# -*- coding : UTF-8


class Bank:
    def __init__(self, cust_id, name, balance = 0):
        self.cust_id, self.name, self.balance = cust_id, name, balance

    def __str__(self):
        cust_str = """
        customer:
            cust_id : {cust_id}
            name : {name}
            balance : {balance}
        """.format(cust_id = self.cust_id, name = self.name, balance= self.balance)

        return cust_str

if __name__ == "__main__":
    bank_cust = Bank(123, "YH")
    print(bank_cust)

DB에 저장 되지 않지만, 로그 기록을 확인 할 수 있다.

str() and repr() 비교

# /c/Users/brill/Desktop/PyThon_Function/venv/Scripts/python
# -*- coding : UTF-8


class Bank:
    def __init__(self, cust_id, name, balance = 0):
        self.cust_id, self.name, self.balance = cust_id, name, balance

    def __str__(self):
        cust_str = """
        customer:
            cust_id : {cust_id}
            name : {name}
            balance : {balance}
        """.format(cust_id = self.cust_id, name = self.name, balance= self.balance)

        return cust_str

    def __repr__(self):
        cust_str = "Bank({cust_id}, '{name}', {balance})".format(cust_id = self.cust_id, name = self.name, balance= self.balance)
        return cust_str

if __name__ == "__main__":
    bank_cust = Bank(123, "YH")
    print(str(bank_cust))
    print(repr(bank_cust))

difference of str() and repr()

2021-12-11 게시 됨2021-12-13 업데이트 됨pharmacerical_company몇 초안에 읽기 (약 13 단어)

Xenomix

제노믹스

Ref

Description of competition

Description of competition

data overview

Evaluation

data 불러오기

data 전처리

Text 전처리

훈련, 검증용 data 분류

Logistic Regression Model Develop

Step: AIC=2202.56

모형 성능 측정

정리

개요

평가

Colab에 Mecab 설치

네이버 쇼핑 리뷰 데이터에 대한 이해와 전처리

데이터 불러오기

레이블의 분포 확인

데이터 정제하기

토큰화

단어와 길이 분포 확인하기

정수 인코딩

패딩

GRU로 네이버 쇼핑 리뷰 감성 분류하기

Text Mining in R (03)

앞서서 설치한 files 바탕으로 TextMining을 해 보자.

data 수집

data 전처리

Tokenize

Text Mining in R (02)

§ MeCab 설치

§ R 에서 설치

RcppMeCab 설치 확인 (형태소 분리기)

R을 이용한 TextMining

빅카인즈 (Korea)

감정분석

R 환경 설정

R-tool 설치 (path 설정)

jsonlite install

R packages 설치

명사 분리기(KoNLP) 설치를 위한 remotes packages 설치 (in R)

Content type ‘application/java-archive’ length 5744974 bytes (5.5 MB)

명사 분리기 KoNLP 설치

명사 분리기 설치 후 확인

python

Exception

java 의 try catch 구문과 같음

python

Bank _ 계좌 만들기

Bank _ customer ID 확인하기

eq() 함수 사용하기

접근기록, log 기록 확인하기

str() and repr() 비교

제노믹스

광고

링크

카테고리

최근 글

아카이브

태그

업데이트 소식 받기

follow.it