2021-11-15 게시 됨2021-11-15 업데이트 됨python - plotly몇 초안에 읽기 (약 49 단어)

[object Object]

#Plotly Tutorial For Kaggle Survey Competitions

진도가 너무 더디게 나가서 Teacher’s code를 조금더 뜯어 본후

python for문이나 if문을 조금 더 잘 쓸 수 있을기를 바란다.

2021-11-09 게시 됨2021-11-09 업데이트 됨python - plotly5분안에 읽기 (약 681 단어)

kaggle :ScatterLine (Q42)

kaggle dictation (08)

plotly.graph_objects as go_ Scatter + line plot

산점도

bivariate”이변수” 값을 시각화 하는 기본적인 그래프.
- correlation: Positive, Negative, non
두 개의 변수 각각의 분포과 변수간의 관계를 확인 할 수 있다.

ref.

0. data set

https://www.kaggle.com/miguelfzzz/the-typical-kaggle-data-scientist-in-2021

Subject : 가장 많이 이용하는 Media source

1. data 읽어오기

Q42로 시작하는 col을 읽어오기.
python의 for문을 이용.

1	media_cols = [col for col in df if col.startswith('Q42')]

media_cols

2. data Frame 만들어 주기

media = df[media_cols]

media.columns = ['Twitter', 'Email newsletters', 
                'Reddit', 'Kaggle', 'Course Forums', 
                'YouTube', 'Podcasts', 'Blogs',
                'Journal Publications', 'Slack Communities', 'None', 'Other']

df_media_cols

3.표 설정.

media = (
    media
    .count()
    .to_frame()
    .reset_index()
    .rename(columns={'index':'Medias', 0:'Count'})
    .sort_values(by=['Count'], ascending=False)
    )

media

4. 색 지정


colors = ['#033351',] * 12
colors[11] = '#5abbf9'
colors[10] = '#5abbf9'
colors[9] = '#5abbf9'
colors[8] = '#0779c3'
colors[7] = '#0779c3'
colors[6] = '#0779c3'
colors[5] = '#0779c3'
colors[4] = '#0779c3'

5. percent로 계산한 column 추가

i. add percent column

1
2
3

media['percent'] = ((media['Count'] / len(df))*100).round(2).astype(str) + '%'

media

Q42_percent

ii. Count값 (column값으로 ) 정렬

media = (media
        .sort_values(by = ['Count'])
        .iloc[0:15]
        .reset_index())

media

Q42_percent_sort

1. Default는 내림차순
2. iloc으로 0번부터 15까지 List로 긁어오기
3. reset index()

6.plotly.graph_objects.Scatter()

Scatter G 그리기

i. 산점도 점 찍기


fig = go.Figure(go.Scatter(x = media['Count'], 
                           y = media["Medias"],
                           text = media['percent'],
                           mode = 'markers',
                           marker_color =colors,
                           marker_size  = 12))
fig

Q42_graph

ii. 산점도에 for문을 이용하여 line 연결하기

for i in range(0, len(media)):
               fig.add_shape(type='line',
                              x0 = 0, y0 = i,
                              x1 = media["Count"][i],
                              y1 = i,
                              line=dict(color=colors[i], width = 4))  
fig

Q42_Scatter+line

for i in range(0~platform의 길이만큼)
fig. add_shape()
- type = ‘line’
  - line모양의 grape shape add
- x0 = 0, y0 = i,
  - 초기값 (0, i)에서 시작
  - (0, 0) = other Line Start
- x1 = platform[“Count”][i],
  - x축 Index : count의 값만큼 x축방향으로 Line이 그어진다.
- y1 = i,
  - y축 Index, 마지막 값
- line=dict(color=colors[i], width = 4)
  - line의 세부 설정, 색과 두께

7. update_traces(hovertemplate)

1
2
3

fig.update_traces(hovertemplate='<b>Media Source</b>: %{y}<br><extra></extra>'+
                                '<b>Count</b>: %{x}<br>'+
                                '<b>Proportion</b>: %{text}')

8. Design

i. 축 grid

1 2	fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='#9f9f9f', ticklabelmode='period') fig.update_yaxes(showgrid=False)

x 축의 grid만 보여줌. tick labe lmode : period

Q42_grid

plotly의 axes/En

ii. update_layout()


fig.update_layout(showlegend=False, 
                  plot_bgcolor='#F7F7F7', 
                  margin=dict(pad=20),
                  paper_bgcolor='#F7F7F7',
                  yaxis_title=None,
                  xaxis_title=None,
                  title_text="Most Commonly Used <b>Media Sources</b>",
                  title_x=0.5,
                  height=700,
                  font=dict(family="Hiragino Kaku Gothic Pro, sans-serif", size=17, color='#000000'),
                  title_font_size=35)

Q42_layout

9. Annotation

fig.add_annotation(dict(font=dict(size=14),
                                    x=0.98,
                                    y=-0.22,
                                    showarrow=False,
                                    text="@miguelfzzz",
                                    xanchor='left',
                                    xref="paper",
                                    yref="paper"))

fig.add_annotation(dict(font=dict(size=12),
                                    x=0.04,
                                    y=-0.22,
                                    showarrow=False,
                                    text="Source: 2021 Kaggle Machine Learning & Data Science Survey",
                                    xanchor='left',
                                    xref="paper",
                                    yref="paper"))

fig.show()

Q42_final

2021-11-09 게시 됨2021-11-09 업데이트 됨python - plotly5분안에 읽기 (약 718 단어)

kaggle :ScatterLine (Q11)

kaggle dictation (05)

plotly.graph_objects as go: 를 이용한 Scatter + line G

산점도

bivariate”이변수” 값을 시각화 하는 기본적인 그래프.
- correlation: Positive, Negative, non
두 개의 변수 각각의 분포과 변수간의 관계를 확인 할 수 있다.

ref.

0. data set

https://www.kaggle.com/miguelfzzz/the-typical-kaggle-data-scientist-in-2021

Subject : 가장 많이 이용하는 computer platform(hardware)

1. data 읽어오기 + data Frame 만들어 주기

platform = (
    df['Q11']
    .value_counts()
    .to_frame()
    .reset_index()
    .rename(columns={'index':'Platform', 'Q11':'Count'})
    .sort_values(by=['Count'], ascending=False)   
    .replace(['A deep learning workstation (NVIDIA GTX, LambdaLabs, etc)',
              'A cloud computing platform (AWS, Azure, GCP, hosted notebooks, etc)'], 
              ['A deep learning workstation', 'A cloud computing platform'])
          )

ide를 dataframe화 완료.

Q11의 column이름 까지 재설정 완료.

Q11

2.표 설정.

ide = (
    ide
    .count()
    .to_frame()
    .reset_index()
    .rename(columns={'index':'IDE', 0:'Count'})
    .sort_values(by=['Count'], ascending=False)
    )


ide['percent'] = ((ide['Count'] / len(df))*100).round(2).astype(str) + '%'

3. percent 추가

1	platform['percent'] = ((platform['Count'] / platform['Count'].sum())*100).round(2).astype(str) + '%'

Q11_percent

3. 색 지정

colors = ['#033351',] * 6
colors[5] = '#5abbf9'
colors[4] = '#0779c3'
colors[3] = '#0779c3'

4. 표 재 설정

platform = (platform
           .sort_values(by = ['Count'])
           .iloc[0:15]
           .reset_index())
platform

platformRe

.sort_values(by = [‘Count’]) : [Count]로 정렬,
.iloc[0:15] platform의 column 선택: 0~15까지 data 가져오기
.reset_index() : data와 상관 없는 새 index 가져오기

5.plotly.graph_objects.Scatter()

본격적으로 Scatter G 만들기.

## 산점도 점 찍기


fig = go.Figure(go.Scatter(x = platform['Count'], 
                           y = platform["Platform"],
                           text = platform['percent'],
                           mode = 'markers',
                           marker_color =colors,
                           marker_size  = 12))
fig

Q11_graph

x = platform[‘Count’], y = platform[“Platform”],
1. x축, y축 설정
text = platform[‘percent’],
1. text를 넣는다고 하는데 안보이네
mode = ‘markers’,
1. Text, lines+markers, makers, line 이 가능 한거 같다.
2. Scatter.mod
marker_color =colors, marker_size = 12)
1. 산점도 안에 있는 점의 색과 크기

## 산점도에 for문을 이용하여 line 연결하기

for i in range(0, len(platform)):
               fig.add_shape(type='line',
                              x0 = 0, y0 = i,
                              x1 = platform["Count"][i],
                              y1 = i,
                              line=dict(color=colors[i], width = 4))

Q11_Scatter+line

for i in range(0~platform의 길이만큼)

fig. add_shape()

type = ‘line’
- line모양의 grape shape add

x0 = 0, y0 = i,
- 초기값

x1 = platform[“Count”][i],
x축 Index
y1 = i,
y축 Index, 마지막 값

line=dict(color=colors[i], width = 4)

line의 세부 설정, 색과 두께

flatform은 .iloc[0:15] 로 뽑아진 list 형식
따라서 platform[“Count”][i]값을 뽑아 낼 수 있다.

6. update_traces()

1
2
3

fig.update_traces(hovertemplate='<b>Platform</b>: %{y}<br><extra></extra>'+
                                '<b>Count</b>: %{x}<br>'+
                                '<b>Proportion</b>: %{text}')

7. Design

fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='#9f9f9f', ticklabelmode='period')
fig.update_yaxes(showgrid=False)
 
fig.update_layout(showlegend=False, 
                  plot_bgcolor='#F7F7F7', 
                  margin=dict(pad=20),
                  paper_bgcolor='#F7F7F7',
                  yaxis_title=None,
                  xaxis_title=None,
                  title_text="Most Commonly Used <b>Computing Platforms</b>",
                  title_x=0.5,
                  font=dict(family="Hiragino Kaku Gothic Pro, sans-serif", size=17, color='#000000'),
                  title_font_size=35)

Q11_layout

8. Annotation

fig.add_annotation(dict(font=dict(size=14),
                                    x=0.98,
                                    y=-0.22,
                                    showarrow=False,
                                    text="@miguelfzzz",
                                    xanchor='left',
                                    xref="paper",
                                    yref="paper"))

fig.add_annotation(dict(font=dict(size=12),
                                    x=0.04,
                                    y=-0.22,
                                    showarrow=False,
                                    text="Source: 2021 Kaggle Machine Learning & Data Science Survey",
                                    xanchor='left',
                                    xref="paper",
                                    yref="paper"))

fig.show()

Q11_final

[object Object]

kaggle :ScatterLine (Q42)

kaggle dictation (08)

plotly.graph_objects as go_ Scatter + line plot

산점도

0. data set

Subject : 가장 많이 이용하는 Media source

1. data 읽어오기

2. data Frame 만들어 주기

3.표 설정.

4. 색 지정

5. percent로 계산한 column 추가

6.plotly.graph_objects.Scatter()

Scatter G 그리기

7. update_traces(hovertemplate)

8. Design

9. Annotation

kaggle :ScatterLine (Q11)

kaggle dictation (05)

plotly.graph_objects as go: 를 이용한 Scatter + line G

산점도

0. data set

Subject : 가장 많이 이용하는 computer platform(hardware)

1. data 읽어오기 + data Frame 만들어 주기

2.표 설정.

3. percent 추가

3. 색 지정

4. 표 재 설정

5.plotly.graph_objects.Scatter()

6. update_traces()

7. Design

8. Annotation

광고

링크

카테고리

최근 글

아카이브

태그

업데이트 소식 받기

follow.it