스파르타 힙한 취미 코딩 - 파이썬 혼자놀기 패키지 2일차 개발일지

개발일지 2021. 9. 23. 17:39

[Week I Learned]

*Notion

https://www.notion.so/2-b501fb7ea8ca4cd0a5417480bb9de9a2

[스파르타코딩클럽] 파이썬 혼자놀기 패키지 - 2일차

강의자료 시작에 PDF파일을 올려두었어요!

www.notion.so

*기사 스크래핑 시작코드

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome('chromedriver')

url = "https://search.naver.com/search.naver?where=news&sm=tab_jum&query=추석"

driver.get(url)
req = driver.page_source
soup = BeautifulSoup(req, 'html.parser')

#####################
# 여기에 코드 적기!
#####################

driver.quit()

*태그 내 텍스트만 가져오기

.text 붙이기

articles = soup.select_one('#sp_nws1 > div.news_wrap.api_ani_send > div > a').text
print(articles)

기사마다 selector 규칙이 좀 달라서 soup.select 하려면 다른 방법이 필요함

바로 큰 덩어리를 데려오는 것

articles = soup.select('#main_pack > section.sc_new.sp_nnews._prs_nws > div > div.group_news > ul > li')
for article in articles:
print(article)

for article in articles:
title = article.select_one('div.news_wrap.api_ani_send > div > a').text
url = article.select_one('div.news_wrap.api_ani_send > div > a')['href']
comp = article.select_one('a.info.press').text.split(' ')[0].replace('언론사','')

=>텍스트를 띄어쓰기 기준으로 쪼개고, 맨 앞에만 갖고와, 언론사라는 단어는 빼

print(title,url,comp)

[엑셀파일로 저장하기: openpyxl]

파이썬 패키지 파일 openpyxl 설치

*openpyxl 이용하기

from openpyxl import Workbook

wb = Workbook() =>워크북을 하나 새로 만들어서
ws1 = wb.active
ws1.title = "articles" =>시트제목을'articles'로 하고
ws1.append(["제목", "링크", "신문사"]) =>1행 내용

wb.save(filename='articles.xlsx')

=>run 하기 전 해당 엑셀 파일 꼭 종료된 상태에서 하기

from bs4 import BeautifulSoup
from selenium import webdriver
from openpyxl import Workbook

driver = webdriver.Chrome('chromedriver')

url = "https://search.naver.com/search.naver?where=news&sm=tab_jum&query=추석"

driver.get(url)
req = driver.page_source
soup = BeautifulSoup(req, 'html.parser')

articles = soup.select('#main_pack > section.sc_new.sp_nnews._prs_nws > div > div.group_news > ul > li')

wb = Workbook()
ws1 = wb.active
ws1.title = "articles"
ws1.append(["제목", "링크", "신문사"])

for article in articles:
title = article.select_one('div.news_wrap.api_ani_send > div > a').text
url = article.select_one('div.news_wrap.api_ani_send > div > a')['href']
comp = article.select_one('a.info.press').text.split(' ')[0].replace('언론사','')

ws1.append([title, url, comp])

# main_pack > section.sc_new.sp_nnews._prs_nws > div > div.group_news > ul
# sp_nws1 > div.news_wrap.api_ani_send

driver.quit()
wb.save(filename='articles.xlsx')

[이메일 보내기]

*이메일 보내기 시작코드

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders

# 보내는 사람 정보
me = "보내는사람@gmail.com"
my_password = "비밀번호"

# 로그인하기
s = smtplib.SMTP_SSL('smtp.gmail.com')
s.login(me, my_password)

# 받는 사람 정보
you = "받는사람@아무_도메인"

# 메일 기본 정보 설정
msg = MIMEMultipart('alternative')
msg['Subject'] = "제목"
msg['From'] = me
msg['To'] = you

# 메일 내용 쓰기
content = "메일 내용"
part2 = MIMEText(content, 'plain')
msg.attach(part2)

# 메일 보내고 서버 끄기
s.sendmail(me, you, msg.as_string())
s.quit()

*2단계 인증 해제

https://myaccount.google.com/signinoptions/two-step-verification

로그인 - Google 계정

하나의 계정으로 모든 Google 서비스를 Google 계정으로 로그인

accounts.google.com

*보안 수준이 낮은 앱 해제하기

https://myaccount.google.com/lesssecureapps

로그인 - Google 계정

하나의 계정으로 모든 Google 서비스를 Google 계정으로 로그인

accounts.google.com

*여러 사람에게 이메일 보내기 시작코드

#아래부터 다름

# 받는 사람 정보
email_list = ["이메일1", "이메일2"]

for you in email_list:
    # 메일 기본 정보 설정
    msg = MIMEMultipart('alternative')
    msg['Subject'] = "제목"
    msg['From'] = me
    msg['To'] = you

    # 메일 내용 쓰기
    content = "메일 내용"
    part2 = MIMEText(content, 'plain')
    msg.attach(part2)

    # 메일 보내기
    s.sendmail(me, you, msg.as_string())

# 다 끝나고 닫기
s.quit()

# 받는 사람 정보

emails = ['A','B']

for you in emails:
# 메일 기본 정보 설정
msg = MIMEMultipart('alternative')
msg['Subject'] = "이것이 제목이다"
msg['From'] = me
msg['To'] = you

# 메일 내용 쓰기
content = "추석에 뭐해?"
part2 = MIMEText(content, 'plain')
msg.attach(part2)

# 메일 보내고 서버 끄기
s.sendmail(me, you, msg.as_string())
s.quit()

*파일 첨부하기

- s.sendmail 바로 앞에 붙이기

part = MIMEBase('application', "octet-stream")
with open("articles.xlsx", 'rb') as file:
part.set_payload(file.read())
encoders.encode_base64(part)
part.add_header('Content-Disposition', "attachment", filename="추석기사.xlsx")
msg.attach(part)

[숙제]

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome('chromedriver')

from openpyxl import Workbook

wb = Workbook()
ws1 = wb.active
ws1.title = "articles"
ws1.append(["제목", "링크", "신문사", "썸네일"])

url = "https://search.naver.com/search.naver?&where=news&query=추석"

driver.get(url)
req = driver.page_source
soup = BeautifulSoup(req, 'html.parser')

articles = soup.select('#main_pack > section.sc_new.sp_nnews._prs_nws > div > div.group_news > ul > li')

for article in articles:
title = article.select_one('div.news_wrap.api_ani_send > div > a').text
url = article.select_one('div.news_wrap.api_ani_send > div > a')['href']
comp = article.select_one('a.info.press').text.split(' ')[0].replace('언론사','')
thumbnail = article.select_one('div.news_wrap.api_ani_send > a')['href']

ws1.append([title, url, comp, thumbnail])

wb.save(filename='homework.xlsx')
driver.quit()

'개발일지' 카테고리의 다른 글

스파르타 힙한 취미 코딩 - 파이썬 혼자놀기 패키지 3일차 개발일지 (0)	2021.09.24
스파르타 힙한 취미 코딩 - 파이썬 혼자놀기 패키지 1일차 개발일지 (0)	2021.09.22
21.09.19. 스파르타 내배단 5주차 개발일지 (0)	2021.09.19
21.09.18. 스파르타 내배단 4주차 개발일지 (0)	2021.09.18
21.09.15. 스파르타 내배단 3주차 개발일지 (0)	2021.09.16

ABOUT ME

HEily's 코딩 개발일지 HEily's 코딩 개발일지

[Week I Learned]

'개발일지' 카테고리의 다른 글

티스토리툴바

ABOUT ME

[Week I Learned]

'개발일지' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바