Python 範例 Python Examples

星期四, 9月 23, 2021

Python 範例 Python Examples

範例 pythonJson-01.py : 注意，JSON資料須加上"" 引號，因為JSON資料再Python中是以字串形式存在的。

# pythonJson-01.py
import json

jsonObj = '{"b":80, "a":25, "c":60}'    # json物件
dictObj = json.loads(jsonObj)           # 轉成Python物件
print(dictObj)
print(type(dictObj))

執行結果

{'b': 80, 'a': 25, 'c': 60}
<class 'dict'>

範例 pythonJson-02.py :

# pythonJson-02.py
import json

x = '{ "name":"John", "age":30, "city":"New York"}' # some JSON:
y = json.loads(x) # parse x:
print(y["age"]) # the result is a Python dictionary:

執行結果

30
<class 'str'>

範例 pythonJson-03.py : 使用多層次JSON物件，在父JSON物件"Asia"下建立兩個子JSON物件 "Japan"、"China"。

# pythonJson-03.py
import json

obj = '{"Asia":[{"Japan":"Tokyo"},{"China":"Beijing"}]}'
json_obj = json.loads(obj)
print(json_obj)
print(type(json_obj))
print(json_obj["Asia"])
print(json_obj["Asia"][0])
print(json_obj["Asia"][1])
print(json_obj["Asia"][0]["Japan"])
print(json_obj["Asia"][1]["China"])

執行結果

{'Asia': [{'Japan': 'Tokyo'}, {'China': 'Beijing'}]}
<class 'dict'>
[{'Japan': 'Tokyo'}, {'China': 'Beijing'}]
{'Japan': 'Tokyo'}
{'China': 'Beijing'}
Tokyo
Beijing

範例 pythonJson-04.py : 將 Python 串列與元組資料轉換成JSON陣列資料

# pythonJson-04.py
import json

listNumbers = [5, 10, 20, 1]            # 串列資料
tupleNumbers = (1, 5, 10, 9)            # 元組資料
jsonData1 = json.dumps(listNumbers)     # 將串列資料轉成json資料
jsonData2 = json.dumps(tupleNumbers)    # 將串列資料轉成json資料
print("串列轉換成json的陣列", jsonData1)
print("元組轉換成json的陣列", jsonData2)
print("json陣列在Python的資料類型 ", type(jsonData1))

執行結果

串列轉換成json的陣列 [5, 10, 20, 1]
元組轉換成json的陣列 [1, 5, 10, 9]
json陣列在Python的資料類型 <class 'str'>

範例 pythonJson-05.py : json.dumps() 方法，使用參數: sort_keys，ident

# pythonJson-05.py
import json

players = {'Stephen Curry':'Golden State Warriors',
           'Kevin Durant':'Golden State Warriors',
           'Lebron James':'Cleveland Cavaliers'}
jsonObj = json.dumps(players, sort_keys=True, indent=4)   
print(jsonObj)
print(type(jsonObj))

執行結果

{
"Kevin Durant": "Golden State Warriors",
"Lebron James": "Cleveland Cavaliers",
"Stephen Curry": "Golden State Warriors"
}
<class 'str'>

範例 pythonJson-06.py : 將一個基本型態的Python字典資料 dictObj ，使用 dump()，將其儲存為檔案 output-pythonJson-06.json。dump() 第一個參數 dictObj 是資料來源，第二個參數 fileObj 是檔案物件。在工作附錄中會產生一個檔案 output-pythonJson-06.json，可使用筆記本打開。

# pythonJson-06.py
import json

dictObj = {'b':80, 'a':25, 'c':60}  # 字典資料   
fileName = 'output-pythonJson-06.json'  # 輸出檔案
with open(fileName, 'w') as fileObj:    # 開啟檔案物件
    json.dump(dictObj, fileObj)  # 將字典資料儲存為檔案

執行結果

{"b": 80, "a": 25, "c": 60}

範例 pythonJson-07.py : 將一個2層次的JSON陣列的Python字典資料 dictObj ，使用 dump()，將其儲存為檔案 output-pythonJson-06.json。dump() 第一個參數 dictObj 是資料來源，第二個參數 fileObj 是檔案物件。在工作附錄中會產生一個檔案 output-pythonJson-07.json，可使用筆記本打開。

# pythonJson-07.py
import json

dictObj = {"Asia":
        [{"Japan":"Tokyo"},
         {"China":"Beijing"}],
        "Europe":
        [{"UK":"London"},
         {"France":"Paris"}]
      }
fileName = 'output-pythonJson-07.json'
with open(fileName, 'w') as fileObj:
    json.dump(dictObj, fileObj)

執行結果

{"Asia": [{"Japan": "Tokyo"}, {"China": "Beijing"}], "Europe": [{"UK": "London"}, {"France": "Paris"}]}

範例 pythonJson-08.py : 儲存資料為一個包含中文字的串列物件 objList ，在工作附錄中會產生一個檔案 output-pythonJson-08.json，可使用筆記本打開。若打開時會看到中文字 "日本" 顯示為16進位碼 "\u65e5\u672c"....等。

# pythonJson-08.py
import json

objList = [{"日本":"Japan", "首都":"Tykyo"},
           {"美州":"USA", "首都":"Washington"}]
fileName = 'output-pythonJson-08.json'
with open(fileName, 'w') as fileObj:
    json.dump(objList, fileObj)

執行結果

[{"\u65e5\u672c": "Japan", "\u9996\u90fd": "Tykyo"}, {"\u7f8e\u5dde": "USA", "\u9996\u90fd": "Washington"}]

範例 pythonJson-09.py : 儲存資料為一個包含中文字的串列物件 objList。改善範例 pythonJson-08.py ，使用編碼方式 encoding='utf-8' 創建一個檔案物件，存檔時搭配 ensure_ascii=False

# pythonJson-09.py
import json

objList = [{"日本":"Japan", "首都":"Tykyo"},
           {"美州":"USA", "首都":"Washington"}]

fileName = 'output-pythonJson-09.json'
with open(fileName, 'w', encoding='utf-8') as fileObj:
    json.dump(objList, fileObj, indent=2, ensure_ascii=False)

執行結果

[ { "日本": "Japan", "首都": "Tykyo" }, { "美州": "USA", "首都": "Washington" } ]

範例 pythonJson-10.py : 讀取檔案 'output-pythonJson-07.json' 並列出結果。

# pythonJson-10.py
import json
      
fileName = 'output-pythonJson-07.json'
with open(fileName, 'r') as fileObj:
    data = json.load(fileObj)

print(data)
print(type(data))

執行結果

{'Asia': [{'Japan': 'Tokyo'}, {'China': 'Beijing'}], 'Europe': [{'UK': 'London'}, {'France': 'Paris'}]}
<class 'dict'>

範例 pythonJson-11.py : 執行程式時，會要求輸入帳號，同時會將帳號存檔為 "login.json"，然後會印出 " __帳號__! 歡迎回來使用本系統! "。

# pythonJson-11.py
import json

fileName = 'login.json'
login = input("請輸入帳號 : ")
with open(fileName, 'w') as fileObj:
    json.dump(login, fileObj)
    print("%s! 歡迎使用本系統! " % login)

執行結果

請輸入帳號 : aron
aron! 歡迎使用本系統!

範例 pythonJson-12.py : 延續範例 pythonJson-11.py ，讀取檔案 'login.json' 並印出 "__帳號__ ! 歡迎回來使用本系統! "

# pythonJson-12.py
import json

fileName = 'login.json'
with open(fileName, 'r') as fileObj:
    login = json.load(fileObj)
    print("%s! 歡迎回來使用本系統! " % login)

執行結果

請輸入帳號 : aron
aron! 歡迎回來使用本系統!

範例 pythonJson-13.py : 登入系統，若登入檔案 'login1_13.json' 不存在，要求 "請輸入帳號 : "，若登入檔案 'login1_13.json' 已存在，則顯示歡迎回來。

# pythonJson-13.py
import json

fileName = 'login.json'
try:
    with open(fileName) as fileObj:
        login = json.load(fileObj)   
except Exception:
    login = input("請輸入帳號 : ") 
    with open(fileName, 'w') as fileObj:
        json.dump(login, fileObj)
        print("系統已經記錄你的帳號!")
else:
    print("%s 歡迎回來!" % login)

執行結果

aron 歡迎回來!

範例 pythonPopulation-1.py : 篩選JSON檔案之2020年的人口數據，其中程式碼第12行 : 將字串轉浮點float，單位: 千人 * 1000 = 個人，再將浮點轉為為整數int。

# pythonPopulation-1.py
import json

fileName = 'population2020-WorldPopulationReview.json'
with open(fileName) as fileObj:
    getDatas = json.load(fileObj)   # 讀json檔案

for getData in getDatas:
    if getData["pop2020"] != None :  # 篩選2020年的人口數據
        countryName = getData["name"]   # 國家名稱
        countryCode = getData["cca2"]   # 國家代碼
        population = int(float(getData["pop2021"])* 1000) # 將字串轉為整數，單位: 人
        print('國家代碼 =', countryCode,
              ',國家名稱 =', countryName,
              ',人口數 =', population , "人")
    else:
        print(getData["name"], " 2020年沒有人口資料")

執行結果

國家代碼 = CN ,國家名稱 = China ,人口數 = 1444216107 人
國家代碼 = IN ,國家名稱 = India ,人口數 = 1393409038 人
國家代碼 = US ,國家名稱 = United States ,人口數 = 332915073 人
...........

範例 pythonPopulation-2.py : 列出所有二位數的國家代碼與相對的國家名稱。

# pythonPopulation-2.py
from pygal.maps.world import COUNTRIES

for countryCode in sorted(COUNTRIES.keys()):
    print("國家代碼 :", countryCode, "  國家名稱 = ", COUNTRIES[countryCode])

國家代碼 : ad 國家名稱 = Andorra
國家代碼 : ae 國家名稱 = United Arab Emirates
國家代碼 : af 國家名稱 = Afghanistan
...........

範例 pythonPopulation-3.py : 比較國家名稱，如果有不同處，會輸出 "名稱不吻合"。國碼資訊模組 pygal.maps.world與我們的人口檔案的國家名稱可能不同，以此程式來做檢視比較。

# pythonPopulation-3.py
import json
from pygal.maps.world import COUNTRIES

def getCountryCode(countryName):
    '''輸入國家名稱回傳國家代碼'''
    for dictCode, dictName in COUNTRIES.items():    # 搜尋國家與國家代碼字典
        if dictName == countryName:
            return dictCode                         # 如果找到則回傳國家代碼
    return None                                     # 找不到則回傳None

fileName = 'population2020-WorldPopulationReview.json'
with open(fileName) as fileNameObj:
    getDatas = json.load(fileNameObj)                     # 讀取人口數據json檔案

for getData in getDatas:
    if getData["pop2020"] != None :  # 篩選2020年的數據
        countryName = getData['name']       # 國家名稱
        countryCode = getCountryCode(countryName)
        population = int(float(getData['pop2020'])) # 人口數       
        if countryCode != None:
            print(countryCode, ":", population)     # 國家名稱相符
        else:
            print(countryName," 名稱不吻合:")       # 國家名稱不吻合

執行結果

ng : 206139
bd : 164689
Russia 名稱不吻合:
mx : 128932
...........

範例 pythonPopulation-4.py : 繪製世界地圖，標記台灣。

# pythonPopulation-4.py
import pygal.maps.world

worldMapObj = pygal.maps.world.World()         # 建立世界地圖物件
worldMapObj.title = '台灣 世界地圖'         # 世界地圖標題
worldMapObj.add('Taiwan',['tw'])                # 標記台灣
worldMapObj.render_to_file('output-pythonPopulation-4.svg')      # 儲存地圖檔案

執行結果

範例 pythonPopulation-5.py : 繪製世界地圖，使用不同顏色區分五大洲區域，並標示區域內的幾個國家。

# pythonPopulation-5.py
import pygal.maps.world

worldMapObj = pygal.maps.world.World()                         # 建立世界地圖物件
worldMapObj.title = ' Asia, Europe, Africa, and North America' # 世界地圖標題
worldMapObj.add('Asia亞洲',['tw', 'cn', 'jp', 'th'])            # 標記Asia亞洲
worldMapObj.add('Europe歐洲',['fr', 'de', 'it'])                # 標記Europe歐洲
worldMapObj.add('Africa非洲',['eg', 'ug', 'ng'])                # 標記Africa非洲
worldMapObj.add('North America北美洲',['ca', 'us', 'mx'])       # 標記America北美洲
worldMapObj.add('Mid-South America中南美洲',['cr', 'co', 'br', 'ar']) # 標記Mid-South America中南美洲
worldMapObj.add('Australia澳洲',['au', 'nz'])             # 標記Australia澳洲
worldMapObj.render_to_file('output-pythonPopulation-5.svg')    # 儲存地圖檔案

執行結果

範例 pythonPopulation-6.py : 讓地圖呈現數據，國家代碼:人口數。

# pythonPopulation-6.py
import pygal.maps.world

worldMapObj = pygal.maps.world.World()                # 建立世界地圖物件
worldMapObj.title = 'Populations in China/Japan/Thailand'   # 世界地圖標題
worldMapObj.add('Asia',{'tw':23816775,
                        'cn':1262645000,
                        'jp':126870000,
                        'th':63155029})       # 標記人口資訊
worldMapObj.render_to_file('output-pythonPopulation-6.svg') # 儲存地圖檔案

範例 pythonPopulation-7.py : 繪製世界地圖，將 countryCode國家代碼:population人口的資料存入字典中，然後使用 add() 方法帶入世界地圖中。

# pythonPopulation-7.py
import json
import pygal.maps.world
from pygal.maps.world import COUNTRIES

def getCountryCode(countryName):
    '''輸入國家名稱回傳國家代碼'''
    for dictCode, dictName in COUNTRIES.items():    # 搜尋國家與國家代碼字典
        if dictName == countryName:
            return dictCode                         # 如果找到則回傳國家代碼
    return None                                     # 找不到則回傳None

fileName = 'population2020-WorldPopulationReview.json'
with open(fileName) as fileNameObj:
    getDatas = json.load(fileNameObj)            # 讀取人口數據json檔案

dictData = {}                                       # 定義地圖使用的字典
for getData in getDatas:
    if getData['pop2020'] != None:                   # 篩選2020年的數據
        countryName = getData['name']       # 國家名稱
        countryCode = getCountryCode(countryName)
        population = int(float(getData["pop2020"])* 1000) # 將字串轉為整數，單位: 人
        if countryCode != None:
            dictData[countryCode] = population      # 代碼:人口數據加入字典

worldMap = pygal.maps.world.World()
worldMap.title = "World Population in 2020"
worldMap.add('Year 2020', dictData)
worldMap.render_to_file('output-pythonPopulation-7.svg')   # 儲存地圖檔案

範例 pythonPopulation-8.py : 繪製世界地圖，依據1億人口數做分類，將 countryCode國家代碼:population人口的資料存入字典中，然後使用 add() 方法帶入世界地圖中。

# pythonPopulation-8.py
import json
import pygal.maps.world
from pygal.maps.world import COUNTRIES

def getCountryCode(countryName):
    '''輸入國家名稱回傳國家代碼'''
    for dictCode, dictName in COUNTRIES.items():    # 搜尋國家與國家代碼字典
        if dictName == countryName:
            return dictCode                         # 如果找到則回傳國家代碼
    return None                                     # 找不到則回傳None

fileName = 'population2020-WorldPopulationReview.json'
with open(fileName) as fileNameObj:
    getDatas = json.load(fileNameObj)            # 讀取人口數據json檔案

dictData = {}                                       # 定義地圖使用的字典
for getData in getDatas:
    if getData['pop2020'] != None:                   # 篩選2020年的數據
        countryName = getData['name']       # 國家名稱
        countryCode = getCountryCode(countryName)
        population = int(float(getData["pop2020"])* 1000) # 將字串轉為整數，單位: 人
        if countryCode != None:
            dictData[countryCode] = population      # 代碼:人口數據加入字典

dict1, dict2 = {}, {}                               # 定義人口數分級的字典
for code, population in dictData.items():
    if population > 100000000:
        dict1[code] = population                    # 人口數大於1000000000
    else:
        dict2[code] = population                    # 人口數小於1000000000

worldMapObj = pygal.maps.world.World()
worldMapObj.title = "2020 世界人口地圖"
worldMapObj.add('Over 1,000,000,000', dict1)
worldMapObj.add('Under 1,000,000,000', dict2)
worldMapObj.render_to_file('output-pythonPopulation-8.svg')              # 儲存地圖檔案

執行結果

範例 pythonExcel-01.py : 將Python 串列資料，儲存為 Excel 檔案 'output-pythonExcel-01.xls'

# pythonExcel-01.py
import xlwt

fileName = 'output-pythonExcel-01.xls'
datahead = ['Phone', 'TV', 'Notebook']
price = ['35000', '18000', '28000']
wworkbook = xlwt.Workbook()
worksheet = wworkbook.add_sheet('sheet1', cell_overwrite_ok=True)
for i in range(len(datahead)):
    worksheet.write(0, i, datahead[i])     # 寫入datahead list
for j in range(len(price)):
    worksheet.write(1, j, price[j])        # 寫入price list

wworkbook.save(fileName)

執行結果

範例 pythonExcel-02.py : 讀取 'output-pythonExcel-01.xls' 檔案，並印出。

# pythonExcel-02.py
import xlrd

fileName = 'output-pythonExcel-02.xls'
wb = xlrd.open_workbook(fileName,encoding_override='utf-8')
sh = wb.sheets()[0]
rows = sh.nrows
for row in range(rows):
    print(sh.row_values(row))

執行結果

['Phone', 'TV', 'Notebook']
['35000', '18000', '28000']

範例 pythonExcel-03.py : openpyxl : 單一儲存格操作

# pythonExcel-03.py
from openpyxl import load_workbook

# 讀取 Excel 檔案
wb = load_workbook('test.xlsx')
sheet = wb['工作表1']

# 根據位置取得儲存格
c = sheet['A4']

# 得取儲存格資料
print(c.value)

執行結果

2020-10-04

範例 pythonExcel-04.py : openpyxl 多儲存格操作

# pythonExcel-04.py
from openpyxl import load_workbook

# 透過名稱取得工作表
mywb = load_workbook('test.xlsx')
sheet = mywb['工作表1']

# 取得指定範圍內儲存格物件
cellRange = sheet['B2':'C3']

# 以 for 迴圈逐一處理每個儲存格
for row in cellRange:
    for c in row:
        print(c.value)

執行結果

4
34
5
53

範例 pythonPickle-01.py:

# pythonPickle-01.py
import pickle
game_info = {
    "position_X":"100",
    "position_Y":"200",
    "money":300,
    "pocket":["黃金", "鑰匙", "小刀"]
}

fileName = "pythonPickle-01.dat"
file_obj = open(fileName, 'wb')         # 二進位開啟
pickle.dump(game_info, file_obj)
file_obj.close()

執行結果

在工作目錄中會生成一個二進位檔案 pythonPickle-01.dat

範例 pythonPickle-02.py : load()功能，載入範例 pythonPickle-01.py 生成的檔案 "pythonPickle-01.dat"

# pythonPickle-02.py
import pickle
 
fileName = "pythonPickle-01.dat"
file_obj = open(fileName, 'rb')         # 二進位開啟
game_info = pickle.load(file_obj)
file_obj.close()
print(game_info)

執行結果

{'position_X': '100', 'position_Y': '200', 'money': 300, 'pocket': ['黃金', '鑰匙', '小刀']}

範例 pythonPickle-03.py : dumps()功能

# pythonPickle-03.py
#dumps功能
import pickle
data = ['1','3','4']
#將data中python的特殊資料形式存為只有python語言認識的字串
a = pickle.dumps(data)
print(a)

執行結果

b'\x80\x04\x95\x11\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x011\x94\x8c\x013\x94\x8c\x014\x94e.'

範例 pythonPickle-04.py : loads()功能

# pythonPickle-04.py
#dumps功能
import pickle
data = ['1','3','4']
#將data中python的特殊資料形式存為只有python語言認識的字串
a = pickle.dumps(data)
print(a)
# loads功能
# 將pickle資料轉換為python的資料結構
b = pickle.loads(a)
print(b)

執行結果

b'\x80\x04\x95\x11\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x011\x94\x8c\x013\x94\x8c\x014\x94e.'
['1', '3', '4']

範例 pythonHttpbin-01.py : 列出 get() 方法的網址

# pythonHttpbin-01.py
import requests

url = 'https://www.httpbin.org/get'
response = requests.get(url)
print(type(response))
print(response.url)

執行結果

<class 'requests.models.Response'>
https://www.httpbin.org/get

範例 pythonHttpbin-02.py : 有一些網站登入時需要些參數，當我們使用網路爬蟲登入時需要設定這些參數才可以進入此網頁，可使用 get() 內設定這些參數 params ，編程時可以使用字典設計此參數

# pythonHttpbin-02.py
import requests

url = 'https://www.httpbin.org/get'
form_data = {'gender':'M','page':'1'}
response = requests.get(url, params=form_data)
print(response.url)

執行結果

https://www.httpbin.org/get?gender=M&page=1

範例 pythonHttpbin-03.py : 使用網路爬蟲登入時可能在登入時時同時會發送表單數據，表單格式可能是JSON或字典，post() 方法內有參數 data 可以設定此參數。

# pythonHttpbin-03.py
import requests

url = 'https://www.httpbin.org/post'
form_data = {'gender':'M','page':'1'}
response = requests.post(url, data=form_data)
print(response.url)
print('-'*70)
print(response.text)

執行結果

https://www.httpbin.org/post
----------------------------------------------------------------------
{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "gender": "M",
    "page": "1"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "15",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "www.httpbin.org",
    "User-Agent": "python-requests/2.26.0",
    "X-Amzn-Trace-Id": "Root=1-615a9b44-56a767ae14d45cf348bd06aa"
  },
  "json": null,
  "origin": "180.177.109.201",
  "url": "https://www.httpbin.org/post"
}

範例 pythonHttpbin-04.py : 延續範例 pythonHttpbin-03.py ，將發送數據改為 JSON 格式。

# pythonHttpbin-04.py
import requests, json

url = 'https://www.httpbin.org/post'
form_data = {'gender':'M','page':'1'}
response = requests.post(url, data=json.dumps(form_data))
print(response.url)
print('-'*70)
print(response.text)

執行結果

https://www.httpbin.org/post
----------------------------------------------------------------------
{
  "args": {},
  "data": "{\"gender\": \"M\", \"page\": \"1\"}",
  "files": {},
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "28",
    "Host": "www.httpbin.org",
    "User-Agent": "python-requests/2.26.0",
    "X-Amzn-Trace-Id": "Root=1-615a9da4-35fe181b3875ce7978be390b"
  },
  "json": {
    "gender": "M",
    "page": "1"
  },
  "origin": "180.177.109.201",
  "url": "https://www.httpbin.org/post"
}

範例 pythonHttpbin-05.py : 延續範例 pythonHttpbin-04.py ，直接在 post() 方法內使用 json 參數 post(url, json=form_data)

# pythonHttpbin-05.py
import requests, json

url = 'https://www.httpbin.org/post'
form_data = {'gender':'M','page':'1'}
response = requests.post(url, json=form_data)
print(response.url)
print('-'*70)
print(response.text)

執行結果

https://www.httpbin.org/post
----------------------------------------------------------------------
{
  "args": {},
  "data": "{\"gender\": \"M\", \"page\": \"1\"}",
  "files": {},
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "28",
    "Host": "www.httpbin.org",
    "User-Agent": "python-requests/2.26.0",
    "X-Amzn-Trace-Id": "Root=1-615a9da4-35fe181b3875ce7978be390b"
  },
  "json": {
    "gender": "M",
    "page": "1"
  },
  "origin": "180.177.109.201",
  "url": "https://www.httpbin.org/post"
}

範例 pythonHttpbin-06.py : 使用 requests.headers 和 headers 屬性列出我們傳遞的表頭和伺服器回傳的表頭內容。

# pythonHttpbin-06.py
import requests, json

headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64)\
            AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101\
            Safari/537.36', }
url = 'https://www.httpbin.org/post'
form_data = {'gender':'M','page':'1'}
r = requests.post(url, json=form_data, headers=headers)
print(r.url)
print('-'*70)
print('r.request.headers :\n', r.request.headers)
print('-'*70)
print('r.headers :\n', r.headers)

執行結果

https://www.httpbin.org/post
----------------------------------------------------------------------
r.request.headers :
 {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64)            AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101            Safari/537.36', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '28', 'Content-Type': 'application/json'}
----------------------------------------------------------------------
r.headers :
 {'Date': 'Mon, 04 Oct 2021 09:03:40 GMT', 'Content-Type': 'application/json', 'Content-Length': '633', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}

範例 pythonHttpbin-07.py : 列出回應的 status_code 狀態碼和 reason 理由。

# pythonHttpbin-07.py
import requests

url = 'https://www.httpbin.org/get'
r = requests.get(url)
print(r.status_code)
print(r.reason)

執行結果

200
OK

範例 pythonHttpbin-08.py : 列出部分HTML內文

# pythonHttpbin-07.py
import requests

url = 'https://www.httpbin.org/get'
r = requests.get(url)
print(r.status_code)
print(r.reason)

執行結果

utf-8
----------------------------------------------------------------------
<!DOCTYPE html>
<html>
  <head>
  </head>
  <body>
      <h1>Herman Melville - Moby-Dick</h1>

範例 pythonHttpbin-09.py : 回應數據是 JSON 格式，將內容轉換成 Python 物件

# pythonHttpbin-09.py
import requests

url = 'https://www.httpbin.org/response-headers?freeform='
r = requests.get(url)
if r.status_code == 200:
    print(r.headers.get('content-type'))
    print('-'*70)
    print(r.json())

執行結果

application/json
----------------------------------------------------------------------
{'Content-Length': '87', 'Content-Type': 'application/json', 'freeform': ''}

範例 pythonHttpbin-10.py : 回應或下載數據是影音、圖片，則程式所獲得的是二進位格式的內容，可以使用 content 屬性舉得此內容。

# pythonHttpbin-10.py
import requests

url = 'https://www.httpbin.org/image/jpeg'
response = requests.get(url)
img = response.content

fileName = 'output-pythonHttpbin-10.jpg'
with open(fileName, 'wb') as outFile:
    outFile.write(img)

範例 pythonUrllib-01.py : 利用 urllib.request.urlopen() 讀取網頁，印出所回傳的的資料形態與內容

# pythonUrllib-01.py
import urllib.request

url = 'https://tw.finance.yahoo.com/'
htmlfile = urllib.request.urlopen(url)
print(type(htmlfile))
print(htmlfile)

執行結果

範例 pythonUrllib-02.py : 利用 urlopen() 讀取網頁，回傳的的資料形態為 http.client.HTTPResponse，可使用 read()讀取，但是中文內容以二進位顯示

# pythonUrllib-02py
import urllib.request

url = 'https://tw.finance.yahoo.com/'
htmlfile = urllib.request.urlopen(url)
print(htmlfile.read())

執行結果

b'<!DOCTYPE html><html id="atomic" class="NoJs desktop" lang="zh-Hant-TW"><head prefix="og: http://ogp.me/ns#"><script>window.performance && window.performance.mark && window.performance.mark(\'PageStart\');</script><meta charSet="utf-8"/><meta property="og:type" content="website"/><meta property="og:description" content="Yahoo\xe5\xa5\x87\xe6\x91\xa9\xe8\x82\xa1\xe5\xb8\x82\xe6\x8f\x90\xe4\xbe\x9b\xe5\x9c\x8b\xe5\x85\xa7\xe5\xa4\x96\xe8\xb2\xa1\xe7\xb6\x93\xe6\x96\......

範例 pythonUrllib-03.py : 延續範例 pythonUrllib-02.py，但是中文內容以二進位顯示，使用 decode('utf-8') 方法處理。

# pythonUrllib-03.py
import urllib.request

url = 'https://tw.finance.yahoo.com/'
htmlfile = urllib.request.urlopen(url)
print(htmlfile.read().decode('utf-8'))

執行結果

<!DOCTYPE html><html id="atomic" class="NoJs desktop" lang="zh-Hant-TW"><head prefix="og: http://ogp.me/ns#"><script>window.performance && window.performance.mark && window.performance.mark('PageStart');</script><meta charSet="utf-8"/><meta property="og:type" content="website"/><meta property="og:description" content="Yahoo奇摩股市提供國內外財經新聞，台股、期貨、選擇權、國際指數
、外匯、港滬深股、美股等即時報價資訊，以及自選股、......

範例 pythonUrllib-04.py : http.client.HTTPResponse 物件常用的屬性。

# pythonUrllib-04.py
import urllib.request

url = 'https://tw.finance.yahoo.com/'
htmlfile = urllib.request.urlopen(url)
print('版本 : ', htmlfile.version)
print('網址 : ', htmlfile.geturl())
print('下載 : ', htmlfile.status)
print('表頭 : ')
for header in htmlfile.getheaders():
    print(header)

執行結果

版本 :  11
網址 :  https://tw.finance.yahoo.com/
下載 :  200
表頭 :
('expect-ct', 'max-age=31536000, report-uri="http://csp.yahoo.com/beacon/csp?src=yahoocom-expect-ct-report-only"')
('referrer-policy', 'no-referrer-when-downgrade')
('strict-transport-security', 'max-age=31536000')......

使用urllib.request.urlretrieve()下載圖片

範例 pythonUrllib-05.py : 使用urllib.request.urlretrieve()下載圖片。

# pythonUrllib-05.py
import urllib.request

url_pict = 'http://www.python.org/images/success/nasa.jpg'
fn = 'output-pythonUrllib-05.png'
pict = urllib.request.urlretrieve(url_pict,fn)

執行結果

產生一個圖檔 output-pythonUrllib-05.png

範例 pythonUrllib-06.py : urllib.parse模組，中文的不同編碼方式 : URL編碼和 UTF-8 。

# pythonUrllib-06.py
from urllib import parse

s = '台灣積體電路製造'
url_code = parse.quote(s)
print('URL編碼  : ', url_code)
code = parse.unquote(url_code)
print('中文編碼 : ', code)

執行結果

URL編碼 : %E5%8F%B0%E7%81%A3%E7%A9%8D%E9%AB%94%E9%9B%BB%E8%B7%AF%E8%A3%BD%E9%80%A0
中文編碼 : 台灣積體電路製造

範例 pythonUrllib-07.py : urllib.parse模組的6大組件 : scheme : URL協議, netloc : 網絡位置, path : 分層路徑, params : 最後路徑元素的參數, query : 查詢組件, fragment : 片段識別。

# pythonUrllib-07.py
from urllib import parse

url = 'https://docs.python.org/3/search.html?q=parse&check_keywords=yes&area=default'
parse.urlparse = parse.urlparse(url)
print(type(parse.urlparse))
print(parse.urlparse)
print('scheme   = ', parse.urlparse.scheme)
print('netloc   = ', parse.urlparse.netloc)
print('path     = ', parse.urlparse.path)
print('params   = ', parse.urlparse.params)
print('query    = ', parse.urlparse.query)
print('fragment = ', parse.urlparse.fragment)

執行結果

<class 'urllib.parse.ParseResult'>
ParseResult(scheme='https', netloc='docs.python.org', path='/3/search.html', params='', query='q=parse&check_keywords=yes&area=default', fragment='')
scheme   =  https
netloc   =  docs.python.org
path     =  /3/search.html
params   =
query    =  q=parse&check_keywords=yes&area=default
fragment =.....

範例 pythonUrllib-08.py : 另外相似的模組 parse.urlsplit(url) 和 parse.urlparse(url) 最大不同是回傳部分沒有 params 元素。

# pythonUrllib-08.py
from urllib import parse

url = 'https://docs.python.org/3/search.html?q=parse&check_keywords=yes&area=default'
urp = parse.urlsplit(url)
print(type(urp))
print(urp)
print('scheme   = ', urp.scheme)
print('netloc   = ', urp.netloc)
print('path     = ', urp.path)
print('query    = ', urp.query)
print('fragment = ', urp.fragment)

執行結果

<class 'urllib.parse.SplitResult'>
SplitResult(scheme='https', netloc='docs.python.org', path='/3/search.html', query='q=parse&check_keywords=yes&area=default', fragment='')
scheme   =  https
netloc   =  docs.python.org
path     =  /3/search.html
query    =  q=parse&check_keywords=yes&area=default
fragment =...

範例 pythonUrllib-09.py : parse.urlsplit(url) 合成URL方法為 parse.urlunsplit(url)，parse.urlparse(url) 合成URL方法為 parse.urlunparse(url)。

# pythonUrllib-09.py
from urllib import parse

scheme = 'https'
netloc  = 'docs.python.org'
path = '/3/search.html'
params = ''
query = 'q=parse&check_keywords=yes&area=default'
frament = ''
url_unparse = parse.urlunparse((scheme,netloc,path,params,query,frament))
print(url_unparse)
url_unsplit = parse.urlunsplit([scheme,netloc,path,query,frament])
print(url_unsplit)

執行結果

https://docs.python.org/3/search.html?q=parse&check_keywords=yes&area=default
https://docs.python.org/3/search.html?q=parse&check_keywords=yes&area=default

範例 pythonUrllib-10.py : 使用 parse.urlencode() 方法，將字典格式的資料轉化為網頁網址。

# pythonUrllib-10.py
from urllib import parse

url_python = 'https://docs.python.org/3/search.html?'
query = {
         'q':'parse',
         'check_keywords':'yes',
         'area':'default'}
url = url_python + parse.urlencode(query)
print(url)

執行結果

https://docs.python.org/3/search.html?q=parse&check_keywords=yes&area=default

範例 pythonUrllib-11.py : 使用 parse.parse_qs() 方法，將網頁網址轉化為字典格式的資料。

# pythonUrllib-11.py
from urllib import parse

query_str = 'q=parse&check_keywords=yes&area=default'
print('parse.parse_qs  = ', parse.parse_qs(query_str))
print('parse.parse_qsl = ', parse.parse_qsl(query_str))

執行結果

parse.parse_qs = {'q': ['parse'], 'check_keywords': ['yes'], 'area': ['default']}
parse.parse_qsl = [('q', 'parse'), ('check_keywords', 'yes'), ('area', 'default')]

範例 pythonUrllib-12.py : 使用錯誤網址與正確網址來觀察 URLError 類別的回應。

# pythonUrllib-12.py
from urllib import request, error

headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64)\
            AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101\
            Safari/537.36', }
# 錯誤網址
url_error = 'http://aaa.24t.com.tw/'            # 錯誤網址
try:
    htmlfile = request.urlopen(url_error)
except error.URLError as e:
    print('錯誤原因 : ', e.reason)
else:
    print("擷取網路資料成功")
# 正確網址
url = 'http://aaa.24ht.com.tw/'                 # 網址正確
try:
    req = request.Request(url, headers=headers)
    htmlfile = request.urlopen(req)
except error.URLError as e:
    print('錯誤原因 : ', e.reason)
else:
    print("擷取網路資料成功")

執行結果

錯誤原因 : [Errno 11001] getaddrinfo failed
擷取網路資料成功

範例 pythonUrllib-13.py : 延續範例 pythonUrllib-12.py 觀察 URLError 和 HTTPError 類別的回應。

# pythonUrllib-13.py
from urllib import request, error

headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64)\
            AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101\
            Safari/537.36', }
# 錯誤1
url_error = 'http://aaa.24t.com.tw/'            # 錯誤網址
try:
    htmlfile = request.urlopen(url_error)
except error.HTTPError as e:
    print('錯誤代碼 : ', e.code)
    print('錯誤原因 : ', e.reason)
    print('回應表頭 : ', e.headers)
except error.URLError as e:
    print('錯誤原因 : ', e.reason)
else:
    print("擷取網路資料成功")
print('-'*70)
# 錯誤2
url = 'http://aaa.24ht.com.tw/'                 # 網址正確
try:
    htmlfile = request.urlopen(url)
except error.HTTPError as e:
    print('錯誤代碼 : ', e.code)
    print('錯誤原因 : ', e.reason)
    print('回應表頭 : ', e.headers)    
except error.URLError as e:
    print('錯誤原因 : ', e.reason)    
else:
    print("擷取網路資料成功")
print('-'*70)
# 正確
url = 'http://aaa.24ht.com.tw/'                 # 網址正確
try:
    req = request.Request(url, headers=headers)
    htmlfile = request.urlopen(req)
except error.HTTPError as e:
    print('錯誤代碼 : ', e.code)
    print('錯誤原因 : ', e.reason)
    print('回應表頭 : ', e.headers)    
except error.URLError as e:
    print('錯誤原因 : ', e.reason)
else:
    print("擷取網路資料成功")

執行結果

錯誤原因 :  [Errno 11001] getaddrinfo failed
----------------------------------------------------------------------
錯誤代碼 :  406
錯誤原因 :  Not Acceptable
回應表頭 :  Date: Mon, 04 Oct 2021 04:47:50 GMT
Server: Apache
Accept-Ranges: bytes
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
----------------------------------------------------------------------
擷取網路資料成功

範例 pythonUrllib-14.py : urllib.robotparser 使用範例。

# pythonUrllib-14.py
import urllib.robotparser
rp = urllib.robotparser.RobotFileParser()
rp.set_url("http://www.musi-cal.com/robots.txt")
rp.read()
print(rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco"))
print(rp.can_fetch("*", "http://www.musi-cal.com/"))

執行結果

True
True

範例 pythonPandas-01.py : 建立 Beijing Hongkong Singapre 2020-2022年3月平均溫度，成為3個 Series 物件，這裡設定 concat() 方法不設定 axis，結果不是我們預期的。

# pythonPandas-01.py
import pandas as pd
years = range(2020, 2023)
beijing = pd.Series([20, 21, 19], index = years)
hongkong = pd.Series([25, 26, 27], index = years)
singapore = pd.Series([30, 29, 31], index = years)
citydf = pd.concat([beijing, hongkong, singapore])  # 預設axis=0
print(type(citydf))
print(citydf)

執行結果

<class 'pandas.core.series.Series'>
2020    20
2021    21
2022    19
2020    25
2021    26
2022    27
2020    30
2021    29
2022    31
dtype: int64

範例 pythonPandas-02.py : 重新設計 pythonPandas-01.py 建立 DataFrame 物件。

# pythonPandas-02.py
import pandas as pd
years = range(2020, 2023)
taipei = pd.Series([20, 21, 19], index = years)
hongkong = pd.Series([25, 26, 27], index = years)
singapore = pd.Series([30, 29, 31], index = years)
citydf = pd.concat([taipei, hongkong, singapore],axis=1)  # axis=1
print(type(citydf))
print(citydf)

執行結果

<class 'pandas.core.frame.DataFrame'>
       0   1   2
2020  20  25  30
2021  21  26  29
2022  19  27  31

範例 pythonPandas-03.py : 擴充 pythonPandas-02.py 使用 columns 屬性設定欄位名稱。

# pythonPandas-03.py
import pandas as pd
years = range(2020, 2023)
taipei = pd.Series([20, 21, 19], index = years)
hongkong = pd.Series([25, 26, 27], index = years)
singapore = pd.Series([30, 29, 31], index = years)
citydf = pd.concat([taipei, hongkong, singapore],axis=1)  # axis=1
cities = ["Taipei", "HongKong", "Singapore"]
citydf.columns = cities
print(citydf)

執行結果

      Taipei  HongKong  Singapore
2020      20        25         30
2021      21        26         29
2022      19        27         31

範例 pythonPandas-04.py : 更改 pythonPandas-03.py 設計方式，使用 name 屬性設定 DataFrame 的 columns欄位名稱。

# pythonPandas-04.py
import pandas as pd
years = range(2020, 2023)
taipei = pd.Series([20, 21, 19], index = years)
hongkong = pd.Series([25, 26, 27], index = years)
singapore = pd.Series([30, 29, 31], index = years)
taipei.name = "Taipei"
hongkong.name = "HongKong"
singapore.name = "Singapore"
citydf = pd.concat([taipei, hongkong, singapore],axis=1)  
print(citydf)

執行結果

      Taipei  HongKong  Singapore
2020      20        25         30
2021      21        26         29
2022      19        27         31

範例 pythonPandas-05.py : 使用元素是字典的串列建立 DataFrame物件

# pythonPandas-05.py
import pandas as pd
data = [{'apple':50,'Orange':30,'Grape':80},{'apple':50,'Grape':80}]
fruits = pd.DataFrame(data)
print(fruits)

執行結果

   apple  Orange  Grape
0    50   30.0    80
1    50    NaN    80

範例 pythonPandas-06.py : 使用字典鍵(key)的值(value) 是串列建立 DataFrame 物件

# pythonPandas-06.py
import pandas as pd
cities = {'country':['Taiwan', 'Japan', 'Singapore'],
          'town':['Taipei','Tokyo','Singapore'],
          'population':[400, 1600, 600]}
citydf = pd.DataFrame(cities)
print(citydf)

執行結果

     country       town  population
0     Taiwan     Taipei        2000
1      Japan      Tokyo        1600
2  Singapore  Singapore         600

範例 pythonPandas-07.py : 重新設計範例 pythonPandas-06.py 將 row 標籤改為 first, second, third

# pythonPandas-07.py
import pandas as pd
cities = {'country':['Taiwan', 'Japan', 'Singapore'],
          'town':['Taipei','Tokyo','Singapore'],
          'population':[400, 1600, 600]}
rowindex = ['first', 'second', 'third']
citydf = pd.DataFrame(cities, index=rowindex)
print(citydf)

執行結果

          country       town  population
first      Taiwan     Taipei         400
second      Japan      Tokyo        1600
third   Singapore  Singapore         600

範例 pythonPandas-08.py : 重新設計範例 pythonPandas-07.py ，這個程式會將 country 當作 index

# pythonPandas-08.py
import pandas as pd
cities = {'country':['Taiwan', 'Japan', 'Singapore'],
          'town':['Taipei','Tokyo','Singapore'],
          'population':[400, 1600, 600]}
citydf = pd.DataFrame(cities, columns=["town","population"],
                      index=cities["country"])
print(citydf)

執行結果

                town  population
Taiwan        Taipei         400
Japan          Tokyo        1600
Singapore  Singapore         600

範例 pythonPandas-09.py : 在說明上述屬性用法前，先建立一個 DataFrame 物件，然後用此物件做解說。

# pythonPandas-09.py
import pandas as pd
cities = {'Country':['China','China','Thailand','Japan','Singapore'],
          'Town':['Beijing','Shanghai','Bangkok', 'Tokyo','Singapore'],
          'Population':[2000, 2300, 900, 1600, 600]}
df = pd.DataFrame(cities, columns=["Town","Population"],
                  index=cities["Country"])
print(df)

執行結果

                Town  Population
China        Beijing        2000
China       Shanghai        2300
Thailand     Bangkok         900
Japan          Tokyo        1600
Singapore  Singapore         600

範例 pythonPandas-10.py : 將 Numpy 的隨機值函數 randint() 應用在建立 DataFrame 物件元素內容，假設有一個課程，第一次 first、第二次 second、和最後一次成績 final 皆是使用隨機數給予，分數介於 60 -99 之間。程式第6行 np.random.randint(60,100,size=(3,3)) 方法可以建立 (3,3) 陣列，每格的的數據是在 60-99 分之間。

# pythonPandas-10.py
import pandas as pd
import numpy as np
name = ['Frank', 'Peter', 'John']
score = ['first', 'second', 'final']
df = pd.DataFrame(np.random.randint(60,100,size=(3,3)),
                  columns=name,
                  index=score)
print(df)

執行結果

        Frank  Peter  John
first      82     80    69
second     60     96    90
final      98     75    95

範例 pythonPandas-11.py : 有幾位學生的大學學測分數，請建立此 DataFrame 物件，並列印。

# pythonPandas-11.py
import pandas as pd

course = ['Chinese', 'English', 'Math', 'Natural', 'Society']
chinese = [14, 12, 13, 10, 13]
eng = [13, 14, 11, 10, 15]
math = [15, 9, 12, 8, 15]
nature = [15, 10, 13, 10, 15]
social = [12, 11, 14, 9, 14]

df = pd.DataFrame([chinese, eng, math, nature, social],
                  columns = course,
                  index = range(1,6))
print(df)

執行結果

   Chinese  English  Math  Natural  Society
1       14       12    13       10       13
2       13       14    11       10       15
3       15        9    12        8       15
4       15       10    13       10       15
5       12       11    14        9       14

範例 pythonPandas-12.py : 將範例 pythonPandas-11.py 所建立的 DataFrame 物件，用有保留 header 和 index 方式儲存至 output-pythonPandas-12a.csv ，然後用沒有保留方式存入 output-pythonPandas-12b.csv。

# pythonPandas-12.py
import pandas as pd

course = ['Chinese', 'English', 'Math', 'Natural', 'Society']
chinese = [14, 12, 13, 10, 13]
eng = [13, 14, 11, 10, 15]
math = [15, 9, 12, 8, 15]
nature = [15, 10, 13, 10, 15]
social = [12, 11, 14, 9, 14]

df = pd.DataFrame([chinese, eng, math, nature, social],
                  columns = course,
                  index = range(1,6))
df.to_csv("output-pythonPandas-12a.csv")
df.to_csv("output-pythonPandas-12b.csv", header=False, index=False)

執行結果

範例 pythonPandas-13.py : 分別讀取所建立的 CSV 檔案，然後列印。

# pythonPandas-13.py
import pandas as pd

course = ['Chinese', 'English', 'Math', 'Natural', 'Society']
x = pd.read_csv("out4_12a.csv",index_col=0)
y = pd.read_csv("out4_12b.csv",names=course)
print(x)
print(y)

執行結果

   Chinese  English  Math  Natural  Society
1       14       12    13       10       13
2       13       14    11       10       15
3       15        9    12        8       15
4       15       10    13       10       15
5       12       11    14        9       14
   Chinese  English  Math  Natural  Society
0       14       12    13       10       13
1       13       14    11       10       15
2       15        9    12        8       15
3       15       10    13       10       15
4       12       11    14        9       14

範例 pythonPandas-14.py : 建立一個 Series 物件 tw，是紀錄 1950到2010年間，每隔10年台灣人口的數據，單位是萬人。

# pythonPandas-14.py
import pandas as pd
import matplotlib.pyplot as plt

population = [860, 1100, 1450, 1800, 2020, 2200, 2260]
tw = pd.Series(population, index=range(1950, 2011, 10))
tw.plot(title='Population in Taiwan')
plt.xlabel("Year")
plt.ylabel("Population")
plt.show()

執行結果

範例 pythonPandas-15.py : 設計一個世界大城市的人口圖，製作 DataFrame 物件，然後繪製圖表。

# pythonPandas-15.py
import pandas as pd
import matplotlib.pyplot as plt

cities = {'population':[1000, 850, 800, 1500, 600, 800],
          'town':['New York','Chicago','Bangkok','Tokyo',
                   'Singapore','HongKong']}
tw = pd.DataFrame(cities, columns=['population'],index=cities['town'])
          
tw.plot(title='Population in the World')
plt.xlabel('City')
plt.ylabel("Population")
plt.show()

執行結果

範例 pythonPandas-16.py : 使用直條圖重新設計程式實例 pythonPandas-15.py。

# pythonPandas-15.py
import pandas as pd
import matplotlib.pyplot as plt

cities = {'population':[1000, 850, 800, 1500, 600, 800],
          'town':['New York','Chicago','Bangkok','Tokyo',
                   'Singapore','HongKong']}
tw = pd.DataFrame(cities, columns=['population'],index=cities['town'])
          
tw.plot(title='Population in the World')
plt.xlabel('City')
plt.ylabel("Population")
plt.show()

執行結果

調整過後

範例 pythonPandas-17.py : 擴充 DataFrame ，增加城市面積資料(平方公里)。

# pythonPandas-15.py
import pandas as pd
import matplotlib.pyplot as plt

cities = {'population':[1000, 850, 800, 1500, 600, 800],
          'town':['New York','Chicago','Bangkok','Tokyo',
                   'Singapore','HongKong']}
tw = pd.DataFrame(cities, columns=['population'],index=cities['town'])
          
tw.plot(title='Population in the World')
plt.xlabel('City')
plt.ylabel("Population")
plt.show()

執行結果

範例 pythonPandas-18.py : 將人口單位數將為"人"，重新設計 pythonPandas-17.py。

# pythonPandas-18.py
import pandas as pd
import matplotlib.pyplot as plt

cities = {'population':[10000000,8500000,8000000,15000000,6000000,8000000],
          'area':[400, 500, 850, 300, 200, 320],
          'town':['New York','Chicago','Bangkok','Tokyo',
                   'Singapore','HongKong']}
tw = pd.DataFrame(cities, columns=['population','area'],index=cities['town'])
          
tw.plot(title='Population in the World')
plt.xlabel('City')
plt.show()

執行結果

範例 pythonPandas-19.py : 使用第2軸的觀念，重新設計 範例 pythonPandas-18.py。

# pythonPandas-19.py
import pandas as pd
import matplotlib.pyplot as plt

cities = {'population':[10000000,8500000,8000000,15000000,6000000,8000000],
          'area':[400, 500, 850, 300, 200, 320],
          'town':['New York','Chicago','Bangkok','Tokyo',
                   'Singapore','HongKong']}
tw = pd.DataFrame(cities, columns=['population','area'],index=cities['town'])

fig, ax = plt.subplots()    # fig 是整個圖表物件，ax 是第一個軸
fig.suptitle("City Statistics")
ax.set_ylabel("Population")
ax.set_xlabel("City")

ax2 = ax.twinx()    # 使用 twinx() 可以建立第2個數值軸 ax2
ax2.set_ylabel("Area")
tw['population'].plot(ax=ax,rot=90)     # 繪製人口數線
tw['area'].plot(ax=ax2, style='g-')     # 繪製面積線
ax.legend(loc=1)                        # 圖例位置在右上
ax2.legend(loc=2)                       # 圖例位置在左上
plt.show()

執行結果

範例 pythonPandas-20.py : 重新設計 範例 pythonPandas-19.py，在左側 y 軸不用科學記號表示人口數，此例第15行增加下列

ax.ticklabel_format(style='plain') # 不用科學記號表示

# pythonPandas-20.py
import pandas as pd
import matplotlib.pyplot as plt

cities = {'population':[10000000,8500000,8000000,15000000,6000000,8000000],
          'area':[400, 500, 850, 300, 200, 320],
          'town':['New York','Chicago','Bangkok','Tokyo',
                   'Singapore','HongKong']}
tw = pd.DataFrame(cities, columns=['population','area'],index=cities['town'])

fig, ax = plt.subplots()
fig.suptitle("City Statistics")
ax.set_ylabel("Population")
ax.set_xlabel("City")
ax.ticklabel_format(style='plain')     # 不用科學記號表示
ax2 = ax.twinx()
ax2.set_ylabel("Area")
tw['population'].plot(ax=ax,rot=90)     # 繪製人口數線
tw['area'].plot(ax=ax2, style='g-')     # 繪製面積線
ax.legend(loc=1)                        # 圖例位置在右上
ax2.legend(loc=2)                       # 圖例位置在左上
plt.show()

執行結果

範例 pythonPandas-21.py : 使用 Series 物件繪製圓餅圖。

# pythonPandas-21.py
import pandas as pd
import matplotlib.pyplot as plt

fruits = ['Apples', 'Bananas', 'Grapes', 'Pears', 'Oranges']
s = pd.Series([2300, 5000, 1200, 2500, 2900], index=fruits,
              name='Fruits Shop')
explode = [0.4, 0, 0, 0.2, 0]
s.plot.pie(explode = explode, autopct='%1.2f%%')
plt.show()

執行結果

範例 pythonPandas-22.py : 列出現在時間。

# pythonPandas-22.py
from datetime import datetime

timeNow = datetime.now()
print(type(timeNow))
print("現在時間 : ", timeNow)

執行結果

<class 'datetime.datetime'>
現在時間 :  2021-10-07 12:14:08.031411

範例 pythonPandas-23.py : 列出時間的個別內容。

# pythonPandas-23.py
from datetime import datetime

timeNow = datetime.now()
print(type(timeNow))
print("現在時間 : ", timeNow)
print("年 : ", timeNow.year)
print("月 : ", timeNow.month)
print("日 : ", timeNow.day)
print("時 : ", timeNow.hour)
print("分 : ", timeNow.minute)
print("秒 : ", timeNow.second)

執行結果

<class 'datetime.datetime'>
現在時間 :  2021-10-07 12:22:48.237163
年 :  2021
月 :  10
日 :  7
時 :  12
分 :  22
秒 :  48

範例 pythonPandas-24.py : 設定程式迴圈執行到 2019年3月11日22點271分0秒將甦醒停止列印 program is sleeping，然後列印 Wake up。

# pythonPandas-24.py
from datetime import datetime

timeStop = datetime(2021,10,7,12,33,0)
while datetime.now() < timeStop:
    print("Program is sleeping.", end="")
print("Wake up")

執行結果

Program is sleeping.Program is sleeping.
Program is sleeping.Program is sleeping.
Program is sleeping.Program is sleeping.
Program is sleeping.Program is sleeping.
Program is sleeping.Program is sleeping.
Program is sleeping.Program is sleeping.
Wake up

範例 pythonPandas-25.py : 列印出伊甸時間的日數、秒數、百萬分之一秒數。

# pythonPandas-25.py
from datetime import datetime, timedelta

deltaTime = timedelta(days=3,hours=5,minutes=8,seconds=10)
print(deltaTime.days, deltaTime.seconds, deltaTime.microseconds)

執行結果

3 18490 0

範例 pythonPandas-26.py : 重新設計 範例 pythonPandas-25.py ，將一段時間轉為秒數。

# pythonPandas-26.py
from datetime import datetime, timedelta

deltaTime = timedelta(days=3,hours=5,minutes=8,seconds=10)
print(deltaTime.total_seconds())

執行結果

277690.0

範例 pythonPandas-27.py : 使用 datetime 建立含 5 天的 Series 物件和列印，這五天是使用串列 [34, 44, 65, 53, 39] 同時列出時間序列物件的數據型態，以及時間序列的索引 index。

# pythonPandas-27.py
import pandas as pd
from datetime import datetime, timedelta

ndays = 5
start = datetime(2019, 3, 11)   
dates = [start + timedelta(days=x) for x in range(0, ndays)]
data = [34, 44, 65, 53, 39]
ts = pd.Series(data, index=dates)
print(type(ts))
print(ts)
print(ts.index)

執行結果

<class 'pandas.core.series.Series'>
2019-03-11    34
2019-03-12    44
2019-03-13    65
2019-03-14    53
2019-03-15    39
dtype: int64
DatetimeIndex(['2019-03-11', '2019-03-12', '2019-03-13', '2019-03-14',
               '2019-03-15'],
              dtype='datetime64[ns]', freq=None)

範例 pythonPandas-28.py : 擴充範例 pythonPandas-27.py ，建立相同時間戳記的 Series 物件，然後計算兩個 Series 物件的相加與計算平均。

# pythonPandas-28.py
import pandas as pd
from datetime import datetime, timedelta

ndays = 5
start = datetime(2019, 3, 11)   
dates = [start + timedelta(days=x) for x in range(0, ndays)]
data1 = [34, 44, 65, 53, 39]
ts1 = pd.Series(data1, index=dates)

data2 = [34, 44, 65, 53, 39]
ts2 = pd.Series(data2, index=dates)

addts = ts1 + ts2
print("ts1+ts2")
print(addts)

meants = (ts1 + ts2)/2
print("(ts1+ts2)/2")
print(meants)

執行結果

ts1+ts2
2019-03-11     68
2019-03-12     88
2019-03-13    130
2019-03-14    106
2019-03-15     78
dtype: int64
(ts1+ts2)/2
2019-03-11    34.0
2019-03-12    44.0
2019-03-13    65.0
2019-03-14    53.0
2019-03-15    39.0
dtype: float64

範例 pythonPandas-29.py : 重新設計 範例 pythonPandas-28.py ，執行兩個 Series 物件相加，但是部分時間戳記是不同。

# pythonPandas-29.py
import pandas as pd
from datetime import datetime, timedelta

ndays = 5
start = datetime(2019, 3, 11)   
dates1 = [start + timedelta(days=x) for x in range(0, ndays)]
data1 = [34, 44, 65, 53, 39]
ts1 = pd.Series(data1, index=dates1)

dates2 = [start - timedelta(days=x) for x in range(0, ndays)]
data2 = [34, 44, 65, 53, 39]
ts2 = pd.Series(data2, index=dates2)

addts = ts1 + ts2
print("ts1+ts2")
print(addts)

執行結果

ts1+ts2
2019-03-07     NaN
2019-03-08     NaN
2019-03-09     NaN
2019-03-10     NaN
2019-03-11    68.0
2019-03-12     NaN
2019-03-13     NaN
2019-03-14     NaN
2019-03-15     NaN
Freq: D, dtype: float64

範例 pythonPandas-30.py : 重新設計 範例 pythonPandas-27.py ，使用 date_range()。

# pythonPandas-30.py
import pandas as pd

dates = pd.date_range('3/11/2019', '3/15/2019')
data = [34, 44, 65, 53, 39]
ts = pd.Series(data, index=dates)
print(type(ts))
print(ts)
print(ts.index)

執行結果

<class 'pandas.core.series.Series'>
2019-03-11    34
2019-03-12    44
2019-03-13    65
2019-03-14    53
2019-03-15    39
Freq: D, dtype: int64
DatetimeIndex(['2019-03-11', '2019-03-12', '2019-03-13', '2019-03-14',
               '2019-03-15'],
              dtype='datetime64[ns]', freq='D')

範例 pythonPandas-31.py : 讀取加州大學鳶尾花資料集網頁，然後將此資料集儲存成 iris.csv。

# pythonPandas-31.py
import requests

url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
try:
    htmlfile = requests.get(url)    # 將檔案下載至htmlfile
    print('下載成功')
except Exception as err:
    print('下載失敗')

fileName = 'iris.csv'      # 未來儲存鳶尾花的檔案
with open(fileName, 'wb') as fileobj:       # 開啟iris.csv
    for diskstorage in htmlfile.iter_content(10240):
        size = fileobj.write(diskstorage)       # 寫入

執行結果

下載成功

範例 pythonPandas-32.py : 讀取 iris.csv，為此資料集加上欄位名稱，然後列出此資料集的長度和內容。

# pythonPandas-32.py
import pandas as pd

colName = ['sepal_len','sepal_wd','petal_len','petal_wd','species']
iris = pd.read_csv('iris.csv', names = colName)
print('資料集長度 : ', len(iris))
print(iris)

執行結果

資料集長度 :  150
     sepal_len  sepal_wd  petal_len  petal_wd         species
0          5.1       3.5        1.4       0.2     Iris-setosa
1          4.9       3.0        1.4       0.2     Iris-setosa
2          4.7       3.2        1.3       0.2     Iris-setosa
3          4.6       3.1        1.5       0.2     Iris-setosa
4          5.0       3.6        1.4       0.2     Iris-setosa
..         ...       ...        ...       ...             ...
145        6.7       3.0        5.2       2.3  Iris-virginica
146        6.3       2.5        5.0       1.9  Iris-virginica
147        6.5       3.0        5.2       2.0  Iris-virginica
148        6.2       3.4        5.4       2.3  Iris-virginica
149        5.9       3.0        5.1       1.8  Iris-virginica

[150 rows x 5 columns]

範例 pythonPandas-33.py : 繪製 (Sepal Length, Sepal Width) 之散點圖。

# pythonPandas-33.py
import pandas as pd
import matplotlib.pyplot as plt

colName = ['sepal_len','sepal_wd','petal_len','petal_wd','species']
iris = pd.read_csv('iris.csv', names = colName)

iris.plot(x='sepal_len',y='sepal_wd',kind='scatter')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Iris Sepal length and width anslysis')
plt.show()

執行結果

範例 pythonPandas-34.py : 修改 範例 pythonPandas-33.py 第8行， 使用 plot() 方式完成不同顏色和點標記。

# pythonPandas-34.py
import pandas as pd
import matplotlib.pyplot as plt

colName = ['sepal_len','sepal_wd','petal_len','petal_wd','species']
iris = pd.read_csv('iris.csv', names = colName)

plt.plot(iris['sepal_len'],iris['sepal_wd'],'*',color='g')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Iris Sepal length and width anslysis')
plt.show()

執行結果

範例 pythonPandas-35.py : 將不同的鳶尾花的花萼是用不同的標記繪製散點圖

# pythonPandas-35.py
import pandas as pd
import matplotlib.pyplot as plt

colName = ['sepal_len','sepal_wd','petal_len','petal_wd','species']
iris = pd.read_csv('iris.csv', names = colName)

# 擷取不同品種的鳶尾花
iris_setosa = iris[iris['species'] == 'Iris-setosa']
iris_versicolor = iris[iris['species'] == 'Iris-versicolor']
iris_virginica = iris[iris['species'] == 'Iris-virginica']
# 繪製散點圖
plt.plot(iris_setosa['sepal_len'],iris_setosa['sepal_wd'],
         '*',color='g',label='setosa')
plt.plot(iris_versicolor['sepal_len'],iris_versicolor['sepal_wd'],
         'x',color='b',label='versicolor')
plt.plot(iris_virginica['sepal_len'],iris_virginica['sepal_wd'],
         '.',color='r',label='virginica')
# 標註軸和標題
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Iris Sepal length and width anslysis')
plt.legend()
plt.show()

執行結果

範例 pythonPandas-36.py : 以均值和直條圖方式繪製不同品種花萼與花瓣長與寬

# pythonPandas-36.py
import pandas as pd
import matplotlib.pyplot as plt

colName = ['sepal_len','sepal_wd','petal_len','petal_wd','species']
iris = pd.read_csv('iris.csv', names = colName)

# 鳶尾花分組統計均值
iris_mean = iris.groupby('species', as_index=False).mean()
# 繪製直條圖
iris_mean.plot(kind='bar')
# 刻度處理
plt.xticks(iris_mean.index,iris_mean['species'], rotation=0)
plt.show()

執行結果

範例 pythonPandas-37.py : 重新設計 範例 pythonPandas-36.py ，將 x 軸的品種前方字串 "iris-" 刪除，修改第7行程式碼 : iris['species'] = iris['species'].apply(lambda x: x.replace("Iris-",""))

# pythonPandas-37.py
import pandas as pd
import matplotlib.pyplot as plt

colName = ['sepal_len','sepal_wd','petal_len','petal_wd','species']
iris = pd.read_csv('iris.csv', names = colName)
iris['species'] = iris['species'].apply(lambda x: x.replace("Iris-",""))
# 鳶尾花分組統計均值
iris_mean = iris.groupby('species', as_index=False).mean()
# 繪製直條圖
iris_mean.plot(kind='bar')
# 刻度處理
plt.xticks(iris_mean.index,iris_mean['species'], rotation=0)

plt.show()

執行結果

範例 pythonPandas-38.py : 重新設計 範例 pythonPandas-37.py ，但是使用堆疊方式處裡數據，修改第11行程式碼 : iris_mean.plot(kind='bar',stacked=True)

# pythonPandas-38.py
import pandas as pd
import matplotlib.pyplot as plt

colName = ['sepal_len','sepal_wd','petal_len','petal_wd','species']
iris = pd.read_csv('iris.csv', names = colName)
iris['species'] = iris['species'].apply(lambda x: x.replace("Iris-",""))
# 鳶尾花分組統計均值
iris_mean = iris.groupby('species', as_index=False).mean()
# 繪製堆疊直條圖
iris_mean.plot(kind='bar',stacked=True)
# 刻度處理
plt.xticks(iris_mean.index,iris_mean['species'], rotation=0)

plt.show()

執行結果

範例 pythonPandas-39.py : 重新設計 範例 pythonPandas-38.py ，將直條圖改為橫條圖，修改第11行程式碼 :iris_mean.plot(kind='barh',stacked=True)，第13行程式碼 : plt.yticks(iris_mean.index,iris_mean['species'], rotation=0)

# pythonPandas-39.py
import pandas as pd
import matplotlib.pyplot as plt

colName = ['sepal_len','sepal_wd','petal_len','petal_wd','species']
iris = pd.read_csv('iris.csv', names = colName)
iris['species'] = iris['species'].apply(lambda x: x.replace("Iris-",""))
# 鳶尾花分組統計均值
iris_mean = iris.groupby('species', as_index=False).mean()
# 繪製堆疊橫條圖
iris_mean.plot(kind='barh',stacked=True)
# 刻度處理
plt.yticks(iris_mean.index,iris_mean['species'], rotation=0)

plt.show()

執行結果

範例 pythonPandas-40.py : 使用 Pandas 的 read_html(‘url’) 讀取全球貨幣匯率表格。

# pythonPandas-40.py
import pandas as pd

url = 'http://www.stockq.org/market/currency.php'
currencys = pd.read_html(url)

print(type(currencys))              # 列出資料型態
print(currencys)                    # 列出匯率的串列內容

執行結果

<class 'list'>
[                                                   0
0  google_ad_client = "ca-pub-9803646600609510"; ...,                                                    0
0  google_ad_client = "ca-pub-9803646600609510"; ...
1  首頁 市場動態 歷史股價 基金淨值 基金分類 經濟數據總覽 2021行事曆  期貨報告  加...,     0
                      1
0 NaN  google_ad_client = "ca-pub-9803646600609510"; ...,                                                    0     1
0  首頁 市場動態 歷史股價 基金淨值 基金分類 經濟數據總覽 2021行事曆  期貨報告  加...  简体中文,

範例 pythonPandas-41.py : 列出全球匯率網頁的元素內容，一個元素一個元素列印。

# pythonPandas-41.py
import pandas as pd

url = 'http://www.stockq.org/market/currency.php'
currencys = pd.read_html(url)         # 讀取全球匯率行情表

item = 0
for currency in currencys:
    print("元素 : ", item)            # 列出元素編號
    print(currency)                   # 列出元素內容
    print()
    item += 1

執行結果

元素 :  0
                                                   0
0  google_ad_client = "ca-pub-9803646600609510"; ...

元素 :  1
                                                   0
0  google_ad_client = "ca-pub-9803646600609510"; ...
1  首頁 市場動態 歷史股價 基金淨值 基金分類 經濟數據總覽 2021行事曆  期貨報告  加...
............................................................
元素 :  7
                           0                         1  ...                         3                         4
0   全球匯率 (Currency Exchange)  全球匯率 (Currency Exchange)  ...  全球匯率 (Currency Exchange)  全球匯率 (Currency Exchange)
1                         貨幣                        匯率  ...                        比例                        台北

2                      歐元/美元                    1.1561  ...                     0.06%                     19:02
3                      英鎊/美元                    1.3599  ...                     0.15%                     19:02
4                    美元/瑞士法郎                    0.9258  ...                    -0.13%                     19:02
5                    美元/瑞典克朗                    8.7797  ...                    -0.21%                     19:02
.............

範例 pythonPandas-42.py : 顯示元素 7 的 Pandas 之 DataFrame 資料。

# pythonPandas-42.py
import pandas as pd

url = 'http://www.stockq.org/market/currency.php'
currencys = pd.read_html(url)                               # 讀取全球匯率行情表

currency = currencys[7]                                     # 讀取第7元素
currency = currency.drop(currency.index[[0,1]])             # 拋棄前2 row
currency.columns = ['貨幣', '匯率', '漲跌', '比例', '台北'] # 建立column標題
currency.index = range(len(currency.index))                 # 建立row標題
print(currency)

執行結果

      貨幣       匯率    漲跌    比例    台北
0     歐元/美元   1.1561   0.0006   0.06%  19:02
1     英鎊/美元   1.3599   0.0021   0.15%  19:02
2   美元/瑞士法郎   0.9258  -0.0012  -0.13%  19:02
3   美元/瑞典克朗   8.7797  -0.0184  -0.21%  19:02
4    美元/俄盧布  72.0608  -0.3435  -0.47%  19:02
5   美元/匈牙利幣   309.50    -0.64  -0.21%  19:02
6   美元/土耳其幣   8.8690   0.0020   0.02%  19:02
7    美元/南非幣  14.8737  -0.0859  -0.57%  19:02
8   美元/以色列幣   3.2283  -0.0049  -0.15%  19:02
9    美元/摩洛哥   9.0692   0.0022   0.02%  19:02
10    澳幣/美元   0.7296   0.0024   0.34%  19:02
11    紐幣/美元   0.6928   0.0020   0.29%  19:02
12    美元/日圓   111.38    -0.01  -0.01%  19:02
13   美元/人民幣   6.4452   0.0000   0.00%  05:00
14    美元/港幣   7.7845  -0.0016  -0.02%  19:02
15    美元/台幣   27.968   -0.001  -0.00%  19:02
16    美元/韓圜  1189.71     0.25   0.02%  19:02
17    美元/泰銖   33.770   -0.030  -0.09%  19:02
18    美元/新元   1.3576  -0.0006  -0.05%  19:02
19   美元/菲披索   50.343   -0.467  -0.92%  19:02
20   美元/馬來幣   4.1810   0.0005   0.01%  16:14
21   美元/印尼盾  14215.0    -33.5  -0.24%  15:53
22  美元/印度盧比   74.739   -0.011  -0.01%  19:02
23    美元/加幣   1.2578  -0.0007  -0.06%  19:02
24   美元/巴西幣   5.4932   0.0025   0.05%  05:00
25   美元/墨披索  20.5165  -0.0272  -0.13%  19:02
26   美元/阿根廷  98.9100   0.0050   0.01%  09:57
27    美元/智利   812.93     0.00   0.00%  18:55

範例 pythonBS4-01.py : 解析 https://icook.tw/ 網頁，主要是列出資料型態。

# pythonBS4-01.py
import requests, bs4

htmlFile = requests.get('https://icook.tw/')    # 下載愛料理網頁內容
objSoup = bs4.BeautifulSoup(htmlFile.text, 'lxml')  # lxml 是解析HTML文件方式
print("列印BeautifulSoup物件資料型態 : ", type(objSoup))    # 列印串列方法

執行結果

列印 BeautifulSoup 物件資料型態 :  <class 'bs4.BeautifulSoup'>

範例 pythonBS4-02.py : 分析 htmlExample-1.html 文件，列出物件類型。

# pythonBS4-02.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
print("列印BeautifulSoup物件資料型態 : ", type(objSoup))

執行結果

列印BeautifulSoup物件資料型態 : <class 'bs4.BeautifulSoup'>

範例 pythonBS4-03.py : 分析 htmlExample-1.html 文件，列出標籤 <title> 內容。

# pythonBS4-03.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
print("列印title = ", objSoup.title)
print("title內容 = ", objSoup.title.text)

執行結果

列印title = <title>HTML 文件基礎架構</title>
title內容 = HTML 文件基礎架構

範例 pythonBS4-04.py : 傳回第一個 <h1> 內容。

# pythonBS4-04.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.find('h1')
print("資料型態       = ", type(objTag))
print("列印Tag        = ", objTag)
print("Text屬性內容   = ", objTag.text)
print("String屬性內容 = ", objTag.string)

執行結果

資料型態       =  <class 'bs4.element.Tag'>
列印Tag        =  <h1 class="text-center">這是表頭區塊</h1>
Text屬性內容   =  這是表頭區塊
String屬性內容 =  這是表頭區塊

範例 pythonBS4-05.py : 傳回所有 <h1> 內容。

# pythonBS4-05.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.find_all('h1')
print("資料型態    = ", type(objTag))     # 列印資料型態
print("列印Tag串列 = ", objTag)           # 列印串列
print("以下是列印串列元素 : ")
for data in objTag:                       # 列印串列元素內容
		print(data.text)

執行結果

資料型態    =  <class 'bs4.element.ResultSet'>
列印Tag串列 =  [<h1 class="text-center">這是表頭區塊</h1>, <h1 class="text-center">這是主要內容區</h1>]
以下是列印串列元素 :
這是表頭區塊
這是主要內容區

範例 pythonBS4-05-1.py : 傳回所有 <h1> 內容，限制尋找最多 1 個標籤。

# pythonBS4-05-1.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.find_all('h1', limit=1)
print("資料型態    = ", type(objTag))     # 列印資料型態
print("列印Tag串列 = ", objTag)           # 列印串列
print("以下是列印串列元素 : ")
for data in objTag:                       # 列印串列元素內容
		print(data.text)

執行結果

資料型態    =  <class 'bs4.element.ResultSet'>
列印Tag串列 =  [<h1 class="text-center">這是表頭區塊</h1>]
以下是列印串列元素 :
這是表頭區塊

範例 pythonBS4-06.py : 使用 getText()，擴充設計 範例 pythonBS4-03.py。

# pythonBS4-06.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.find_all('h1')
print("資料型態    = ", type(objTag))     # 列印資料型態
print("列印Tag串列 = ", objTag)           # 列印串列
print("\n使用Text屬性列印串列元素 : ")
for data in objTag:                       # 列印串列元素內容
		print(data.text)
print("\n使用getText()方法列印串列元素 : ")
for data in objTag:
		print(data.getText())

執行結果

資料型態    =  <class 'bs4.element.ResultSet'>
列印Tag串列 =  [<h1 class="text-center">這是表頭區塊</h1>, <h1 class="text-center">這是主要內容區</h1>]

使用Text屬性列印串列元素 :
這是表頭區塊
這是主要內容區

使用getText()方法列印串列元素 :
這是表頭區塊
這是主要內容區

範例 pythonBS4-06-1.py : 搜尋第一個含有 id='header' 的標籤節點。

# pythonBS4-06-1.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.find(id='header')
print(objTag)
print(objTag.text)

執行結果

<h1 class="text-center" id="header">這是表頭區塊</h1>
這是表頭區塊

範例 pythonBS4-06-2.py : 搜尋所有含有 class_='card-text' 的標籤節點。

# pythonBS4-06-2.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.find_all(class_='card-text')
for tag in objTag:
		print(tag)
		print(tag.text)

執行結果

<p class="card-text text-center">人物介紹-1</p>
人物介紹-1
<p class="card-text text-center">人物介紹-2</p>
人物介紹-2
<p class="card-text text-center">人物介紹-2</p>
人物介紹-2

範例 pythonBS4-06-3.py : 使用 attrs 參數，搜尋自訂義屬性 data-* 之類的標籤。

# pythonBS4-06-3.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
tag = objSoup.find(attrs={'data-author':'author1'})
print(tag)
print(tag.text)

執行結果

<p class="card-text text-center" data-author="author1">人物介紹-1</p>
人物介紹-1

範例 pythonBS4-06-4.py : 使用 class_和直接省略方式搜尋 CSS 類別的標籤。

# pythonBS4-06-4.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
tag = objSoup.find('p', class_='card-text')
print(tag)
print(tag.getText())
print('-'*70)
tag = objSoup.find('p', 'card-text')
print(tag)
print(tag.text)

執行結果

<p class="card-text text-center" data-author="author1">人物介紹-1</p>
人物介紹-1
----------------------------------------------------------------------
<p class="card-text text-center" data-author="author1">人物介紹-1</p>
人物介紹-1

範例 pythonBS4-06-5.py : 使用正則表達式搜尋部分字串符合的標籤。

# pythonBS4-06-5.py
import bs4
import re

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
tag = objSoup.find('h1', class_=re.compile('text'))
print(tag)
print(tag.text)

執行結果

<h1 class="text-center" id="header">這是表頭區塊</h1>
這是表頭區塊

範例 pythonBS4-07.py : 使用正則表達式搜尋部分字串符合的標籤。

# pythonBS4-07.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.select('#header')
print("資料型態     = ", type(objTag))          # 列印資料型態
print("串列長度     = ", len(objTag))           # 列印串列長度
print("元素資料型態 = ", type(objTag[0]))       # 列印元素資料型態
print("元素內容     = ", objTag[0].getText())   # 列印元素內容

執行結果

資料型態     =  <class 'bs4.element.ResultSet'>
串列長度     =  1
元素資料型態 =  <class 'bs4.element.Tag'>
元素內容     =  這是表頭區塊

範例 pythonBS4-08.py : 將解析的串列傳給 str() 並列印出來。

# pythonBS4-08.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.select('#header')
print("列出串列元素的資料型態    = ", type(objTag[0]))
print(objTag[0])
print("列出str()轉換過的資料型態 = ", type(str(objTag[0])))
print(str(objTag[0]))

執行結果

列出串列元素的資料型態    =  <class 'bs4.element.Tag'>
<h1 class="text-center" id="header">這是表頭區塊</h1>
列出str()轉換過的資料型態 =  <class 'str'>
<h1 class="text-center" id="header">這是表頭區塊</h1>

範例 pythonBS4-09.py : 將解析的串列應用 attrs 列印出所有屬性的字典資料型態。

# pythonBS4-09.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
objTag = objSoup.select('#header')
print(type(objTag[0].attrs))
print(str(objTag[0].attrs))

執行結果

<class 'dict'>
{'class': ['text-center'], 'id': 'header'}

範例 pythonBS4-10.py : 搜尋 <img> 標籤，並列印出結果。

# pythonBS4-10.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
imgTag = objSoup.select('img')
print("含<img>標籤的串列長度 = ", len(imgTag))
for img in imgTag:              
		print(img)

執行結果

含<img>標籤的串列長度 =  3
<img alt="card-1" class="card-img-top" src="media/components/user-1.jpg"/>
<img alt="card-2" class="card-img-top" src="media/components/user-2.jpg"/>
<img alt="card-2" class="card-img-top" src="media/components/user-3.jpg"/>

範例 pythonBS4-11.py : 擴充範例 pythonBS4-10.py，取得所有圖檔的路徑。

# pythonBS4-11.py
import bs4

htmlFile = open('htmlExampleBS4-02.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
imgTag = objSoup.select('img')
print("含<img>標籤的串列長度 = ", len(imgTag))
for img in imgTag:              
		print("列印標籤串列 = ", img)
		print("列印圖檔     = ", img.get('src'))
		print("列印圖檔     = ", img['src'])

執行結果

列印標籤串列 =  <img alt="card-1" class="card-img-top" src="media/components/user-1.jpg"/>
列印圖檔     =  media/components/user-1.jpg
列印圖檔     =  media/components/user-1.jpg
列印標籤串列 =  <img alt="card-2" class="card-img-top" src="media/components/user-2.jpg"/>
列印圖檔     =  media/components/user-2.jpg
列印圖檔     =  media/components/user-2.jpg
列印標籤串列 =  <img alt="card-2" class="card-img-top" src="media/components/user-3.jpg"/>
列印圖檔     =  media/components/user-3.jpg
列印圖檔     =  media/components/user-3.jpg

範例 htmlExampleBS4-03.html : 有關項目清單之簡單 HTML 文件。

<!doctype html>
<html lang="zh-Hant-TW">
	 <html>
			<head>
				 <meta charset="utf-8">
				 <title>htmlExampleBS4-03.html</title>
			</head>
			<body>
				 <h1>台灣旅遊景點排名</h1>
				 <ol type="a">
						<li>故宮博物院</li>
						<li>日月潭</li>
						<li>阿里山</li>
				 </ol>
				 <h2>台灣夜市排名</h2>
				 <ol type="A">
						<li>士林夜市</li>
						<li>永康夜市</li>
						<li>逢甲夜市</li>
				 </ol>
				 <h2>台灣人口排名</h2>
				 <ol type="i">
						<li>新北市</li>
						<li>台北市</li>
						<li>桃園市</li>
				 </ol>
				 <h2>台灣最健康大學排名</h2>
				 <ol type="I">
						<li>明志科大</li>
						<li>台灣體院</li>
						<li>台北體院</li>
				 </ol>
			</body>
	 </html>

執行結果

範例 pythonBS4-11-1.py : 爬取台灣最健康大學排名，這程式會列印出標題和最健康學校清單。

# pythonBS4-11-1.py
import bs4

htmlFile = open('htmlExampleBS4-03.html', encoding='utf-8')
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')
titleobj = objSoup.find_all('h2')               # h2標題
print(titleobj[2].text)

itemobj = objSoup.find('ol', type='I')          # type='I'
items = itemobj.find_all('li')
for item in items:               
		print(item.text)

執行結果

台灣最健康大學排名
明志科大
台灣體院
台北體院

範例 htmlExampleBS4-04.html : 有關自定義清單簡單 HTML 文件。

<!doctype html>
<html lang="zh-Hant-TW">
<html>
	 <head>
			<meta charset="utf-8">
			<title>htmlExampleBS4-04.html</title>
	 </head>
	 <body>
			<h1>國家首都資料表</h1>
			<dl>
				 <dt>Washington</dt>
				 <dd>美國首都</dd>
				 <dt>Tokyo</dt>
				 <dd>日本首都</dd>
				 <dt>Paris</dt>
				 <dd>法國首都</dd>
			</dl>
	 </body>
</html>

執行結果

範例 pythonBS4-11-2.py : 爬取國家首都資料，這程式會以字典方式列印出國家、首都資料。

# pythonBS4-11-2.py
import requests, bs4

url = 'htmlExampleBS4-04.html'
htmlFile = open(url, encoding='utf-8')              
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')

mycity = []
cityobj = objSoup.find('dl')          
cities = cityobj.find_all('dt')
for city in cities:
		mycity.append(city.text)                # mycity串列

mycountry = []    
countryobj = objSoup.find('dl')
countries = countryobj.find_all('dd')
for country in countries:
		mycountry.append(country.text)          # mycountry串列

print("國家 = ", mycountry)
print("首都 = ", mycity)
data = dict(zip(mycountry, mycity))
print(data)                                 # 字典顯示結果

執行結果

國家 =  ['美國首都', '日本首都', '法國首都']
首都 =  ['Washington', 'Tokyo', 'Paris']
{'美國首都': 'Washington', '日本首都': 'Tokyo', '法國首都': 'Paris'}

範例 htmlExampleBS4-05.html : 簡單的 HTML 表格文件。

<!doctype html>
<html lang="zh-Hant-TW">
	 <html>
			<head>
				 <meta charset="utf-8">
				 <title>htmlExampleBS4-05.html</title>
				 <style>
						table, th, td {
							 border: 1px solid black;
						}

				 </style>
			</head>
			<body>
				 <table>
						<thead>
							 <!--建立表頭 -->
							 <tr>
									<th colspan="3">聯合國水資源中心</th>
							 </tr>
							 <tr>
									<th>河流名稱</th>
									<th>國家</th>
									<th>洲名</th>
							 </tr>
						</thead>
						<tbody>
							 <!-- 建立表格本體 -->
							 <tr>
									<td>長江</td>
									<td>中國</td>
									<td>亞洲</td>
							 </tr>
							 <tr>
									<td>尼羅河</td>
									<td>埃及</td>
									<td>非洲</td>
							 </tr>
							 <tr>
									<td>亞馬遜河</td>
									<td>巴西</td>
									<td>南美洲</td>
							 </tr>
						</tbody>
						<tfoot>
							 <!-- 建立表尾 -->
							 <tr>
									<td colspan="3">製表2017年5月30日</td>
							 </tr>
						</tfoot>
				 </table>
			</body>
	 </html>

執行結果

範例 pythonBS4-11-3.py : 爬取河川資料，這程式會以字典方式列印出國家、河川資料。

# pythonBS4-11-3.py
import requests, bs4

url = 'htmlExampleBS4-05.html'
htmlFile = open(url, encoding='utf-8')              
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')

myRiver = []                        # 河川
tableObj = objSoup.find('table').find('tbody')
tables = tableObj.find_all('tr')
for table in tables:
		river = table.find('td')
		myRiver.append(river.text)

myCountry = []                      # 國家
for table in tables:
		countries = table.find_all('td')
		country = countries[1]
		myCountry.append(country.text)
		
print("國家 = ", myCountry)
print("河川 = ", myRiver)
data = dict(zip(myCountry, myRiver))
print(data)                         # 字典顯示結果

執行結果

國家 =  ['中國', '埃及', '巴西']
河川 =  ['長江', '尼羅河', '亞馬遜河']
{'中國': '長江', '埃及': '尼羅河', '巴西': '亞馬遜河'}

範例 pythonBS4-11-4.py : 使用 find_next_sibling() 修改 範例 pythonBS4-11-3.py。

# pythonBS4-11-4.py
import requests, bs4

url = 'htmlExampleBS4-05.html'
htmlFile = open(url, encoding='utf-8')              
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')

myRiver = []                                    # 河川
myCountry = []                                  # 國家
tableObj = objSoup.find('table').find('tbody')
tables = tableObj.find_all('tr')
for table in tables:
		river = table.find('td')
		myRiver.append(river.text)
		country = river.find_next_sibling('td')     # 下一個節點
		myCountry.append(country.text)
		
print("國家 = ", myCountry)
print("河川 = ", myRiver)
data = dict(zip(myCountry, myRiver))
print(data)                                     # 字典顯示結果

執行結果

國家 =  ['中國', '埃及', '巴西']
河川 =  ['長江', '尼羅河', '亞馬遜河']
{'中國': '長江', '埃及': '尼羅河', '巴西': '亞馬遜河'}

範例 pythonBS4-11-5.py : 使用 find_next_sibling() 和 find_previous_sibling() 方法，建立洲別和河川的字典。

# pythonBS4-11-5.py
import requests, bs4

url = 'htmlExampleBS4-05.html'
htmlFile = open(url, encoding='utf-8')              
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')

myRiver = []                                    # 河川
myState = []                                    # 洲名
tableObj = objSoup.find('table').find('tbody')
tables = tableObj.find_all('tr')

for table in tables:
		countries = table.find_all('td')
		country = countries[1]                      # 國家節點
		river = country.find_previous_sibling('td') # 前一個節點
		myRiver.append(river.text)
		state = country.find_next_sibling('td')     # 下一個節點
		myState.append(state.text)
		
print("洲名 = ", myState)
print("河川 = ", myRiver)
data = dict(zip(myState, myRiver))
print(data)

執行結果

洲名 =  ['亞洲', '非洲', '南美洲']
河川 =  ['長江', '尼羅河', '亞馬遜河']
{'亞洲': '長江', '非洲': '尼羅河', '南美洲': '亞馬遜河'}

範例 pythonBS4-11-6.py : 使用 find_next_siblings() 和 find_previous_siblings() 方法，爬取同階層的以下所有節點，然後再爬取第3個節點的前面所有的2個節點。

# pythonBS4-11-6.py
import requests, bs4

url = 'htmlExampleBS4-03.html'
htmlFile = open(url, encoding='utf-8')              
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')

titleObj = objSoup.find('h2')                       # h2標題
title = titleObj.find_next_siblings('h2')           # 下一系列節點
print('find_next_siblings     = ', title)

titleObj = objSoup.find_all('h2')
title = titleObj[2].find_previous_siblings('h2')    # 前一系列節點
print('find_previous_siblings = ', title)

執行結果

find_next_siblings     =  [<h2>台灣人口排名</h2>, <h2>台灣最健康大學排名</h2>]
find_previous_siblings =  [<h2>台灣人口排名</h2>, <h2>台灣夜市排名</h2>]

範例 pythonBS4-11-7.py : 先列出尼羅河，再爬取並印出父節點。

# pythonBS4-11-7.py
import requests, bs4

url = 'htmlExampleBS4-05.html'
htmlFile = open(url, encoding='utf-8')              
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')

myRiver = []                        # 河川
tableObj = objSoup.find('table').find('tbody')
tables = tableObj.find_all('tr')
river = tables[1].find('td')
print(river.text)

river_parent = river.parent()
print(river_parent)

執行結果

尼羅河
[<td>尼羅河</td>, <td>埃及</td>, <td>非洲</td>]

範例 pythonBS4-11-8.py : 先列出尼羅河，再爬取並印出父節點的前一行節點，然後後一行節點。

# pythonBS4-11-8.py
import requests, bs4

url = 'htmlExampleBS4-05.html'
htmlFile = open(url, encoding='utf-8')              
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')

myRiver = []                        # 河川
tableObj = objSoup.find('table').find('tbody')
tables = tableObj.find_all('tr')
river = tables[1].find('td')
print(river.text)

previous_row = river.parent.find_previous_sibling()
print(previous_row)
next_row = river.parent.find_next_sibling()
print(next_row)

執行結果

尼羅河
<tr>
<td>長江</td>
<td>中國</td>
<td>亞洲</td>
</tr>
<tr>
<td>亞馬遜河</td>
<td>巴西</td>
<td>南美洲</td>
</tr>

範例 pythonBS4-11-9.py : 先到長江，先到父階層，然後印出之後的所有節點。再到亞馬遜河，然後印出之前的所有節點。

# pythonBS4-11-9.py
import requests, bs4

url = 'htmlExampleBS4-05.html'
htmlFile = open(url, encoding='utf-8')              
objSoup = bs4.BeautifulSoup(htmlFile, 'lxml')

myRver = []                        # 河川
tableObj = objSoup.find('table').find('tbody')
tables = tableObj.find_all('tr')
river = tables[0].find('td')
print(river.text)
previous_rows = river.parent.find_next_siblings()
print(previous_rows)
print('-'*70)
river = tables[2].find('td')
print(river.text)
next_rows = river.parent.find_previous_siblings()
print(next_rows)

執行結果

江
[<tr>
<td>尼羅河</td>
<td>埃及</td>
<td>非洲</td>
</tr>, <tr>
<td>亞馬遜河</td>
<td>巴西</td>
<td>南美洲</td>
</tr>]
----------------------------------------------------------------------
亞馬遜河
[<tr>
<td>尼羅河</td>
<td>埃及</td>
<td>非洲</td>
</tr>, <tr>
<td>長江</td>
<td>中國</td>
<td>亞洲</td>
</tr>]

範例 pythonBS4-12.py : 到上奇資訊的網站下載所有的圖片，並放到目錄 "outputPythonBS4" 內。

# pythonBS4-12.py
import bs4, requests, os

url = 'https://www.grandtech.com/'                  # 上奇資訊網頁
html = requests.get(url)
print("網頁下載中 ...")
html.raise_for_status()                             # 驗證網頁是否下載成功                      
print("網頁下載完成")

destDir = 'outputPythonBS4'                                 # 設定未來儲存圖片的資料夾
if os.path.exists(destDir) == False:
		os.mkdir(destDir)                               # 建立資料夾供未來儲存圖片

objSoup = bs4.BeautifulSoup(html.text, 'lxml')      # 建立BeautifulSoup物件

imgTag = objSoup.select('img')                      # 搜尋所有圖片檔案
print("搜尋到的圖片數量 = ", len(imgTag))           # 列出搜尋到的圖片數量
if len(imgTag) > 0:                                 # 如果有找到圖片則執行下載與儲存
		for i in range(len(imgTag)):                    # 迴圈下載圖片與儲存
				imgUrl = imgTag[i].get('src')               # 取得圖片的路徑
				print("%s 圖片下載中 ... " % imgUrl)
				finUrl = url + imgUrl                       # 取得圖片在Internet上的路徑
				print("%s 圖片下載中 ... " % finUrl)
				picture = requests.get(finUrl)              # 下載圖片
				picture.raise_for_status()                  # 驗證圖片是否下載成功
				print("%s 圖片下載成功" % finUrl)

				# 先開啟檔案, 再儲存圖片
				pictFile = open(os.path.join(destDir, os.path.basename(imgUrl)), 'wb')
				for diskStorage in picture.iter_content(10240):
						pictFile.write(diskStorage)
				pictFile.close()

範例 pythonBS4-13.py : 到網站下載所有的圖片，並放到目錄 "outputPythonBS4" 內，使用偽裝伺服器的 header 宣告。

# pythonBS4-13.py
import bs4, requests, os

headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64)\
						AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101\
						Safari/537.36', }
url = 'http://aaa.24ht.com.tw/'                     # 這個伺服器會擋住網頁
html = requests.get(url, headers=headers)           
print("網頁下載中 ...")
html.raise_for_status()                             # 驗證網頁是否下載成功                      
print("網頁下載完成")

destDir = 'outputPythonBS4'                                 # 設定儲存資料夾
if os.path.exists(destDir) == False:
		os.mkdir(destDir)                               # 建立目錄供未來儲存圖片

objSoup = bs4.BeautifulSoup(html.text, 'lxml')      # 建立BeautifulSoup物件

imgTag = objSoup.select('img')                      # 搜尋所有圖片檔案
print("搜尋到的圖片數量 = ", len(imgTag))           # 列出搜尋到的圖片數量
if len(imgTag) > 0:                                 # 如果有找到圖片則執行下載與儲存
		for i in range(len(imgTag)):                    # 迴圈下載圖片與儲存
				imgUrl = imgTag[i].get('src')               # 取得圖片的路徑
				print("%s 圖片下載中 ... " % imgUrl)
				finUrl = url + imgUrl                       # 取得圖片在Internet上的路徑
				print("%s 圖片下載中 ... " % finUrl)
				picture = requests.get(finUrl, headers=headers) # 下載圖片
				picture.raise_for_status()                  # 驗證圖片是否下載成功
				print("%s 圖片下載成功" % finUrl)

				# 先開啟檔案, 再儲存圖片
				pictFile = open(os.path.join(destDir, os.path.basename(imgUrl)), 'wb')
				for diskStorage in picture.iter_content(10240):
						pictFile.write(diskStorage)
				pictFile.close()                            # 關閉檔案

範例 pythonBS4-14.py : 找出威力彩最新一期開獎結果，在程式第12行先找出 Class 是 "contents_box02" ，因為我們發現這裡面有包含最新一期開獎結果。程式第13行發現總共有4組串列，程式第14-15行則列出此4組串列。觀察這4組串列，發現威力彩是在第1組串列中，所以程式第18行 :

balls = dataTag[0].find_all('div', {'class':'ball_tx ball_green'})

dataTag[0] 即要找出第1串列
find_all : 找出所有 div 標籤，並且屬性的 class 是 ball_tx ball_green。

程式第20-21行是列出前6球是開出順序，程式第24-25行是列出第7球以後大小順序。程式第28行找出紅球特別號，因為只有一顆，所以用 find_all() 或 find() 都可以。

# pythonBS4-14.py
import bs4, requests

url = 'http://www.taiwanlottery.com.tw'
html = requests.get(url)
print("網頁下載中 ...")
html.raise_for_status()                             # 驗證網頁是否下載成功                      
print("網頁下載完成")

objSoup = bs4.BeautifulSoup(html.text, 'lxml')      # 建立BeautifulSoup物件

dataTag = objSoup.select('.contents_box02')         # 尋找class是contents_box02
print("串列長度", len(dataTag))
for i in range(len(dataTag)):                       # 列出含contents_box02的串列                 
		print(dataTag[i])
				
# 找尋開出順序與大小順序的球
balls = dataTag[0].find_all('div', {'class':'ball_tx ball_green'})
print("開出順序 : ", end='')
for i in range(6):                                  # 前6球是開出順序
		print(balls[i].text, end='   ')

print("\n大小順序 : ", end='')
for i in range(6,len(balls)):                       # 第7球以後是大小順序
		print(balls[i].text, end='   ')

# 找出第二區的紅球                   
redball = dataTag[0].find_all('div', {'class':'ball_red'})
print("\n第二區   :", redball[0].text)

執行結果

網頁下載中 ...
網頁下載完成
串列長度 4
<div class="contents_box02">
<div id="contents_logo_02"></div><div class="contents_mine_tx02"><span class="font_black15">110/10/11 第110000081
期 </span><span class="font_red14"><a href="Result_all.aspx#01">開獎結果</a></span></div><div class="contents_mine_tx04">開出順序<br/>大小順序<br/>第二區</div><div class="ball_tx ball_green">24 </div><div class="ball_tx ball_green">33 </div><div class="ball_tx ball_green">15 </div><div class="ball_tx ball_green">12 </div><div class="ball_tx ball_green">38 </div><div class="ball_tx ball_green">05 </div><div class="ball_tx ball_green">05 </div><div class="ball_tx ball_green">12 </div><div class="ball_tx ball_green">15 </div><div class="ball_tx ball_green">24 </div><div class="ball_tx ball_green">33 </div><div class="ball_tx ball_green">38 </div><div class="ball_red">02 </div>
</div>
<div class="contents_box02">
.....中間省略.....
</div>
開出順序 : 24    33    15    12    38    05
大小順序 : 05    12    15    24    33    38
第二區   : 02

範例 pythonBS4-15.py : 列出 Yahoo 焦點新聞標題和超連結。

# pythonBS4-15.py
import requests, bs4,re

htmlFile = requests.get('https://tw.yahoo.com/')
objSoup = bs4.BeautifulSoup(htmlFile.text, 'lxml')
headline_news = objSoup.find_all('a', class_='story-title')
for h in headline_news:
		print("焦點新聞 : " + h.text)
		print("新聞網址 : " + h.get('href'))

範例 pythonBS4-16.py : 列出自己的 IP，程式第13行 strip() 方法，會將字串頭尾的空白、換行符號去掉。

# pythonBS4-16.py
import requests
import bs4

# 使用自己的IP
headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64)\
						AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101\
						Safari/537.36', }
url = 'http://ip.filefab.com/index.php'
htmlFile = requests.get(url, headers=headers)
objSoup = bs4.BeautifulSoup(htmlFile.text, 'lxml')
ip = objSoup.find('h1', id='ipd')
print(ip.text.strip())

執行結果

Your IP address: 180.177.109.201

範例 pythonHash-01.py : 使用 md5() 方法列出英文字串 ‘National Taiwan University’ 的哈希值，同時列出 md5() 物件與哈希值的資料型態。

# pythonHash-01.py
import hashlib

data = hashlib.md5()                                # 建立data物件
data.update(b'National Taiwan University')    # 更新data物件內容

print('Hash Value         = ', data.digest())
print('Hash Value(16進位) = ', data.hexdigest())
print(type(data))                                   # 列出data資料型態
print(type(data.hexdigest()))                       # 列出哈希值資料型態

執行結果

Hash Value         =  b'1\xc1zR\xe9H\xe3\xa9\xc8Y\xe0\xd9\x0b,\xc6r'
Hash Value(16進位) =  31c17a52e948e3a9c859e0d90b2cc672
<class '_hashlib.HASH'>
<class 'str'>

範例 pythonHash-02.py : 建立中文字串 ‘國立台灣大學’的哈希值。

# pythonHash-02.py
import hashlib

data = hashlib.md5()                                # 建立data物件
school = '國立台灣大學'                             # 中文字串
data.update(school.encode('utf-8'))                 # 更新data物件內容

print('Hash Value         = ', data.digest())
print('Hash Value(16進位) = ', data.hexdigest())
print(type(data))                                   # 列出data資料型態
print(type(data.hexdigest()))                       # 列出哈希值資料型態

執行結果

Hash Value         =  b'\x114\xeb6\x9e\x97\xbfE\xf2=h\x1b\\\xdb\xbd\xe5'
Hash Value(16進位) =  1134eb369e97bf45f23d681b5cdbbde5
<class '_hashlib.HASH'>
<class 'str'>

範例 pythonHash-03.py : 在 Python 領域中最著名的學習格言是 Tim Peters 所寫的 Python 之禪( The Zen of Python)，我們將此內容放在 dataHash.txt，此檔案內容如下，請計算此檔案的哈希值。

# pythonHash-03.py
import hashlib

data = hashlib.md5()                                # 建立data物件
filename = "dataHash.txt"

with open(filename, "rb") as fileName:                    # 以二進位方式讀取檔案
		binaryText = fileName.read()
		data.update(binaryText)

print('Hash Value         = ', data.digest())
print('Hash Value(16進位) = ', data.hexdigest())
print(type(data))                                   # 列出data資料型態
print(type(data.hexdigest()))                       # 列出哈希值資料型態

執行結果

Hash Value         =  b'h\xf1$*\xdf\xe4\xf4\xcb\x0e*\xac&K\xa5r\xd7'
Hash Value(16進位) =  68f1242adfe4f4cb0e2aac264ba572d7
<class '_hashlib.HASH'>
<class 'str'>

範例 pythonHash-04.py : 使用 sha1() 方法重新設計範例 pythonHash-01.py。

# pythonHash-04.py
import hashlib

data = hashlib.sha1()                               # 建立data物件
data.update(b'National Taiwan University')    # 更新data物件內容

print('Hash Value         = ', data.digest())
print('Hash Value(16進位) = ', data.hexdigest())
print(type(data))                                   # 列出data資料型態
print(type(data.hexdigest()))                       # 列出哈希值資料型態

執行結果

Hash Value         =  b'|\x02H\xd4\\\xee\xea\x195!\xce\xf7\xb8\xd7\xc6\xc5q\xaa\xae8'
Hash Value(16進位) =  7c0248d45ceeea193521cef7b8d7c6c571aaae38
<class '_hashlib.HASH'>
<class 'str'>

範例 pythonHash-05.py : 列出你所使用的作業系統平台可以使用的哈希演算法。

# pythonHash-05.py
import hashlib

print(hashlib.algorithms_available)         # 列出此平台可使用的哈希演算法

執行結果

{'shake_128', 'shake_256', 'sha3_224', 'md5-sha1', 'sha3_384', 'mdc2', 'ripemd160', 'sha512', 'sha3_256', 'sha384', 'sha512_256', 'sha224', 'blake2s', 'whirlpool', 'sha1', 'md5', 'sha512_224', 'sha256', 'sha3_512', 'blake2b', 'md4', 'sm3'}

範例 pythonHash-06.py : 列出跨作業系統平台可以使用的哈希演算法。

# pythonHash-06.py
import hashlib

print(hashlib.algorithms_guaranteed)         # 列出跨平台可使用的哈希演算法

執行結果

{'sha3_384', 'sha224', 'shake_128', 'sha1', 'sha3_512', 'sha256', 'blake2b', 'md5', 'sha384', 'shake_256', 'sha3_256', 'blake2s', 'sha512', 'sha3_224'}

範例 pythonHash-07.py : 讀取環保署空氣品質指標 AQI ，然後存入 aqi.json 檔案。

# pythonHash-07.py
import requests
import json

url = 'http://opendata.epa.gov.tw/webapi/Data/REWIQA/?$orderby=SiteName&$\
skip=0&$top=1000&format=json'
try:
		aqijsons = requests.get(url)                # 將檔案下載至aqijsons
		print('下載成功')
except Exception as err:
		print('下載失敗')
print(aqijsons.text)                            # 列印所下載的json檔案         
fn = "aqi.json"                                 # 建立欲儲存的json檔案 
with open(fn, 'w') as f:
		json.dump(aqijsons.json(),f)                # 寫入json檔案至aqi.json

範例 pythonHash-08.py : 讀取環保署下載的空氣品質指標 aqi.json檔案，只列出城市名稱、站台ID、PM2.5值、站台名稱。

# pythonHash-08.py
import json

fileName = 'aqi.json'
with open(fileName, encoding='utf-8') as fileObj:
		getDatas = json.load(fileObj)                     # 讀json檔案

for getData in getDatas:
		county = getData['County']                      # 城市名稱
		sitename = getData['SiteName']                  # 站台名稱
		siteid = getData['SiteId']                      # 站台ID
		pm25 = getData['PM2.5']                         # PM2.5值    
		print('城市名稱 = %3s  站台ID = %3s  PM2.5值 = %2s  站台名稱 = %s ' %
					(county, siteid, pm25, sitename))

執行結果

城市名稱 = 基隆市  站台ID =   1  PM2.5值 =  8  站台名稱 = 基隆
城市名稱 = 新北市  站台ID =   2  PM2.5值 =  8  站台名稱 = 汐止
城市名稱 = 新北市  站台ID =   3  PM2.5值 = 11  站台名稱 = 萬里
城市名稱 = 新北市  站台ID =   4  PM2.5值 =  5  站台名稱 = 新店
城市名稱 = 新北市  站台ID =   5  PM2.5值 =  4  站台名稱 = 土城
......................

範例 pythonHash-09.py : 讀取環保署下載的空氣品質指標 aqi.json檔案，只列出台北市的站台ID、PM2.5值、站台名稱。

# pythonHash-09.py
import json

fn = 'aqi.json'
with open(fn,encoding='utf-8') as fnObj:
		getDatas = json.load(fnObj)                         # 讀json檔案

for getData in getDatas:
		if getData['County'] == '臺北市':
				sitename = getData['SiteName']                  # 站台名稱
				siteid = getData['SiteId']                      # 站台ID
				pm25 = getData['PM2.5']                         # PM2.5值    
				print('站台ID =%3s  PM2.5值 =%3s  站台名稱 = %s ' %
							(siteid, pm25, sitename))

執行結果

站台ID = 11  PM2.5值 =  6  站台名稱 = 士林
站台ID = 12  PM2.5值 =  6  站台名稱 = 中山
站台ID = 13  PM2.5值 =  4  站台名稱 = 萬華
站台ID = 14  PM2.5值 =  6  站台名稱 = 古亭
站台ID = 15  PM2.5值 =  9  站台名稱 = 松山
站台ID = 16  PM2.5值 = 21  站台名稱 = 大同
站台ID = 64  PM2.5值 =  3  站台名稱 = 陽明

範例 pythonHash-10.py : 讀取環保署的空氣品質指標 aqi.json檔案，然後機算出此檔案的哈希值，並存入 outputPythonHash-01.txt 檔案中。

# pythonHash-10.py
import requests, hashlib, json

url = 'http://opendata.epa.gov.tw/webapi/Data/REWIQA/?$orderby=SiteName&$\
skip=0&$top=1000&format=json'
try:
		aqijsons = requests.get(url)                # 將檔案下載至htmlfile
		print('下載成功')
except Exception as err:
		print('下載失敗')

data = hashlib.md5()
data.update(aqijsons.text.encode('utf-8'))
hashdata = data.hexdigest()
print('環保署空氣品質指標 aqi.json 的哈希值 = ', hashdata)

fn = "outputPythonHash-01.txt"
with open(fn, 'w') as fileobj:
		fileobj.write(hashdata)

執行結果

下載成功
環保署空氣品質指標 aqi.json 的哈希值 =  bf3231d7fad0292d818aac7d6d669f00

範例 pythonHash-11.py : 檢測目前環保署的空氣品質指標 aqi.json檔案是否更新，判斷方式是使用哈希值，將這次讀取aqi.json檔案的哈希值跟上次的做比較，如果有更新，則儲存此JSON檔案為newAqi.json，並將哈希值存至 outputPythonHash-01.txt 檔案中。

# pythonHash-11.py
import requests, hashlib, json, os

def save_newaqi():
		'''儲存newaqi.json'''
		with open(fn, 'w') as f:
				json.dump(aqijsons.json(),f)            # 寫入json檔案至newaqi.json
def save_hashvalue():
		'''儲存哈希值至hashvalue.txt'''
		with open(fn_hash, 'w') as fileobj:
				fileobj.write(newhash)                  # 寫入哈希值至hashvalue.txt
def cal_hashvalue():
		''' 計算hash value''' 
		data = hashlib.md5()
		data.update(aqijsons.text.encode('utf-8'))
		hashdata = data.hexdigest()
		return hashdata                             # 傳回哈希值

url = 'http://opendata.epa.gov.tw/webapi/Data/REWIQA/?$orderby=SiteName&$\
skip=0&$top=1000&format=json'
try:
		aqijsons = requests.get(url)                # 將檔案下載至aqijsons
		print('下載成功')
except Exception as err:
		print('下載失敗')

fn = 'newaqi.json'
fn_hash = 'hashvalue.txt'                       # 檔案名稱
if os.path.exists(fn_hash):                     # 如果hashvalue.txt存在
		newhash = cal_hashvalue()                   # 計算新的哈希值hashvalue
		print('newhash = ',newhash)
# 開啟hashvalue.txt檔案
		with open(fn_hash, 'r') as fnObj:           # 讀取舊的哈希值
				oldhash =  fnObj.read()
				print('oldhash = ', oldhash)        
		if newhash == oldhash:                      # 比對新舊哈希值
				print('環保署空氣品質資料未更新')
		else:
				print('環保署空氣品質資料已經更新')
				save_newaqi()                           # 儲存newaqi.son
				save_hashvalue()                        # 儲存哈希值至hashvalue.txt
else:                                           # 如果hashvalue.txt不存在
		print('第一次啟動此程式')
		newhvalue = cal_hashvalue()
		print('哈希值 = ', newhvalue)
		save_hashvalue()                            # 儲存哈希值至hashvalue.txt
		save_newaqi()

範例 pythonHash-12.py : 讀取下載的空氣品質指標 aqi.csv 檔案，然後清洗數據，只留下城市名稱、站台ID、PM2.5值、站台名稱。

# pythonHash-12.py
import csv

infn = 'aqi.csv'                     # 來源檔案
outfn = 'outputPythonHash-02.csv'                               # 目的檔案
with open(infn,encoding='utf-8') as csvRFile:                        # 開啟csv檔案供讀取
		csvReader = csv.reader(csvRFile)                # 讀檔案建立Reader物件
		listReport = list(csvReader)                    # 將資料轉成串列 

newListReport = []                                  # 空串列
tmpList = []
for row in listReport:                              # 使用迴圈取新的欄位
		tmpList = [row[1],row[23],row[11],row[0]]
		newListReport.append(tmpList)

with open(outfn, 'w', newline = '',encoding='utf-8') as csvOFile:    # 開啟csv檔案供寫入
		csvWriter = csv.writer(csvOFile)                # 建立Writer物件   
		for row in newListReport:                       # 將串列寫入和列印
				csvWriter.writerow(row)                     # 寫入檔案
				if row[0] != 'County':                      # 不是標題
						print('城市名稱 =%4s  站台ID =%3s  PM2.5值 =%3s  站台名稱 = %s ' %
									(row[0], row[1], row[2], row[3]))

執行結果

城市名稱 = 基隆市  站台ID =  1  PM2.5值 =  8  站台名稱 = 基隆
城市名稱 = 新北市  站台ID =  2  PM2.5值 =  8  站台名稱 = 汐止
城市名稱 = 新北市  站台ID =  3  PM2.5值 = 11  站台名稱 = 萬里
城市名稱 = 新北市  站台ID =  4  PM2.5值 =  5  站台名稱 = 新店
城市名稱 = 新北市  站台ID =  5  PM2.5值 =  4  站台名稱 = 土城
.........以下省略........

範例 pythonHash-13.py : 讀取下載的空氣品質指標 aqi.csv 檔案，然後清洗數據，只留下台北市的站台ID、PM2.5值、站台名稱。

# pythonHash-13.py
import csv

infn = 'outputPythonHash-02.csv'                    # 來源檔案
with open(infn,encoding='utf-8') as csvRFile:       # 開啟csv檔案供讀取
		csvReader = csv.reader(csvRFile)                # 讀檔案建立Reader物件
		listReport = list(csvReader)                    # 將資料轉成串列 

for row in listReport:                              # 使用迴圈取新的欄位
		if row[0] == '臺北市':                         
				print('站台ID =%3s  PM2.5值 =%3s  站台名稱 = %s ' %
							(row[1], row[2], row[3]))

執行結果

站台ID = 11  PM2.5值 =  6  站台名稱 = 士林
站台ID = 12  PM2.5值 =  6  站台名稱 = 中山
站台ID = 13  PM2.5值 =  4  站台名稱 = 萬華
站台ID = 14  PM2.5值 =  6  站台名稱 = 古亭
站台ID = 15  PM2.5值 =  9  站台名稱 = 松山
站台ID = 16  PM2.5值 = 21  站台名稱 = 大同
站台ID = 64  PM2.5值 =  3  站台名稱 = 陽明

範例 pythonSelenium-02.py : 列出 webdriver 物件型態。

# pythonSelenium-02.py
from selenium import webdriver

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
print(type(browser))

執行結果

<class 'selenium.webdriver.firefox.webdriver.WebDriver'>

範例 pythonSelenium-03.py : 列出 webdriver 物件型態。

# pythonSelenium-03.py
from selenium import webdriver

dirverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Chrome(dirverPath)
print(type(browser))

執行結果

<class 'selenium.webdriver.chrome.webdriver.WebDriver'>

範例 pythonSelenium-04.py : 讓瀏覽器連上網頁與列印網頁標題。

# pythonSelenium-04.py
from selenium import webdriver

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://icook.tw/'
browser.get(url)                # 網頁下載至瀏覽器

執行結果

範例 pythonSelenium-04-1.py : 列出網頁的 HTML 原始碼。

# pythonSelenium-04-1.py
from selenium import webdriver

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://icook.tw/'
browser.get(url)                # 網頁下載至瀏覽器
print(browser.page_source)      # 列印網頁原始碼

執行結果

Firefox 瀏覽器會打開網頁 https://icook.tw/ ，並且列印網頁原始碼

範例 pythonSelenium-04-2.py : 列出 name、current_url、session_id 和 capabilities 屬性。

# pythonSelenium-04-1.py
from selenium import webdriver

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://icook.tw/'
browser.get(url)                # 網頁下載至瀏覽器
print(browser.page_source)      # 列印網頁原始碼

執行結果

瀏覽器名稱 =  firefox
網頁url    =  https://icook.tw/
網頁連線id =  ff973e9c-212b-4f1f-8a0a-c73aa97b0f33
瀏覽器功能 =
 {'acceptInsecureCerts': True, 'browserName': 'firefox', 'browserVersion': '93.0', 'moz:accessibilityChecks': False, 'moz:buildID': '20210927210923', 'moz:debuggerAddress': 'localhost:52392', 'moz:geckodriverVersion': '0.30.0', 'moz:headless': False, 'moz:processID': 7704, 'moz:profile': 'C:\\Users\\Administrator\\AppData\\Local\\Temp\\rust_mozprofileDZ6B33', 'moz:shutdownTimeout': 60000, 'moz:useNonSpecCompliantPointerOrigin': False, 'moz:webdriverClick': True, 'pageLoadStrategy': 'normal', 'platformName': 'windows', 'platformVersion': '10.0', 'proxy': {}, 'setWindowRect': True, 'strictFileInteractability': False, 'timeouts': {'implicit': 0, 'pageLoad': 300000, 'script': 30000}, 'unhandledPromptBehavior': 'dismiss and notify'}

範例 pythonSelenium-04-3.py : 每隔5秒瀏覽一個網站。

# pythonSelenium-04-3.py
from selenium import webdriver
import time

urls = ['https://icook.tw/',
				'http://www.mcut.edu.tw',
				'http://www.siliconstone.com']

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)

for url in urls:
		browser.get(url)                # 網頁下載至瀏覽器
		time.sleep(5)

browser.quit()

範例 pythonSelenium-05.py : 找不到符合條件的元素，造成程式結束的範例。

# pythonSelenium-05.py
from selenium import webdriver

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://icook.tw/'
browser.get(url)                # 網頁下載至瀏覽器

tag = browser.find_element_by_id('main')
print(tag.tag_name)

範例 pythonSelenium-06.py : 找不到符合條件的元素，執行例外處理。

# pythonSelenium-06.py
from selenium import webdriver

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://icook.tw/'
browser.get(url)                # 網頁下載至瀏覽器

try:
		tag = browser.find_element_by_id('main')
		print(tag.tag_name)
except:
		print("沒有找到相符的元素")

執行結果

沒有找到相符的元素

範例 pythonSelenium-07.py : 抓取不同元素的應用。

# pythonSelenium-07.py
from selenium import webdriver

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'http://127.0.0.1:5500/htmlExampleBS4-02.html'
browser.get(url)                # 網頁下載至瀏覽器

print("網頁標題內容是 = ", browser.title)

tag2 = browser.find_element_by_id('header')             # 傳回<h1 id='header'>
print("\n標籤名稱 = %s, 內容是 = %s " % (tag2.tag_name, tag2.text))

tag4 = browser.find_elements_by_tag_name('p')           # 傳回<p>
for t4 in tag4:
		print("標籤名稱 = %s, 內容是 = %s " % (t4.tag_name, t4.text))

tag5 = browser.find_elements_by_tag_name('img')         # 傳回<img>
for t5 in tag5:
		print("標籤名稱 = %s, 內容是 = %s " % (t5.tag_name, t5.get_attribute('src')))

執行結果

tag2 = browser.find_element_by_id('header')  # 傳回<h1 id='header'>
標籤名稱 = h1, 內容是 = 這是表頭區塊
tag4 = browser.find_elements_by_tag_name('p')  # 傳回<p>
標籤名稱 = p, 內容是 = 人物介紹-1
標籤名稱 = p, 內容是 = 人物介紹-2
標籤名稱 = p, 內容是 = 人物介紹-2
tag5 = browser.find_elements_by_tag_name('img')    # 傳回<img>
標籤名稱 = img, 內容是 = http://127.0.0.1:5500/media/components/user-1.jpg
標籤名稱 = img, 內容是 = http://127.0.0.1:5500/media/components/user-2.jpg
標籤名稱 = img, 內容是 = http://127.0.0.1:5500/media/components/user-3.jpg

範例 htmlExampleBS4-02.html : 簡單的 HTML 文件。

<!doctype html>
<html lang="zh-Hant-TW">
	<head>
		<meta charset="utf-8">
		<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet"
			integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">
		<title>htmlExampleBS4-02.html</title>
		<style>
			header {
				background-color: rgb(218, 180, 231);
				min-height: 100px;      }
			main {
				background-color: rgb(239, 238, 238);
				min-height: 300px;      }
			footer {
				min-height: 80px;      }
		</style>
	</head>
	<body>
		<header class="d-flex justify-content-center align-items-center">
			<h1 class="text-center" id='header'>這是表頭區塊</h1>
		</header>
		<main>
			<h1 class="text-center">這是主要內容區</h1>
			<section class="d-flex justify-content-center align-items-center">
				<div class="card" style="width: 18rem;">
					<img src="media/components/user-1.jpg" class="card-img-top" alt="card-1">
					<div class="card-body">
						<p class="card-text text-center" data-author='author1'>人物介紹-1</p>
					</div>
				</div>
				<div class="card" style="width: 18rem;">
					<img src="media/components/user-2.jpg" class="card-img-top" alt="card-2">
					<div class="card-body">
						<p class="card-text text-center" data-author='author2'>人物介紹-2</p>
					</div>
				</div>
				<div class="card" style="width: 18rem;">
					<img src="media/components/user-3.jpg" class="card-img-top" alt="card-2">
					<div class="card-body">
						<p class="card-text text-center" data-author='author3'>人物介紹-2</p>
					</div>
				</div>
			</section>
		</main>
		<footer class="bg-dark text-white d-flex justify-content-center align-items-center">
			<h3 class="text-center">這是表尾</h3>
		</footer>
		<!-- Bootstrap Bundle with Popper -->
		<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js"
			integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM"
			crossorigin="anonymous"></script>
	</body>
</html>

執行結果

此範例所有節點的樹狀結構如下:

範例 pythonSelenium-08.py : 使用 htmlExampleBS4-02.html 驗證相對路徑節點。

# pythonSelenium-08.py
from selenium import webdriver

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Chrome(executable_path=driverPath)
url = 'F:\GoogleDrive\Coding\CodeIndex\Example-Python\Examples\htmlExampleBS4-02.html'
browser.get(url)                # 網頁下載至瀏覽器

n1 = browser.find_element_by_xpath('//h1')
print(n1.text)
n2 = browser.find_element_by_xpath('//body/header/h1')
print(n2.text)
n3 = browser.find_element_by_xpath('//header/h1')
print(n3.text)
n4 = browser.find_element_by_xpath('//body/*/h1')
print(n4.text)

執行結果

  n1 = browser.find_element_by_xpath('//h1')
這是表頭區塊
	n2 = browser.find_element_by_xpath('//body/*/h1')
這是表頭區塊
	n3 = browser.find_element_by_xpath('//header/h1')
這是表頭區塊
	n4 = browser.find_element_by_xpath('//body/*/h1')
這是表頭區塊

範例 pythonSelenium-08-1.py : 使用 htmlExampleBS4-02.html 找出第一個 <p> ，<p>元素共有3個，第一個是 "人物介紹-1" 。

# pythonSelenium-08-1.py
from selenium import webdriver

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Chrome(executable_path=driverPath)
url = 'F:\GoogleDrive\Coding\CodeIndex\Example-Python\Examples\htmlExampleBS4-02.html'
browser.get(url)                # 網頁下載至瀏覽器

n1 = browser.find_element_by_xpath('//p')
print(n1.text)

執行結果

  n1 = browser.find_element_by_xpath('//p')
人物介紹-1

範例 pythonSelenium-08-2.py : 在 htmlExampleBS4-02.html 文件中<p>元素共有3個，找出第1、2個 <p> 元素。

# pythonSelenium-08-2.py
from selenium import webdriver

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Chrome(executable_path=driverPath)
url = 'F:\GoogleDrive\Coding\CodeIndex\Example-Python\Examples\htmlExampleBS4-02.html'
browser.get(url)                # 網頁下載至瀏覽器

n1 = browser.find_element_by_xpath("//section/div[1]/div/p")
print(n1.text)
n2 = browser.find_element_by_xpath("//section/div[2]/div/p")
print(n2.text)

執行結果

  n1 = browser.find_element_by_xpath("//section/div[1]/div/p")
人物介紹-1
	n2 = browser.find_element_by_xpath("//section/div[2]/div/p")
人物介紹-2

範例 pythonSelenium-08-3.py : 在 htmlExampleBS4-02.html 文件中<p>元素共有3個，找出第2、3個 <p> 元素。

# pythonSelenium-08-3.py
from selenium import webdriver

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Chrome(executable_path=driverPath)
url = 'F:\GoogleDrive\Coding\CodeIndex\Example-Python\Examples\htmlExampleBS4-02.html'
browser.get(url)                # 網頁下載至瀏覽器

n1 = browser.find_element_by_xpath("//section/*/*/p[@data-author='author2']")
print(n1.text)
n1 = browser.find_element_by_xpath("//section/*/*/p[@data-author='author3']")
print(n1.text)

執行結果

  n1 = browser.find_element_by_xpath("//section/*/*/p[@data-author='author2']")
人物介紹-2
	n1 = browser.find_element_by_xpath("//section/*/*/p[@data-author='author3']")
人物介紹-3

範例 pythonSelenium-08-4.py : 在 htmlExampleBS4-02.html 文件中，列出圖片的完整路徑。

# pythonSelenium-08-4.py
from selenium import webdriver

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Chrome(executable_path=driverPath)
url = 'F:\GoogleDrive\Coding\CodeIndex\Example-Python\Examples\htmlExampleBS4-02.html'
browser.get(url)                # 網頁下載至瀏覽器

pict = browser.find_element_by_xpath("//section/div/img")
print(pict.get_attribute('src'))

執行結果

  pict = browser.find_element_by_xpath("//section/div/img")
file:///F:/GoogleDrive/Coding/CodeIndex/Example-Python/Examples/media/components/user-1.jpg

範例 pythonSelenium-08-5.py : 以 htmlExampleBS4-02.html 為例，說明屬性的用法。

# pythonSelenium-08-5.py
from selenium import webdriver

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Chrome(executable_path=driverPath)
url = 'F:\GoogleDrive\Coding\CodeIndex\Example-Python\Examples\htmlExampleBS4-02.html'
browser.get(url)                # 網頁下載至瀏覽器

n1 = browser.find_element_by_xpath("//h1/em")
print('em          : ', n1.text)
n2 = browser.find_element_by_xpath("//h1")
print('h1          : ', n2.text)
n3 = browser.find_element_by_xpath("//h1")
print('textContent : ', n3.get_attribute('textContent'))
n4 = browser.find_element_by_xpath("//h1")
print('innerHTML : ', n4.get_attribute('innerHTML'))
n5 = browser.find_element_by_xpath("//h1")
print('outerHTML : ', n5.get_attribute('outerHTML'))

執行結果

  n1 = browser.find_element_by_xpath("//h1/em")
em :  表頭區塊
	n2 = browser.find_element_by_xpath("//h1")
h1 :  這是表頭區塊
	n3 = browser.find_element_by_xpath("//h1")
textContent :  這是表頭區塊
	n4 = browser.find_element_by_xpath("//h1")
innerHTML :  這是<em>表頭區塊</em>
	n5 = browser.find_element_by_xpath("//h1")
outerHTML :  <h1 class="text-center" id="header">這是<em>表頭區塊</em></h1>

範例 pythonSelenium-08-6.py : 以 htmlExampleBS4-02.html 為例，使用 contains() 方法，找出節點的 outerHTML 和 data-author 屬性。

# pythonSelenium-08-6.py
from selenium import webdriver

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Chrome(executable_path=driverPath)
url = 'F:\GoogleDrive\Coding\CodeIndex\Example-Python\Examples\htmlExampleBS4-02.html'
browser.get(url)                # 網頁下載至瀏覽器

n = browser.find_element_by_xpath("//div[@class='card-body']//p[contains(text(),'人物介紹-2')]")
print(n.get_attribute('outerHTML'))
print(n.get_attribute('data-author'))

執行結果

<p class="card-text text-center" data-author="author2">人物介紹-2</p>
author2

範例 pythonSelenium-08-7.py : 以重新設計範例 pythonSelenium-08-6.py ，增加隱藏參數與等待網頁載入設定。

# pythonSelenium-08-7.py
from selenium import webdriver

driverPath = 'c:\driver\chromedriver.exe'
headless = webdriver.ChromeOptions()
headless.add_argument('headless')   # 隱藏參數
browser = webdriver.Chrome(executable_path=driverPath, options=headless)
url = 'F:\GoogleDrive\Coding\CodeIndex\Example-Python\Examples\htmlExampleBS4-02.html'
browser.implicitly_wait(5)          # 等待網頁載入
browser.get(url)                    # 網頁下載至瀏覽器

n = browser.find_element_by_xpath("//div[@class='card-body']//p[contains(text(),'人物介紹-2')]")
print(n.get_attribute('outerHTML'))
print(n.get_attribute('data-author'))

執行結果

結果同範例 pythonSelenium-08-6.py，並且不再出現瀏覽器。

範例 pythonSelenium-09.py : 進入深智數位網頁，經過5秒後(第12行)，程式設計自動點選 "深智數位緣起" 超連結，我們設計程式暫停5秒，主要是讓讀者可以體會網頁的變化。

# pythonSelenium-09.py
from selenium import webdriver
import time

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://deepmind.com.tw/'
browser.get(url)                # 網頁下載至瀏覽器

eleLink = browser.find_element_by_link_text('深智數位緣起')
print(type(eleLink))            # 列印eleLink資料類別
time.sleep(5)                   # 暫停5秒
eleLink.click()

執行結果

<class 'selenium.webdriver.remote.webelement.WebElement'>

範例 pythonSelenium-10.py : 用 Python 填寫表單，所填寫的表單是搜尋 "2330"，本程式會經過5秒自動送出，我們在執行結果中印出來填寫表單以及送出的結果。

# pythonSelenium-10.py
from selenium import webdriver
import time

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://www.twse.com.tw/zh/'
browser.get(url)                    # 網頁下載至瀏覽器

txtBox = browser.find_element_by_name('stockNo')
txtBox.send_keys('2330')          # 輸入表單資料
time.sleep(5)                       # 暫停5秒
txtBox.submit()                     # 送出表單

執行結果 : 上述表單是自動填寫，經過5秒後可以得到下列結果 :

範例 pythonSelenium-11.py : 這個程式在執行時，首先顯示最上方的網頁內容，經過3秒後會往下捲動一頁，再過3秒會捲動到最下方，經過3秒可以往上捲動，再過3秒可以將網頁捲動到最上方。程式第10行，先搜尋 “body” ，這是網頁設計主體的開始標籤，相當於在網頁的最上方。

# pythonSelenium-11.py
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://www.twse.com.tw/zh/'
browser.get(url)                    # 網頁下載至瀏覽器
ele = browser.find_element_by_tag_name('body')
time.sleep(3)
ele.send_keys(Keys.PAGE_DOWN)       # 網頁捲動到下一頁
time.sleep(3)
ele.send_keys(Keys.END)             # 網頁捲動到最底端
time.sleep(3)
ele.send_keys(Keys.PAGE_UP)         # 網頁捲動到上一頁
time.sleep(3)
ele.send_keys(Keys.HOME)            # 網頁捲動到最上端

執行結果 : 每次間隔3秒，讀者可以觀察頁面內容的捲動方式。

範例 pythonSelenium-12.py : 更新網頁與關閉網頁的應用。

# pythonSelenium-12.py
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
url = 'https://www.twse.com.tw/zh/'
browser.get(url)                    # 網頁下載至瀏覽器

time.sleep(3)
browser.refresh()                   # 更新網頁
time.sleep(3)
browser.quit()                      # 關閉網頁

執行結果 : 網頁下載後3秒可以更新網頁內容，再過3秒後關閉瀏覽器。

範例 pythonSelenium-13.py : 設計自動登入 Google 的系統。

from selenium import webdriver
import time

url = 'https://www.google.com'
email = input('請輸入你的Google Email的帳號 : ')
pwd = input('請輸入你的Google Email的密碼 : ')

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
browser.get(url)                    # 網頁下載至瀏覽器

browser.find_element_by_link_text('登入').click()

執行結果 : 網頁出現以下畫面。

範例 pythonSelenium-14.py : 設計自動登入 Google 的系統，繼續範例 pythonSelenium-13.py ，增加可以輸入Email的帳號。

# pythonSelenium-14.py
from selenium import webdriver
import time

url = 'https://www.google.com'
email = input('請輸入你的Google Email的帳號 : ')
pwd = input('請輸入你的Google Email的密碼 : ')

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
browser.get(url)                    # 網頁下載至瀏覽器

browser.find_element_by_link_text('登入').click()
browser.find_element_by_id('identifierId').send_keys(email) # 輸入帳號
time.sleep(3)

執行結果 : 網頁出現以下面。

接下來，我們必須解析 "繼續" 按鈕，重複類似以上步驟。

範例 pythonSelenium-15.py : 設計自動登入 Google 的系統，繼續範例 pythonSelenium-14.py ，增加 "繼續" 按鈕。

# pythonSelenium-15.py
from selenium import webdriver
import time

url = 'https://www.google.com'
email = input('請輸入你的Google Email的帳號 : ')
pwd = input('請輸入你的Google Email的密碼 : ')

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
browser.get(url)                    # 網頁下載至瀏覽器

browser.find_element_by_link_text('登入').click()
browser.find_element_by_id('identifierId').send_keys(email) # 輸入帳號
time.sleep(3)

# 按繼續鈕
browser.find_element_by_xpath("//span[@class='VfPpkd-vQzf8d']").click()
time.sleep(3)
time.sleep(3)

接下來，我們必須解析輸入 "密碼" 欄位，重複類似以上步驟。

範例 pythonSelenium-16.py : 設計自動登入 Google 的系統，繼續範例 pythonSelenium-15.py ，增加輸入 "密碼" 欄位。

# pythonSelenium-16.py
from selenium import webdriver
import time

url = 'https://www.google.com'
email = input('請輸入你的Google Email的帳號 : ')
pwd = input('請輸入你的Google Email的密碼 : ')

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
browser.get(url)                    # 網頁下載至瀏覽器

browser.find_element_by_link_text('登入').click()
browser.find_element_by_id('identifierId').send_keys(email) # 輸入帳號
time.sleep(3)

# 按繼續鈕
browser.find_element_by_xpath("//span[@class='VfPpkd-vQzf8d']").click()
time.sleep(3)

# 輸入密碼
browser.find_element_by_xpath("//input[@type='password']").send_keys(pwd)
time.sleep(3)

接下來，我們必須解析輸入 "密碼" 後的 "繼續" 按紐，重複類似以上步驟。

範例 pythonSelenium-17.py : 設計自動登入 Google 的系統，繼續範例 pythonSelenium-16.py ，增加 "繼續" 按鈕。

# pythonSelenium-17.py
from selenium import webdriver
import time

url = 'https://www.google.com'
email = input('請輸入你的Google Email的帳號 : ')
pwd = input('請輸入你的Google Email的密碼 : ')

driverPath = 'c:\driver\chromedriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
browser.get(url)                    # 網頁下載至瀏覽器

browser.find_element_by_link_text('登入').click()
browser.find_element_by_id('identifierId').send_keys(email) # 輸入帳號
time.sleep(3)

# 按繼續鈕
browser.find_element_by_xpath("//span[@class='VfPpkd-vQzf8d']").click()
time.sleep(3)

# 輸入密碼
browser.find_element_by_xpath("//input[@type='password']").send_keys(pwd)
time.sleep(3)

# 按繼續鈕
browser.find_element_by_xpath("//span[@class='RveJvd snByac']").click()
time.sleep(3)

們就會看到登入 Google 的網頁如下，右上方已經改為我們的名字與圖示。

範例 pythonSelenium-18.py : 自動化下載環保署空氣品質資料。

# pythonSelenium-18.py
from selenium import webdriver
import time

url = 'https://data.epa.gov.tw/dataset/aqx_p_434/resource/b50c2109-6b1b-4413-9037-658760f7c969'

driverPath = 'c:\driver\geckodriver.exe'
browser = webdriver.Firefox(executable_path=driverPath)
browser.get(url)                    # 網頁下載至瀏覽器

browser.find_element_by_link_text('JSON').click()      # 按JSON鈕
time.sleep(5)

browser.find_element_by_link_text('XML').click()        # 按XML鈕
time.sleep(5)

browser.find_element_by_link_text('CSV').click()        # 按CSV鈕
time.sleep(5)

範例 pythonSQLite-01.py : 建立一個新的資料庫 myData.db ，我們習慣使用的 db 當副檔名稱。

# pythonSQLite-01.py
import sqlite3
conn = sqlite3.connect("myData.db")
conn.close()

執行結果

這個程式沒有執行結果，不過可以看到資料夾內有新建的資料庫檔案 myData.db

範例 pythonSQLite-02.py : 建立一個新的資料庫 myDatabase.db ，此資料庫有個表單，名稱為 "students"。

# pythonSQLite-02.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")   # 資料庫連線
cursor = conn.cursor()
sql = '''Create table students(  
				id int,
				name text,
				gender text)'''
cursor.execute(sql)                     # 執行SQL指令
cursor.close()                          # 關閉
conn.close()                            # 關閉資料庫連線

範例 pythonSQLite-03.py : 在範例 pythonSQLite-02.py 中，我們使用 cursor() 方法建立物件，然後再啟動 execute() 方法，我們也可以省略 cursor()。

# pythonSQLite-03.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
sql = '''Create table students(  
				id int,
				name text,
				gender text)'''
conn.execute(sql)                       # 執行SQL指令
conn.close()                            # 關閉資料庫連線

範例 pythonSQLite-03-1.py : id 使用自動新增 1 方式處理，此程式會建立 student2 表單。

# pythonSQLite-03.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
sql = '''Create table student2(  
				id INTEGER PRIMARY KEY AUTOINCREMENT,
				name TEXT,
				gender TEXT)'''
conn.execute(sql)                       # 執行SQL指令
conn.close()                            # 關閉資料庫連線

範例 pythonSQLite-04.py : 由螢幕輸入 student 表單，此輸入迴圈再輸入完每一筆紀錄會詢問 "繼續(y/n)?" ，按 n 結束程式，也可按 Ctrl+C 強制結束程式。

# pythonSQLite-04.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
print("請輸入myInfo資料庫students表單資料")
while True:
		new_id = int(input("請輸入id : "))  # 轉成整數
		new_name = input("請輸入name : ")
		new_gender = input("請輸入gender : ")
		x = (new_id, new_name, new_gender)
		sql = '''insert into students values(?,?,?)'''  
		conn.execute(sql,x)
		conn.commit()                       # 更新資料庫
		again = input("繼續(y/n)? ")
		if again[0].lower() == "n":
				break
conn.close()

執行結果

請輸入myInfo資料庫students表單資料
請輸入id : 1
請輸入name : yang
請輸入gender : m
繼續(y/n)? n

範例 pythonSQLite-04-1.py : 使用 id 自動新增 1 方式，建立 student2 表單。

# pythonSQLite-04-1.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
print("請輸入myInfo資料庫student2表單資料")
while True:
		n_name = input("請輸入name : ")
		n_gender = input("請輸入gender : ")
		x = (n_name, n_gender)
		sql = '''insert into student2(name, gender) values(?,?)'''  
		conn.execute(sql,x)
		conn.commit()                       # 更新資料庫
		again = input("繼續(y/n)? ")
		if again[0].lower() == "n":
				break
conn.close()                            # 關閉資料庫連線

範例 pythonSQLite-05.py : 建列出所有 students 表單內容。

# pythonSQLite-05.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
results = conn.execute("SELECT * from students")
for record in results:
		print("id = ", record[0])
		print("name = ", record[1])
		print("gender = ", record[2])
conn.close()   
conn.close()

執行結果

id =  1
name =  yang
gender =  m

範例 pythonSQLite-06.py : 以元組元素方式列出所有 students 表單內容。

# pythonSQLite-06.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
results = conn.execute("SELECT * from students")
allstudents = results.fetchall()        # 結果轉成元素是元組的串列
print(type(allstudents))
for student in allstudents:
		print(student)
conn.close()                            # 關閉資料庫連線

執行結果

<class 'list'>
(1, 'yang', 'm')
(2, 'aron', 'm')
(5, 'linda', 'f')

範例 pythonSQLite-07.py : 只列出 students 表單中的 name 欄位內容。

# pythonSQLite-07.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
results = conn.execute("SELECT name from students")
allstudents = results.fetchall()        # 結果轉成元素是元組的串列
for student in allstudents:
		print(student)
conn.close()                            # 關閉資料庫連線

執行結果

('yang',)
('aron',)
('linda',)

範例 pythonSQLite-08.py : 查詢所有男生紀錄( where gender = "m" )，只列出 students 表單中的 name、gender 欄位內容。

# pythonSQLite-08.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
sql = '''SELECT name, gender
				from students
				where gender = "m"'''
results = conn.execute(sql)
allstudents = results.fetchall()        # 結果轉成元素是元組的串列
for student in allstudents:
		print(student)
conn.close()                            # 關閉資料庫連線

執行結果

('yang', 'm')
('aron', 'm')

範例 pythonSQLite-09.py : 將 id = 1 的紀錄 name 名字改為 "Tomy"。

# pythonSQLite-09.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
sql = '''UPDATE students
				set name = "Tomy"
				where id = 1'''
results = conn.execute(sql)
conn.commit()                           # 更新資料庫
results = conn.execute("SELECT name from students")
allstudents = results.fetchall()        # 結果轉成元素是元組的串列
for student in allstudents:
		print(student)
conn.close()                            # 關閉資料庫連線

執行結果

('Tomy',)
('aron',)
('linda',)

範例 pythonSQLite-10.py : 將 id = 2 的紀錄刪除。

# pythonSQLite-10.py
import sqlite3
conn = sqlite3.connect("myDatabase.db")     # 資料庫連線
sql = '''DELETE
				from students
				where id = 2'''
results = conn.execute(sql)
conn.commit()                           # 更新資料庫
results = conn.execute("SELECT name from students")
allstudents = results.fetchall()        # 結果轉成元素是元組的串列
for student in allstudents:
		print(student)
conn.close()                            # 關閉資料庫連線

執行結果

('Tomy',)
('linda',)

範例 pythonSQLite-11.py : 除了在 Python 視窗列出上述各行政區域男性女性人口數，也將列出總人口數資訊，同時我們也將建立 SQLite 的 population.db 資料庫檔案，這個檔案中有population 表單，這個表單欄位資訊如下 :

area TEXT : 行政區名稱
male int : 男性人數
female int : 女性人數
total int : 總人數

所有人口資訊也將儲存到 population 表單。

# pythonSQLite-11.py
import sqlite3
import csv
import matplotlib.pyplot as plt

conn = sqlite3.connect("populations.db")    # 資料庫連線
sql = '''Create table population( 
				area TEXT,
				male int,                     
				female int,
				total int)'''
conn.execute(sql)                           # 執行SQL指令

fn = 'Taipei_Population.csv'
with open(fn) as csvFile:                   # 儲存在SQLite
		csvReader = csv.reader(csvFile)
		listCsv = list(csvReader)               # 轉成串列
		csvData = listCsv[4:]                   # 切片刪除前4 rows
		for row in csvData:
				area = row[0]                       # 區名稱
				male = int(row[7])                  # 男性人數
				female = int(row[8])                # 女性人數
				total = int(row[6])                 # 總人數
				x = (area, male, female, total)
				sql = '''insert into population values(?,?,?,?)'''
				conn.execute(sql,x)
				conn.commit()

results = conn.execute("SELECT * from population")
for record in results:
		print("區域       = ", record[0])
		print("男性人口數 = ", record[1])
		print("女性人口數 = ", record[2])
		print("總計人口數 = ", record[3])
			 
conn.close()      # 關閉資料庫連線

執行結果

區域       =    松山區
男性人口數 =  96357
女性人口數 =  109276
總計人口數 =  205633
區域       =    信義區
男性人口數 =  106330
女性人口數 =  116783
總計人口數 =  223113
區域       =    大安區
男性人口數 =  143905
女性人口數 =  164781
總計人口數 =  308686
........部分省略.....

範例 pythonSQLite-12.py : 讀取 population.db 資料庫檔案，列出 population 表單台北市 2019年男性、女性與總計人口數，用折線圖表達。

# pythonSQLite-12.py
import sqlite3
import matplotlib.pyplot as plt

conn = sqlite3.connect("populations.db")    # 資料庫連線
results = conn.execute("SELECT * from population")

area, male, female, total = [], [], [], []
for record in results:                      # 將人口資料放入串列
		area.append(record[0])
		male.append(record[1])
		female.append(record[2])
		total.append(record[3])       
conn.close()                                # 關閉資料庫連線

plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']        # 使用黑體
seq = area
linemale, = plt.plot(seq, male, '-*', label='男性人口數')
linefemale, = plt.plot(seq, female, '-o', label='女性人口數')
linetotal, = plt.plot(seq, total, '-^', label='總計人口數')

plt.legend(handles=[linemale, linefemale, linetotal], loc='best')
plt.title(u"台北市", fontsize=24)
plt.xlabel("2019年", fontsize=14)
plt.ylabel("人口數", fontsize=14)
plt.show()

執行結果

範例 pythonStock-01.py : 設計上櫃股票，股票代號是 6488 的 2021 年 9 月，每天最高價、最低價、收盤價的股票走勢圖。這個程式設計中，我們先將所讀取的 csv 檔案轉成串列 list，然後使用串列切片觀念，刪除前5行和最後一行。

# pythonStock-01.py
import csv
import matplotlib.pyplot as plt
from datetime import datetime

fn = 'ST43_6488_202109.csv'
with open(fn) as csvFile:
    csvReader = csv.reader(csvFile)
    listCsv = list(csvReader)                   # 轉成串列
    csvData = listCsv[5:-1]                     # 切片刪除非成交資訊
    dates, highs, lows, prices = [], [], [], [] # 設定空串列
    for row in csvData:
        try:
            datestr = row[0].replace('110','2021')
            currentDate = datetime.strptime(datestr, "%Y/%m/%d")
            high = float(row[4])                # 設定最高價
            low = float(row[5])                 # 設定最低價
            price = float(row[6])               # 設定收盤價
        except Exception:
            print('有缺值')
        else:
            highs.append(high)                  # 儲存最高價
            lows.append(low)                    # 儲存最低價
            prices.append(price)                # 儲存收盤價
            dates.append(currentDate)           # 儲存日期
       
fig = plt.figure(dpi=80, figsize=(12, 8))       # 設定繪圖區大小
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']        # 使用黑體
plt.plot(dates, highs, '-*', label='High')      # 繪製最高價
plt.plot(dates, lows, '-o', label='Low')        # 繪製最低價
plt.plot(dates, prices, '-^',   label='Price')  # 繪製收盤價
plt.legend(loc='best')
fig.autofmt_xdate()                         # 日期旋轉
plt.title("6488 環球晶 , 2021 年 9 月", fontsize=24)
plt.xlabel("", fontsize=14)
plt.ylabel("股 價", fontsize=14)
plt.tick_params(axis='both', labelsize=12, color='red')
plt.show()

執行結果

範例 pythonStock-02.py : 設計上市股票台泥股票代號 1101 的 1991 年至 2020 年，每年最高價、最低價、收盤平均價的股票走勢圖。

# pythonStock-02.py
import csv
import matplotlib.pyplot as plt
from datetime import datetime

fn = 'FMNPTK_1101.csv'
with open(fn) as csvFile:
    csvReader = csv.reader(csvFile)
    listCsv = list(csvReader)                   # 轉成串列
    csvData = listCsv[2:-5]                     # 切片刪除非成交資訊
    years, highs, lows, prices = [], [], [], [] # 設定空串列
    for row in csvData:
        try:
            year = int(row[0]) + 1911
            high = float(row[4])                # 設定最高價
            low = float(row[6])                 # 設定最低價
            price = float(row[8])               # 設定收盤平均價
        except Exception:
            print('有缺值')
        else:
            highs.append(high)                  # 儲存最高價
            lows.append(low)                    # 儲存最低價
            prices.append(price)                # 儲存收盤平均價
            years.append(year)                  # 儲存日期
       
fig = plt.figure(dpi=80, figsize=(12, 8))       # 設定繪圖區大小
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']        # 使用黑體
plt.plot(years, highs, '-*', label='最高價')      # 繪製最高價
plt.plot(years, lows, '-o', label='最低價')        # 繪製最低價
plt.plot(years, prices, '-^',   label='收盤平均價')  # 繪製收盤平均價
plt.legend(loc='best')
fig.autofmt_xdate()                         # 日期旋轉
plt.title("1101 台泥，1991 - 2020 年度收盤價", fontsize=24)
plt.xlabel("", fontsize=14)
plt.ylabel("價格", fontsize=14)
plt.tick_params(axis='both', labelsize=12, color='red')
plt.show()

執行結果

範例 pythonStock-03.py : 讀取MI_5MINS.csv 檔案，每30分鐘將一筆數據的時間和累積成交述欄位寫入 MI_30MINS.csv 內。

# pythonStock-03.py
import csv

fn = 'MI_5MINS.csv'                             # 台灣證劵交易所資料
out = 'MI_30MINS.csv'                           # 每30分鐘資料
with open(out, 'w', newline='') as csvOut:
    csvWriter = csv.writer(csvOut)
    csvWriter.writerow(["時間", "累積成交數"])
    with open(fn) as csvFile:
        csvReader = csv.reader(csvFile)
        listCsv = list(csvReader)               # 轉成串列
        csvData = listCsv[2:-8]                 # 切片刪除非成交資訊
        for row in csvData:
            xmin = row[0][3:5]                  # 分
            xsec = row[0][6:]                   # 秒
            if xmin == '00' or xmin == '30':    # 每30分鐘
                if xsec == '00':                # True時寫入時間和累積成交數
                    csvWriter.writerow([row[0], row[6]])
                    print(row[0], row[6])

執行結果

範例 pythonStock-04.py : 讀取MI_30MINS.csv 檔案資料繪製折線圖。

# pythonStock-04.py
import csv
import matplotlib.pyplot as plt
from datetime import datetime

fn = 'MI_30MINS.csv'
with open(fn) as csvFile:
    csvReader = csv.reader(csvFile)
    listCsv = list(csvReader)                   # 轉成串列
    csvData = listCsv[1:]                       # 切片刪除非成交資訊
    times, items = [], []                       # 設定空串列
    for row in csvData:
        try:
            time = row[0]                       # 時間
            item = row[1]                       # 累積成交數
        except Exception:
            print('有缺值')
        else:
            times.append(time)                  # 儲存時間
            items.append(item)                  # 儲存累積成交數
       
fig = plt.figure(dpi=80, figsize=(12, 8))       # 設定繪圖區大小
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']        # 使用黑體
plt.plot(times, items, '-*')                    # 繪製累積成交數
fig.autofmt_xdate()                             # 時間旋轉
plt.title("30分鐘累積交易量", fontsize=24)
plt.xlabel("", fontsize=14)
plt.ylabel("累積交易量", fontsize=14)
plt.tick_params(axis='both', labelsize=12, color='red')
plt.show()

執行結果

範例 pythonStock-05.py : 使用爬蟲爬取 Google 財經網站，列出台泥當日最新價格、前日收盤價、單日股價範圍。

# pythonStock-05.py
import requests, bs4

url = 'https://www.google.com/finance/quote/1101:TPE'
headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64)\
            AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101\
            Safari/537.36', }
newshtml = requests.get(url, headers=headers)           # 台灣水泥
objSoup = bs4.BeautifulSoup(newshtml.text, 'lxml')      # 取得HTML
companyInfo = objSoup.find('main')                 # 主要資訊區塊
companyName = companyInfo.find_all('div',attrs={'role':'heading'})    # 取得公司名稱
print('公司名稱 : ',companyName[1].text)    
price = companyInfo.find('div','YMlKec fxKbKc')   # 取得最新報價
print('最新報價 : ', price.text)
detailsTable = companyInfo.find_all('div','P6K39c')    # 取得公司詳細資訊表
print('前日收盤價', detailsTable[0].text)      # 前日收盤價
print('單日股價範圍', detailsTable[1].text)   # 單日股價範圍

執行結果

公司名稱 :  Taiwan Cement Corp
最新報價 :  $49.45
前日收盤價 $49.80
單日股價範圍 $49.85 - $50.10

範例 pythonStock-06.py : 使用爬蟲爬取 Yahoo 股市網站，列出 2330 台積電股票相關資訊。

# pythonStock-06.py
import requests, bs4

url = 'https://tw.stock.yahoo.com/quote/2330'   # 2330 台積電
headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64)\
            AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101\
            Safari/537.36', }
newshtml = requests.get(url, headers=headers)           
objSoup = bs4.BeautifulSoup(newshtml.text, 'lxml')      # 取得HTML
companyInfo = objSoup.find('section', id='qsp-overview-realtime-info')   # 主要資訊區塊
companyName = companyInfo.find('h2')    # 取得公司名稱
print('公司名稱 : ', companyName.text)
time = companyInfo.find('time').find_all('span')[2]
print('資料時間：', time.text)
priceDetailItems = companyInfo.find_all('li', 'price-detail-item')
for item in priceDetailItems:
  print(item.text)

執行結果

公司名稱 :  台積電即時行情
資料時間： 2021/10/19 11:35
成交599
開盤598
最高600
最低593
均價598
成交值(億)62.41
昨收590
漲跌幅1.53%
漲跌9
總量10,440
昨量19,120
振幅1.19%

範例 pythonStock-07.py : 獲得台積電股票代號和近31天的收盤價。

# pythonStock-07.py
import twstock
stock2330 = twstock.Stock("2330")

print("股票代號   : ", stock2330.sid)
print("股票收盤價 : ", stock2330.price)

執行結果

股票代號   :  2330
股票收盤價 :  [613.0, 607.0, 620.0, 631.0, 623.0, 619.0, 619.0, 622.0, 615.0, 613.0, 607.0, 600.0, 600.0, 586.0, 588.0, 598.0, 602.0, 594.0, 580.0, 580.0, 574.0, 572.0, 572.0, 571.0, 580.0, 575.0, 575.0, 571.0, 573.0, 600.0, 590.0]

範例 pythonStock-08.py : 列出近31天台積電收盤價的折線圖。

# pythonStock-08.py
import matplotlib.pyplot as plt
import twstock

stock2330 = twstock.Stock("2330")
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']        # 使用黑體
plt.title("2330 台積電", fontsize=24)
plt.plot(stock2330.price)
plt.show()

執行結果

範例 pythonStock-09.py : 以折線圖列出台積電2021年1月以來的收盤價格資料。

# pythonStock-09.py
import matplotlib.pyplot as plt
import twstock

plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']        # 使用黑體
stock2330 = twstock.Stock("2330")
stock2330.fetch_from(2021,1)
plt.title("2330 台積電", fontsize=24)
plt.xlabel("2021年1月以來的交易天數", fontsize=14)
plt.ylabel("價格", fontsize=14)
plt.plot(stock2330.price)
plt.show()

範例 pythonStock-10.py : 參考上述實例，將最佳五檔買進賣出使用 pandas 處理，以表單方式顯示。

# pythonStock-10.py
import pandas as pd
import twstock

stock2330 = twstock.realtime.get('2330')
buyPrice = stock2330['realtime']['best_bid_price']
buyNum = stock2330['realtime']['best_bid_volume']

sellPrice = stock2330['realtime']['best_ask_price']
sellNum = stock2330['realtime']['best_ask_volume']

dict2330 = {'BVolumn':buyNum,
            'Buy':buyPrice,
            'Sell':sellPrice,
            'SVolumn':sellNum}

df2330 = pd.DataFrame(dict2330, index=range(1,6))
print("台積電最佳五檔價量表")
print(df2330)

執行結果

台積電最佳五檔價量表
   BVolumn   Buy      Sell    SVolumn
1      72  599.0000  600.0000    1947
2      76  598.0000  601.0000     962
3     202  597.0000  602.0000    1048
4     247  596.0000  603.0000     943
5     308  595.0000  604.0000     822

範例 pythonRequestsHtml-01.py: 列出回傳物件屬性的應用，與網頁網址和內容。

# pythonRequestsHtml-01.py
from requests_html import HTMLSession

session = HTMLSession()             # 定義Session
url = 'http://127.0.0.1:5500/htmlExampleBS4-02.html'
r = session.get(url)                # get()
print(type(r))
print(type(r.html))
print(r.html)
print(type(r.html.text))
print('-'*70)
print(r.html.text)

執行結果

<class 'requests_html.HTMLResponse'>
<class 'requests_html.HTML'>
<HTML url='http://127.0.0.1:5500/htmlExampleBS4-02.html'>
<class 'str'>
----------------------------------------------------------------------
htmlExampleBS4-02.html
header { background-color: rgb(218, 180, 231); min-height: 100px; } main { background-color: rgb(239, 238, 238); min-height: 300px; } footer { min-height: 80px; }
這是表頭區塊
這是主要內容區
人物介紹-1
人物介紹-2
人物介紹-3
這是表尾

範例 pythonRequestsHtml-02.py: 分別列出 https://www.python.org/ 屬性是 links 和 absolute_links 的超連結的數量，同時列出前5個超連結。

# pythonRequestsHtml-02.py
from requests_html import HTMLSession

session = HTMLSession()             # 定義Session
url = 'https://python.org/'
r = session.get(url)                # get()
url_links = r.html.links
count = 0
print('相對位址超連結數量 : ', len(url_links))
for link in url_links:
    count += 1
    print(link)
    if count >= 5:
        break
print('-'*70)
url_a_links = r.html.absolute_links
count = 0
print('絕對位址超連結數量 : ', len(url_a_links))
for link in url_a_links:
    count += 1
    print(link)
    if count >= 5:
        break

執行結果

相對位址超連結數量 :  126
http://www.pylonsproject.org/
http://wiki.python.org/moin/TkInter
http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator
//docs.python.org/3/tutorial/controlflow.html
https://blog.python.org
----------------------------------------------------------------------
絕對位址超連結數量 :  126
http://www.pylonsproject.org/
http://wiki.python.org/moin/TkInter
http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator
https://blog.python.org
https://www.python.org/success-stories/category/engineering/

範例 pythonRequestsHtml-03.py: 爬取上述表單選項。

# pythonRequestsHtml-03.py
from requests_html import HTMLSession

session = HTMLSession()             # 定義Session
url = 'https://python.org/'
r = session.get(url)                # get()
about = r.html.find('#about', first=True)
print(about.text)

執行結果

About
Applications
Quotes
Getting Started
Help
Python Brochure

範例 pythonRequestsHtml-04.py: 列出系列搜尋about的屬性。

# pythonRequestsHtml-04.py
from requests_html import HTMLSession

session = HTMLSession()             # 定義Session
url = 'https://python.org/'
r = session.get(url)                # get()
about = r.html.find('#about', first=True)
print('印出 about.attrs屬性 :', about.attrs)
print('-'*70)
print('印出 about.html屬性 :', about.html)
print('-'*70)
print('印出 about.absolute_links屬性 :',about.attrs)
print('-'*70)
print("印出 about.find('a') :",about.find('a'))

執行結果

印出 about.attrs屬性 : {'id': 'about', 'class': ('tier-1', 'element-1'), 'aria-haspopup': 'true'}
----------------------------------------------------------------------
印出 about.html屬性 : <li id="about" class="tier-1 element-1" aria-haspopup="true">
<a href="/about/" title="" class="">About</a>
<ul class="subnav menu" role="menu" aria-hidden="true">
<li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>
</ul>
</li>
----------------------------------------------------------------------
印出 about.absolute_links屬性 : {'id': 'about', 'class': ('tier-1', 'element-1'), 'aria-haspopup': 'true'}
----------------------------------------------------------------------
印出 about.find('a') : [<Element 'a' href='/about/' title='' class=()>, <Element 'a' href='/about/apps/' title=''>, <Element 'a' href='/about/quotes/' title=''>, <Element 'a' href='/about/gettingstarted/' title=''>, <Element 'a' href='/about/help/' title=''>, <Element 'a' href='http://brochure.getpython.info/' title=''>]

範例 pythonRequestsHtml-05.py: 設定搜尋含字串 ”kenneth” 的<a> 元素。

# pythonRequestsHtml-05.py
from requests_html import HTMLSession

session = HTMLSession()             # 定義Session
url = 'http://python-requests.org/'
r = session.get(url)                # get()
a_element = r.html.find('a', containing='kenneth')
if a_element:
    for a in a_element:
        print(a)

執行結果

<Element 'a' href='https://kenreitz.org/projects'>
<Element 'a' class=('reference', 'internal') href='dev/contributing/#kenneth-reitz-s-code-style'>

範例 pythonRequestsHtml-06.py: xpath() 方法的應用。

# pythonRequestsHtml-06.py
from requests_html import HTMLSession

session = HTMLSession()             # 定義Session
url = 'https://python.org/'
r = session.get(url)                # get()
a_element = r.html.xpath('//a')
if a_element:
    for a in a_element:
        print(a)
        print('-'*70)

執行結果

<Element 'a' href='#content' title='Skip to content'>
----------------------------------------------------------------------
<Element 'a' id='close-python-network' class=('jump-link',) href='#python-network' aria-hidden='true'>
.......以下省略.....

範例 pythonRequestsHtml-07.py: search() 搜尋本文的應用。

# pythonRequestsHtml-07.py
from requests_html import HTMLSession

session = HTMLSession()             # 定義Session
url = 'https://python.org/'
r = session.get(url)                # get()
txt = r.html.search('Python is a {} language')[0] 
print(txt)

執行結果

programming

範例 pythonRequestsHtml-08.py: 列出豆瓣電影網站第一部影片和評分。

# pythonRequestsHtml-08.py
from requests_html import HTMLSession

session = HTMLSession()
url = 'https://movie.douban.com/'
r = session.get(url)

print('影片名稱 : ', r.html.find('li.title', first=True).text)
print('影片評分 : ', r.html.find('li.rating', first=True).text)

執行結果

影片名稱 :  沙丘
影片評分 :  7.9

範例 pythonRequestsHtml-09.py: 使用不同方式列出上方區塊的電影名稱和評分。

# pythonRequestsHtml-09.py
from requests_html import HTMLSession

session = HTMLSession()
url = 'https://movie.douban.com/'
r = session.get(url)

movies = r.html.find('li.ui-slide-item')
print('影片數量 : ', len(movies))
print('數據型態 : ', type(movies[0]))
print(movies[0])
print('-'*70)
print(movies[0].attrs['data-title'])
print(movies[0].attrs['data-rate'])

執行結果

影片數量 :  112
數據型態 :  <class 'requests_html.Element'>
<Element 'li' class=('ui-slide-item', 's') data-dstat-areaid='70_1' data-dstat-mode='click,expose' data-dstat-watch='.ui-slide-content' data-dstat-viewport='.screening-bd' data-title='沙丘 Dune' data-release='2021' data-rate='7.9' data-star='40' data-trailer='https://movie.douban.com/subject/3001114/trailer' data-ticket='https://movie.douban.com/ticket/redirect/?movie_id=3001114' data-duration='156分钟' data-region='美国' data-director='丹尼斯·维伦纽瓦' data-actors='蒂莫西·柴勒梅德 / 丽贝卡·弗格森 / 奥斯卡·伊萨克' data-intro='' data-enough='true' data-rater='10815'>
----------------------------------------------------------------------
沙丘 Dune
7.9

範例 pythonRequestsHtml-10.py: 列出前5部影片和評分。

# pythonRequestsHtml-10.py
from requests_html import HTMLSession

session = HTMLSession()
url = 'https://movie.douban.com/'
r = session.get(url)

movies = r.html.find('li.ui-slide-item')
count = 0
for m in movies:
    count += 1
    print('影片編號 : ', count)
    print('影片名稱 : ', m.attrs['data-title'])
    print('影片評分 : ', m.attrs['data-rate'])
    print('-'*70)
    if count == 5:
        break

執行結果

影片編號 :  1
影片名稱 :  沙丘 Dune
影片評分 :  7.9
----------------------------------------------------------------------
影片編號 :  2
影片名稱 :  兰心大剧院
影片評分 :  7.4
----------------------------------------------------------------------
影片編號 :  3
影片名稱 :  图兰朵：魔咒缘起
影片評分 :  3.6
----------------------------------------------------------------------
影片編號 :  4
影片名稱 :  长津湖
影片評分 :  7.4
----------------------------------------------------------------------
影片編號 :  5
影片名稱 :  我和我的父辈
影片評分 :  6.9
----------------------------------------------------------------------

範例 pythonRequestsHtml-11.py: 下列 Requests-HTML 官網，是使用 Requests-HTML 模組的 render() 執行下載 ajax 動態數據的實例。

# pythonRequestsHtml-11.py
from requests_html import HTMLSession

session = HTMLSession()             # 定義Session
url = 'http://python-requests.org/'
r = session.get(url)                # get()
r.html.render()
txt =  r.html.search('Python 2 will retire in only {months} months!')['months']
print(txt)

搜尋此網誌

先機致勝 AI Advantage