第 3 部：評估模型 Evaluating the Model

星期一, 9月 13, 2021

前言大綱

在本教程的這一部分中，您將：

根據測試數據評估您的模型。
將您的模型連接到 Spot。
使用 Spot 的平板電腦實時查看您的模型。

將檢查點轉換為在線模型

首先，我們需要將我們的訓練輸出轉換為我們可以在線使用的模型。

將轉換腳本複製到方便的位置：

cp models-with-protos/research/object_detection/exporter_main_v2.py .

mkdir -p dogtoy/exported-models/dogtoy-model

運行導出器：

python exporter_main_v2.py --input_type image_tensor --pipeline_config_path dogtoy/models/my_ssd_resnet50_v1_fpn/pipeline.config --trained_checkpoint_dir dogtoy/models/my_ssd_resnet50_v1_fpn/ --output_directory dogtoy/exported-models/dogtoy-model

評估模型

現在我們將在一些圖像上運行我們的模型並確保結果是可以接受的。

下載 eval.py 腳本並將其保存在 ~/fetch
為輸出圖像創建一個目錄

mkdir -p dogtoy/output

評估所有圖像：

python eval.py -i dogtoy/images -m dogtoy/exported-models/dogtoy-model/saved_model -l dogtoy/annotations/label_map.pbtxt -o dogtoy/output

如果一切順利，您將獲得一堆帶有圍繞狗玩具的邊界框的圖像！

故障排除

如果很多邊界框不正確或丟失，您需要進行故障排除。

模型是否在 訓練集中的圖像 上失敗？

如果是這樣，則說明在訓練過程中出現問題，因為模型無法處理在訓練過程中已經看到的數據。這通常意味著您的訓練存在結構性問題。

仔細檢查您是否從我們的清單中 checklist. 對 pipeline.config 進行了所有更改。
訓練更長時間。

在訓練集上表現良好；測試集表現不佳

你有至少 300 張帶標籤的圖片嗎？
您是否在數據捕獲期間移動玩具以獲得多個視圖？
您的圖像是否涵蓋了廣泛的觀點？
你有靠近和遠離狗玩具的圖像嗎？

連接到 Spot

接下來，我們將使用 網絡計算橋 Network Compute Bridge 連接到 Spot，這將使我們能夠在平板電腦上查看結果並在腳本中使用我們的模型。

為此，我們將編寫一個連接到 Spot 的腳本，並將我們的模型作為網絡計算資源提供。

網絡計算服務器概述

我們將編寫一個小型服務器來拍攝圖像、運行我們的模型並返回邊界框坐標。

創建（或下載）network_compute_server.py 到您的 ~/fetch 文件夾中。我們將逐步介紹每個部分：

import argparse
import io
import os
import sys
import time
import logging

import cv2
from PIL import Image
import numpy as np

from bosdyn.api import network_compute_bridge_service_pb2_grpc
from bosdyn.api import network_compute_bridge_pb2
from bosdyn.api import image_pb2
from bosdyn.api import header_pb2
import bosdyn.client
import bosdyn.client.util
import grpc
from concurrent import futures
import tensorflow as tf

import queue
import threading
from google.protobuf import wrappers_pb2
from object_detection.utils import label_map_util

kServiceAuthority = "auth.spot.robot"

導入一堆套件並定義一個常量。

class TensorFlowObjectDetectionModel:
    def __init__(self, model_path, label_path):
        self.detect_fn = tf.saved_model.load(model_path)
        self.category_index = label_map_util.create_category_index_from_labelmap(label_path, use_display_name=True)
        self.name = os.path.basename(os.path.dirname(model_path))

    def predict(self, image):
        input_tensor = tf.convert_to_tensor(image)
        input_tensor = input_tensor[tf.newaxis, ...]
        detections = self.detect_fn(input_tensor)

        return detections

此類加載我們的模型並具有在圖像上運行模型的功能。我們將在下面調用該函數。

def process_thread(args, request_queue, response_queue):
    # Load the model(s)
    models = {}
    for model in args.model:
        this_model = TensorFlowObjectDetectionModel(model[0], model[1])
        models[this_model.name] = this_model

    print('')
    print('Service ' + args.name + ' running on port: ' + str(args.port))

    print('Loaded models:')
    for model_name in models:
        print('    ' + model_name)

這是我們腳本的主要功能。

我們將支持多種模型（稍後我們將需要它）。
因為我們的計算可能需要很長時間，所以我們不想在 GRPC 回調中進行計算。相反，我們將使用一個執行緒。

    while True:
        request = request_queue.get()

設置一個 while 循環，使執行緒永遠保持活動狀態。
解壓通過 python queue 傳入的 GRPC 請求。

注意：這個調用是阻塞的，所以我們會一直在這裡等待直到收到請求（這就是我們如何避免使用 100% CPU 的 while True）

        if isinstance(request, network_compute_bridge_pb2.ListAvailableModelsRequest):
            out_proto = network_compute_bridge_pb2.ListAvailableModelsResponse()
            for model_name in models:
                out_proto.available_models.append(model_name)
            response_queue.put(out_proto)
            continue
        else:
            out_proto = network_compute_bridge_pb2.NetworkComputeResponse()

GRPC 協議可以向我們的伺服器詢問它支持的模型。在這裡我們檢查是否我們得到了一個 ListAvailableModelsRequest，如果是，我們用我們的模型回复，然後繼續等待下一個請求。

        # Find the model
        if request.input_data.model_name not in models:
            err_str = 'Cannot find model "' + request.input_data.model_name + '" in loaded models.'
            print(err_str)

             # Set the error in the header.
            out_proto.header.error.code = header_pb2.CommonError.CODE_INVALID_REQUEST
            out_proto.header.error.message = err_str
            response_queue.put(out_proto)
            continue

        model = models[request.input_data.model_name]

輸入請求包括模型名稱。找到該型號或報告錯誤。

        # Unpack the incoming image.
        if request.input_data.image.format == image_pb2.Image.FORMAT_RAW:
            pil_image = Image.open(io.BytesIO(request.input_data.image.data))
            if request.input_data.image.pixel_format == image_pb2.Image.PIXEL_FORMAT_GREYSCALE_U8:
                # If the input image is grayscale, convert it to RGB.
                image = cv2.cvtColor(pil_image, cv2.COLOR_GRAY2RGB)

            elif request.input_data.image.pixel_format == image_pb2.Image.PIXEL_FORMAT_RGB_U8:
                # Already an RGB image.
                image = pil_image

            else:
                print('Error: image input in unsupported pixel format: ', request.input_data.image.pixel_format)
                response_queue.put(out_proto)
                continue

        elif request.input_data.image.format == image_pb2.Image.FORMAT_JPEG:
            dtype = np.uint8
            jpg = np.frombuffer(request.input_data.image.data, dtype=dtype)
            image = cv2.imdecode(jpg, -1)

            if len(image.shape) < 3:
                # If the input image is grayscale, convert it to RGB.
                image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)

        image_width = image.shape[0]
        image_height = image.shape[1]

圖像可以採用多種格式。我們在這裡解壓：

GRAYSCALE 或 RGB RAW 圖像
JPEG 圖像

無論哪種方式，我們總是轉換為 RGB，因為這是我們的模型使用的

我們希望支持所有這些圖像類型，因為圖像可以直接來自機器人，也可以來自用戶上傳自己的圖像。

        detections = model.predict(image)

輸出格式有點 wonky不穩定，因此我們將其解壓縮為更合理的格式。

        for i in range(boxes.shape[0]):
            if scores[i] < request.input_data.min_confidence:
                continue

            box = tuple(boxes[i].tolist())

            # Boxes come in with normalized coordinates.  Convert to pixel values.
            box = [box[0] * image_width, box[1] * image_height, box[2] * image_width, box[3] * image_height]

            score = scores[i]

            if classes[i] in model.category_index.keys():
                label = model.category_index[classes[i]]['name']
            else:
                label = 'N/A'

            num_objects += 1

            print('Found object with label: "' + label + '" and score: ' + str(score))

循環遍歷每個對象並...

檢查分數是否足夠高。
從標準化（橫向百分比/圖像向下）轉換為像素坐標。
打開類別標籤。

            point1 = np.array([box[1], box[0]])
            point2 = np.array([box[3], box[0]])
            point3 = np.array([box[3], box[2]])
            point4 = np.array([box[1], box[2]])

            # Add data to the output proto.
            out_obj = out_proto.object_in_image.add()
            out_obj.name = "obj" + str(num_objects) + "_label_" + label

            vertex1 = out_obj.image_properties.coordinates.vertexes.add()
            vertex1.x = point1[0]
            vertex1.y = point1[1]

            vertex2 = out_obj.image_properties.coordinates.vertexes.add()
            vertex2.x = point2[0]
            vertex2.y = point2[1]

            vertex3 = out_obj.image_properties.coordinates.vertexes.add()
            vertex3.x = point3[0]
            vertex3.y = point3[1]

            vertex4 = out_obj.image_properties.coordinates.vertexes.add()
            vertex4.x = point4[0]
            vertex4.y = point4[1]

現在我們有了結果，我們將把它打包成輸出 proto 格式 output proto format：

邊界框坐標
物件標籤

            # Pack the confidence value.
            confidence = wrappers_pb2.FloatValue(value=score)
            out_obj.additional_properties.Pack(confidence)

自信度值被打包到 Any 字段。您可以在此處為您的應用程序打包許多其他東西。

            if not args.no_debug:
                polygon = np.array([point1, point2, point3, point4], np.int32)
                polygon = polygon.reshape((-1, 1, 2))
                cv2.polylines(image, [polygon], True, (0, 255, 0), 2)

                caption = "{}: {:.3f}".format(label, score)
                left_x = min(point1[0], min(point2[0], min(point3[0], point4[0])))
                top_y = min(point1[1], min(point2[1], min(point3[1], point4[1])))
                cv2.putText(image, caption, (int(left_x), int(top_y)), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                            (0, 255, 0), 2)

為了除錯，我們在圖像上繪製邊界框以及標記文本。

        print('Found ' + str(num_objects) + ' object(s)')

        if not args.no_debug:
            debug_image_filename = 'network_compute_server_output.jpg'
            cv2.imwrite(debug_image_filename, image)
            print('Wrote debug image output to: "' + debug_image_filename + '"')

在我們的循環之後，我們打印出物件的數量並編寫我們的除錯圖像。

        response_queue.put(out_proto)

最後，我們使用另一個python queue 將輸出 proto 發送回 GRPC 服務程序。在此之後，我們準備好下一個請求。

class NetworkComputeBridgeWorkerServicer(
        network_compute_bridge_service_pb2_grpc.NetworkComputeBridgeWorkerServicer):

    def __init__(self, thread_input_queue, thread_output_queue):
        super(NetworkComputeBridgeWorkerServicer, self).__init__()

        self.thread_input_queue = thread_input_queue
        self.thread_output_queue = thread_output_queue

    def NetworkCompute(self, request, context):
        print('Got NetworkCompute request')
        self.thread_input_queue.put(request)
        out_proto = self.thread_output_queue.get()
        return out_proto

    def ListAvailableModels(self, request, context):
        print('Got ListAvailableModels request')
        self.thread_input_queue.put(request)
        out_proto = self.thread_output_queue.get()
        return out_proto

這是 GRPC 服務。它主要是樣板代碼，將 GRPC 網絡請求連接到我們的 Python queues 並將我們執行緒的響應寫回。

def register_with_robot(options):
    """ Registers this worker with the robot's Directory."""
    ip = bosdyn.client.common.get_self_ip(options.hostname)
    print('Detected IP address as: ' + ip)

    sdk = bosdyn.client.create_standard_sdk("tensorflow_server")

    robot = sdk.create_robot(options.hostname)

    # Authenticate robot before being able to use it
    robot.authenticate(options.username, options.password)

    directory_client = robot.ensure_client(
        bosdyn.client.directory.DirectoryClient.default_service_name)
    directory_registration_client = robot.ensure_client(
        bosdyn.client.directory_registration.DirectoryRegistrationClient.default_service_name)

    # Check to see if a service is already registered with our name
    services = directory_client.list()
    for s in services:
        if s.name == options.name:
            print("WARNING: existing service with name, \"" + options.name + "\", removing it.")
            directory_registration_client.unregister(options.name)
            break

    # Register service
    print('Attempting to register ' + ip + ':' + options.port + ' onto ' + options.hostname + ' directory...')
    directory_registration_client.register(options.name, "bosdyn.api.NetworkComputeBridgeWorker", kServiceAuthority, ip, int(options.port))

使用 Spot 進行身份驗證。
檢查目錄是否已經有使用我們名稱的服務，如果有，將其刪除。
將我們自己添加到 Spot 的目錄中。

def main(argv):
    default_port = '50051'

    parser = argparse.ArgumentParser()
    parser.add_argument('-m', '--model', help='[MODEL_DIR] [LABELS_FILE.pbtxt]: Path to a model\'s directory and path to its labels .pbtxt file', action='append', nargs=2, required=True)
    parser.add_argument('-p', '--port', help='Server\'s port number, default: ' + default_port,
                        default=default_port)
    parser.add_argument('-d', '--no-debug', help='Disable writing debug images.', action='store_true')
    parser.add_argument('-n', '--name', help='Service name', default='fetch-server')
    bosdyn.client.util.add_common_arguments(parser)

    options = parser.parse_args(argv)

    print(options.model)

    for model in options.model:
        if not os.path.isdir(model[0]):
            print('Error: model directory (' + model[0] + ') not found or is not a directory.')
            sys.exit(1)

    # Perform registration.
    register_with_robot(options)

設置我們的參數並調用我們的目錄註冊函數。

    # Thread-safe queues for communication between the GRPC endpoint and the ML thread.
    request_queue = queue.Queue()
    response_queue = queue.Queue()

    # Start server thread
    thread = threading.Thread(target=process_thread, args=([options, request_queue, response_queue]))
    thread.start()

設置並啟動我們的機器學習執行緒。

    # Set up GRPC endpoint
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    network_compute_bridge_service_pb2_grpc.add_NetworkComputeBridgeWorkerServicer_to_server(
        NetworkComputeBridgeWorkerServicer(request_queue, response_queue), server)
    server.add_insecure_port('[::]:' + options.port)
    server.start()

用於啟動 GRPC 服務的 GRPC 樣板代碼。

    print('Running...')
    thread.join()

    return True

if __name__ == '__main__':
    logging.basicConfig()
    if not main(sys.argv[1:]):
        sys.exit(1)

讓主執行緒永遠等待處理執行緒。

使用 Spot 運行模型

運行腳本：

python network_compute_server.py -m dogtoy/exported-models/dogtoy-model/saved_model dogtoy/annotations/label_map.pbtxt --username user --password YOUR_ROBOTS_PASSWORD 192.168.80.3

參數是：

要使用的模型的路徑
標籤文件的路徑
用戶名（通常是 user）
密碼
機器人的IP地址。上面，當機器人託管自己的接入點 192.168.80.3 時，我們使用了默認 IP。根據 Spot 連接到您的網絡的方式，您的情況可能會有所不同。

現在讓我們檢查一下該伺服器是否在目錄中註冊了自己。在新終端中（不要忘記重新輸入您的 virtualenv）：

source my_spot_env/bin/activate
python -m bosdyn.client 192.168.80.3 --username user --password YOUR_ROBOTS_PASSWORD dir list

填寫您的IP、用戶名和密碼。如果它有效，你應該看到這樣的條目：

name        type                          authority        tokens
-----------------------------------------------------------------------------------------
[...]
fetch-server   bosdyn.api.NetworkComputeBridgeWorker   auth.spot.robot   user
[...]

現在我們可以看到結果了！獲取 Spot 的平板電腦，連接到機器人，然後將其站起來。

選擇 Hamburger Menu > Utilities > ML Model Viewer

如果您的伺服器在目錄中，則應自動填寫 伺服器 Server 和 模型 Model。選擇您要使用的相機，然後按 開始 Start。

查看運行 network_compute_server.py 的終端，您應該開始看到來自機器人的請求。

故障排除

這裡最常見的問題是防火牆。機器人需要能夠連接到您的服務器（默認在端口 50051）。要確定這是否是一個問題：

當您進入 ML 模型查看器屏幕時，服務器是否打印 Got ListAvailableModels 請求？
如果是這樣，您的網絡正在工作，否則，繼續除錯。
ufw 是一個普通的防火牆。檢查它是否正在運行：

$ service ufw status

    ● ufw.service - Uncomplicated firewall
       Loaded: loaded (/lib/systemd/system/ufw.service; enabled; vendor preset: enab
---->  Active: active (exited) since Fri 2021-02-19 20:07:28 EST; 2s ago
         Docs: man:ufw(8)
      Process: 14875 ExecStop=/lib/ufw/ufw-init stop (code=exited, status=0/SUCCESS)
      Process: 26704 ExecStart=/lib/ufw/ufw-init start quiet (code=exited, status=0/
     Main PID: 26704 (code=exited, status=0/SUCCESS)

如果您在上面帶有箭頭的行上看到 Active: active，請考慮打開防火牆中的端口。
如果 ufw 未運行，您將看到 Active: inactive (dead)

如果一切順利，您應該會在圖像中看到狗玩具周圍的邊界框。您可以通過啟用電源來驅動此屏幕中的機器人。

一旦您對您的模型感到滿意，請轉到第 4 部分，在那裡我們將集成 Spot Manipulation API 來挑選狗玩具。

搜尋此網誌

先機致勝 AI Advantage