GCP Machine Learning 엔진을 활용한 Object Detection

2018년 7월 16일

GCP Machine Learning 엔진을 활용한 Object Detection

Google Cloud는 TensorFlow플랫폼을 통해 사용자가 원하는 모델을 보다 쉽게 구축할 수 있습니다.

GCP의 ML 환경에서 제공하는 고성능의 VM들을 통해 사용자는 대규모 학습 및 학습된 모델을 통한 서비스 제공을 보다 쉽게 수행할 수 있습니다.

뉴스레터 가입

클라우드 관련 최신 소식을 업데이트 받으실 수 있습니다.

그림 1 Google Cloud 기반 Machine Learning 환경

• Master: 다른 작업들을 관리하고 전체 상태를 모니터링. 모든 작업이 완료될 때까지 수행

• Worker: Training을 수행하는 replica

• Parameter server: Worker 간 공유된 모델 상태를 조정

그림 2 Cloud ML 런타임 환경

이 글에서는 GCP ML엔진을 사용해서, Oxford-IIIT Pet의 애완동물 데이터를 학습하고 Detection하는 과정을 설명합니다.

그림 3 Detection 예제

※ 이 글에서 소개하는 내용을 따라 시험할 시, 데이터 학습시간 및 양으로 많은 비용이 소요될 수 있으므로, 학습 횟수를 줄이거나 전체 flow만 파악하고 중간에 학습을 중단시켜 비용이 많이 나오지 않도록 하는 것을 권장합니다.

Object Detection model training 절차

1. VM 생성

Compute Engine > VM instances 메뉴 선택

CREATE INSTANCE 메뉴 선택

아래와 같이 세팅 후 Create 버튼 클릭

– Name/Region : 개인별 선택
– OS : Ubuntu 16.04
– Access scopes : Allow full access to all Cloud APIs
– 그 외 Default

2. 생성한 VM에 SSH 접속

생성한 VM의 SSH 버튼 클릭

3. 학습 결과를 저장할 Cloud Storage Bucket Name 지정 및 생성

export GCS_BUCKET=object-detect-test0
gsutil mb -c regional -l asia-east1 gs://${GCS_BUCKET}

※ Cloud storage Bucket Name은(GCS_BUCKET 변수) Global하게 유일한 이름으로 지정해야 합니다.

4. VM에 Tensorflow 및 실행환경 설치

sudo apt-get update
sudo apt-get install python-pip
pip install tensorflow
sudo apt-get install protobuf-compiler python-pil python-lxml python-tk
pip install Cython
pip install jupyter
pip install matplotlib
pip install raven

5. Object Detection API library 다운로드

mkdir tensorflow
cd tensorflow
git clone https://github.com/tensorflow/models.git

6. COCO dataset 사용을 위한 COCO API 설치

git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cp -r pycocotools ~/tensorflow/models/research/

7. 학습을 위한 옵션변경 및 환경설정

cd ~/tensorflow/models/research/
vi object_detection/protos/ssd.proto (87행 reserved 6; 삭제)
protoc object_detection/protos/*.proto –python_out=.
ls -al object_detection/protos/*.py
cd ~/tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

8. Oxford-IIIT Pet Dataset 다운로드 및 TFRecord format으로 변환

cd ~/tensorflow/models/research/
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
tar -xvf images.tar.gz
tar -xvf annotations.tar.gz
python object_detection/dataset_tools/create_pet_tf_record.py
–label_map_path=object_detection/data/pet_label_map.pbtxt
–data_dir=`pwd`
–output_dir=`pwd`

9. 변환한 Dataset을 Cloud Storage에 업로드

gsutil cp pet_faces_train.record-0000* gs://${GCS_BUCKET}/data/
gsutil cp pet_faces_val.record-0000* gs://${GCS_BUCKET}/data/
gsutil cp object_detection/data/pet_label_map.pbtxt gs://${GCS_BUCKET}/data/pet_label_map.pbtxt

10. Pretrained COCO Model 다운로드 및 Cloud Storage에 업로드

wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz
tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://${GCS_BUCKET}/data/

11. COCO API 연동설정 (Copy the common folder from cocoapi into the cocoapi/PythonAPI directory)

cd ~/tensorflow/cocoapi/PythonAPI && mv ../common ./
vi ~/tensorflow/cocoapi/PythonAPI/setup.py
(아래와 같이 수정)
ext_modules = [
Extension(
‘pycocotools. mask’,
sources=[‘common/maskApi.c’, ‘pycocotools/_mask.pyx’],
include_dirs = [np.get_include(), ‘common’],

12. 그래프 작업을 위한 Python tk embedding

vi ~/tensorflow/cocoapi/PythonAPI/pycocotools/coco.py

(아래와 같이 수정)
import json
import time
import matplotlib (추가)
matplotlib.use(‘Agg’) (추가)
import matplotlib.pyplot as plt

13. 수정한 PythonAPI file 압축

cd ~/tensorflow/cocoapi/
tar -czf pycocotools-2.0.tar.gz PythonAPI/

14. Object Detection Pipeline 설정 (faster_rcnn_resnet101_pets.config)

sed -i “s|PATH_TO_BE_CONFIGURED|”gs://${GCS_BUCKET}”/data|g” ￦
object_detection/samples/configs/faster_rcnn_resnet101_pets.config
vi object_detection/samples/configs/faster_rcnn_resnet101_pets.config
(Input path 수정 > “pet_faces_train.record*”, “pet_faces_val.record*”)
gsutil cp object_detection/samples/configs/faster_rcnn_resnet101_pets.config ￦
gs://${GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

15. Tensorflow Object Detection code 패키징

cd ~/tensorflow/models/research/
vi object_detection/samples/cloud/cloud.yml

(cloud.yml 파일 내 내용을 아래와 같이 수정)
– runtimeVersion: “1.0” ⇒ runtimeVersion “1.8” (Tensorflow 버전)
– masterType: standard_gpu ⇒ masterType: complex_model_l_gpu (master machine Type)
– workerCount: 2 (worker machine 수)
– workerType: standard_gpu ⇒ workerType: complex_model_l_gpu (worker machine Type)
– parameterServerCount: 2 (parameterserver machine 수)

16. Configuration file 설정으로 ML Engine machine 타입 세팅

cd ~/tensorflow/models/research/
vi object_detection/samples/cloud/cloud.yml

※ 아래 표의 Cloud ML Engine scale tier 및 가격정책을 참고해서 학습환경을 설정할 수 있습니다.
본 예제에서는 CUSTOM scale tirer로 머신 타입 및 수를 개별적으로 설정하였습니다.

<사전 정의된 확장 등급 및 머신타입>

Cloud ML Engine scale tier	Compute Engine machine type
BASIC	– A single worker instance – n1-standard-4
STANDARD_1	– 1 master, 4 workers, 3 parameter servers – master: n1-highcpu-8, workers: n1-highcpu-8, parameter servers: n1-standard-4
PREMIUM_1	– 1 master, 19 workers, 11 parameter servers – master: n1-highcpu-16, workers: n1-highcpu-16, parameter servers: n1-highmem-8
BASIC_GPU	– A sigle worker instance – n1-standard-8 with one k80 GPU

* 모바일에서는 좌 / 우로 스크롤해서 보세요.

<사전 정의된 확장 등급 및 가격정책>

Cloud ML Engine scale tier	시간(및 학습 단위)당 가격
BASIC	$0.2774 (0.5661)
STANDARD_1	$2.9025 (5.9234)
PREMIUM_1	$24.1683 (49.323)
BASIC_GPU	$1.2118 (2.4731)
CUSTOM	확장 등급으로 학습 작업에 사용되는 가상 머신의 개수와 유형의 제어 가능. 아래 머신 유형 표를 참조

* 모바일에서는 좌 / 우로 스크롤해서 보세요.

<머신유형 및 가격정책 >

Machine type	Cloud ML Engine machine type	CPU	GPUs	Memory	시간(및 학습 단위)당 가격
standard	n1-standard-4	XS	–	M	$0.2774 (0.5661)
large_model	n1-highmem-8	S	–	XL	$0.6915 (1.4111)
complex_model_s	n1-highcpu-8	S	–	S	$0.4141 (0.845)
complex_model_m	n1-highcpu-16	M	–	M	$0.8281 (1.69)
complex_model_l	n1-highcpu-32	L	–	L	$1.6562 (3.38)
starndard_gpu	n1-standard-8	XS	1 (K80)	M	$1.2118 (2.4731)
complex_model_m_gpu	n1-standard-16	M	4 (K80)	M	$3.7376 (7.6278)
complex_model_l_gpu	n1-standard-32	L	8 (K80)	L	$7.4752 (15.2555)
stardard_p100 (Beta)	n1-standard-8	XS	1 (P100)	M	$2.6864 (5.4824)
complex_model_m_p100 (Beta)	n1-standard-16	M	4 (P100)	M	$9.636 (19.6653)

* 모바일에서는 좌 / 우로 스크롤해서 보세요.

17. GCP 환경에서 학습 시작

gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` ￦
–packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz ￦
–module-name object_detection.train ￦
–job-dir=gs://${GCS_BUCKET}/train ￦
–runtime-version 1.8 ￦
–region asia-east1 ￦
–config object_detection/samples/cloud/cloud.yml ￦
— ￦
–train_dir=gs://${GCS_BUCKET}/train ￦
–pipeline_config_path=gs://${GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

18. 학습된 모델 evaluation 수행

gcloud ml-engine jobs submit training `whoami`_object_detection_eval_`date +%s` ￦
–runtime-version 1.8 ￦
–job-dir=gs://${GCS_BUCKET}/train ￦
–packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,../../cocoapi/pycocotools-2.0.tar.gz ￦
–module-name object_detection.eval ￦
–region asia-east1 ￦
–scale-tier BASIC_GPU ￦
— ￦
–checkpoint_dir=gs://${GCS_BUCKET}/train ￦
–eval_dir=gs://${GCS_BUCKET}/eval ￦
–pipeline_config_path=gs://${GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

19. 학습현황 확인

ML Engine > Jobs 버튼 클릭으로 ML jobs 확인

20. 학습 결과 및 테스트 이미지 확인

tensorboard –logdir=gs://${GCS_BUCKET}

학습 완료 후 tensorboard를 통해 학습 결과 및 테스트 이미지를 확인할 수 있습니다.

21. 학습된 모델 Export

학습 완료 후 Cloud Storage에 저장된 check point 확인 후 export 할 check point 번호 선택

아래와 같이 export 완료 후 output_inference_graph.pb 폴더에서 생성된 모델을 확인할 수 있습니다.

export CHECKPOINT_NUMBER=2000
gsutil cp gs://${GCS_BUCKET}/train/model.ckpt-${CHECKPOINT_NUMBER}.* .
python object_detection/export_inference_graph.py ￦
–input_type image_tensor ￦
–pipeline_config_path object_detection/samples/configs/faster_rcnn_resnet101_pets.config ￦
–trained_checkpoint_prefix model.ckpt-${CHECKPOINT_NUMBER} ￦
–output_directory output_inference_graph.pb
cd output_inference_graph.pb

생성한 모델은 향후 Object Detection 응용 프로그램과 연동하여 사용할 수 있습니다.

연관 콘텐츠

구글 클라우드 플랫폼(GCP)에 대해 더 알고 싶으세요?
베스핀글로벌의 GCP 전문 엔지니어가 답해드립니다.

이전 글 [Step by Step] Amazon RDS 가용성 및 확장성 (Multi AZ 및 Read Replica) 다음 글 Google Cloud Platform에 IoT Analytics Pipeline 구축하기

GCP Machine Learning 엔진을 활용한 Object Detection