非NVIDIAなGPUでディープラーニング可能なPlaidMLをMacで試してみた。統合GPUで

PlaidML ロゴ

最近のMacに搭載されているdGPUはAMD製なのでCUDAが使えず、マカーなディープラーニング勢はどうしてんの？と本気でわかっていないところです。eGPUでNVIDIAという手もMojaveから塞がれてしまいました。

そんな中、NVIDIA以外のGPUでディープラーニングのGPUアクセラレーションを行えるPlaidML（プレッドエムエル）というフレームワークがあることを知りました。Kerasのバックエンドとして使えます。浅ーく試してみます。

github.com

なお、筆者はディープラーニング素人なので本記事には至らないところがあると思います。情報をお持ちの方は（優しく）教えてください。

参考記事

stella-log.hatenablog.com

qiita.com

canplay-music.com

実行環境

MacBook Pro (Retina, 13-inch, Early 2015)
macOS 10.14.6 Mojave
メモリ：16 GB
GPU：Intel Iris Graphics 6100

実験

本題に入ります。

機械学習となると、バージョン依存とかいろいろあると思うのでVirtualenvを入れます。

$ pip3 install virtualenv

この（システムの）pipはPython 3.7.4付属のpip 19.2.1です。

適当なディレクトリで仮想環境を作ります。

$ virtualenv plaidml

仮想環境をアクティベートします。

$ . plaidml/bin/activate

仮想環境を抜けるときは、$ deactivateです。

この真っさらな環境には、pip、setuptools、wheelだけ入っています。

(plaidml) $ pip list
Package    Version
---------- -------
pip        19.2.1
setuptools 41.0.1
wheel      0.33.4

plaidml-kerasとベンチマーク用のplaidbenchをインストール。

(plaidml) $ pip install plaidml-keras plaidbench

最初、インストール中にWARNING: RECORD line has more than three elementsと警告が出たのですが、リリースされたばかりのPlaidML 0.6.4を早速入れ直してみたところ大丈夫でした。

インストールされたパッケージのバージョンは全部でこちら。

cffi                1.12.3
Click               7.0
colorama            0.4.1
enum34              1.1.6
h5py                2.9.0
Keras               2.2.4
Keras-Applications  1.0.8
Keras-Preprocessing 1.1.0
numpy               1.17.0
pip                 19.2.1
plaidbench          0.6.4
plaidml             0.6.4
plaidml-keras       0.6.4
pycparser           2.19
PyYAML              5.1.2
scipy               1.3.0
setuptools          41.0.1
six                 1.12.0
wheel               0.33.4

専用のコマンドを使用してPlaidMLのセットアップを行います。

(plaidml) $ plaidml-setup

PlaidML Setup (0.6.4)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues: https://github.com/plaidml/plaidml
  * Questions: https://stackoverflow.com/questions/tagged/plaidml
  * Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
  * PlaidML is licensed under the Apache License 2.0


Default Config Devices:
   metal_intel(r)_iris(tm)_graphics_6100.0 : Intel(R) Iris(TM) Graphics 6100 (Metal)

当該GPUは検証済みのようです。

Experimental Config Devices:
   llvm_cpu.0 : CPU (LLVM)
   opencl_intel_iris(tm)_graphics_6100.0 : Intel Inc. Intel(R) Iris(TM) Graphics 6100 (OpenCL)
   opencl_cpu.0 : Intel CPU (OpenCL)
   metal_intel(r)_iris(tm)_graphics_6100.0 : Intel(R) Iris(TM) Graphics 6100 (Metal)

Using experimental devices can cause poor performance, crashes, and other nastiness.

Enable experimental device support? (y,n)[n]:y

選択肢を増やすため、実験的にサポートされているデバイスも有効にしてみました。

Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:

   1 : llvm_cpu.0
   2 : opencl_intel_iris(tm)_graphics_6100.0
   3 : opencl_cpu.0
   4 : metal_intel(r)_iris(tm)_graphics_6100.0

Default device? (1,2,3,4)[1]:4

IrisのMetalを試してみました。

Selected device:
    metal_intel(r)_iris(tm)_graphics_6100.0

Almost done. Multiplying some matrices...
Tile code:
  function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.

行列の乗算は正常に動きました。

Save settings to /Users/****/.plaidml? (y,n)[y]:y
Success!

ホーム下に設定の保存を許可し、セットアップが成功したようです。

わかりにくくてすみませんが、0.6.4でもtrainingに失敗する状況は変わらなかったので、無精にもここから下はPlaidML 0.6.3での結果をそのまま掲載しています。

ベンチマークを実行してみます。

(plaidml) $ plaidbench keras mobilenet
Running 1024 examples with mobilenet, batch size 1, on backend plaid
INFO:plaidml:Opening device "metal_intel(r)_iris(tm)_graphics_6100.0"
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
17227776/17225924 [==============================] - 5s 0us/step
Compiling network... Warming up... Running...
Example finished, elapsed: 5.844s (compile), 24.208s (execution)

-----------------------------------------------------------------------------------------
Network Name         Inference Latency         Time / FPS
-----------------------------------------------------------------------------------------
mobilenet            23.64 ms                  628.02 ms / 1.59 fps
Correctness: PASS, max_error: 9.645211321185343e-06, max_abs_error: 7.897615432739258e-07, fail_ratio: 0.0

Correctness: PASSと出たので成功しました。実行時間は24.208秒。

trainingも試してみます。

(plaidml) $ plaidbench --batch-size 16 keras --train mobilenet
Running 1024 examples with mobilenet, batch size 16, on backend plaid
Loading CIFAR data
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 18s 0us/step
INFO:plaidml:Opening device "metal_intel(r)_iris(tm)_graphics_6100.0"
Compiling network...Epoch 1/1
INFO:plaidml:Analyzing Ops: 1076 of 2263 operations complete
16/16 [==============================] - 40s 3s/step - loss: 4.3766 - acc: 0.0000e+00
 Warming up...Epoch 1/1
32/32 [==============================] - 5s 171ms/step - loss: nan - acc: 0.0625
 Running...
Epoch 1/1
1024/1024 [==============================] - 174s 169ms/step - loss: nan - acc: 0.1006
Example finished, elapsed: 40.384s (compile), 173.551s (execution)

-----------------------------------------------------------------------------------------
Network Name         Inference Latency         Time / FPS
-----------------------------------------------------------------------------------------
mobilenet            169.48 ms                 10600.41 ms / 0.09 fps
Correctness: untested. Could not find golden data to compare against.

nanが出てダメでした。

batch sizeを1にしても、nanは出ませんでしたが失敗です。

(plaidml) $ plaidbench --batch-size 1 keras --train mobilenet
Running 1024 examples with mobilenet, batch size 1, on backend plaid
Loading CIFAR data
INFO:plaidml:Opening device "metal_intel(r)_iris(tm)_graphics_6100.0"
Compiling network...Epoch 1/1
INFO:plaidml:Analyzing Ops: 1327 of 2258 operations complete
1/1 [==============================] - 33s 33s/step - loss: 8.1627 - acc: 0.0000e+00
 Warming up...Epoch 1/1
32/32 [==============================] - 7s 211ms/step - loss: 5.0959 - acc: 0.1562
 Running...
Epoch 1/1
1024/1024 [==============================] - 226s 220ms/step - loss: 2.9138 - acc: 0.1611
Example finished, elapsed: 33.439s (compile), 225.611s (execution)

-----------------------------------------------------------------------------------------
Network Name         Inference Latency         Time / FPS
-----------------------------------------------------------------------------------------
mobilenet            220.32 ms                 12580.38 ms / 0.08 fps
Correctness: FAIL, max_error: 0.49545086008182004, max_abs_error: 4.044239853858016, fail_ratio: 1.0

OpenCLに切り替えてやってみました。

(plaidml) $ plaidbench keras mobilenet
Running 1024 examples with mobilenet, batch size 1, on backend plaid
INFO:plaidml:Opening device "opencl_intel_iris(tm)_graphics_6100.0"
Compiling network... Warming up... Running...
Example finished, elapsed: 5.431s (compile), 31.153s (execution)

-----------------------------------------------------------------------------------------
Network Name         Inference Latency         Time / FPS
-----------------------------------------------------------------------------------------
mobilenet            30.42 ms                  23.86 ms / 41.92 fps
Correctness: PASS, max_error: 8.218973562179599e-06, max_abs_error: 1.341104507446289e-06, fail_ratio: 0.0

実行時間31.153秒。inferenceはOKなのですが、

(plaidml) $ plaidbench --batch-size 1 keras --train mobilenet
Running 1024 examples with mobilenet, batch size 1, on backend plaid
Loading CIFAR data
INFO:plaidml:Opening device "opencl_intel_iris(tm)_graphics_6100.0"
Compiling network...Epoch 1/1
INFO:plaidml:Analyzing Ops: 1051 of 2258 operations complete
1/1 [==============================] - 31s 31s/step - loss: 8.1767 - acc: 0.0000e+00
 Warming up...Epoch 1/1
32/32 [==============================] - 9s 291ms/step - loss: 5.1407 - acc: 0.1562
 Running...
Epoch 1/1
ERROR:plaidml:Unable to read profiling info for CL_PROFILING_COMMAND_QUEUED: CL_PROFILING_INFO_NOT_AVAILABLE
ERROR:plaidml:Unable to read profiling info for CL_PROFILING_COMMAND_QUEUED: CL_PROFILING_INFO_NOT_AVAILABLE
  17/1024 [..............................] - ETA: 5:48 - loss: 3.4137 - acc: 0.2941Segmentation fault: 11

途中でエラーが出てダメでした。

plaidbenchのヘルプ。

(plaidml) $ plaidbench --help
Usage: plaidbench [OPTIONS] COMMAND [ARGS]...

  PlaidML Machine Learning Benchmarks

  plaidbench runs benchmarks for a variety of ML framework,
  framework backend, and neural network combinations.

  For more information, see
  http://www.github.com/plaidml/plaidbench

Options:
  -v, --verbose
  -n, --examples INTEGER          Number of examples to use (over
                                  all epochs)
  --blanket-run                   Run all networks at a range of
                                  batch sizes, ignoring the --batch-
                                  size and --examples options and
                                  the choice of network.
  --results DIRECTORY             Destination directory for results
                                  output
  --callgrind / --no-callgrind    Invoke callgrind during timing
                                  runs
  --epochs INTEGER                Number of epochs per test
  --batch-size INTEGER
  --timeout-secs INTEGER
  --warmup / --no-warmup          Do warmup runs before main timing
  --kernel-timing / --no-kernel-timing
                                  Emit kernel timing info
  --print-stacktraces / --no-print-stacktraces
                                  Print a stack trace if an
                                  exception occurs
  --help                          Show this message and exit.

Commands:
  keras  Benchmarks Keras neural networks.
  onnx   Benchmarks ONNX models.

(plaidml) $ plaidbench keras --help
Usage: plaidbench keras [OPTIONS] [[densenet121|densenet169|densenet
                        201|inception_resnet_v2|inception_v3|mobilen
                        et|mobilenet_v2|nasnet_large|nasnet_mobile|r
                        esnet50|vgg16|vgg19|xception|imdb_lstm]]...

  Benchmarks Keras neural networks.

Options:
  --plaid                         Use PlaidML as the backend
  --tensorflow                    Use TensorFlow as the backend
  --fp16 / --no-fp16              Use half-precision floats,
                                  settings floatx='float16'
  --train / --no-train            Measure training performance
                                  instead of inference
  --tile FILE                     Save network to *.tile file
  --fix-learn-phase / --no-fix-learn-phase
                                  Set the Keras learning_phase to an
                                  integer (rather than an input
                                  tensor)
  --help                          Show this message and exit.

Supported Networks:
  densenet121, densenet169, densenet201, inception_resnet_v2,
  inception_v3, mobilenet, mobilenet_v2, nasnet_large,
  nasnet_mobile, resnet50, vgg16, vgg19, xception, imdb_lstm

mobilenet以外にこれらのネットワークが使えるようです。

ですが、mobilenet_v2を試してみましたが同様にダメでした。

（私の環境だと）バージョンアップに期待でしょうか？　trainingに使えないんだったら…　次はどうしよう。

[追記 2019/08/09] こちらのテストコードに下の2行を追加して試してみましたが、同様にダメだったのでplaidbenchのせいではないようです。

https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py

また、現状ではKerasのサポートはPython 3.6までなのでそのせいかと思い、Python 3.6.9をインストールして試してみましたがダメでした。 [追記ここまで]

自作のコードで使う

以下の2行を加えるだけです。

import plaidml.keras
plaidml.keras.install_backend()

（ほかの）いずれのKerasモジュールのインポートよりも前に置くこと。

参考：plaidml.keras — PlaidML documentation

詳解ディープラーニング ~TensorFlow・Kerasによる時系列データ処理~

作者: 巣籠悠輔
出版社/メーカー: マイナビ出版
発売日: 2017/05/30
メディア: 単行本（ソフトカバー）
この商品を含むブログ (5件) を見る

cBlog

Tips for you.

非NVIDIAなGPUでディープラーニング可能なPlaidMLをMacで試してみた。統合GPUで

参考記事

実行環境

実験

自作のコードで使う