Haskellで数値計算線形代数編

BLASとLAPACK

数値計算では、連立線形方程式に帰着させられる問題はとても多い。また、固有値問題は分野を越えて応用範囲が広い。そのため、これらの問題を扱うための線形代数ライブラリは古くから活発に開発されてきた。中でも代表的なものは基礎的な行列、ベクトル演算を行うBLASと、その上で線形方程式系や最小2乗問題、固有値問題などを解くLAPACKで、幾つかの実装では最適化されており高速に動作するので、多くの言語のライブラリでも背景で利用されている。

HaskellでもhmatrixパッケージがBLAS/LAPACK バインディングを提供してくれている。BLAS/LAPACKの関数の呼び出しで処理出来る問題の場合、自分の実装でこの速度を超えるのは難しいので、出来るだけ利用して行きたい。

BLASとLAPACK
hmatrixの使い方
numpyやEigenとの比較

hmatrixの使い方

インストール

Haskellを使うのが初めての場合、まずはビルドツールstackから環境を構築するのがおすすめ。stack自体のインストールはこちらを参照。
Install/upgrade - The Haskell Tool Stack

hmatrixパッケージをインストールする。色々なパッケージを使うようになっても、プロジェクトごとに依存関係をstackが解決してくれる。

$ stack update
$ stack install hmatrix

cabalを直接使う場合は以下の通り。環境によってはsudoが必要。

$ cabal update
$ cabal install hmatrix

シェルでghciを起動して、:mでモジュールを読み込める。またファイルに書いたコードは、:lでロードできる。

$ stack ghci
> :m Numeric.LinearAlgebra
> :set prompt "> " -- promptの文字列を変えている ~/.ghciに書いてもよい
> :l [file]

この記事ではファイルに書いたコードをghciから:lでロードし、> に続くコードを実行して試していくことを想定している。main関数を持つファイルはghcでコンパイルして実行可能バイナリにもできる。

モジュールのimport。

import Numeric.LinearAlgebra
import Numeric.LinearAlgebra.Data

-- ghci
> :m Numeric.LinearAlgebra
> :m Numeric.LinearAlgebra.Data

lts-12.0からPreludeの(<>)という同名の関数と競合するようになってしまったので、Numeric.LinearAlgebraの方をqualified importするか、Preludeの方をhidingする必要がある。この記事では簡単のため、Preludeの方をhidingする。古いltsを使っている場合は、この行を消せば動くはず。

-- 名前空間を分ける
-- Numeric.LinearAlgebraの関数はL.付きで呼ぶ必要がある
import qualified Numeric.LinearAlgebra as L

-- Preludeの方を隠す
import Prelude hiding ((<>))

MatrixやVectorの作成と基本的な演算

主な関数を表にまとめてみた。詳細はパッケージのドキュメントを参照。
hmatrix: Numeric Linear Algebra

リストからの生成やリストへの変換

関数	型	説明
(><)	Int -> Int -> [a] -> Matrix a	行数と列数を指定する
fromLists	[ [a] ] -> Matrix a
toLists	Matrix a -> [ [a] ]
vector	[a] -> Vector a
fromList	[a] -> Vector a
toList	Vector a -> [a]
flatten	Matrix a -> Vector a	行ベクトルを1列につなげる
reshape	Int -> Vector a -> Matrix a	列数を指定する
toRows/Columns	Matrix a -> [Vector a]
fromRows/Columns	[Vector a] -> Matrix a
rand	Int -> Int -> IO (Matrix Double)	[0,1)一様分布擬似乱数
randn	Int -> Int -> IO (Matrix Double)	ガウス分布擬似乱数

基本的な2項演算

演算	左辺値	演算子	右辺値
$\bf{u} \pm \bf{v}$	Vector	+ -	Vector
$\bf{u} \cdot \bf{v}$	Vector	<.>	Vector
$\bf{u} \times \bf{v}$	Vector	cross	Vector
$A \pm B$	Matrix	+ -	Matrix
$A \bf{u}$	Matrix	#>	Vector
${\bf u} A$	Vector	<#	Matrix
$A B$	Matrix	<>	Matrix
$A \otimes B$	Matrix	kronecker	Matrix

要素ごとの操作など

操作	関数	説明
要素ごとの関数適用	cmap	(e -> b) -> c e -> c b
indexでのアクセス	!	行列の場合は行ベクトルに
行列のスライス	??	行と列の位置などを指定する m ?? (Pos (idxs [3,4]), Pos (idxs [1,0])) 3行と4行、1列と0列を取る m ?? (Take 3, Drop 2) 3行分を取り、2列分を捨てる
最大値最大値のindex	maxElement maxIndex
最小値最小値のindex	minElement minIndex
昇順にソートその位置の値のソート後のindex	sortVector sortIndex
L1/L2/L $\infty$ ノルム (Vector) 作用素ノルム (Matrix)	norm_1 norm_2 norm_Inf	$\mid\mid {\bf x} \mid\mid_p = \left( \sum_i \mid x_i \mid^p \right)^{1/p}$ $\mid\mid A \mid\mid = \sup_{\mid\mid {\bf x} \mid\mid_p = 1} \mid\mid A {\bf x} \mid\mid_p$ この行列で写された任意の単位ベクトルのノルムの上限
フロベニウスノルム	norm_Frob	$\mid\mid A \mid\mid_F = \sqrt{\sum_{i} \sum_{j} \mid a_{ij} \mid^2}$
行列式	det
対角成分をVectorに	takeDiag

特殊な行列の生成など

行列	関数	説明
$I_n$	ident n	n次単位行列
$a_{ij} = \delta_{ij} x_i$	diag x	対角成分から行列をつくる
$^t A$	tr	転置行列
$A^{-1}$	inv	逆行列
$(A + ^t A) / 2$	sym	対称行列をつくる
$^t AA$	mTm	転置行列との積

3次元の回転行列を使って座標を回転させてみる。

import Prelude hiding ((<>))
import Numeric.LinearAlgebra

rotateX :: Double -> Matrix Double
rotateX t = (3><3) [1.0,   0.0,    0.0,
                    0.0, cos t, -sin t,
                    0.0, sin t,  cos t]

rotateY :: Double -> Matrix Double
rotateY t = (3><3) [ cos t, 0.0, sin t,
                       0.0, 1.0,   0.0,
                    -sin t, 0.0, cos t]

rotateZ :: Double -> Matrix Double
rotateZ t = (3><3) [cos t, -sin t, 0.0,
                    sin t,  cos t, 0.0,
                      0.0,    0.0, 1.0]

> (rotateX pi <> rotateY pi <> rotateZ pi) #> vector [1.0,1.0,1.0]
[1.0000000000000004,1.0,0.9999999999999998]

連立線形方程式

連立線形方程式は行列を使って $A \bf{x} = \bf{b}$ のように書けるので、左から $A^{-1}$ を掛ければ解くことが出来るが、とても小さな行列以外では、クラメルの公式で逆行列を計算するのはコストが高い。そこで、 $A$ を扱いやすい行列の積に分解したり、要素の反復計算で漸近するなど色々な工夫が行われている。

hmatrixで用意されている主な線形ソルバは以下の通り。基本的に行列を特殊な行列の積に分解することで解いてくれる。引数は $\bf{x}$ と $\bf{b}$ を行列化した $AX = B$ の形であたえられる。行列分解を1度計算しておけば、同じ $A$ に対する複数の $\bf{x}$ と $\bf{b}$ の組み合わせを効率的に解くことができる。

関数	対象	説明
<\>	優決定系/劣決定系を含む	汎用線形ソルバ式が多すぎる系や足りない系でもノルムを最小にするような解を計算してくれる
linearSolve	正則行列	正則でない場合を考慮してMaybeを返す中身はluSolve
luSolve	正則行列	$A$ をLU分解 (luPacked) して渡す 1000 $\times$ 1000程度までの小さな正則行列なら基本的にこれを使えば問題ない
ldlSolve	エルミート行列 (実対称行列を含む)	$A$ をLDL分解 (ldlPacked) して渡す修正コレスキー分解制約があるが速い
cholSolve	正定値のエルミート行列	$A$ をコレスキー分解 (chol) して渡す制約があるが速い
triSolve	上三角行列または下三角行列	制約があるがとても速い QR分解 (qr) した上三角行列も解ける $QR{\bf x} = {\bf b}$ $R{\bf x} = ^t Q {\bf b}$
triDiagSolve	三重対角行列	制約があるがとても速い
cgSolve	正定値のエルミート行列疎行列を想定したindex表現であるGMatrixであたえる	共役勾配法 (反復法の一種) 状況によってはとても速いが収束しないこともある大規模な疎行列の時有利

エルミート行列は転置して複素共役を取ると自身と一致するような行列。虚部がすべて0なら実対称行列になる。正定値行列は固有値がすべて正の実数であるような行列。基本的に対象となる行列への制約が強いソルバほど速いので、解こうとしている行列が制約をみたすなら対応するソルバを使うと良い。

乱数で初期化した1000 $\times$ 1000行列をLU分解を使って解いて時間を測ってみる。簡単のためGHCのStrict拡張で正格評価にしている。

{-# LANGUAGE Strict #-}

import Prelude hiding ((<>))
import System.CPUTime
import Numeric.LinearAlgebra

randomLU :: Int -> IO ()
randomLU n = do
  -- 乱数で初期化
  a <- rand n n
  b <- rand n 1
  -- 時間を計測
  start <- getCPUTime
  let x = luSolve (luPacked a) b
  end <- getCPUTime
  -- 誤差を計算
  print $ fromIntegral (end - start) / 1e12
  print $ norm_2 (a <> x - b) / norm_2 b

> randomLU 1000
8.3574e-2
3.1152329739293184e-13

おそらくLAPACKの関数の実行速度が出ている。LU分解ではどの言語のどのような実装でも並列化しなければこれ以上速くするのは難しい気がする。さらに速度を求める場合は、(収束性を考慮する必要が出てくるが) 共役勾配法やSOR法のような反復法を使ったほうが良いかもしれない。

参考文献

連立線形方程式の数値解法についてはこの辺りがわかりやすかった。

http://ri2t.kyushu-u.ac.jp/~watanabe/RESERCH/MANUSCRIPT/KOHO/GEPP/GEPP.pdf
http://nkl.cc.u-tokyo.ac.jp/16n/SolverDirect.pdf
http://nkl.cc.u-tokyo.ac.jp/16n/SolverIterative.pdf
Eigen: Linear algebra and decompositions
Eigen: Benchmark of dense decompositions
Eigen: Solving Sparse Linear Systems

固有値と固有ベクトル

$A \bf{x} = \lambda \bf{x}$ ( $\bf{x} \neq \bf{0}$ ) を満たすような $\lambda$ を固有値といい、 $\bf{x}$ を固有ベクトルという。 $A$ が $n \times n$ 行列の時、複素数値も含めて、 $n$ 個の固有値と対応する $n$ 個の固有ベクトルが存在する (重解でなければ) 。これらをすべて求めるのが固有値問題。

関数	対象	説明
eig	一般	固有値のVectorと固有ベクトルのMatrixをタプルで返す
eigenvalues	一般	固有値のみを求める場合
eigSH	エルミート行列	制約があるが速い
eigenvaluesSH	エルミート行列	固有値のみを求める場合
geigSH	エルミート行列の一般化固有値問題	$A{\bf x} = \lambda B {\bf x}$ $A$ 、 $B$ はエルミート行列、 $B$ は正定値

主成分分析を固有値問題として解いてみる。

import Prelude hiding ((<>))
import Numeric.LinearAlgebra
import Numeric.LinearAlgebra.Data

pca :: Matrix Double -> Vector Double
pca x = let n = fromIntegral $ cols x
            -- 各成分の平均
            avg = vector . map ((/n) . sumElements) $ toRows x
            -- 中心化
            ctr = fromColumns . map (+ (-avg)) $ toColumns x
            vcm = cmap (/n) $ ctr <> tr ctr
            -- 共分散行列の固有値と固有ベクトルを求める
            (s, w) = eigSH . trustSym $ vcm
            -- 分散が最大になる方向を選択する
        in toColumns w !! maxIndex s

randomPca :: Int -> Int -> IO ()
randomPca l m = do
  -- l次元の成分を持つ標本をm個つくる
  x <- rand l m
  let v = pca x
  -- 第1主成分ベクトル
  print v
  -- 第1主成分軸方向への射影
  print . map (<.> v) $ toColumns x

> randomPca 3 10

1000 $\times$ 1000行列について、eig関数の処理時間も測ってみる。基本が非正格評価なので式ごとの処理時間の計測などは簡単ではなさそうなのだが、GHCのStrict拡張やseq関数などを使うと、妥当そうな結果が得られることもある。

randomEigen :: Int -> IO ()
randomEigen n = do
  a <- rand n n

  start <- getCPUTime
  let (s, w) = eig a
  end <- seq w getCPUTime

  print $ fromIntegral (end - start) / 1e12
  print $ norm_2 (complex a <> w - w <> diag s) / norm_2 s

> randomEigen 1000
2.90387
2.2744873540326555e-15

参考文献

固有値問題の数値解法についてはこの辺りがわかりやすかった。

http://nalab.mind.meiji.ac.jp/~mk/labo/text/eigenvalues.pdf
http://www.r-ccs.riken.jp/r-ccssite/wp-content/uploads/2014/03/sps14_enshu2-1.pdf
Eigen: Catalogue of dense decompositions

numpyやEigenとの比較

同様の扱いやすいライブラリであるPythonのnumpyやC++のEigenについてもサンプルコードを書いてみた。また、関数呼び出し部分のみ手元の環境での実行時間を測ってみた。100回計測して平均を取っている。

	線形方程式系 1000 $\times$ 1000	線形方程式系 200 $\times$ 200	固有値問題 200 $\times$ 200	説明
hmatrix	0.08489044s	0.00156016s	0.05794971s	並列化しないなら repaより速い
numpy	0.0847463s	0.0014532s	0.0577511s	処理はできるだけ numpyで完結したい
Eigen	0.0618023s	0.00073476s	0.0619551s	速度的にはBoost よりEigenが良い

環境によって多少の前後はあるかもしれないが、すべて同等の速度だと見ていいと思う。numpyは後ろでLAPACKを使っていた気がするので当たり前の結果かもしれない。Eigenは選択できる模様。

Eigen: Using BLAS/LAPACK from Eigen

小さなコードでは書きやすさもあまり変わらないが、大きなアプリケーションになった時はどうだろうか。計測に使用したコードは以下の通り。

hmatrix

{-# LANGUAGE Strict #-}

import Prelude hiding ((<>))
import Numeric.LinearAlgebra
import System.CPUTime
import System.Environment (getArgs)

randomLU :: Int -> IO Double
randomLU n = do
  a <- rand n n
  b <- rand n 1

  start <- getCPUTime
  let x = luSolve (luPacked a) b
  end <- seq x getCPUTime

  let t = fromIntegral (end - start) / 1e12
  print t
  print $ norm_2 (a <> x - b) / norm_2 b
  return t

randomEigen :: Int -> IO Double
randomEigen n = do
  a <- rand n n

  start <- getCPUTime
  let (s, w) = eig a
  end <- seq w getCPUTime

  let t = fromIntegral (end - start) / 1e12
  print t
  print $ norm_2 (complex a <> w - w <> diag s) / norm_2 s
  return t

main :: IO ()
main = do
  [n,r] <- map read <$> getArgs
  t <- mapM randomLU $ replicate r n
--  t <- mapM randomEigen $ replicate r n
  print $ sum t / fromIntegral r

$ stack ghc -- -o lh -O3 la.hs
$ ./lh 1000 100

numpy

import numpy as np
import numpy.linalg as la
import sys
import time

def randomSLE(rank):
    a = np.random.rand(rank, rank)
    b = np.random.rand(rank)

    start = time.process_time()
    x = la.solve(a, b)
    end = time.process_time()

    t = end - start
    print(f'duration: {t:.7f}s')
    print(la.norm(np.dot(a, x) - b) / la.norm(b))
    return t

def randomEigen(rank):
    a = np.random.rand(rank, rank)

    start = time.process_time()
    s, w = la.eig(a)
    end = time.process_time()

    t = end - start
    print(f'duration: {t:.7f}s')

    d = np.matmul(a, w) - np.dot(w, s)
#    d = np.matmul(a, w) - np.matmul(w, np.dot(np.identity(rank), s))
    print(la.norm(d) / la.norm(s))
    return t

rank = int(sys.argv[1])
repeat = int(sys.argv[2])

t = 0
for i in range(repeat):
    t += randomSLE(rank)
#    t += randomEigen(rank)

print(f'average: {t / repeat:.7f}s')

$ python3 la.py 1000 100

Eigen

#include <chrono>
#include <iostream>
#include <Eigen/Dense>

#define NDEBUG
#define EIGEN_NO_DEBUG

using namespace Eigen;

double randomLU(const int rank) {
  MatrixXd a = MatrixXd::Random(rank, rank);
  MatrixXd b = MatrixXd::Random(rank, 1);

  const auto start = std::chrono::system_clock::now();
  MatrixXd x = a.lu().solve(b);
  const auto end = std::chrono::system_clock::now();

  const double elapsed = std::chrono::duration<double>(end - start).count();
  std::cout << "duration: " << elapsed << "s" << std::endl;

  MatrixXd d = a * x - b;
  std::cout << d.norm() / b.norm() << std::endl;

  return elapsed;
}

double randomEigen(const int rank) {
  MatrixXd a = MatrixXd::Random(rank, rank);

//  ComplexEigenSolver<MatrixXd> es;
  EigenSolver<MatrixXd> es;

  const auto start = std::chrono::system_clock::now();
  es.compute(a);
  MatrixXcd s = es.eigenvalues();
  MatrixXcd w = es.eigenvectors();
  const auto end = std::chrono::system_clock::now();

  const double elapsed = std::chrono::duration<double>(end - start).count();
  std::cout << "duration: " << elapsed << "s" << std::endl;
  MatrixXcd d = a * w - w * s.asDiagonal();
  std::cout << d.norm() / s.norm() << std::endl;

  return elapsed;
}

int main(int argc, char *argv[]) {
  const int rank = std::atoi(argv[1]);
  const int repeat = std::atoi(argv[2]);

  double t = 0.0;
  for (int i = 0; i < repeat; i++) {
    t += randomLU(rank);
//    t += randomEigen(rank);
  }
  std::cout << "average: " << t / repeat << "s" << std::endl;

  return 0;
}

$ clang++ -Ofast -march=native -std=c++14 la.cpp -o lc
$ ./lc 1000 100