【C#】Tesseract.Net SDK使って文字読み取って認識する【OCR】

準備
英語の読み取り
- 結果（上が読み取る画像、下が読み取った文字列）
日本語の読み取り
- 結果（上が読み取る画像、下が読み取った文字列）

準備

NugetからTesseract.Net SDKを取ってきます。
f:id:shirakamisauto:20160128120159p:plain
以下のusingディレクティブを追加します。

using Patagames.Ocr;
using Patagames.Ocr.Enums;

英語の読み取り

コード例：

using System;
using Patagames.Ocr;
using Patagames.Ocr.Enums;

namespace OCRTest
{
    class Program
    {
        static readonly string filepath = @"C:\Users\xxxxx\Desktop\sample.png";

        static void Main(string[] args)
        {
            using (var api = OcrApi.Create())
            {
                api.Init(Languages.English);
                string plainText = api.GetTextFromImage(filepath);
                Console.WriteLine(plainText);
            }
            Console.ReadLine();
        }
    }
}

結果（上が読み取る画像、下が読み取った文字列）

f:id:shirakamisauto:20160128120024p:plain
だいたいあってます。
中途半端な位置に配置するとうまく読み取ってくれないようです。（Go is Godが分離してる）

日本語の読み取り

日本語はここから学習データをDLします。
それをソリューションフォルダ＞プロジェクトフォルダ＞binフォルダ＞Debug/Releaseフォルダのtessdataフォルダに突っ込みます。
api.InitメソッドでLanguages.Japaneseを渡せばOKです。
ほかの言語も同様の手順でできます。（多分）

コード例（api.Initメソッドの引数が変わっただけ）：

using System;
using Patagames.Ocr;
using Patagames.Ocr.Enums;

namespace OCRTest
{
    class Program
    {
        static readonly string filepath = @"C:\Users\xxxxx\Desktop\sample.png";

        static void Main(string[] args)
        {
            using (var api = OcrApi.Create())
            {
                api.Init(Languages.Japanese);
                string plainText = api.GetTextFromImage(filepath);
                Console.WriteLine(plainText);
            }
            Console.ReadLine();
        }
    }
}