SWF AVM2 ABC バイナリ解析始めました

SWF AVM2 ABC バイナリ形式は以下の所で説明されてます。

Constant Pool の分解で今日は挫折。おやすみなさい。

SWF ヘッダとタグの分割

今回は IO_SWF を使わず、どこまでベタに作れるかに挑戦。
30分位で書けました。これまでは慣れてるので簡単。

function swfparse($data) {
    $swf = Array();
    $swf['data'] = $data;
    $swf['sig'] = substr($data, 0, 3);
    $swf['version'] = ord($data[3]);
    $fileLength = unpack_once('V', substr($data, 4, 4));
    $swf['length'] = $fileLength;
    if ($swf['sig'] === 'CWS') {
        $data = substr($data, 0, 8) . gzuncompress(substr($data, 8));
        $swf['data'] = $data;
    } else if ($swf['sig'] === 'ZWS') {
        $data = substr($data, 0, 8) . lzuncompress(substr($data, 8));
        $swf['data'] = $data;
    }
    $nbit = ord($data[8]) >> 3; // nbit of rectangle
    $movieheaderLength = ceil((5 + 4 * $nbit)/8) + 4;
    $swf['movieOffset'] = 8 + $movieheaderLength;
    $offset = $swf['movieOffset'];
    $tagRefs = array();
    while ($offset + 2 < $fileLength) {
        $tag = array('offset' => $offset);
        $tag_length = unpack_once('v', substr($data, $offset, 2));
        $tag['code'] = $tag_length >> 6;
        $length = $tag_length & 0x3f;
        if ($length < 0x3f) {
            $tag['length'] = $length;
            $tag['payloadOffset'] = $offset + 2;
            $offset += 2 + $length;
        } else {
            $length = unpack_once('V', substr($data, $offset + 2, 4));
            $tag['length'] = $length;
            $tag['payloadOffset'] = $offset + 2 + 4;
            $offset += 2 + 4 + $length;
        }
        $tagRefs [] = $tag;
    }
    $swf['tagRefs'] = $tagRefs;
    return $swf;
}

AVM2 ABC バイナリ

major_version, minor_version は大丈夫。
だけど、その後ろに続くデータが全然駄目。

function unpack_once($v, $f) { $t = unpack($v, $f) ; return $t[1]; }
// ABC primitive data types
function read_u16($data, &$offset) {
    return ord($data[$offset++]) + ord($data[$offset++]) * 0x100;
}

function read_u30($data, &$offset) {
    $v = 0;
    for ($i = 0 ; $i < 5 ; $i++) {
        $d = ord($data[$offset++]);
        $v += ($d & 0x7f) << (7 * $i);
        if (!($d & 0x80)) {
            break;
        }
    }
    return $v;
}

function read_s32($data, &$offset) {
    $origOffset = $offset;
    $v = read_u30($data, &$offset);
    $byteLen = $offset - $origOffset;
    if ($v >> (7 * $byteLen - 1)) { //sign bit
        $v = $v - (1 << (7 * $byteLen));
    }
    return $v;
}

function read_u32($data, &$offset) {
    return read_u30($data, $offset); // XXX
}

function read_d64($data, &$offset) {
    $v = unpack('d', substr($data, $offset, 8)); // XXX: 64bit double
    $offset += 8;
    return $v[1];
}

function read_string($data, &$offset) {
    $len = read_u30($data, $offset);
    $str = substr($data, $offset, $len);
    $offset += $len;
    return $str;
}
function swfas3strreplace($swf, $replaceTable) {
    $data = $swf['data'];
    foreach ($swf['tagRefs'] as &$tag) {
        if ($tag['code'] !== 82) {
            continue; // skip if is not DoABC tag.
        }
        $offset = $tag['payloadOffset'];
        $length = $tag['length'];
        // version
        $minor_version = read_u16($data, $offset);
        $major_version = read_u16($data, $offset);
        echo "version:$major_version.$minor_version\n";
        // constant pool
        $int_count = read_u30($data, $offset);
        echo "int_count:$int_count\n";
        for ($i = 0 ; $i < $int_count ; $i++) {
            $integer = read_s32($data, $offset);
            echo "\t[$i]: $integer\n";
        }
        $uint_count = read_u30($data, $offset);
        echo "uint_count:$uint_count\n";
        for ($i = 0 ; $i < $uint_count ; $i++) {
            $uinteger = read_u32($data, $offset);
            echo "\t[$i]: $uinteger\n";
        }
        $double_count = read_u30($data, $offset);
        echo "double_count:$double_count\n";
        for ($i = 0 ; $i < $double_count ; $i++) {
            $double = read_d64($data, $offset);
            echo "\t[$i]: $double\n";
        }
        $string_count = read_u30($data, $offset);
        echo "string_count:$string_count\n";
        for ($i = 0 ; $i < $string_count ; $i++) {
//            $string = read_string($data, $offset);
//            echo "\t[$i]: $string\n";
        }
    }
}

read_string でデータの末尾をはみ出るので間違いがはっきりするけど、恐らくその前に壊れてるはず。

また明日頑張ろう。