Subscribed unsubscribe Subscribe Subscribe

Hateburo: kazeburo hatenablog

Operations Engineer / Site Reliability / 運用系小姑 / Perl Monger

application/x-www-form-urlencoded パーサーの動作を決める

深淵な理由があって2014年に application/x-www-form-urlencoded のパーサーを作ることになるとして仕様を考える

基本はW3CSPECを参考にしつつ、これまでのアプリケーションとの互換性を保つことを目標とする

  1. application/x-www-form-urlencoded ペイロードを "&" (U+0026) または ";" (U+003B) を使って分割する
  2. name-valueを格納する配列を用意
  3. 分割された文字列を次のように処理する
    1. 文字列の最初の文字が " " (U+0020) であればそれを削除
    2. 文字列に"="が含まれていれば、最初の"="までの文字をnameとし、残りの文字をvalueとする。最初の"="以降に文字がなければvalueは空文字。"="が文字列の最初の文字であればkeyを空文字とする。文字列に"="が含まれていない場合、文字列のすべてをnameとし、valueは空文字列とする。
    3. 全ての "+" (U+002B) を " " (U+0020) に入れ替える
    4. nameとvalueをunescapeし、配列に格納(push)する
  4. 配列を返す

テストデータはこんな感じになるかな

'a=b&c=d'     => ["a","b","c","d"]
'a=b;c=d'     => ["a","b","c","d"]
'a=1&b=2;c=3' => ["a","1","b","2","c","3"]
'a==b&c==d'   => ["a","=b","c","=d"]
'a=b& c=d'    => ["a","b","c","d"]
'a=b; c=d'    => ["a","b","c","d"]
'a=b; c =d'   => ["a","b","c ","d"]
'a=b;c= d '   => ["a","b","c"," d "]
'a=b&+c=d'    => ["a","b"," c","d"]
'a=b&+c+=d'   => ["a","b"," c ","d"]
'a=b&c=+d+'   => ["a","b","c"," d "]
'a=b&%20c=d'  => ["a","b"," c","d"]
'a=b&%20c%20=d' => ["a","b"," c ","d"]
'a=b&c=%20d%20' => ["a","b","c"," d "]
'a&c=d'       => ["a","","c","d"]
'a=b&=d'      => ["a","b","","d"]
'a=b&='       => ["a","b","",""]
'&'           => ["","","",""]
'='           => ["",""]
''            => []