Update html5lib-1.1

This commit is contained in:
JonnyWong16 2021-10-14 22:49:47 -07:00
parent 3a116486e7
commit 586fd15464
No known key found for this signature in database
GPG key ID: B1F1F9807184697A
142 changed files with 90234 additions and 2393 deletions

View file

@ -0,0 +1,107 @@
Tokenizer tests
===============
The test format is [JSON](http://www.json.org/). This has the advantage
that the syntax allows backward-compatible extensions to the tests and
the disadvantage that it is relatively verbose.
Basic Structure
---------------
{"tests": [
    {"description": "Test description",
    "input": "input_string",
    "output": [expected_output_tokens],
    "initialStates": [initial_states],
    "lastStartTag": last_start_tag,
"errors": [parse_errors]
    }
]}
Multiple tests per file are allowed simply by adding more objects to the
"tests" list.
Each parse error is an object that contains error `code` and one-based
error location indices: `line` and `col`.
`description`, `input` and `output` are always present. The other values
are optional.
### Test set-up
`test.input` is a string containing the characters to pass to the
tokenizer. Specifically, it represents the characters of the **input
stream**, and so implementations are expected to perform the processing
described in the spec's **Preprocessing the input stream** section
before feeding the result to the tokenizer.
If `test.doubleEscaped` is present and `true`, then `test.input` is not
quite as described above. Instead, it must first be subjected to another
round of unescaping (i.e., in addition to any unescaping involved in the
JSON import), and the result of *that* represents the characters of the
input stream. Currently, the only unescaping required by this option is
to convert each sequence of the form \\uHHHH (where H is a hex digit)
into the corresponding Unicode code point. (Note that this option also
affects the interpretation of `test.output`.)
`test.initialStates` is a list of strings, each being the name of a
tokenizer state which can be one of the following:
- `Data state`
- `PLAINTEXT state`
- `RCDATA state`
- `RAWTEXT state`
- `Script data state`
- `CDATA section state`
The test should be run once for each string, using it
to set the tokenizer's initial state for that run. If
`test.initialStates` is omitted, it defaults to `["Data state"]`.
`test.lastStartTag` is a lowercase string that should be used as "the
tag name of the last start tag to have been emitted from this
tokenizer", referenced in the spec's definition of **appropriate end tag
token**. If it is omitted, it is treated as if "no start tag has been
emitted from this tokenizer".
### Test results
`test.output` is a list of tokens, ordered with the first produced by
the tokenizer the first (leftmost) in the list. The list must mach the
**complete** list of tokens that the tokenizer should produce. Valid
tokens are:
["DOCTYPE", name, public_id, system_id, correctness]
["StartTag", name, {attributes}*, true*]
["StartTag", name, {attributes}]
["EndTag", name]
["Comment", data]
["Character", data]
`public_id` and `system_id` are either strings or `null`. `correctness`
is either `true` or `false`; `true` corresponds to the force-quirks flag
being false, and vice-versa.
When the self-closing flag is set, the `StartTag` array has `true` as
its fourth entry. When the flag is not set, the array has only three
entries for backwards compatibility.
All adjacent character tokens are coalesced into a single
`["Character", data]` token.
If `test.doubleEscaped` is present and `true`, then every string within
`test.output` must be further unescaped (as described above) before
comparing with the tokenizer's output.
xmlViolation tests
------------------
`tokenizer/xmlViolation.test` differs from the above in a couple of
ways:
- The name of the single member of the top-level JSON object is
"xmlViolationTests" instead of "tests".
- Each test's expected output assumes that implementation is applying
the tweaks given in the spec's "Coercing an HTML DOM into an
infoset" section.

View file

@ -0,0 +1,93 @@
{"tests": [
{"description":"PLAINTEXT content model flag",
"initialStates":["PLAINTEXT state"],
"lastStartTag":"plaintext",
"input":"<head>&body;",
"output":[["Character", "<head>&body;"]]},
{"description":"PLAINTEXT with seeming close tag",
"initialStates":["PLAINTEXT state"],
"lastStartTag":"plaintext",
"input":"</plaintext>&body;",
"output":[["Character", "</plaintext>&body;"]]},
{"description":"End tag closing RCDATA or RAWTEXT",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xmp>",
"output":[["Character", "foo"], ["EndTag", "xmp"]]},
{"description":"End tag closing RCDATA or RAWTEXT (case-insensitivity)",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xMp>",
"output":[["Character", "foo"], ["EndTag", "xmp"]]},
{"description":"End tag closing RCDATA or RAWTEXT (ending with space)",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xmp ",
"output":[["Character", "foo"]],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 10 }
]},
{"description":"End tag closing RCDATA or RAWTEXT (ending with EOF)",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xmp",
"output":[["Character", "foo</xmp"]]},
{"description":"End tag closing RCDATA or RAWTEXT (ending with slash)",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xmp/",
"output":[["Character", "foo"]],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 10 }
]},
{"description":"End tag not closing RCDATA or RAWTEXT (ending with left-angle-bracket)",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xmp<",
"output":[["Character", "foo</xmp<"]]},
{"description":"End tag with incorrect name in RCDATA or RAWTEXT",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"</foo>bar</xmp>",
"output":[["Character", "</foo>bar"], ["EndTag", "xmp"]]},
{"description":"Partial end tags leading straight into partial end tags",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"</xmp</xmp</xmp>",
"output":[["Character", "</xmp</xmp"], ["EndTag", "xmp"]]},
{"description":"End tag with incorrect name in RCDATA or RAWTEXT (starting like correct name)",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"</foo>bar</xmpaar>",
"output":[["Character", "</foo>bar</xmpaar>"]]},
{"description":"End tag closing RCDATA or RAWTEXT, switching back to PCDATA",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xmp></baz>",
"output":[["Character", "foo"], ["EndTag", "xmp"], ["EndTag", "baz"]]},
{"description":"RAWTEXT w/ something looking like an entity",
"initialStates":["RAWTEXT state"],
"lastStartTag":"xmp",
"input":"&foo;",
"output":[["Character", "&foo;"]]},
{"description":"RCDATA w/ an entity",
"initialStates":["RCDATA state"],
"lastStartTag":"textarea",
"input":"&lt;",
"output":[["Character", "<"]]}
]}

View file

@ -0,0 +1,330 @@
{
"tests": [
{
"description":"CR in bogus comment state",
"input":"<?\u000d",
"output":[["Comment", "?\u000a"]],
"errors":[
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
]
},
{
"description":"CRLF in bogus comment state",
"input":"<?\u000d\u000a",
"output":[["Comment", "?\u000a"]],
"errors":[
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
]
},
{
"description":"CRLFLF in bogus comment state",
"input":"<?\u000d\u000a\u000a",
"output":[["Comment", "?\u000a\u000a"]],
"errors":[
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
]
},
{
"description":"Raw NUL replacement",
"doubleEscaped":true,
"initialStates":["RCDATA state", "RAWTEXT state", "PLAINTEXT state", "Script data state"],
"input":"\\u0000",
"output":[["Character", "\\uFFFD"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 1 }
]
},
{
"description":"NUL in CDATA section",
"doubleEscaped":true,
"initialStates":["CDATA section state"],
"input":"\\u0000]]>",
"output":[["Character", "\\u0000"]]
},
{
"description":"NUL in script HTML comment",
"doubleEscaped":true,
"initialStates":["Script data state"],
"input":"<!--test\\u0000--><!--test-\\u0000--><!--test--\\u0000-->",
"output":[["Character", "<!--test\\uFFFD--><!--test-\\uFFFD--><!--test--\\uFFFD-->"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 9 },
{ "code": "unexpected-null-character", "line": 1, "col": 22 },
{ "code": "unexpected-null-character", "line": 1, "col": 36 }
]
},
{
"description":"NUL in script HTML comment - double escaped",
"doubleEscaped":true,
"initialStates":["Script data state"],
"input":"<!--<script>\\u0000--><!--<script>-\\u0000--><!--<script>--\\u0000-->",
"output":[["Character", "<!--<script>\\uFFFD--><!--<script>-\\uFFFD--><!--<script>--\\uFFFD-->"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 13 },
{ "code": "unexpected-null-character", "line": 1, "col": 30 },
{ "code": "unexpected-null-character", "line": 1, "col": 48 }
]
},
{
"description":"EOF in script HTML comment",
"initialStates":["Script data state"],
"input":"<!--test",
"output":[["Character", "<!--test"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 9 }
]
},
{
"description":"EOF in script HTML comment after dash",
"initialStates":["Script data state"],
"input":"<!--test-",
"output":[["Character", "<!--test-"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 10 }
]
},
{
"description":"EOF in script HTML comment after dash dash",
"initialStates":["Script data state"],
"input":"<!--test--",
"output":[["Character", "<!--test--"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 11 }
]
},
{
"description":"EOF in script HTML comment double escaped after dash",
"initialStates":["Script data state"],
"input":"<!--<script>-",
"output":[["Character", "<!--<script>-"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 14 }
]
},
{
"description":"EOF in script HTML comment double escaped after dash dash",
"initialStates":["Script data state"],
"input":"<!--<script>--",
"output":[["Character", "<!--<script>--"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 15 }
]
},
{
"description":"EOF in script HTML comment - double escaped",
"initialStates":["Script data state"],
"input":"<!--<script>",
"output":[["Character", "<!--<script>"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 13 }
]
},
{
"description":"Dash in script HTML comment",
"initialStates":["Script data state"],
"input":"<!-- - -->",
"output":[["Character", "<!-- - -->"]]
},
{
"description":"Dash less-than in script HTML comment",
"initialStates":["Script data state"],
"input":"<!-- -< -->",
"output":[["Character", "<!-- -< -->"]]
},
{
"description":"Dash at end of script HTML comment",
"initialStates":["Script data state"],
"input":"<!--test--->",
"output":[["Character", "<!--test--->"]]
},
{
"description":"</script> in script HTML comment",
"initialStates":["Script data state"],
"lastStartTag":"script",
"input":"<!-- </script> --></script>",
"output":[["Character", "<!-- "], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
},
{
"description":"</script> in script HTML comment - double escaped",
"initialStates":["Script data state"],
"lastStartTag":"script",
"input":"<!-- <script></script> --></script>",
"output":[["Character", "<!-- <script></script> -->"], ["EndTag", "script"]]
},
{
"description":"</script> in script HTML comment - double escaped with nested <script>",
"initialStates":["Script data state"],
"lastStartTag":"script",
"input":"<!-- <script><script></script></script> --></script>",
"output":[["Character", "<!-- <script><script></script>"], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
},
{
"description":"</script> in script HTML comment - double escaped with abrupt end",
"initialStates":["Script data state"],
"lastStartTag":"script",
"input":"<!-- <script>--></script> --></script>",
"output":[["Character", "<!-- <script>-->"], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
},
{
"description":"Incomplete start tag in script HTML comment double escaped",
"initialStates":["Script data state"],
"lastStartTag":"script",
"input":"<!--<scrip></script>-->",
"output":[["Character", "<!--<scrip>"], ["EndTag", "script"], ["Character", "-->"]]
},
{
"description":"Unclosed start tag in script HTML comment double escaped",
"initialStates":["Script data state"],
"lastStartTag":"script",
"input":"<!--<script</script>-->",
"output":[["Character", "<!--<script"], ["EndTag", "script"], ["Character", "-->"]]
},
{
"description":"Incomplete end tag in script HTML comment double escaped",
"initialStates":["Script data state"],
"lastStartTag":"script",
"input":"<!--<script></scrip>-->",
"output":[["Character", "<!--<script></scrip>-->"]]
},
{
"description":"Unclosed end tag in script HTML comment double escaped",
"initialStates":["Script data state"],
"lastStartTag":"script",
"input":"<!--<script></script-->",
"output":[["Character", "<!--<script></script-->"]]
},
{
"description":"leading U+FEFF must pass through",
"initialStates":["Data state", "RCDATA state", "RAWTEXT state", "Script data state"],
"doubleEscaped":true,
"input":"\\uFEFFfoo\\uFEFFbar",
"output":[["Character", "\\uFEFFfoo\\uFEFFbar"]]
},
{
"description":"Non BMP-charref in RCDATA",
"initialStates":["RCDATA state"],
"input":"&NotEqualTilde;",
"output":[["Character", "\u2242\u0338"]]
},
{
"description":"Bad charref in RCDATA",
"initialStates":["RCDATA state"],
"input":"&NotEqualTild;",
"output":[["Character", "&NotEqualTild;"]],
"errors":[
{ "code": "unknown-named-character-reference", "line": 1, "col": 14 }
]
},
{
"description":"lowercase endtags",
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":"</XMP>",
"output":[["EndTag","xmp"]]
},
{
"description":"bad endtag (space before name)",
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":"</ XMP>",
"output":[["Character","</ XMP>"]]
},
{
"description":"bad endtag (not matching last start tag)",
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":"</xm>",
"output":[["Character","</xm>"]]
},
{
"description":"bad endtag (without close bracket)",
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":"</xm ",
"output":[["Character","</xm "]]
},
{
"description":"bad endtag (trailing solidus)",
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":"</xm/",
"output":[["Character","</xm/"]]
},
{
"description":"Non BMP-charref in attribute",
"input":"<p id=\"&NotEqualTilde;\">",
"output":[["StartTag", "p", {"id":"\u2242\u0338"}]]
},
{
"description":"--!NUL in comment ",
"doubleEscaped":true,
"input":"<!----!\\u0000-->",
"output":[["Comment", "--!\\uFFFD"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 8 }
]
},
{
"description":"space EOF after doctype ",
"input":"<!DOCTYPE html ",
"output":[["DOCTYPE", "html", null, null , false]],
"errors":[
{ "code": "eof-in-doctype", "line": 1, "col": 16 }
]
},
{
"description":"CDATA in HTML content",
"input":"<![CDATA[foo]]>",
"output":[["Comment", "[CDATA[foo]]"]],
"errors":[
{ "code": "cdata-in-html-content", "line": 1, "col": 9 }
]
},
{
"description":"CDATA content",
"input":"foo&#32;]]>",
"initialStates":["CDATA section state"],
"output":[["Character", "foo&#32;"]]
},
{
"description":"CDATA followed by HTML content",
"input":"foo&#32;]]>&#32;",
"initialStates":["CDATA section state"],
"output":[["Character", "foo&#32; "]]
},
{
"description":"CDATA with extra bracket",
"input":"foo]]]>",
"initialStates":["CDATA section state"],
"output":[["Character", "foo]"]]
},
{
"description":"CDATA without end marker",
"input":"foo",
"initialStates":["CDATA section state"],
"output":[["Character", "foo"]],
"errors":[
{ "code": "eof-in-cdata", "line": 1, "col": 4 }
]
},
{
"description":"CDATA with single bracket ending",
"input":"foo]",
"initialStates":["CDATA section state"],
"output":[["Character", "foo]"]],
"errors":[
{ "code": "eof-in-cdata", "line": 1, "col": 5 }
]
},
{
"description":"CDATA with two brackets ending",
"input":"foo]]",
"initialStates":["CDATA section state"],
"output":[["Character", "foo]]"]],
"errors":[
{ "code": "eof-in-cdata", "line": 1, "col": 6 }
]
}
]
}

View file

@ -0,0 +1,542 @@
{"tests": [
{"description": "Undefined named entity in a double-quoted attribute value ending in semicolon and whose name starts with a known entity name.",
"input":"<h a=\"&noti;\">",
"output": [["StartTag", "h", {"a": "&noti;"}]]},
{"description": "Entity name requiring semicolon instead followed by the equals sign in a double-quoted attribute value.",
"input":"<h a=\"&lang=\">",
"output": [["StartTag", "h", {"a": "&lang="}]]},
{"description": "Valid entity name followed by the equals sign in a double-quoted attribute value.",
"input":"<h a=\"&not=\">",
"output": [["StartTag", "h", {"a": "&not="}]]},
{"description": "Undefined named entity in a single-quoted attribute value ending in semicolon and whose name starts with a known entity name.",
"input":"<h a='&noti;'>",
"output": [["StartTag", "h", {"a": "&noti;"}]]},
{"description": "Entity name requiring semicolon instead followed by the equals sign in a single-quoted attribute value.",
"input":"<h a='&lang='>",
"output": [["StartTag", "h", {"a": "&lang="}]]},
{"description": "Valid entity name followed by the equals sign in a single-quoted attribute value.",
"input":"<h a='&not='>",
"output": [["StartTag", "h", {"a": "&not="}]]},
{"description": "Undefined named entity in an unquoted attribute value ending in semicolon and whose name starts with a known entity name.",
"input":"<h a=&noti;>",
"output": [["StartTag", "h", {"a": "&noti;"}]]},
{"description": "Entity name requiring semicolon instead followed by the equals sign in an unquoted attribute value.",
"input":"<h a=&lang=>",
"output": [["StartTag", "h", {"a": "&lang="}]],
"errors":[
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 11 }
]},
{"description": "Valid entity name followed by the equals sign in an unquoted attribute value.",
"input":"<h a=&not=>",
"output": [["StartTag", "h", {"a": "&not="}]],
"errors":[
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 10 }
]},
{"description": "Ambiguous ampersand.",
"input":"&rrrraannddom;",
"output": [["Character", "&rrrraannddom;"]],
"errors":[
{ "code": "unknown-named-character-reference", "line": 1, "col": 14 }
]},
{"description": "Semicolonless named entity 'not' followed by 'i;' in body",
"input":"&noti;",
"output": [["Character", "\u00ACi;"]],
"errors":[
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
]},
{"description": "Very long undefined named entity in body",
"input":"&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;",
"output": [["Character", "&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;"]],
"errors":[
{ "code": "unknown-named-character-reference", "line": 1, "col": 950 }
]},
{"description": "CR as numeric entity",
"input":"&#013;",
"output": [["Character", "\r"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 7 }
]},
{"description": "CR as hexadecimal numeric entity",
"input":"&#x00D;",
"output": [["Character", "\r"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 EURO SIGN numeric entity.",
"input":"&#0128;",
"output": [["Character", "\u20AC"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
"input":"&#0129;",
"output": [["Character", "\u0081"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 SINGLE LOW-9 QUOTATION MARK numeric entity.",
"input":"&#0130;",
"output": [["Character", "\u201A"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN SMALL LETTER F WITH HOOK numeric entity.",
"input":"&#0131;",
"output": [["Character", "\u0192"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 DOUBLE LOW-9 QUOTATION MARK numeric entity.",
"input":"&#0132;",
"output": [["Character", "\u201E"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 HORIZONTAL ELLIPSIS numeric entity.",
"input":"&#0133;",
"output": [["Character", "\u2026"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 DAGGER numeric entity.",
"input":"&#0134;",
"output": [["Character", "\u2020"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 DOUBLE DAGGER numeric entity.",
"input":"&#0135;",
"output": [["Character", "\u2021"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 MODIFIER LETTER CIRCUMFLEX ACCENT numeric entity.",
"input":"&#0136;",
"output": [["Character", "\u02C6"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 PER MILLE SIGN numeric entity.",
"input":"&#0137;",
"output": [["Character", "\u2030"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN CAPITAL LETTER S WITH CARON numeric entity.",
"input":"&#0138;",
"output": [["Character", "\u0160"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 SINGLE LEFT-POINTING ANGLE QUOTATION MARK numeric entity.",
"input":"&#0139;",
"output": [["Character", "\u2039"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN CAPITAL LIGATURE OE numeric entity.",
"input":"&#0140;",
"output": [["Character", "\u0152"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
"input":"&#0141;",
"output": [["Character", "\u008D"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN CAPITAL LETTER Z WITH CARON numeric entity.",
"input":"&#0142;",
"output": [["Character", "\u017D"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
"input":"&#0143;",
"output": [["Character", "\u008F"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
"input":"&#0144;",
"output": [["Character", "\u0090"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LEFT SINGLE QUOTATION MARK numeric entity.",
"input":"&#0145;",
"output": [["Character", "\u2018"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 RIGHT SINGLE QUOTATION MARK numeric entity.",
"input":"&#0146;",
"output": [["Character", "\u2019"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LEFT DOUBLE QUOTATION MARK numeric entity.",
"input":"&#0147;",
"output": [["Character", "\u201C"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 RIGHT DOUBLE QUOTATION MARK numeric entity.",
"input":"&#0148;",
"output": [["Character", "\u201D"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 BULLET numeric entity.",
"input":"&#0149;",
"output": [["Character", "\u2022"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 EN DASH numeric entity.",
"input":"&#0150;",
"output": [["Character", "\u2013"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 EM DASH numeric entity.",
"input":"&#0151;",
"output": [["Character", "\u2014"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 SMALL TILDE numeric entity.",
"input":"&#0152;",
"output": [["Character", "\u02DC"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 TRADE MARK SIGN numeric entity.",
"input":"&#0153;",
"output": [["Character", "\u2122"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN SMALL LETTER S WITH CARON numeric entity.",
"input":"&#0154;",
"output": [["Character", "\u0161"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 SINGLE RIGHT-POINTING ANGLE QUOTATION MARK numeric entity.",
"input":"&#0155;",
"output": [["Character", "\u203A"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN SMALL LIGATURE OE numeric entity.",
"input":"&#0156;",
"output": [["Character", "\u0153"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
"input":"&#0157;",
"output": [["Character", "\u009D"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 EURO SIGN hexadecimal numeric entity.",
"input":"&#x080;",
"output": [["Character", "\u20AC"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
"input":"&#x081;",
"output": [["Character", "\u0081"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 SINGLE LOW-9 QUOTATION MARK hexadecimal numeric entity.",
"input":"&#x082;",
"output": [["Character", "\u201A"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN SMALL LETTER F WITH HOOK hexadecimal numeric entity.",
"input":"&#x083;",
"output": [["Character", "\u0192"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 DOUBLE LOW-9 QUOTATION MARK hexadecimal numeric entity.",
"input":"&#x084;",
"output": [["Character", "\u201E"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 HORIZONTAL ELLIPSIS hexadecimal numeric entity.",
"input":"&#x085;",
"output": [["Character", "\u2026"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 DAGGER hexadecimal numeric entity.",
"input":"&#x086;",
"output": [["Character", "\u2020"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 DOUBLE DAGGER hexadecimal numeric entity.",
"input":"&#x087;",
"output": [["Character", "\u2021"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 MODIFIER LETTER CIRCUMFLEX ACCENT hexadecimal numeric entity.",
"input":"&#x088;",
"output": [["Character", "\u02C6"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 PER MILLE SIGN hexadecimal numeric entity.",
"input":"&#x089;",
"output": [["Character", "\u2030"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN CAPITAL LETTER S WITH CARON hexadecimal numeric entity.",
"input":"&#x08A;",
"output": [["Character", "\u0160"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 SINGLE LEFT-POINTING ANGLE QUOTATION MARK hexadecimal numeric entity.",
"input":"&#x08B;",
"output": [["Character", "\u2039"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN CAPITAL LIGATURE OE hexadecimal numeric entity.",
"input":"&#x08C;",
"output": [["Character", "\u0152"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
"input":"&#x08D;",
"output": [["Character", "\u008D"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN CAPITAL LETTER Z WITH CARON hexadecimal numeric entity.",
"input":"&#x08E;",
"output": [["Character", "\u017D"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
"input":"&#x08F;",
"output": [["Character", "\u008F"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
"input":"&#x090;",
"output": [["Character", "\u0090"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LEFT SINGLE QUOTATION MARK hexadecimal numeric entity.",
"input":"&#x091;",
"output": [["Character", "\u2018"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 RIGHT SINGLE QUOTATION MARK hexadecimal numeric entity.",
"input":"&#x092;",
"output": [["Character", "\u2019"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LEFT DOUBLE QUOTATION MARK hexadecimal numeric entity.",
"input":"&#x093;",
"output": [["Character", "\u201C"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 RIGHT DOUBLE QUOTATION MARK hexadecimal numeric entity.",
"input":"&#x094;",
"output": [["Character", "\u201D"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 BULLET hexadecimal numeric entity.",
"input":"&#x095;",
"output": [["Character", "\u2022"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 EN DASH hexadecimal numeric entity.",
"input":"&#x096;",
"output": [["Character", "\u2013"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 EM DASH hexadecimal numeric entity.",
"input":"&#x097;",
"output": [["Character", "\u2014"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 SMALL TILDE hexadecimal numeric entity.",
"input":"&#x098;",
"output": [["Character", "\u02DC"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 TRADE MARK SIGN hexadecimal numeric entity.",
"input":"&#x099;",
"output": [["Character", "\u2122"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN SMALL LETTER S WITH CARON hexadecimal numeric entity.",
"input":"&#x09A;",
"output": [["Character", "\u0161"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 SINGLE RIGHT-POINTING ANGLE QUOTATION MARK hexadecimal numeric entity.",
"input":"&#x09B;",
"output": [["Character", "\u203A"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN SMALL LIGATURE OE hexadecimal numeric entity.",
"input":"&#x09C;",
"output": [["Character", "\u0153"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
"input":"&#x09D;",
"output": [["Character", "\u009D"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN SMALL LETTER Z WITH CARON hexadecimal numeric entity.",
"input":"&#x09E;",
"output": [["Character", "\u017E"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Windows-1252 LATIN CAPITAL LETTER Y WITH DIAERESIS hexadecimal numeric entity.",
"input":"&#x09F;",
"output": [["Character", "\u0178"]],
"errors":[
{ "code": "control-character-reference", "line": 1, "col": 8 }
]},
{"description": "Decimal numeric entity followed by hex character a.",
"input":"&#97a",
"output": [["Character", "aa"]],
"errors":[
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
]},
{"description": "Decimal numeric entity followed by hex character A.",
"input":"&#97A",
"output": [["Character", "aA"]],
"errors":[
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
]},
{"description": "Decimal numeric entity followed by hex character f.",
"input":"&#97f",
"output": [["Character", "af"]],
"errors":[
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
]},
{"description": "Decimal numeric entity followed by hex character A.",
"input":"&#97F",
"output": [["Character", "aF"]],
"errors":[
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
]}
]}

View file

@ -0,0 +1,36 @@
{"tests": [
{"description":"Commented close tag in RCDATA or RAWTEXT",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo<!--</xmp>--></xmp>",
"output":[["Character", "foo<!--"], ["EndTag", "xmp"], ["Character", "-->"], ["EndTag", "xmp"]]},
{"description":"Bogus comment in RCDATA or RAWTEXT",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo<!-->baz</xmp>",
"output":[["Character", "foo<!-->baz"], ["EndTag", "xmp"]]},
{"description":"End tag surrounded by bogus comment in RCDATA or RAWTEXT",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo<!--></xmp><!-->baz</xmp>",
"output":[["Character", "foo<!-->"], ["EndTag", "xmp"], ["Comment", ""], ["Character", "baz"], ["EndTag", "xmp"]],
"errors":[
{ "code": "abrupt-closing-of-empty-comment", "line": 1, "col": 19 }
]},
{"description":"Commented entities in RCDATA",
"initialStates":["RCDATA state"],
"lastStartTag":"xmp",
"input":" &amp; <!-- &amp; --> &amp; </xmp>",
"output":[["Character", " & <!-- & --> & "], ["EndTag", "xmp"]]},
{"description":"Incorrect comment ending sequences in RCDATA or RAWTEXT",
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo<!-- x --x>x-- >x--!>x--<></xmp>",
"output":[["Character", "foo<!-- x --x>x-- >x--!>x--<>"], ["EndTag", "xmp"]]}
]}

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,9 @@
{"tests": [
{"description":"<!---- >",
"input":"<!---- >",
"output":[["Comment","-- >"]],
"errors":[
{ "code": "eof-in-comment", "line": 1, "col": 9 }
]}
]}

View file

@ -0,0 +1,349 @@
{"tests": [
{"description":"Correct Doctype lowercase",
"input":"<!DOCTYPE html>",
"output":[["DOCTYPE", "html", null, null, true]]},
{"description":"Correct Doctype uppercase",
"input":"<!DOCTYPE HTML>",
"output":[["DOCTYPE", "html", null, null, true]]},
{"description":"Correct Doctype mixed case",
"input":"<!DOCTYPE HtMl>",
"output":[["DOCTYPE", "html", null, null, true]]},
{"description":"Correct Doctype case with EOF",
"input":"<!DOCTYPE HtMl",
"output":[["DOCTYPE", "html", null, null, false]],
"errors":[
{ "code": "eof-in-doctype", "line": 1, "col": 15 }
]},
{"description":"Truncated doctype start",
"input":"<!DOC>",
"output":[["Comment", "DOC"]],
"errors":[
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 }
]},
{"description":"Doctype in error",
"input":"<!DOCTYPE foo>",
"output":[["DOCTYPE", "foo", null, null, true]]},
{"description":"Single Start Tag",
"input":"<h>",
"output":[["StartTag", "h", {}]]},
{"description":"Empty end tag",
"input":"</>",
"output":[],
"errors":[
{ "code": "missing-end-tag-name", "line": 1, "col": 3 }
]},
{"description":"Empty start tag",
"input":"<>",
"output":[["Character", "<>"]],
"errors":[
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 2 }
]},
{"description":"Start Tag w/attribute",
"input":"<h a='b'>",
"output":[["StartTag", "h", {"a":"b"}]]},
{"description":"Start Tag w/attribute no quotes",
"input":"<h a=b>",
"output":[["StartTag", "h", {"a":"b"}]]},
{"description":"Start/End Tag",
"input":"<h></h>",
"output":[["StartTag", "h", {}], ["EndTag", "h"]]},
{"description":"Two unclosed start tags",
"input":"<p>One<p>Two",
"output":[["StartTag", "p", {}], ["Character", "One"], ["StartTag", "p", {}], ["Character", "Two"]]},
{"description":"End Tag w/attribute",
"input":"<h></h a='b'>",
"output":[["StartTag", "h", {}], ["EndTag", "h"]],
"errors":[
{ "code": "end-tag-with-attributes", "line": 1, "col": 13 }
]},
{"description":"Multiple atts",
"input":"<h a='b' c='d'>",
"output":[["StartTag", "h", {"a":"b", "c":"d"}]]},
{"description":"Multiple atts no space",
"input":"<h a='b'c='d'>",
"output":[["StartTag", "h", {"a":"b", "c":"d"}]],
"errors":[
{ "code": "missing-whitespace-between-attributes", "line": 1, "col": 9 }
]},
{"description":"Repeated attr",
"input":"<h a='b' a='d'>",
"output":[["StartTag", "h", {"a":"b"}]],
"errors":[
{ "code": "duplicate-attribute", "line": 1, "col": 11 }
]},
{"description":"Simple comment",
"input":"<!--comment-->",
"output":[["Comment", "comment"]]},
{"description":"Comment, Central dash no space",
"input":"<!----->",
"output":[["Comment", "-"]]},
{"description":"Comment, two central dashes",
"input":"<!-- --comment -->",
"output":[["Comment", " --comment "]]},
{"description":"Comment, central less-than bang",
"input":"<!--<!-->",
"output":[["Comment", "<!"]]},
{"description":"Unfinished comment",
"input":"<!--comment",
"output":[["Comment", "comment"]],
"errors":[
{ "code": "eof-in-comment", "line": 1, "col": 12 }
]},
{"description":"Unfinished comment after start of nested comment",
"input":"<!-- <!--",
"output":[["Comment", " <!"]],
"errors":[
{ "code": "eof-in-comment", "line": 1, "col": 10 }
]},
{"description":"Start of a comment",
"input":"<!-",
"output":[["Comment", "-"]],
"errors":[
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 }
]},
{"description":"Short comment",
"input":"<!-->",
"output":[["Comment", ""]],
"errors":[
{ "code": "abrupt-closing-of-empty-comment", "line": 1, "col": 5 }
]},
{"description":"Short comment two",
"input":"<!--->",
"output":[["Comment", ""]],
"errors":[
{ "code": "abrupt-closing-of-empty-comment", "line": 1, "col": 6 }
]},
{"description":"Short comment three",
"input":"<!---->",
"output":[["Comment", ""]]},
{"description":"< in comment",
"input":"<!-- <test-->",
"output":[["Comment", " <test"]]},
{"description":"<! in comment",
"input":"<!-- <!test-->",
"output":[["Comment", " <!test"]]},
{"description":"<!- in comment",
"input":"<!-- <!-test-->",
"output":[["Comment", " <!-test"]]},
{"description":"Nested comment",
"input":"<!-- <!--test-->",
"output":[["Comment", " <!--test"]],
"errors":[
{ "code": "nested-comment", "line": 1, "col": 10 }
]},
{"description":"Nested comment with extra <",
"input":"<!-- <<!--test-->",
"output":[["Comment", " <<!--test"]],
"errors":[
{ "code": "nested-comment", "line": 1, "col": 11 }
]},
{"description":"< in script data",
"initialStates":["Script data state"],
"input":"<test-->",
"output":[["Character", "<test-->"]]},
{"description":"<! in script data",
"initialStates":["Script data state"],
"input":"<!test-->",
"output":[["Character", "<!test-->"]]},
{"description":"<!- in script data",
"initialStates":["Script data state"],
"input":"<!-test-->",
"output":[["Character", "<!-test-->"]]},
{"description":"Escaped script data",
"initialStates":["Script data state"],
"input":"<!--test-->",
"output":[["Character", "<!--test-->"]]},
{"description":"< in script HTML comment",
"initialStates":["Script data state"],
"input":"<!-- < test -->",
"output":[["Character", "<!-- < test -->"]]},
{"description":"</ in script HTML comment",
"initialStates":["Script data state"],
"input":"<!-- </ test -->",
"output":[["Character", "<!-- </ test -->"]]},
{"description":"Start tag in script HTML comment",
"initialStates":["Script data state"],
"input":"<!-- <test> -->",
"output":[["Character", "<!-- <test> -->"]]},
{"description":"End tag in script HTML comment",
"initialStates":["Script data state"],
"input":"<!-- </test> -->",
"output":[["Character", "<!-- </test> -->"]]},
{"description":"- in script HTML comment double escaped",
"initialStates":["Script data state"],
"input":"<!--<script>-</script>-->",
"output":[["Character", "<!--<script>-</script>-->"]]},
{"description":"-- in script HTML comment double escaped",
"initialStates":["Script data state"],
"input":"<!--<script>--</script>-->",
"output":[["Character", "<!--<script>--</script>-->"]]},
{"description":"--- in script HTML comment double escaped",
"initialStates":["Script data state"],
"input":"<!--<script>---</script>-->",
"output":[["Character", "<!--<script>---</script>-->"]]},
{"description":"- spaced in script HTML comment double escaped",
"initialStates":["Script data state"],
"input":"<!--<script> - </script>-->",
"output":[["Character", "<!--<script> - </script>-->"]]},
{"description":"-- spaced in script HTML comment double escaped",
"initialStates":["Script data state"],
"input":"<!--<script> -- </script>-->",
"output":[["Character", "<!--<script> -- </script>-->"]]},
{"description":"Ampersand EOF",
"input":"&",
"output":[["Character", "&"]]},
{"description":"Ampersand ampersand EOF",
"input":"&&",
"output":[["Character", "&&"]]},
{"description":"Ampersand space EOF",
"input":"& ",
"output":[["Character", "& "]]},
{"description":"Unfinished entity",
"input":"&f",
"output":[["Character", "&f"]]},
{"description":"Ampersand, number sign",
"input":"&#",
"output":[["Character", "&#"]],
"errors":[
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 3 }
]},
{"description":"Unfinished numeric entity",
"input":"&#x",
"output":[["Character", "&#x"]],
"errors":[
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 4 }
]},
{"description":"Entity with trailing semicolon (1)",
"input":"I'm &not;it",
"output":[["Character","I'm \u00ACit"]]},
{"description":"Entity with trailing semicolon (2)",
"input":"I'm &notin;",
"output":[["Character","I'm \u2209"]]},
{"description":"Entity without trailing semicolon (1)",
"input":"I'm &notit",
"output":[["Character","I'm \u00ACit"]],
"errors": [
{"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 9 }
]},
{"description":"Entity without trailing semicolon (2)",
"input":"I'm &notin",
"output":[["Character","I'm \u00ACin"]],
"errors": [
{"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 9 }
]},
{"description":"Partial entity match at end of file",
"input":"I'm &no",
"output":[["Character","I'm &no"]]},
{"description":"Non-ASCII character reference name",
"input":"&\u00AC;",
"output":[["Character", "&\u00AC;"]]},
{"description":"ASCII decimal entity",
"input":"&#0036;",
"output":[["Character","$"]]},
{"description":"ASCII hexadecimal entity",
"input":"&#x3f;",
"output":[["Character","?"]]},
{"description":"Hexadecimal entity in attribute",
"input":"<h a='&#x3f;'></h>",
"output":[["StartTag", "h", {"a":"?"}], ["EndTag", "h"]]},
{"description":"Entity in attribute without semicolon ending in x",
"input":"<h a='&notx'>",
"output":[["StartTag", "h", {"a":"&notx"}]]},
{"description":"Entity in attribute without semicolon ending in 1",
"input":"<h a='&not1'>",
"output":[["StartTag", "h", {"a":"&not1"}]]},
{"description":"Entity in attribute without semicolon ending in i",
"input":"<h a='&noti'>",
"output":[["StartTag", "h", {"a":"&noti"}]]},
{"description":"Entity in attribute without semicolon",
"input":"<h a='&COPY'>",
"output":[["StartTag", "h", {"a":"\u00A9"}]],
"errors": [
{"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 12 }
]},
{"description":"Unquoted attribute ending in ampersand",
"input":"<s o=& t>",
"output":[["StartTag","s",{"o":"&","t":""}]]},
{"description":"Unquoted attribute at end of tag with final character of &, with tag followed by characters",
"input":"<a a=a&>foo",
"output":[["StartTag", "a", {"a":"a&"}], ["Character", "foo"]]},
{"description":"plaintext element",
"input":"<plaintext>foobar",
"output":[["StartTag","plaintext",{}], ["Character","foobar"]]},
{"description":"Open angled bracket in unquoted attribute value state",
"input":"<a a=f<>",
"output":[["StartTag", "a", {"a":"f<"}]],
"errors":[
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 7 }
]}
]}

View file

@ -0,0 +1,275 @@
{"tests": [
{"description":"DOCTYPE without name",
"input":"<!DOCTYPE>",
"output":[["DOCTYPE", null, null, null, false]],
"errors":[
{ "code": "missing-doctype-name", "line": 1, "col": 10 }
]},
{"description":"DOCTYPE without space before name",
"input":"<!DOCTYPEhtml>",
"output":[["DOCTYPE", "html", null, null, true]],
"errors":[
{ "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 }
]},
{"description":"Incorrect DOCTYPE without a space before name",
"input":"<!DOCTYPEfoo>",
"output":[["DOCTYPE", "foo", null, null, true]],
"errors":[
{ "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 }
]},
{"description":"DOCTYPE with publicId",
"input":"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML Transitional 4.01//EN\">",
"output":[["DOCTYPE", "html", "-//W3C//DTD HTML Transitional 4.01//EN", null, true]]},
{"description":"DOCTYPE with EOF after PUBLIC",
"input":"<!DOCTYPE html PUBLIC",
"output":[["DOCTYPE", "html", null, null, false]],
"errors": [
{ "code": "eof-in-doctype", "col": 22, "line": 1 }
]},
{"description":"DOCTYPE with EOF after PUBLIC '",
"input":"<!DOCTYPE html PUBLIC '",
"output":[["DOCTYPE", "html", "", null, false]],
"errors": [
{ "code": "eof-in-doctype", "col": 24, "line": 1 }
]},
{"description":"DOCTYPE with EOF after PUBLIC 'x",
"input":"<!DOCTYPE html PUBLIC 'x",
"output":[["DOCTYPE", "html", "x", null, false]],
"errors": [
{ "code": "eof-in-doctype", "col": 25, "line": 1 }
]},
{"description":"DOCTYPE with systemId",
"input":"<!DOCTYPE html SYSTEM \"-//W3C//DTD HTML Transitional 4.01//EN\">",
"output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
{"description":"DOCTYPE with single-quoted systemId",
"input":"<!DOCTYPE html SYSTEM '-//W3C//DTD HTML Transitional 4.01//EN'>",
"output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
{"description":"DOCTYPE with publicId and systemId",
"input":"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML Transitional 4.01//EN\" \"-//W3C//DTD HTML Transitional 4.01//EN\">",
"output":[["DOCTYPE", "html", "-//W3C//DTD HTML Transitional 4.01//EN", "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
{"description":"DOCTYPE with > in double-quoted publicId",
"input":"<!DOCTYPE html PUBLIC \">x",
"output":[["DOCTYPE", "html", "", null, false], ["Character", "x"]],
"errors": [
{ "code": "abrupt-doctype-public-identifier", "col": 24, "line": 1 }
]},
{"description":"DOCTYPE with > in single-quoted publicId",
"input":"<!DOCTYPE html PUBLIC '>x",
"output":[["DOCTYPE", "html", "", null, false], ["Character", "x"]],
"errors": [
{ "code": "abrupt-doctype-public-identifier", "col": 24, "line": 1 }
]},
{"description":"DOCTYPE with > in double-quoted systemId",
"input":"<!DOCTYPE html PUBLIC \"foo\" \">x",
"output":[["DOCTYPE", "html", "foo", "", false], ["Character", "x"]],
"errors": [
{ "code": "abrupt-doctype-system-identifier", "col": 30, "line": 1 }
]},
{"description":"DOCTYPE with > in single-quoted systemId",
"input":"<!DOCTYPE html PUBLIC 'foo' '>x",
"output":[["DOCTYPE", "html", "foo", "", false], ["Character", "x"]],
"errors": [
{ "code": "abrupt-doctype-system-identifier", "col": 30, "line": 1 }
]},
{"description":"Incomplete doctype",
"input":"<!DOCTYPE html ",
"output":[["DOCTYPE", "html", null, null, false]],
"errors":[
{ "code": "eof-in-doctype", "line": 1, "col": 16 }
]},
{"description":"Numeric entity representing the NUL character",
"input":"&#0000;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "null-character-reference", "line": 1, "col": 8 }
]},
{"description":"Hexadecimal entity representing the NUL character",
"input":"&#x0000;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "null-character-reference", "line": 1, "col": 9 }
]},
{"description":"Numeric entity representing a codepoint after 1114111 (U+10FFFF)",
"input":"&#2225222;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 11 }
]},
{"description":"Hexadecimal entity representing a codepoint after 1114111 (U+10FFFF)",
"input":"&#x1010FFFF;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 13 }
]},
{"description":"Hexadecimal entity pair representing a surrogate pair",
"input":"&#xD869;&#xDED6;",
"output":[["Character", "\uFFFD\uFFFD"]],
"errors":[
{ "code": "surrogate-character-reference", "line": 1, "col": 9 },
{ "code": "surrogate-character-reference", "line": 1, "col": 17 }
]},
{"description":"Hexadecimal entity with mixed uppercase and lowercase",
"input":"&#xaBcD;",
"output":[["Character", "\uABCD"]]},
{"description":"Entity without a name",
"input":"&;",
"output":[["Character", "&;"]]},
{"description":"Unescaped ampersand in attribute value",
"input":"<h a='&'>",
"output":[["StartTag", "h", { "a":"&" }]]},
{"description":"StartTag containing <",
"input":"<a<b>",
"output":[["StartTag", "a<b", { }]]},
{"description":"Non-void element containing trailing /",
"input":"<h/>",
"output":[["StartTag","h",{},true]]},
{"description":"Void element with permitted slash",
"input":"<br/>",
"output":[["StartTag","br",{},true]]},
{"description":"Void element with permitted slash (with attribute)",
"input":"<br foo='bar'/>",
"output":[["StartTag","br",{"foo":"bar"},true]]},
{"description":"StartTag containing /",
"input":"<h/a='b'>",
"output":[["StartTag", "h", { "a":"b" }]],
"errors":[
{ "code": "unexpected-solidus-in-tag", "line": 1, "col": 4 }
]},
{"description":"Double-quoted attribute value",
"input":"<h a=\"b\">",
"output":[["StartTag", "h", { "a":"b" }]]},
{"description":"Unescaped </",
"input":"</",
"output":[["Character", "</"]],
"errors":[
{ "code": "eof-before-tag-name", "line": 1, "col": 3 }
]},
{"description":"Illegal end tag name",
"input":"</1>",
"output":[["Comment", "1"]],
"errors":[
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 3 }
]},
{"description":"Simili processing instruction",
"input":"<?namespace>",
"output":[["Comment", "?namespace"]],
"errors":[
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
]},
{"description":"A bogus comment stops at >, even if preceeded by two dashes",
"input":"<?foo-->",
"output":[["Comment", "?foo--"]],
"errors":[
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
]},
{"description":"Unescaped <",
"input":"foo < bar",
"output":[["Character", "foo < bar"]],
"errors":[
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 6 }
]},
{"description":"Null Byte Replacement",
"input":"\u0000",
"output":[["Character", "\u0000"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 1 }
]},
{"description":"Comment with dash",
"input":"<!---x",
"output":[["Comment", "-x"]],
"errors":[
{ "code": "eof-in-comment", "line": 1, "col": 7 }
]},
{"description":"Entity + newline",
"input":"\nx\n&gt;\n",
"output":[["Character","\nx\n>\n"]]},
{"description":"Start tag with no attributes but space before the greater-than sign",
"input":"<h >",
"output":[["StartTag", "h", {}]]},
{"description":"Empty attribute followed by uppercase attribute",
"input":"<h a B=''>",
"output":[["StartTag", "h", {"a":"", "b":""}]]},
{"description":"Double-quote after attribute name",
"input":"<h a \">",
"output":[["StartTag", "h", {"a":"", "\"":""}]],
"errors":[
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 6 }
]},
{"description":"Single-quote after attribute name",
"input":"<h a '>",
"output":[["StartTag", "h", {"a":"", "'":""}]],
"errors":[
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 6 }
]},
{"description":"Empty end tag with following characters",
"input":"a</>bc",
"output":[["Character", "abc"]],
"errors":[
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
]},
{"description":"Empty end tag with following tag",
"input":"a</><b>c",
"output":[["Character", "a"], ["StartTag", "b", {}], ["Character", "c"]],
"errors":[
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
]},
{"description":"Empty end tag with following comment",
"input":"a</><!--b-->c",
"output":[["Character", "a"], ["Comment", "b"], ["Character", "c"]],
"errors":[
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
]},
{"description":"Empty end tag with following end tag",
"input":"a</></b>c",
"output":[["Character", "a"], ["EndTag", "b"], ["Character", "c"]],
"errors":[
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
]}
]}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,532 @@
{"tests": [
{"description":"< in attribute name",
"input":"<z/0 <>",
"output":[["StartTag", "z", {"0": "", "<": ""}]],
"errors":[
{ "code": "unexpected-solidus-in-tag", "line": 1, "col": 4 },
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 7 }
]},
{"description":"< in unquoted attribute value",
"input":"<z x=<>",
"output":[["StartTag", "z", {"x": "<"}]],
"errors":[
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 6 }
]},
{"description":"= in unquoted attribute value",
"input":"<z z=z=z>",
"output":[["StartTag", "z", {"z": "z=z"}]],
"errors":[
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 7 }
]},
{"description":"= attribute",
"input":"<z =>",
"output":[["StartTag", "z", {"=": ""}]],
"errors":[
{ "code": "unexpected-equals-sign-before-attribute-name", "line": 1, "col": 4 }
]},
{"description":"== attribute",
"input":"<z ==>",
"output":[["StartTag", "z", {"=": ""}]],
"errors":[
{ "code": "unexpected-equals-sign-before-attribute-name", "line": 1, "col": 4 },
{ "code": "missing-attribute-value", "line": 1, "col": 6 }
]},
{"description":"=== attribute",
"input":"<z ===>",
"output":[["StartTag", "z", {"=": "="}]],
"errors":[
{ "code": "unexpected-equals-sign-before-attribute-name", "line": 1, "col": 4 },
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 6 }
]},
{"description":"==== attribute",
"input":"<z ====>",
"output":[["StartTag", "z", {"=": "=="}]],
"errors":[
{ "code": "unexpected-equals-sign-before-attribute-name", "line": 1, "col": 4 },
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 6 },
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 7 }
]},
{"description":"\" after ampersand in double-quoted attribute value",
"input":"<z z=\"&\">",
"output":[["StartTag", "z", {"z": "&"}]]},
{"description":"' after ampersand in double-quoted attribute value",
"input":"<z z=\"&'\">",
"output":[["StartTag", "z", {"z": "&'"}]]},
{"description":"' after ampersand in single-quoted attribute value",
"input":"<z z='&'>",
"output":[["StartTag", "z", {"z": "&"}]]},
{"description":"\" after ampersand in single-quoted attribute value",
"input":"<z z='&\"'>",
"output":[["StartTag", "z", {"z": "&\""}]]},
{"description":"Text after bogus character reference",
"input":"<z z='&xlink_xmlns;'>bar<z>",
"output":[["StartTag","z",{"z":"&xlink_xmlns;"}],["Character","bar"],["StartTag","z",{}]]},
{"description":"Text after hex character reference",
"input":"<z z='&#x0020; foo'>bar<z>",
"output":[["StartTag","z",{"z":" foo"}],["Character","bar"],["StartTag","z",{}]]},
{"description":"Attribute name starting with \"",
"input":"<foo \"='bar'>",
"output":[["StartTag", "foo", {"\"": "bar"}]],
"errors":[
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 6 }
]},
{"description":"Attribute name starting with '",
"input":"<foo '='bar'>",
"output":[["StartTag", "foo", {"'": "bar"}]],
"errors":[
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 6 }
]},
{"description":"Attribute name containing \"",
"input":"<foo a\"b='bar'>",
"output":[["StartTag", "foo", {"a\"b": "bar"}]],
"errors":[
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 7 }
]},
{"description":"Attribute name containing '",
"input":"<foo a'b='bar'>",
"output":[["StartTag", "foo", {"a'b": "bar"}]],
"errors":[
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 7 }
]},
{"description":"Unquoted attribute value containing '",
"input":"<foo a=b'c>",
"output":[["StartTag", "foo", {"a": "b'c"}]],
"errors":[
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 9 }
]},
{"description":"Unquoted attribute value containing \"",
"input":"<foo a=b\"c>",
"output":[["StartTag", "foo", {"a": "b\"c"}]],
"errors":[
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 9 }
]},
{"description":"Double-quoted attribute value not followed by whitespace",
"input":"<foo a=\"b\"c>",
"output":[["StartTag", "foo", {"a": "b", "c": ""}]],
"errors":[
{ "code": "missing-whitespace-between-attributes", "line": 1, "col": 11 }
]},
{"description":"Single-quoted attribute value not followed by whitespace",
"input":"<foo a='b'c>",
"output":[["StartTag", "foo", {"a": "b", "c": ""}]],
"errors":[
{ "code": "missing-whitespace-between-attributes", "line": 1, "col": 11 }
]},
{"description":"Quoted attribute followed by permitted /",
"input":"<br a='b'/>",
"output":[["StartTag","br",{"a":"b"},true]]},
{"description":"Quoted attribute followed by non-permitted /",
"input":"<bar a='b'/>",
"output":[["StartTag","bar",{"a":"b"},true]]},
{"description":"CR EOF after doctype name",
"input":"<!doctype html \r",
"output":[["DOCTYPE", "html", null, null, false]],
"errors":[
{ "code": "eof-in-doctype", "line": 2, "col": 1 }
]},
{"description":"CR EOF in tag name",
"input":"<z\r",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 2, "col": 1 }
]},
{"description":"Slash EOF in tag name",
"input":"<z/",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 4 }
]},
{"description":"Zero hex numeric entity",
"input":"&#x0",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 },
{ "code": "null-character-reference", "line": 1, "col": 5 }
]},
{"description":"Zero decimal numeric entity",
"input":"&#0",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 4 },
{ "code": "null-character-reference", "line": 1, "col": 4 }
]},
{"description":"Zero-prefixed hex numeric entity",
"input":"&#x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000041;",
"output":[["Character", "A"]]},
{"description":"Zero-prefixed decimal numeric entity",
"input":"&#000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000065;",
"output":[["Character", "A"]]},
{"description":"Empty hex numeric entities",
"input":"&#x &#X ",
"output":[["Character", "&#x &#X "]],
"errors":[
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 4 },
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 8 }
]},
{"description":"Invalid digit in hex numeric entity",
"input":"&#xZ",
"output":[["Character", "&#xZ"]],
"errors":[
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 4 }
]},
{"description":"Empty decimal numeric entities",
"input":"&# &#; ",
"output":[["Character", "&# &#; "]],
"errors":[
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 3 },
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 6 }
]},
{"description":"Invalid digit in decimal numeric entity",
"input":"&#A",
"output":[["Character", "&#A"]],
"errors":[
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 3 }
]},
{"description":"Non-BMP numeric entity",
"input":"&#x10000;",
"output":[["Character", "\uD800\uDC00"]]},
{"description":"Maximum non-BMP numeric entity",
"input":"&#X10FFFF;",
"output":[["Character", "\uDBFF\uDFFF"]],
"errors":[
{ "code": "noncharacter-character-reference", "line": 1, "col": 11 }
]},
{"description":"Above maximum numeric entity",
"input":"&#x110000;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 11 }
]},
{"description":"32-bit hex numeric entity",
"input":"&#x80000041;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 13 }
]},
{"description":"33-bit hex numeric entity",
"input":"&#x100000041;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 14 }
]},
{"description":"33-bit decimal numeric entity",
"input":"&#4294967361;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 14 }
]},
{"description":"65-bit hex numeric entity",
"input":"&#x10000000000000041;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 22 }
]},
{"description":"65-bit decimal numeric entity",
"input":"&#18446744073709551681;",
"output":[["Character", "\uFFFD"]],
"errors":[
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 24 }
]},
{"description":"Surrogate code point edge cases",
"input":"&#xD7FF;&#xD800;&#xD801;&#xDFFE;&#xDFFF;&#xE000;",
"output":[["Character", "\uD7FF\uFFFD\uFFFD\uFFFD\uFFFD\uE000"]],
"errors":[
{ "code": "surrogate-character-reference", "line": 1, "col": 17 },
{ "code": "surrogate-character-reference", "line": 1, "col": 25 },
{ "code": "surrogate-character-reference", "line": 1, "col": 33 },
{ "code": "surrogate-character-reference", "line": 1, "col": 41 }
]},
{"description":"Uppercase start tag name",
"input":"<X>",
"output":[["StartTag", "x", {}]]},
{"description":"Uppercase end tag name",
"input":"</X>",
"output":[["EndTag", "x"]]},
{"description":"Uppercase attribute name",
"input":"<x X>",
"output":[["StartTag", "x", { "x":"" }]]},
{"description":"Tag/attribute name case edge values",
"input":"<x@AZ[`az{ @AZ[`az{>",
"output":[["StartTag", "x@az[`az{", { "@az[`az{":"" }]]},
{"description":"Duplicate different-case attributes",
"input":"<x x=1 x=2 X=3>",
"output":[["StartTag", "x", { "x":"1" }]],
"errors":[
{ "code": "duplicate-attribute", "line": 1, "col": 9 },
{ "code": "duplicate-attribute", "line": 1, "col": 13 }
]},
{"description":"Uppercase close tag attributes",
"input":"</x X>",
"output":[["EndTag", "x"]],
"errors":[
{ "code": "end-tag-with-attributes", "line": 1, "col": 6 }
]},
{"description":"Duplicate close tag attributes",
"input":"</x x x>",
"output":[["EndTag", "x"]],
"errors":[
{ "code": "duplicate-attribute", "line": 1, "col": 8 },
{ "code": "end-tag-with-attributes", "line": 1, "col": 8 }
]},
{"description":"Permitted slash",
"input":"<br/>",
"output":[["StartTag","br",{},true]]},
{"description":"Non-permitted slash",
"input":"<xr/>",
"output":[["StartTag","xr",{},true]]},
{"description":"Permitted slash but in close tag",
"input":"</br/>",
"output":[["EndTag", "br"]],
"errors":[
{ "code": "end-tag-with-trailing-solidus", "line": 1, "col": 6 }
]},
{"description":"Doctype public case-sensitivity (1)",
"input":"<!DoCtYpE HtMl PuBlIc \"AbC\" \"XyZ\">",
"output":[["DOCTYPE", "html", "AbC", "XyZ", true]]},
{"description":"Doctype public case-sensitivity (2)",
"input":"<!dOcTyPe hTmL pUbLiC \"aBc\" \"xYz\">",
"output":[["DOCTYPE", "html", "aBc", "xYz", true]]},
{"description":"Doctype system case-sensitivity (1)",
"input":"<!DoCtYpE HtMl SyStEm \"XyZ\">",
"output":[["DOCTYPE", "html", null, "XyZ", true]]},
{"description":"Doctype system case-sensitivity (2)",
"input":"<!dOcTyPe hTmL sYsTeM \"xYz\">",
"output":[["DOCTYPE", "html", null, "xYz", true]]},
{"description":"U+0000 in lookahead region after non-matching character",
"input":"<!doc>\u0000",
"output":[["Comment", "doc"], ["Character", "\u0000"]],
"errors":[
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
{ "code": "unexpected-null-character", "line": 1, "col": 7 }
]},
{"description":"U+0000 in lookahead region",
"input":"<!doc\u0000",
"output":[["Comment", "doc\uFFFD"]],
"errors":[
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
{ "code": "unexpected-null-character", "line": 1, "col": 6 }
]},
{"description":"U+0080 in lookahead region",
"input":"<!doc\u0080",
"output":[["Comment", "doc\u0080"]],
"errors":[
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
{ "code": "control-character-in-input-stream", "line": 1, "col": 6 }
]},
{"description":"U+FDD1 in lookahead region",
"input":"<!doc\uFDD1",
"output":[["Comment", "doc\uFDD1"]],
"errors":[
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
{ "code": "noncharacter-in-input-stream", "line": 1, "col": 6 }
]},
{"description":"U+1FFFF in lookahead region",
"input":"<!doc\uD83F\uDFFF",
"output":[["Comment", "doc\uD83F\uDFFF"]],
"errors":[
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
{ "code": "noncharacter-in-input-stream", "line": 1, "col": 6 }
]},
{"description":"CR followed by non-LF",
"input":"\r?",
"output":[["Character", "\n?"]]},
{"description":"CR at EOF",
"input":"\r",
"output":[["Character", "\n"]]},
{"description":"LF at EOF",
"input":"\n",
"output":[["Character", "\n"]]},
{"description":"CR LF",
"input":"\r\n",
"output":[["Character", "\n"]]},
{"description":"CR CR",
"input":"\r\r",
"output":[["Character", "\n\n"]]},
{"description":"LF LF",
"input":"\n\n",
"output":[["Character", "\n\n"]]},
{"description":"LF CR",
"input":"\n\r",
"output":[["Character", "\n\n"]]},
{"description":"text CR CR CR text",
"input":"text\r\r\rtext",
"output":[["Character", "text\n\n\ntext"]]},
{"description":"Doctype publik",
"input":"<!DOCTYPE html PUBLIK \"AbC\" \"XyZ\">",
"output":[["DOCTYPE", "html", null, null, false]],
"errors":[
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
]},
{"description":"Doctype publi",
"input":"<!DOCTYPE html PUBLI",
"output":[["DOCTYPE", "html", null, null, false]],
"errors":[
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
]},
{"description":"Doctype sistem",
"input":"<!DOCTYPE html SISTEM \"AbC\">",
"output":[["DOCTYPE", "html", null, null, false]],
"errors":[
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
]},
{"description":"Doctype sys",
"input":"<!DOCTYPE html SYS",
"output":[["DOCTYPE", "html", null, null, false]],
"errors":[
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
]},
{"description":"Doctype html x>text",
"input":"<!DOCTYPE html x>text",
"output":[["DOCTYPE", "html", null, null, false], ["Character", "text"]],
"errors":[
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
]},
{"description":"Grave accent in unquoted attribute",
"input":"<a a=aa`>",
"output":[["StartTag", "a", {"a":"aa`"}]],
"errors":[
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 8 }
]},
{"description":"EOF in tag name state ",
"input":"<a",
"output":[],
"errors": [
{ "code": "eof-in-tag", "line": 1, "col": 3 }
]},
{"description":"EOF in before attribute name state",
"input":"<a ",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 4 }
]},
{"description":"EOF in attribute name state",
"input":"<a a",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 5 }
]},
{"description":"EOF in after attribute name state",
"input":"<a a ",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 6 }
]},
{"description":"EOF in before attribute value state",
"input":"<a a =",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 7 }
]},
{"description":"EOF in attribute value (double quoted) state",
"input":"<a a =\"a",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 9 }
]},
{"description":"EOF in attribute value (single quoted) state",
"input":"<a a ='a",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 9 }
]},
{"description":"EOF in attribute value (unquoted) state",
"input":"<a a =a",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 8 }
]},
{"description":"EOF in after attribute value state",
"input":"<a a ='a'",
"output":[],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 10 }
]}
]}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,41 @@
{"tests" : [
{"description": "Invalid Unicode character U+DFFF",
"doubleEscaped":true,
"input": "\\uDFFF",
"output":[["Character", "\\uDFFF"]],
"errors":[
{ "code": "surrogate-in-input-stream", "line": 1, "col": 1 }
]},
{"description": "Invalid Unicode character U+D800",
"doubleEscaped":true,
"input": "\\uD800",
"output":[["Character", "\\uD800"]],
"errors":[
{ "code": "surrogate-in-input-stream", "line": 1, "col": 1 }
]},
{"description": "Invalid Unicode character U+DFFF with valid preceding character",
"doubleEscaped":true,
"input": "a\\uDFFF",
"output":[["Character", "a\\uDFFF"]],
"errors":[
{ "code": "surrogate-in-input-stream", "line": 1, "col": 2 }
]},
{"description": "Invalid Unicode character U+D800 with valid following character",
"doubleEscaped":true,
"input": "\\uD800a",
"output":[["Character", "\\uD800a"]],
"errors":[
{ "code": "surrogate-in-input-stream", "line": 1, "col": 1 }
]},
{"description":"CR followed by U+0000",
"input":"\r\u0000",
"output":[["Character", "\n\u0000"]],
"errors":[
{ "code": "unexpected-null-character", "line": 2, "col": 1 }
]}
]
}

View file

@ -0,0 +1,20 @@
{"xmlViolationTests": [
{"description":"Non-XML character",
"input":"a\uFFFFb",
"output":[["Character","a\uFFFDb"]]},
{"description":"Non-XML space",
"input":"a\u000Cb",
"output":[["Character","a b"]]},
{"description":"Double hyphen in comment",
"input":"<!-- foo -- bar -->",
"output":[["Comment"," foo - - bar "]]},
{"description":"FF between attributes",
"input":"<a b=''\u000Cc=''>",
"output":[["StartTag","a",{"b":"","c":""}]]}
]}