mirror of
https://github.com/Tautulli/Tautulli.git
synced 2025-07-07 05:31:15 -07:00
Update html5lib-1.1
This commit is contained in:
parent
3a116486e7
commit
586fd15464
142 changed files with 90234 additions and 2393 deletions
107
lib/html5lib/tests/testdata/tokenizer/README.md
vendored
Normal file
107
lib/html5lib/tests/testdata/tokenizer/README.md
vendored
Normal file
|
@ -0,0 +1,107 @@
|
|||
Tokenizer tests
|
||||
===============
|
||||
|
||||
The test format is [JSON](http://www.json.org/). This has the advantage
|
||||
that the syntax allows backward-compatible extensions to the tests and
|
||||
the disadvantage that it is relatively verbose.
|
||||
|
||||
Basic Structure
|
||||
---------------
|
||||
|
||||
{"tests": [
|
||||
{"description": "Test description",
|
||||
"input": "input_string",
|
||||
"output": [expected_output_tokens],
|
||||
"initialStates": [initial_states],
|
||||
"lastStartTag": last_start_tag,
|
||||
"errors": [parse_errors]
|
||||
}
|
||||
]}
|
||||
|
||||
Multiple tests per file are allowed simply by adding more objects to the
|
||||
"tests" list.
|
||||
|
||||
Each parse error is an object that contains error `code` and one-based
|
||||
error location indices: `line` and `col`.
|
||||
|
||||
`description`, `input` and `output` are always present. The other values
|
||||
are optional.
|
||||
|
||||
### Test set-up
|
||||
|
||||
`test.input` is a string containing the characters to pass to the
|
||||
tokenizer. Specifically, it represents the characters of the **input
|
||||
stream**, and so implementations are expected to perform the processing
|
||||
described in the spec's **Preprocessing the input stream** section
|
||||
before feeding the result to the tokenizer.
|
||||
|
||||
If `test.doubleEscaped` is present and `true`, then `test.input` is not
|
||||
quite as described above. Instead, it must first be subjected to another
|
||||
round of unescaping (i.e., in addition to any unescaping involved in the
|
||||
JSON import), and the result of *that* represents the characters of the
|
||||
input stream. Currently, the only unescaping required by this option is
|
||||
to convert each sequence of the form \\uHHHH (where H is a hex digit)
|
||||
into the corresponding Unicode code point. (Note that this option also
|
||||
affects the interpretation of `test.output`.)
|
||||
|
||||
`test.initialStates` is a list of strings, each being the name of a
|
||||
tokenizer state which can be one of the following:
|
||||
|
||||
- `Data state`
|
||||
- `PLAINTEXT state`
|
||||
- `RCDATA state`
|
||||
- `RAWTEXT state`
|
||||
- `Script data state`
|
||||
- `CDATA section state`
|
||||
|
||||
The test should be run once for each string, using it
|
||||
to set the tokenizer's initial state for that run. If
|
||||
`test.initialStates` is omitted, it defaults to `["Data state"]`.
|
||||
|
||||
`test.lastStartTag` is a lowercase string that should be used as "the
|
||||
tag name of the last start tag to have been emitted from this
|
||||
tokenizer", referenced in the spec's definition of **appropriate end tag
|
||||
token**. If it is omitted, it is treated as if "no start tag has been
|
||||
emitted from this tokenizer".
|
||||
|
||||
### Test results
|
||||
|
||||
`test.output` is a list of tokens, ordered with the first produced by
|
||||
the tokenizer the first (leftmost) in the list. The list must mach the
|
||||
**complete** list of tokens that the tokenizer should produce. Valid
|
||||
tokens are:
|
||||
|
||||
["DOCTYPE", name, public_id, system_id, correctness]
|
||||
["StartTag", name, {attributes}*, true*]
|
||||
["StartTag", name, {attributes}]
|
||||
["EndTag", name]
|
||||
["Comment", data]
|
||||
["Character", data]
|
||||
|
||||
`public_id` and `system_id` are either strings or `null`. `correctness`
|
||||
is either `true` or `false`; `true` corresponds to the force-quirks flag
|
||||
being false, and vice-versa.
|
||||
|
||||
When the self-closing flag is set, the `StartTag` array has `true` as
|
||||
its fourth entry. When the flag is not set, the array has only three
|
||||
entries for backwards compatibility.
|
||||
|
||||
All adjacent character tokens are coalesced into a single
|
||||
`["Character", data]` token.
|
||||
|
||||
If `test.doubleEscaped` is present and `true`, then every string within
|
||||
`test.output` must be further unescaped (as described above) before
|
||||
comparing with the tokenizer's output.
|
||||
|
||||
xmlViolation tests
|
||||
------------------
|
||||
|
||||
`tokenizer/xmlViolation.test` differs from the above in a couple of
|
||||
ways:
|
||||
|
||||
- The name of the single member of the top-level JSON object is
|
||||
"xmlViolationTests" instead of "tests".
|
||||
- Each test's expected output assumes that implementation is applying
|
||||
the tweaks given in the spec's "Coercing an HTML DOM into an
|
||||
infoset" section.
|
||||
|
93
lib/html5lib/tests/testdata/tokenizer/contentModelFlags.test
vendored
Normal file
93
lib/html5lib/tests/testdata/tokenizer/contentModelFlags.test
vendored
Normal file
|
@ -0,0 +1,93 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"PLAINTEXT content model flag",
|
||||
"initialStates":["PLAINTEXT state"],
|
||||
"lastStartTag":"plaintext",
|
||||
"input":"<head>&body;",
|
||||
"output":[["Character", "<head>&body;"]]},
|
||||
|
||||
{"description":"PLAINTEXT with seeming close tag",
|
||||
"initialStates":["PLAINTEXT state"],
|
||||
"lastStartTag":"plaintext",
|
||||
"input":"</plaintext>&body;",
|
||||
"output":[["Character", "</plaintext>&body;"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp>",
|
||||
"output":[["Character", "foo"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT (case-insensitivity)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xMp>",
|
||||
"output":[["Character", "foo"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT (ending with space)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp ",
|
||||
"output":[["Character", "foo"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 10 }
|
||||
]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT (ending with EOF)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp",
|
||||
"output":[["Character", "foo</xmp"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT (ending with slash)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp/",
|
||||
"output":[["Character", "foo"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 10 }
|
||||
]},
|
||||
|
||||
{"description":"End tag not closing RCDATA or RAWTEXT (ending with left-angle-bracket)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp<",
|
||||
"output":[["Character", "foo</xmp<"]]},
|
||||
|
||||
{"description":"End tag with incorrect name in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</foo>bar</xmp>",
|
||||
"output":[["Character", "</foo>bar"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"Partial end tags leading straight into partial end tags",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</xmp</xmp</xmp>",
|
||||
"output":[["Character", "</xmp</xmp"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"End tag with incorrect name in RCDATA or RAWTEXT (starting like correct name)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</foo>bar</xmpaar>",
|
||||
"output":[["Character", "</foo>bar</xmpaar>"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT, switching back to PCDATA",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp></baz>",
|
||||
"output":[["Character", "foo"], ["EndTag", "xmp"], ["EndTag", "baz"]]},
|
||||
|
||||
{"description":"RAWTEXT w/ something looking like an entity",
|
||||
"initialStates":["RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"&foo;",
|
||||
"output":[["Character", "&foo;"]]},
|
||||
|
||||
{"description":"RCDATA w/ an entity",
|
||||
"initialStates":["RCDATA state"],
|
||||
"lastStartTag":"textarea",
|
||||
"input":"<",
|
||||
"output":[["Character", "<"]]}
|
||||
|
||||
]}
|
330
lib/html5lib/tests/testdata/tokenizer/domjs.test
vendored
Normal file
330
lib/html5lib/tests/testdata/tokenizer/domjs.test
vendored
Normal file
|
@ -0,0 +1,330 @@
|
|||
{
|
||||
"tests": [
|
||||
{
|
||||
"description":"CR in bogus comment state",
|
||||
"input":"<?\u000d",
|
||||
"output":[["Comment", "?\u000a"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"CRLF in bogus comment state",
|
||||
"input":"<?\u000d\u000a",
|
||||
"output":[["Comment", "?\u000a"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"CRLFLF in bogus comment state",
|
||||
"input":"<?\u000d\u000a\u000a",
|
||||
"output":[["Comment", "?\u000a\u000a"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"Raw NUL replacement",
|
||||
"doubleEscaped":true,
|
||||
"initialStates":["RCDATA state", "RAWTEXT state", "PLAINTEXT state", "Script data state"],
|
||||
"input":"\\u0000",
|
||||
"output":[["Character", "\\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 1 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"NUL in CDATA section",
|
||||
"doubleEscaped":true,
|
||||
"initialStates":["CDATA section state"],
|
||||
"input":"\\u0000]]>",
|
||||
"output":[["Character", "\\u0000"]]
|
||||
},
|
||||
{
|
||||
"description":"NUL in script HTML comment",
|
||||
"doubleEscaped":true,
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--test\\u0000--><!--test-\\u0000--><!--test--\\u0000-->",
|
||||
"output":[["Character", "<!--test\\uFFFD--><!--test-\\uFFFD--><!--test--\\uFFFD-->"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 9 },
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 22 },
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 36 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"NUL in script HTML comment - double escaped",
|
||||
"doubleEscaped":true,
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script>\\u0000--><!--<script>-\\u0000--><!--<script>--\\u0000-->",
|
||||
"output":[["Character", "<!--<script>\\uFFFD--><!--<script>-\\uFFFD--><!--<script>--\\uFFFD-->"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 13 },
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 30 },
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 48 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"EOF in script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--test",
|
||||
"output":[["Character", "<!--test"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 9 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"EOF in script HTML comment after dash",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--test-",
|
||||
"output":[["Character", "<!--test-"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 10 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"EOF in script HTML comment after dash dash",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--test--",
|
||||
"output":[["Character", "<!--test--"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 11 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"EOF in script HTML comment double escaped after dash",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script>-",
|
||||
"output":[["Character", "<!--<script>-"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 14 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"EOF in script HTML comment double escaped after dash dash",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script>--",
|
||||
"output":[["Character", "<!--<script>--"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 15 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"EOF in script HTML comment - double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script>",
|
||||
"output":[["Character", "<!--<script>"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 13 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"Dash in script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!-- - -->",
|
||||
"output":[["Character", "<!-- - -->"]]
|
||||
},
|
||||
{
|
||||
"description":"Dash less-than in script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!-- -< -->",
|
||||
"output":[["Character", "<!-- -< -->"]]
|
||||
},
|
||||
{
|
||||
"description":"Dash at end of script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--test--->",
|
||||
"output":[["Character", "<!--test--->"]]
|
||||
},
|
||||
{
|
||||
"description":"</script> in script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"lastStartTag":"script",
|
||||
"input":"<!-- </script> --></script>",
|
||||
"output":[["Character", "<!-- "], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
|
||||
},
|
||||
{
|
||||
"description":"</script> in script HTML comment - double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"lastStartTag":"script",
|
||||
"input":"<!-- <script></script> --></script>",
|
||||
"output":[["Character", "<!-- <script></script> -->"], ["EndTag", "script"]]
|
||||
},
|
||||
{
|
||||
"description":"</script> in script HTML comment - double escaped with nested <script>",
|
||||
"initialStates":["Script data state"],
|
||||
"lastStartTag":"script",
|
||||
"input":"<!-- <script><script></script></script> --></script>",
|
||||
"output":[["Character", "<!-- <script><script></script>"], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
|
||||
},
|
||||
{
|
||||
"description":"</script> in script HTML comment - double escaped with abrupt end",
|
||||
"initialStates":["Script data state"],
|
||||
"lastStartTag":"script",
|
||||
"input":"<!-- <script>--></script> --></script>",
|
||||
"output":[["Character", "<!-- <script>-->"], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
|
||||
},
|
||||
{
|
||||
"description":"Incomplete start tag in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"lastStartTag":"script",
|
||||
"input":"<!--<scrip></script>-->",
|
||||
"output":[["Character", "<!--<scrip>"], ["EndTag", "script"], ["Character", "-->"]]
|
||||
},
|
||||
{
|
||||
"description":"Unclosed start tag in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"lastStartTag":"script",
|
||||
"input":"<!--<script</script>-->",
|
||||
"output":[["Character", "<!--<script"], ["EndTag", "script"], ["Character", "-->"]]
|
||||
},
|
||||
{
|
||||
"description":"Incomplete end tag in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"lastStartTag":"script",
|
||||
"input":"<!--<script></scrip>-->",
|
||||
"output":[["Character", "<!--<script></scrip>-->"]]
|
||||
},
|
||||
{
|
||||
"description":"Unclosed end tag in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"lastStartTag":"script",
|
||||
"input":"<!--<script></script-->",
|
||||
"output":[["Character", "<!--<script></script-->"]]
|
||||
},
|
||||
{
|
||||
"description":"leading U+FEFF must pass through",
|
||||
"initialStates":["Data state", "RCDATA state", "RAWTEXT state", "Script data state"],
|
||||
"doubleEscaped":true,
|
||||
"input":"\\uFEFFfoo\\uFEFFbar",
|
||||
"output":[["Character", "\\uFEFFfoo\\uFEFFbar"]]
|
||||
},
|
||||
{
|
||||
"description":"Non BMP-charref in RCDATA",
|
||||
"initialStates":["RCDATA state"],
|
||||
"input":"≂̸",
|
||||
"output":[["Character", "\u2242\u0338"]]
|
||||
},
|
||||
{
|
||||
"description":"Bad charref in RCDATA",
|
||||
"initialStates":["RCDATA state"],
|
||||
"input":"&NotEqualTild;",
|
||||
"output":[["Character", "&NotEqualTild;"]],
|
||||
"errors":[
|
||||
{ "code": "unknown-named-character-reference", "line": 1, "col": 14 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"lowercase endtags",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</XMP>",
|
||||
"output":[["EndTag","xmp"]]
|
||||
},
|
||||
{
|
||||
"description":"bad endtag (space before name)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</ XMP>",
|
||||
"output":[["Character","</ XMP>"]]
|
||||
},
|
||||
{
|
||||
"description":"bad endtag (not matching last start tag)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</xm>",
|
||||
"output":[["Character","</xm>"]]
|
||||
},
|
||||
{
|
||||
"description":"bad endtag (without close bracket)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</xm ",
|
||||
"output":[["Character","</xm "]]
|
||||
},
|
||||
{
|
||||
"description":"bad endtag (trailing solidus)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</xm/",
|
||||
"output":[["Character","</xm/"]]
|
||||
},
|
||||
{
|
||||
"description":"Non BMP-charref in attribute",
|
||||
"input":"<p id=\"≂̸\">",
|
||||
"output":[["StartTag", "p", {"id":"\u2242\u0338"}]]
|
||||
},
|
||||
{
|
||||
"description":"--!NUL in comment ",
|
||||
"doubleEscaped":true,
|
||||
"input":"<!----!\\u0000-->",
|
||||
"output":[["Comment", "--!\\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 8 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"space EOF after doctype ",
|
||||
"input":"<!DOCTYPE html ",
|
||||
"output":[["DOCTYPE", "html", null, null , false]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-doctype", "line": 1, "col": 16 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"CDATA in HTML content",
|
||||
"input":"<![CDATA[foo]]>",
|
||||
"output":[["Comment", "[CDATA[foo]]"]],
|
||||
"errors":[
|
||||
{ "code": "cdata-in-html-content", "line": 1, "col": 9 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"CDATA content",
|
||||
"input":"foo ]]>",
|
||||
"initialStates":["CDATA section state"],
|
||||
"output":[["Character", "foo "]]
|
||||
},
|
||||
{
|
||||
"description":"CDATA followed by HTML content",
|
||||
"input":"foo ]]> ",
|
||||
"initialStates":["CDATA section state"],
|
||||
"output":[["Character", "foo  "]]
|
||||
},
|
||||
{
|
||||
"description":"CDATA with extra bracket",
|
||||
"input":"foo]]]>",
|
||||
"initialStates":["CDATA section state"],
|
||||
"output":[["Character", "foo]"]]
|
||||
},
|
||||
{
|
||||
"description":"CDATA without end marker",
|
||||
"input":"foo",
|
||||
"initialStates":["CDATA section state"],
|
||||
"output":[["Character", "foo"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-cdata", "line": 1, "col": 4 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"CDATA with single bracket ending",
|
||||
"input":"foo]",
|
||||
"initialStates":["CDATA section state"],
|
||||
"output":[["Character", "foo]"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-cdata", "line": 1, "col": 5 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"description":"CDATA with two brackets ending",
|
||||
"input":"foo]]",
|
||||
"initialStates":["CDATA section state"],
|
||||
"output":[["Character", "foo]]"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-cdata", "line": 1, "col": 6 }
|
||||
]
|
||||
}
|
||||
|
||||
]
|
||||
}
|
542
lib/html5lib/tests/testdata/tokenizer/entities.test
vendored
Normal file
542
lib/html5lib/tests/testdata/tokenizer/entities.test
vendored
Normal file
|
@ -0,0 +1,542 @@
|
|||
{"tests": [
|
||||
|
||||
{"description": "Undefined named entity in a double-quoted attribute value ending in semicolon and whose name starts with a known entity name.",
|
||||
"input":"<h a=\"¬i;\">",
|
||||
"output": [["StartTag", "h", {"a": "¬i;"}]]},
|
||||
|
||||
{"description": "Entity name requiring semicolon instead followed by the equals sign in a double-quoted attribute value.",
|
||||
"input":"<h a=\"&lang=\">",
|
||||
"output": [["StartTag", "h", {"a": "&lang="}]]},
|
||||
|
||||
{"description": "Valid entity name followed by the equals sign in a double-quoted attribute value.",
|
||||
"input":"<h a=\"¬=\">",
|
||||
"output": [["StartTag", "h", {"a": "¬="}]]},
|
||||
|
||||
{"description": "Undefined named entity in a single-quoted attribute value ending in semicolon and whose name starts with a known entity name.",
|
||||
"input":"<h a='¬i;'>",
|
||||
"output": [["StartTag", "h", {"a": "¬i;"}]]},
|
||||
|
||||
{"description": "Entity name requiring semicolon instead followed by the equals sign in a single-quoted attribute value.",
|
||||
"input":"<h a='&lang='>",
|
||||
"output": [["StartTag", "h", {"a": "&lang="}]]},
|
||||
|
||||
{"description": "Valid entity name followed by the equals sign in a single-quoted attribute value.",
|
||||
"input":"<h a='¬='>",
|
||||
"output": [["StartTag", "h", {"a": "¬="}]]},
|
||||
|
||||
{"description": "Undefined named entity in an unquoted attribute value ending in semicolon and whose name starts with a known entity name.",
|
||||
"input":"<h a=¬i;>",
|
||||
"output": [["StartTag", "h", {"a": "¬i;"}]]},
|
||||
|
||||
{"description": "Entity name requiring semicolon instead followed by the equals sign in an unquoted attribute value.",
|
||||
"input":"<h a=&lang=>",
|
||||
"output": [["StartTag", "h", {"a": "&lang="}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 11 }
|
||||
]},
|
||||
|
||||
{"description": "Valid entity name followed by the equals sign in an unquoted attribute value.",
|
||||
"input":"<h a=¬=>",
|
||||
"output": [["StartTag", "h", {"a": "¬="}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 10 }
|
||||
]},
|
||||
|
||||
{"description": "Ambiguous ampersand.",
|
||||
"input":"&rrrraannddom;",
|
||||
"output": [["Character", "&rrrraannddom;"]],
|
||||
"errors":[
|
||||
{ "code": "unknown-named-character-reference", "line": 1, "col": 14 }
|
||||
]},
|
||||
|
||||
{"description": "Semicolonless named entity 'not' followed by 'i;' in body",
|
||||
"input":"¬i;",
|
||||
"output": [["Character", "\u00ACi;"]],
|
||||
"errors":[
|
||||
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
|
||||
]},
|
||||
|
||||
{"description": "Very long undefined named entity in body",
|
||||
"input":"&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;",
|
||||
"output": [["Character", "&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;"]],
|
||||
"errors":[
|
||||
{ "code": "unknown-named-character-reference", "line": 1, "col": 950 }
|
||||
]},
|
||||
|
||||
{"description": "CR as numeric entity",
|
||||
"input":"
",
|
||||
"output": [["Character", "\r"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description": "CR as hexadecimal numeric entity",
|
||||
"input":"
",
|
||||
"output": [["Character", "\r"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 EURO SIGN numeric entity.",
|
||||
"input":"€",
|
||||
"output": [["Character", "\u20AC"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u0081"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE LOW-9 QUOTATION MARK numeric entity.",
|
||||
"input":"‚",
|
||||
"output": [["Character", "\u201A"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER F WITH HOOK numeric entity.",
|
||||
"input":"ƒ",
|
||||
"output": [["Character", "\u0192"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 DOUBLE LOW-9 QUOTATION MARK numeric entity.",
|
||||
"input":"„",
|
||||
"output": [["Character", "\u201E"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 HORIZONTAL ELLIPSIS numeric entity.",
|
||||
"input":"…",
|
||||
"output": [["Character", "\u2026"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 DAGGER numeric entity.",
|
||||
"input":"†",
|
||||
"output": [["Character", "\u2020"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 DOUBLE DAGGER numeric entity.",
|
||||
"input":"‡",
|
||||
"output": [["Character", "\u2021"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 MODIFIER LETTER CIRCUMFLEX ACCENT numeric entity.",
|
||||
"input":"ˆ",
|
||||
"output": [["Character", "\u02C6"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 PER MILLE SIGN numeric entity.",
|
||||
"input":"‰",
|
||||
"output": [["Character", "\u2030"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER S WITH CARON numeric entity.",
|
||||
"input":"Š",
|
||||
"output": [["Character", "\u0160"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE LEFT-POINTING ANGLE QUOTATION MARK numeric entity.",
|
||||
"input":"‹",
|
||||
"output": [["Character", "\u2039"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LIGATURE OE numeric entity.",
|
||||
"input":"Œ",
|
||||
"output": [["Character", "\u0152"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u008D"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER Z WITH CARON numeric entity.",
|
||||
"input":"Ž",
|
||||
"output": [["Character", "\u017D"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u008F"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u0090"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LEFT SINGLE QUOTATION MARK numeric entity.",
|
||||
"input":"‘",
|
||||
"output": [["Character", "\u2018"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 RIGHT SINGLE QUOTATION MARK numeric entity.",
|
||||
"input":"’",
|
||||
"output": [["Character", "\u2019"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LEFT DOUBLE QUOTATION MARK numeric entity.",
|
||||
"input":"“",
|
||||
"output": [["Character", "\u201C"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 RIGHT DOUBLE QUOTATION MARK numeric entity.",
|
||||
"input":"”",
|
||||
"output": [["Character", "\u201D"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 BULLET numeric entity.",
|
||||
"input":"•",
|
||||
"output": [["Character", "\u2022"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 EN DASH numeric entity.",
|
||||
"input":"–",
|
||||
"output": [["Character", "\u2013"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 EM DASH numeric entity.",
|
||||
"input":"—",
|
||||
"output": [["Character", "\u2014"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 SMALL TILDE numeric entity.",
|
||||
"input":"˜",
|
||||
"output": [["Character", "\u02DC"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 TRADE MARK SIGN numeric entity.",
|
||||
"input":"™",
|
||||
"output": [["Character", "\u2122"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER S WITH CARON numeric entity.",
|
||||
"input":"š",
|
||||
"output": [["Character", "\u0161"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE RIGHT-POINTING ANGLE QUOTATION MARK numeric entity.",
|
||||
"input":"›",
|
||||
"output": [["Character", "\u203A"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LIGATURE OE numeric entity.",
|
||||
"input":"œ",
|
||||
"output": [["Character", "\u0153"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u009D"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 EURO SIGN hexadecimal numeric entity.",
|
||||
"input":"€",
|
||||
"output": [["Character", "\u20AC"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u0081"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE LOW-9 QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"‚",
|
||||
"output": [["Character", "\u201A"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER F WITH HOOK hexadecimal numeric entity.",
|
||||
"input":"ƒ",
|
||||
"output": [["Character", "\u0192"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 DOUBLE LOW-9 QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"„",
|
||||
"output": [["Character", "\u201E"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 HORIZONTAL ELLIPSIS hexadecimal numeric entity.",
|
||||
"input":"…",
|
||||
"output": [["Character", "\u2026"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 DAGGER hexadecimal numeric entity.",
|
||||
"input":"†",
|
||||
"output": [["Character", "\u2020"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 DOUBLE DAGGER hexadecimal numeric entity.",
|
||||
"input":"‡",
|
||||
"output": [["Character", "\u2021"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 MODIFIER LETTER CIRCUMFLEX ACCENT hexadecimal numeric entity.",
|
||||
"input":"ˆ",
|
||||
"output": [["Character", "\u02C6"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 PER MILLE SIGN hexadecimal numeric entity.",
|
||||
"input":"‰",
|
||||
"output": [["Character", "\u2030"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER S WITH CARON hexadecimal numeric entity.",
|
||||
"input":"Š",
|
||||
"output": [["Character", "\u0160"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE LEFT-POINTING ANGLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"‹",
|
||||
"output": [["Character", "\u2039"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LIGATURE OE hexadecimal numeric entity.",
|
||||
"input":"Œ",
|
||||
"output": [["Character", "\u0152"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u008D"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER Z WITH CARON hexadecimal numeric entity.",
|
||||
"input":"Ž",
|
||||
"output": [["Character", "\u017D"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u008F"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u0090"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LEFT SINGLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"‘",
|
||||
"output": [["Character", "\u2018"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 RIGHT SINGLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"’",
|
||||
"output": [["Character", "\u2019"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LEFT DOUBLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"“",
|
||||
"output": [["Character", "\u201C"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 RIGHT DOUBLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"”",
|
||||
"output": [["Character", "\u201D"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 BULLET hexadecimal numeric entity.",
|
||||
"input":"•",
|
||||
"output": [["Character", "\u2022"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 EN DASH hexadecimal numeric entity.",
|
||||
"input":"–",
|
||||
"output": [["Character", "\u2013"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 EM DASH hexadecimal numeric entity.",
|
||||
"input":"—",
|
||||
"output": [["Character", "\u2014"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 SMALL TILDE hexadecimal numeric entity.",
|
||||
"input":"˜",
|
||||
"output": [["Character", "\u02DC"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 TRADE MARK SIGN hexadecimal numeric entity.",
|
||||
"input":"™",
|
||||
"output": [["Character", "\u2122"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER S WITH CARON hexadecimal numeric entity.",
|
||||
"input":"š",
|
||||
"output": [["Character", "\u0161"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE RIGHT-POINTING ANGLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"›",
|
||||
"output": [["Character", "\u203A"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LIGATURE OE hexadecimal numeric entity.",
|
||||
"input":"œ",
|
||||
"output": [["Character", "\u0153"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": [["Character", "\u009D"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER Z WITH CARON hexadecimal numeric entity.",
|
||||
"input":"ž",
|
||||
"output": [["Character", "\u017E"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER Y WITH DIAERESIS hexadecimal numeric entity.",
|
||||
"input":"Ÿ",
|
||||
"output": [["Character", "\u0178"]],
|
||||
"errors":[
|
||||
{ "code": "control-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description": "Decimal numeric entity followed by hex character a.",
|
||||
"input":"aa",
|
||||
"output": [["Character", "aa"]],
|
||||
"errors":[
|
||||
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
|
||||
]},
|
||||
|
||||
{"description": "Decimal numeric entity followed by hex character A.",
|
||||
"input":"aA",
|
||||
"output": [["Character", "aA"]],
|
||||
"errors":[
|
||||
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
|
||||
]},
|
||||
|
||||
{"description": "Decimal numeric entity followed by hex character f.",
|
||||
"input":"af",
|
||||
"output": [["Character", "af"]],
|
||||
"errors":[
|
||||
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
|
||||
]},
|
||||
|
||||
{"description": "Decimal numeric entity followed by hex character A.",
|
||||
"input":"aF",
|
||||
"output": [["Character", "aF"]],
|
||||
"errors":[
|
||||
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
|
||||
]}
|
||||
|
||||
]}
|
36
lib/html5lib/tests/testdata/tokenizer/escapeFlag.test
vendored
Normal file
36
lib/html5lib/tests/testdata/tokenizer/escapeFlag.test
vendored
Normal file
|
@ -0,0 +1,36 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"Commented close tag in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo<!--</xmp>--></xmp>",
|
||||
"output":[["Character", "foo<!--"], ["EndTag", "xmp"], ["Character", "-->"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"Bogus comment in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo<!-->baz</xmp>",
|
||||
"output":[["Character", "foo<!-->baz"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"End tag surrounded by bogus comment in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo<!--></xmp><!-->baz</xmp>",
|
||||
"output":[["Character", "foo<!-->"], ["EndTag", "xmp"], ["Comment", ""], ["Character", "baz"], ["EndTag", "xmp"]],
|
||||
"errors":[
|
||||
{ "code": "abrupt-closing-of-empty-comment", "line": 1, "col": 19 }
|
||||
]},
|
||||
|
||||
{"description":"Commented entities in RCDATA",
|
||||
"initialStates":["RCDATA state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":" & <!-- & --> & </xmp>",
|
||||
"output":[["Character", " & <!-- & --> & "], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"Incorrect comment ending sequences in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo<!-- x --x>x-- >x--!>x--<></xmp>",
|
||||
"output":[["Character", "foo<!-- x --x>x-- >x--!>x--<>"], ["EndTag", "xmp"]]}
|
||||
|
||||
]}
|
42422
lib/html5lib/tests/testdata/tokenizer/namedEntities.test
vendored
Normal file
42422
lib/html5lib/tests/testdata/tokenizer/namedEntities.test
vendored
Normal file
File diff suppressed because it is too large
Load diff
1677
lib/html5lib/tests/testdata/tokenizer/numericEntities.test
vendored
Normal file
1677
lib/html5lib/tests/testdata/tokenizer/numericEntities.test
vendored
Normal file
File diff suppressed because it is too large
Load diff
9
lib/html5lib/tests/testdata/tokenizer/pendingSpecChanges.test
vendored
Normal file
9
lib/html5lib/tests/testdata/tokenizer/pendingSpecChanges.test
vendored
Normal file
|
@ -0,0 +1,9 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"<!---- >",
|
||||
"input":"<!---- >",
|
||||
"output":[["Comment","-- >"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-comment", "line": 1, "col": 9 }
|
||||
]}
|
||||
]}
|
349
lib/html5lib/tests/testdata/tokenizer/test1.test
vendored
Normal file
349
lib/html5lib/tests/testdata/tokenizer/test1.test
vendored
Normal file
|
@ -0,0 +1,349 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"Correct Doctype lowercase",
|
||||
"input":"<!DOCTYPE html>",
|
||||
"output":[["DOCTYPE", "html", null, null, true]]},
|
||||
|
||||
|
||||
{"description":"Correct Doctype uppercase",
|
||||
"input":"<!DOCTYPE HTML>",
|
||||
"output":[["DOCTYPE", "html", null, null, true]]},
|
||||
|
||||
{"description":"Correct Doctype mixed case",
|
||||
"input":"<!DOCTYPE HtMl>",
|
||||
"output":[["DOCTYPE", "html", null, null, true]]},
|
||||
|
||||
{"description":"Correct Doctype case with EOF",
|
||||
"input":"<!DOCTYPE HtMl",
|
||||
"output":[["DOCTYPE", "html", null, null, false]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-doctype", "line": 1, "col": 15 }
|
||||
]},
|
||||
|
||||
{"description":"Truncated doctype start",
|
||||
"input":"<!DOC>",
|
||||
"output":[["Comment", "DOC"]],
|
||||
"errors":[
|
||||
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 }
|
||||
]},
|
||||
|
||||
{"description":"Doctype in error",
|
||||
"input":"<!DOCTYPE foo>",
|
||||
"output":[["DOCTYPE", "foo", null, null, true]]},
|
||||
|
||||
{"description":"Single Start Tag",
|
||||
"input":"<h>",
|
||||
"output":[["StartTag", "h", {}]]},
|
||||
|
||||
{"description":"Empty end tag",
|
||||
"input":"</>",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "missing-end-tag-name", "line": 1, "col": 3 }
|
||||
]},
|
||||
|
||||
{"description":"Empty start tag",
|
||||
"input":"<>",
|
||||
"output":[["Character", "<>"]],
|
||||
"errors":[
|
||||
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 2 }
|
||||
]},
|
||||
|
||||
{"description":"Start Tag w/attribute",
|
||||
"input":"<h a='b'>",
|
||||
"output":[["StartTag", "h", {"a":"b"}]]},
|
||||
|
||||
{"description":"Start Tag w/attribute no quotes",
|
||||
"input":"<h a=b>",
|
||||
"output":[["StartTag", "h", {"a":"b"}]]},
|
||||
|
||||
{"description":"Start/End Tag",
|
||||
"input":"<h></h>",
|
||||
"output":[["StartTag", "h", {}], ["EndTag", "h"]]},
|
||||
|
||||
{"description":"Two unclosed start tags",
|
||||
"input":"<p>One<p>Two",
|
||||
"output":[["StartTag", "p", {}], ["Character", "One"], ["StartTag", "p", {}], ["Character", "Two"]]},
|
||||
|
||||
{"description":"End Tag w/attribute",
|
||||
"input":"<h></h a='b'>",
|
||||
"output":[["StartTag", "h", {}], ["EndTag", "h"]],
|
||||
"errors":[
|
||||
{ "code": "end-tag-with-attributes", "line": 1, "col": 13 }
|
||||
]},
|
||||
|
||||
{"description":"Multiple atts",
|
||||
"input":"<h a='b' c='d'>",
|
||||
"output":[["StartTag", "h", {"a":"b", "c":"d"}]]},
|
||||
|
||||
{"description":"Multiple atts no space",
|
||||
"input":"<h a='b'c='d'>",
|
||||
"output":[["StartTag", "h", {"a":"b", "c":"d"}]],
|
||||
"errors":[
|
||||
{ "code": "missing-whitespace-between-attributes", "line": 1, "col": 9 }
|
||||
]},
|
||||
|
||||
{"description":"Repeated attr",
|
||||
"input":"<h a='b' a='d'>",
|
||||
"output":[["StartTag", "h", {"a":"b"}]],
|
||||
"errors":[
|
||||
{ "code": "duplicate-attribute", "line": 1, "col": 11 }
|
||||
]},
|
||||
|
||||
{"description":"Simple comment",
|
||||
"input":"<!--comment-->",
|
||||
"output":[["Comment", "comment"]]},
|
||||
|
||||
{"description":"Comment, Central dash no space",
|
||||
"input":"<!----->",
|
||||
"output":[["Comment", "-"]]},
|
||||
|
||||
{"description":"Comment, two central dashes",
|
||||
"input":"<!-- --comment -->",
|
||||
"output":[["Comment", " --comment "]]},
|
||||
|
||||
{"description":"Comment, central less-than bang",
|
||||
"input":"<!--<!-->",
|
||||
"output":[["Comment", "<!"]]},
|
||||
|
||||
{"description":"Unfinished comment",
|
||||
"input":"<!--comment",
|
||||
"output":[["Comment", "comment"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-comment", "line": 1, "col": 12 }
|
||||
]},
|
||||
|
||||
{"description":"Unfinished comment after start of nested comment",
|
||||
"input":"<!-- <!--",
|
||||
"output":[["Comment", " <!"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-comment", "line": 1, "col": 10 }
|
||||
]},
|
||||
|
||||
{"description":"Start of a comment",
|
||||
"input":"<!-",
|
||||
"output":[["Comment", "-"]],
|
||||
"errors":[
|
||||
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 }
|
||||
]},
|
||||
|
||||
{"description":"Short comment",
|
||||
"input":"<!-->",
|
||||
"output":[["Comment", ""]],
|
||||
"errors":[
|
||||
{ "code": "abrupt-closing-of-empty-comment", "line": 1, "col": 5 }
|
||||
]},
|
||||
|
||||
{"description":"Short comment two",
|
||||
"input":"<!--->",
|
||||
"output":[["Comment", ""]],
|
||||
"errors":[
|
||||
{ "code": "abrupt-closing-of-empty-comment", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Short comment three",
|
||||
"input":"<!---->",
|
||||
"output":[["Comment", ""]]},
|
||||
|
||||
{"description":"< in comment",
|
||||
"input":"<!-- <test-->",
|
||||
"output":[["Comment", " <test"]]},
|
||||
|
||||
{"description":"<! in comment",
|
||||
"input":"<!-- <!test-->",
|
||||
"output":[["Comment", " <!test"]]},
|
||||
|
||||
{"description":"<!- in comment",
|
||||
"input":"<!-- <!-test-->",
|
||||
"output":[["Comment", " <!-test"]]},
|
||||
|
||||
{"description":"Nested comment",
|
||||
"input":"<!-- <!--test-->",
|
||||
"output":[["Comment", " <!--test"]],
|
||||
"errors":[
|
||||
{ "code": "nested-comment", "line": 1, "col": 10 }
|
||||
]},
|
||||
|
||||
{"description":"Nested comment with extra <",
|
||||
"input":"<!-- <<!--test-->",
|
||||
"output":[["Comment", " <<!--test"]],
|
||||
"errors":[
|
||||
{ "code": "nested-comment", "line": 1, "col": 11 }
|
||||
]},
|
||||
|
||||
{"description":"< in script data",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<test-->",
|
||||
"output":[["Character", "<test-->"]]},
|
||||
|
||||
{"description":"<! in script data",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!test-->",
|
||||
"output":[["Character", "<!test-->"]]},
|
||||
|
||||
{"description":"<!- in script data",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!-test-->",
|
||||
"output":[["Character", "<!-test-->"]]},
|
||||
|
||||
{"description":"Escaped script data",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--test-->",
|
||||
"output":[["Character", "<!--test-->"]]},
|
||||
|
||||
{"description":"< in script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!-- < test -->",
|
||||
"output":[["Character", "<!-- < test -->"]]},
|
||||
|
||||
{"description":"</ in script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!-- </ test -->",
|
||||
"output":[["Character", "<!-- </ test -->"]]},
|
||||
|
||||
{"description":"Start tag in script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!-- <test> -->",
|
||||
"output":[["Character", "<!-- <test> -->"]]},
|
||||
|
||||
{"description":"End tag in script HTML comment",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!-- </test> -->",
|
||||
"output":[["Character", "<!-- </test> -->"]]},
|
||||
|
||||
{"description":"- in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script>-</script>-->",
|
||||
"output":[["Character", "<!--<script>-</script>-->"]]},
|
||||
|
||||
{"description":"-- in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script>--</script>-->",
|
||||
"output":[["Character", "<!--<script>--</script>-->"]]},
|
||||
|
||||
{"description":"--- in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script>---</script>-->",
|
||||
"output":[["Character", "<!--<script>---</script>-->"]]},
|
||||
|
||||
{"description":"- spaced in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script> - </script>-->",
|
||||
"output":[["Character", "<!--<script> - </script>-->"]]},
|
||||
|
||||
{"description":"-- spaced in script HTML comment double escaped",
|
||||
"initialStates":["Script data state"],
|
||||
"input":"<!--<script> -- </script>-->",
|
||||
"output":[["Character", "<!--<script> -- </script>-->"]]},
|
||||
|
||||
{"description":"Ampersand EOF",
|
||||
"input":"&",
|
||||
"output":[["Character", "&"]]},
|
||||
|
||||
{"description":"Ampersand ampersand EOF",
|
||||
"input":"&&",
|
||||
"output":[["Character", "&&"]]},
|
||||
|
||||
{"description":"Ampersand space EOF",
|
||||
"input":"& ",
|
||||
"output":[["Character", "& "]]},
|
||||
|
||||
{"description":"Unfinished entity",
|
||||
"input":"&f",
|
||||
"output":[["Character", "&f"]]},
|
||||
|
||||
{"description":"Ampersand, number sign",
|
||||
"input":"&#",
|
||||
"output":[["Character", "&#"]],
|
||||
"errors":[
|
||||
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 3 }
|
||||
]},
|
||||
|
||||
{"description":"Unfinished numeric entity",
|
||||
"input":"&#x",
|
||||
"output":[["Character", "&#x"]],
|
||||
"errors":[
|
||||
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"Entity with trailing semicolon (1)",
|
||||
"input":"I'm ¬it",
|
||||
"output":[["Character","I'm \u00ACit"]]},
|
||||
|
||||
{"description":"Entity with trailing semicolon (2)",
|
||||
"input":"I'm ∉",
|
||||
"output":[["Character","I'm \u2209"]]},
|
||||
|
||||
{"description":"Entity without trailing semicolon (1)",
|
||||
"input":"I'm ¬it",
|
||||
"output":[["Character","I'm \u00ACit"]],
|
||||
"errors": [
|
||||
{"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 9 }
|
||||
]},
|
||||
|
||||
{"description":"Entity without trailing semicolon (2)",
|
||||
"input":"I'm ¬in",
|
||||
"output":[["Character","I'm \u00ACin"]],
|
||||
"errors": [
|
||||
{"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 9 }
|
||||
]},
|
||||
|
||||
{"description":"Partial entity match at end of file",
|
||||
"input":"I'm &no",
|
||||
"output":[["Character","I'm &no"]]},
|
||||
|
||||
{"description":"Non-ASCII character reference name",
|
||||
"input":"&\u00AC;",
|
||||
"output":[["Character", "&\u00AC;"]]},
|
||||
|
||||
{"description":"ASCII decimal entity",
|
||||
"input":"$",
|
||||
"output":[["Character","$"]]},
|
||||
|
||||
{"description":"ASCII hexadecimal entity",
|
||||
"input":"?",
|
||||
"output":[["Character","?"]]},
|
||||
|
||||
{"description":"Hexadecimal entity in attribute",
|
||||
"input":"<h a='?'></h>",
|
||||
"output":[["StartTag", "h", {"a":"?"}], ["EndTag", "h"]]},
|
||||
|
||||
{"description":"Entity in attribute without semicolon ending in x",
|
||||
"input":"<h a='¬x'>",
|
||||
"output":[["StartTag", "h", {"a":"¬x"}]]},
|
||||
|
||||
{"description":"Entity in attribute without semicolon ending in 1",
|
||||
"input":"<h a='¬1'>",
|
||||
"output":[["StartTag", "h", {"a":"¬1"}]]},
|
||||
|
||||
{"description":"Entity in attribute without semicolon ending in i",
|
||||
"input":"<h a='¬i'>",
|
||||
"output":[["StartTag", "h", {"a":"¬i"}]]},
|
||||
|
||||
{"description":"Entity in attribute without semicolon",
|
||||
"input":"<h a='©'>",
|
||||
"output":[["StartTag", "h", {"a":"\u00A9"}]],
|
||||
"errors": [
|
||||
{"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 12 }
|
||||
]},
|
||||
|
||||
{"description":"Unquoted attribute ending in ampersand",
|
||||
"input":"<s o=& t>",
|
||||
"output":[["StartTag","s",{"o":"&","t":""}]]},
|
||||
|
||||
{"description":"Unquoted attribute at end of tag with final character of &, with tag followed by characters",
|
||||
"input":"<a a=a&>foo",
|
||||
"output":[["StartTag", "a", {"a":"a&"}], ["Character", "foo"]]},
|
||||
|
||||
{"description":"plaintext element",
|
||||
"input":"<plaintext>foobar",
|
||||
"output":[["StartTag","plaintext",{}], ["Character","foobar"]]},
|
||||
|
||||
{"description":"Open angled bracket in unquoted attribute value state",
|
||||
"input":"<a a=f<>",
|
||||
"output":[["StartTag", "a", {"a":"f<"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 7 }
|
||||
]}
|
||||
|
||||
]}
|
275
lib/html5lib/tests/testdata/tokenizer/test2.test
vendored
Normal file
275
lib/html5lib/tests/testdata/tokenizer/test2.test
vendored
Normal file
|
@ -0,0 +1,275 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"DOCTYPE without name",
|
||||
"input":"<!DOCTYPE>",
|
||||
"output":[["DOCTYPE", null, null, null, false]],
|
||||
"errors":[
|
||||
{ "code": "missing-doctype-name", "line": 1, "col": 10 }
|
||||
]},
|
||||
|
||||
{"description":"DOCTYPE without space before name",
|
||||
"input":"<!DOCTYPEhtml>",
|
||||
"output":[["DOCTYPE", "html", null, null, true]],
|
||||
"errors":[
|
||||
{ "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 }
|
||||
]},
|
||||
|
||||
{"description":"Incorrect DOCTYPE without a space before name",
|
||||
"input":"<!DOCTYPEfoo>",
|
||||
"output":[["DOCTYPE", "foo", null, null, true]],
|
||||
"errors":[
|
||||
{ "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 }
|
||||
]},
|
||||
|
||||
{"description":"DOCTYPE with publicId",
|
||||
"input":"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML Transitional 4.01//EN\">",
|
||||
"output":[["DOCTYPE", "html", "-//W3C//DTD HTML Transitional 4.01//EN", null, true]]},
|
||||
|
||||
{"description":"DOCTYPE with EOF after PUBLIC",
|
||||
"input":"<!DOCTYPE html PUBLIC",
|
||||
"output":[["DOCTYPE", "html", null, null, false]],
|
||||
"errors": [
|
||||
{ "code": "eof-in-doctype", "col": 22, "line": 1 }
|
||||
]},
|
||||
|
||||
{"description":"DOCTYPE with EOF after PUBLIC '",
|
||||
"input":"<!DOCTYPE html PUBLIC '",
|
||||
"output":[["DOCTYPE", "html", "", null, false]],
|
||||
"errors": [
|
||||
{ "code": "eof-in-doctype", "col": 24, "line": 1 }
|
||||
]},
|
||||
|
||||
{"description":"DOCTYPE with EOF after PUBLIC 'x",
|
||||
"input":"<!DOCTYPE html PUBLIC 'x",
|
||||
"output":[["DOCTYPE", "html", "x", null, false]],
|
||||
"errors": [
|
||||
{ "code": "eof-in-doctype", "col": 25, "line": 1 }
|
||||
]},
|
||||
|
||||
{"description":"DOCTYPE with systemId",
|
||||
"input":"<!DOCTYPE html SYSTEM \"-//W3C//DTD HTML Transitional 4.01//EN\">",
|
||||
"output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
|
||||
|
||||
{"description":"DOCTYPE with single-quoted systemId",
|
||||
"input":"<!DOCTYPE html SYSTEM '-//W3C//DTD HTML Transitional 4.01//EN'>",
|
||||
"output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
|
||||
|
||||
{"description":"DOCTYPE with publicId and systemId",
|
||||
"input":"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML Transitional 4.01//EN\" \"-//W3C//DTD HTML Transitional 4.01//EN\">",
|
||||
"output":[["DOCTYPE", "html", "-//W3C//DTD HTML Transitional 4.01//EN", "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
|
||||
|
||||
{"description":"DOCTYPE with > in double-quoted publicId",
|
||||
"input":"<!DOCTYPE html PUBLIC \">x",
|
||||
"output":[["DOCTYPE", "html", "", null, false], ["Character", "x"]],
|
||||
"errors": [
|
||||
{ "code": "abrupt-doctype-public-identifier", "col": 24, "line": 1 }
|
||||
]},
|
||||
|
||||
{"description":"DOCTYPE with > in single-quoted publicId",
|
||||
"input":"<!DOCTYPE html PUBLIC '>x",
|
||||
"output":[["DOCTYPE", "html", "", null, false], ["Character", "x"]],
|
||||
"errors": [
|
||||
{ "code": "abrupt-doctype-public-identifier", "col": 24, "line": 1 }
|
||||
]},
|
||||
|
||||
{"description":"DOCTYPE with > in double-quoted systemId",
|
||||
"input":"<!DOCTYPE html PUBLIC \"foo\" \">x",
|
||||
"output":[["DOCTYPE", "html", "foo", "", false], ["Character", "x"]],
|
||||
"errors": [
|
||||
{ "code": "abrupt-doctype-system-identifier", "col": 30, "line": 1 }
|
||||
]},
|
||||
|
||||
{"description":"DOCTYPE with > in single-quoted systemId",
|
||||
"input":"<!DOCTYPE html PUBLIC 'foo' '>x",
|
||||
"output":[["DOCTYPE", "html", "foo", "", false], ["Character", "x"]],
|
||||
"errors": [
|
||||
{ "code": "abrupt-doctype-system-identifier", "col": 30, "line": 1 }
|
||||
]},
|
||||
|
||||
{"description":"Incomplete doctype",
|
||||
"input":"<!DOCTYPE html ",
|
||||
"output":[["DOCTYPE", "html", null, null, false]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-doctype", "line": 1, "col": 16 }
|
||||
]},
|
||||
|
||||
{"description":"Numeric entity representing the NUL character",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "null-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description":"Hexadecimal entity representing the NUL character",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "null-character-reference", "line": 1, "col": 9 }
|
||||
]},
|
||||
|
||||
{"description":"Numeric entity representing a codepoint after 1114111 (U+10FFFF)",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 11 }
|
||||
]},
|
||||
|
||||
{"description":"Hexadecimal entity representing a codepoint after 1114111 (U+10FFFF)",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 13 }
|
||||
]},
|
||||
|
||||
{"description":"Hexadecimal entity pair representing a surrogate pair",
|
||||
"input":"��",
|
||||
"output":[["Character", "\uFFFD\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "surrogate-character-reference", "line": 1, "col": 9 },
|
||||
{ "code": "surrogate-character-reference", "line": 1, "col": 17 }
|
||||
]},
|
||||
|
||||
{"description":"Hexadecimal entity with mixed uppercase and lowercase",
|
||||
"input":"ꯍ",
|
||||
"output":[["Character", "\uABCD"]]},
|
||||
|
||||
{"description":"Entity without a name",
|
||||
"input":"&;",
|
||||
"output":[["Character", "&;"]]},
|
||||
|
||||
{"description":"Unescaped ampersand in attribute value",
|
||||
"input":"<h a='&'>",
|
||||
"output":[["StartTag", "h", { "a":"&" }]]},
|
||||
|
||||
|
||||
{"description":"StartTag containing <",
|
||||
"input":"<a<b>",
|
||||
"output":[["StartTag", "a<b", { }]]},
|
||||
|
||||
{"description":"Non-void element containing trailing /",
|
||||
"input":"<h/>",
|
||||
"output":[["StartTag","h",{},true]]},
|
||||
|
||||
{"description":"Void element with permitted slash",
|
||||
"input":"<br/>",
|
||||
"output":[["StartTag","br",{},true]]},
|
||||
|
||||
{"description":"Void element with permitted slash (with attribute)",
|
||||
"input":"<br foo='bar'/>",
|
||||
"output":[["StartTag","br",{"foo":"bar"},true]]},
|
||||
|
||||
{"description":"StartTag containing /",
|
||||
"input":"<h/a='b'>",
|
||||
"output":[["StartTag", "h", { "a":"b" }]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-solidus-in-tag", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"Double-quoted attribute value",
|
||||
"input":"<h a=\"b\">",
|
||||
"output":[["StartTag", "h", { "a":"b" }]]},
|
||||
|
||||
{"description":"Unescaped </",
|
||||
"input":"</",
|
||||
"output":[["Character", "</"]],
|
||||
"errors":[
|
||||
{ "code": "eof-before-tag-name", "line": 1, "col": 3 }
|
||||
]},
|
||||
|
||||
{"description":"Illegal end tag name",
|
||||
"input":"</1>",
|
||||
"output":[["Comment", "1"]],
|
||||
"errors":[
|
||||
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 3 }
|
||||
]},
|
||||
|
||||
{"description":"Simili processing instruction",
|
||||
"input":"<?namespace>",
|
||||
"output":[["Comment", "?namespace"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
|
||||
]},
|
||||
|
||||
{"description":"A bogus comment stops at >, even if preceeded by two dashes",
|
||||
"input":"<?foo-->",
|
||||
"output":[["Comment", "?foo--"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
|
||||
]},
|
||||
|
||||
{"description":"Unescaped <",
|
||||
"input":"foo < bar",
|
||||
"output":[["Character", "foo < bar"]],
|
||||
"errors":[
|
||||
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Null Byte Replacement",
|
||||
"input":"\u0000",
|
||||
"output":[["Character", "\u0000"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 1 }
|
||||
]},
|
||||
|
||||
{"description":"Comment with dash",
|
||||
"input":"<!---x",
|
||||
"output":[["Comment", "-x"]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-comment", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description":"Entity + newline",
|
||||
"input":"\nx\n>\n",
|
||||
"output":[["Character","\nx\n>\n"]]},
|
||||
|
||||
{"description":"Start tag with no attributes but space before the greater-than sign",
|
||||
"input":"<h >",
|
||||
"output":[["StartTag", "h", {}]]},
|
||||
|
||||
{"description":"Empty attribute followed by uppercase attribute",
|
||||
"input":"<h a B=''>",
|
||||
"output":[["StartTag", "h", {"a":"", "b":""}]]},
|
||||
|
||||
{"description":"Double-quote after attribute name",
|
||||
"input":"<h a \">",
|
||||
"output":[["StartTag", "h", {"a":"", "\"":""}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Single-quote after attribute name",
|
||||
"input":"<h a '>",
|
||||
"output":[["StartTag", "h", {"a":"", "'":""}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Empty end tag with following characters",
|
||||
"input":"a</>bc",
|
||||
"output":[["Character", "abc"]],
|
||||
"errors":[
|
||||
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"Empty end tag with following tag",
|
||||
"input":"a</><b>c",
|
||||
"output":[["Character", "a"], ["StartTag", "b", {}], ["Character", "c"]],
|
||||
"errors":[
|
||||
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"Empty end tag with following comment",
|
||||
"input":"a</><!--b-->c",
|
||||
"output":[["Character", "a"], ["Comment", "b"], ["Character", "c"]],
|
||||
"errors":[
|
||||
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"Empty end tag with following end tag",
|
||||
"input":"a</></b>c",
|
||||
"output":[["Character", "a"], ["EndTag", "b"], ["Character", "c"]],
|
||||
"errors":[
|
||||
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
|
||||
]}
|
||||
|
||||
]}
|
11233
lib/html5lib/tests/testdata/tokenizer/test3.test
vendored
Normal file
11233
lib/html5lib/tests/testdata/tokenizer/test3.test
vendored
Normal file
File diff suppressed because it is too large
Load diff
532
lib/html5lib/tests/testdata/tokenizer/test4.test
vendored
Normal file
532
lib/html5lib/tests/testdata/tokenizer/test4.test
vendored
Normal file
|
@ -0,0 +1,532 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"< in attribute name",
|
||||
"input":"<z/0 <>",
|
||||
"output":[["StartTag", "z", {"0": "", "<": ""}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-solidus-in-tag", "line": 1, "col": 4 },
|
||||
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description":"< in unquoted attribute value",
|
||||
"input":"<z x=<>",
|
||||
"output":[["StartTag", "z", {"x": "<"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"= in unquoted attribute value",
|
||||
"input":"<z z=z=z>",
|
||||
"output":[["StartTag", "z", {"z": "z=z"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description":"= attribute",
|
||||
"input":"<z =>",
|
||||
"output":[["StartTag", "z", {"=": ""}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-equals-sign-before-attribute-name", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"== attribute",
|
||||
"input":"<z ==>",
|
||||
"output":[["StartTag", "z", {"=": ""}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-equals-sign-before-attribute-name", "line": 1, "col": 4 },
|
||||
{ "code": "missing-attribute-value", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"=== attribute",
|
||||
"input":"<z ===>",
|
||||
"output":[["StartTag", "z", {"=": "="}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-equals-sign-before-attribute-name", "line": 1, "col": 4 },
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"==== attribute",
|
||||
"input":"<z ====>",
|
||||
"output":[["StartTag", "z", {"=": "=="}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-equals-sign-before-attribute-name", "line": 1, "col": 4 },
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 6 },
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description":"\" after ampersand in double-quoted attribute value",
|
||||
"input":"<z z=\"&\">",
|
||||
"output":[["StartTag", "z", {"z": "&"}]]},
|
||||
|
||||
{"description":"' after ampersand in double-quoted attribute value",
|
||||
"input":"<z z=\"&'\">",
|
||||
"output":[["StartTag", "z", {"z": "&'"}]]},
|
||||
|
||||
{"description":"' after ampersand in single-quoted attribute value",
|
||||
"input":"<z z='&'>",
|
||||
"output":[["StartTag", "z", {"z": "&"}]]},
|
||||
|
||||
{"description":"\" after ampersand in single-quoted attribute value",
|
||||
"input":"<z z='&\"'>",
|
||||
"output":[["StartTag", "z", {"z": "&\""}]]},
|
||||
|
||||
{"description":"Text after bogus character reference",
|
||||
"input":"<z z='&xlink_xmlns;'>bar<z>",
|
||||
"output":[["StartTag","z",{"z":"&xlink_xmlns;"}],["Character","bar"],["StartTag","z",{}]]},
|
||||
|
||||
{"description":"Text after hex character reference",
|
||||
"input":"<z z='  foo'>bar<z>",
|
||||
"output":[["StartTag","z",{"z":" foo"}],["Character","bar"],["StartTag","z",{}]]},
|
||||
|
||||
{"description":"Attribute name starting with \"",
|
||||
"input":"<foo \"='bar'>",
|
||||
"output":[["StartTag", "foo", {"\"": "bar"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Attribute name starting with '",
|
||||
"input":"<foo '='bar'>",
|
||||
"output":[["StartTag", "foo", {"'": "bar"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Attribute name containing \"",
|
||||
"input":"<foo a\"b='bar'>",
|
||||
"output":[["StartTag", "foo", {"a\"b": "bar"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description":"Attribute name containing '",
|
||||
"input":"<foo a'b='bar'>",
|
||||
"output":[["StartTag", "foo", {"a'b": "bar"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-attribute-name", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description":"Unquoted attribute value containing '",
|
||||
"input":"<foo a=b'c>",
|
||||
"output":[["StartTag", "foo", {"a": "b'c"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 9 }
|
||||
]},
|
||||
|
||||
|
||||
{"description":"Unquoted attribute value containing \"",
|
||||
"input":"<foo a=b\"c>",
|
||||
"output":[["StartTag", "foo", {"a": "b\"c"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 9 }
|
||||
]},
|
||||
|
||||
{"description":"Double-quoted attribute value not followed by whitespace",
|
||||
"input":"<foo a=\"b\"c>",
|
||||
"output":[["StartTag", "foo", {"a": "b", "c": ""}]],
|
||||
"errors":[
|
||||
{ "code": "missing-whitespace-between-attributes", "line": 1, "col": 11 }
|
||||
]},
|
||||
|
||||
{"description":"Single-quoted attribute value not followed by whitespace",
|
||||
"input":"<foo a='b'c>",
|
||||
"output":[["StartTag", "foo", {"a": "b", "c": ""}]],
|
||||
"errors":[
|
||||
{ "code": "missing-whitespace-between-attributes", "line": 1, "col": 11 }
|
||||
]},
|
||||
|
||||
{"description":"Quoted attribute followed by permitted /",
|
||||
"input":"<br a='b'/>",
|
||||
"output":[["StartTag","br",{"a":"b"},true]]},
|
||||
|
||||
{"description":"Quoted attribute followed by non-permitted /",
|
||||
"input":"<bar a='b'/>",
|
||||
"output":[["StartTag","bar",{"a":"b"},true]]},
|
||||
|
||||
{"description":"CR EOF after doctype name",
|
||||
"input":"<!doctype html \r",
|
||||
"output":[["DOCTYPE", "html", null, null, false]],
|
||||
"errors":[
|
||||
{ "code": "eof-in-doctype", "line": 2, "col": 1 }
|
||||
]},
|
||||
|
||||
{"description":"CR EOF in tag name",
|
||||
"input":"<z\r",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 2, "col": 1 }
|
||||
]},
|
||||
|
||||
{"description":"Slash EOF in tag name",
|
||||
"input":"<z/",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"Zero hex numeric entity",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 },
|
||||
{ "code": "null-character-reference", "line": 1, "col": 5 }
|
||||
]},
|
||||
|
||||
{"description":"Zero decimal numeric entity",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 4 },
|
||||
{ "code": "null-character-reference", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"Zero-prefixed hex numeric entity",
|
||||
"input":"A",
|
||||
"output":[["Character", "A"]]},
|
||||
|
||||
{"description":"Zero-prefixed decimal numeric entity",
|
||||
"input":"A",
|
||||
"output":[["Character", "A"]]},
|
||||
|
||||
{"description":"Empty hex numeric entities",
|
||||
"input":"&#x &#X ",
|
||||
"output":[["Character", "&#x &#X "]],
|
||||
"errors":[
|
||||
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 4 },
|
||||
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description":"Invalid digit in hex numeric entity",
|
||||
"input":"&#xZ",
|
||||
"output":[["Character", "&#xZ"]],
|
||||
"errors":[
|
||||
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"Empty decimal numeric entities",
|
||||
"input":"&# &#; ",
|
||||
"output":[["Character", "&# &#; "]],
|
||||
"errors":[
|
||||
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 3 },
|
||||
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Invalid digit in decimal numeric entity",
|
||||
"input":"&#A",
|
||||
"output":[["Character", "&#A"]],
|
||||
"errors":[
|
||||
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 3 }
|
||||
]},
|
||||
|
||||
{"description":"Non-BMP numeric entity",
|
||||
"input":"𐀀",
|
||||
"output":[["Character", "\uD800\uDC00"]]},
|
||||
|
||||
{"description":"Maximum non-BMP numeric entity",
|
||||
"input":"",
|
||||
"output":[["Character", "\uDBFF\uDFFF"]],
|
||||
"errors":[
|
||||
{ "code": "noncharacter-character-reference", "line": 1, "col": 11 }
|
||||
]},
|
||||
|
||||
|
||||
{"description":"Above maximum numeric entity",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 11 }
|
||||
]},
|
||||
|
||||
{"description":"32-bit hex numeric entity",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 13 }
|
||||
]},
|
||||
|
||||
{"description":"33-bit hex numeric entity",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 14 }
|
||||
]},
|
||||
|
||||
{"description":"33-bit decimal numeric entity",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 14 }
|
||||
]},
|
||||
|
||||
{"description":"65-bit hex numeric entity",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 22 }
|
||||
]},
|
||||
|
||||
{"description":"65-bit decimal numeric entity",
|
||||
"input":"�",
|
||||
"output":[["Character", "\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "character-reference-outside-unicode-range", "line": 1, "col": 24 }
|
||||
]},
|
||||
|
||||
{"description":"Surrogate code point edge cases",
|
||||
"input":"퟿����",
|
||||
"output":[["Character", "\uD7FF\uFFFD\uFFFD\uFFFD\uFFFD\uE000"]],
|
||||
"errors":[
|
||||
{ "code": "surrogate-character-reference", "line": 1, "col": 17 },
|
||||
{ "code": "surrogate-character-reference", "line": 1, "col": 25 },
|
||||
{ "code": "surrogate-character-reference", "line": 1, "col": 33 },
|
||||
{ "code": "surrogate-character-reference", "line": 1, "col": 41 }
|
||||
]},
|
||||
|
||||
{"description":"Uppercase start tag name",
|
||||
"input":"<X>",
|
||||
"output":[["StartTag", "x", {}]]},
|
||||
|
||||
{"description":"Uppercase end tag name",
|
||||
"input":"</X>",
|
||||
"output":[["EndTag", "x"]]},
|
||||
|
||||
{"description":"Uppercase attribute name",
|
||||
"input":"<x X>",
|
||||
"output":[["StartTag", "x", { "x":"" }]]},
|
||||
|
||||
{"description":"Tag/attribute name case edge values",
|
||||
"input":"<x@AZ[`az{ @AZ[`az{>",
|
||||
"output":[["StartTag", "x@az[`az{", { "@az[`az{":"" }]]},
|
||||
|
||||
{"description":"Duplicate different-case attributes",
|
||||
"input":"<x x=1 x=2 X=3>",
|
||||
"output":[["StartTag", "x", { "x":"1" }]],
|
||||
"errors":[
|
||||
{ "code": "duplicate-attribute", "line": 1, "col": 9 },
|
||||
{ "code": "duplicate-attribute", "line": 1, "col": 13 }
|
||||
]},
|
||||
|
||||
{"description":"Uppercase close tag attributes",
|
||||
"input":"</x X>",
|
||||
"output":[["EndTag", "x"]],
|
||||
"errors":[
|
||||
{ "code": "end-tag-with-attributes", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Duplicate close tag attributes",
|
||||
"input":"</x x x>",
|
||||
"output":[["EndTag", "x"]],
|
||||
"errors":[
|
||||
{ "code": "duplicate-attribute", "line": 1, "col": 8 },
|
||||
{ "code": "end-tag-with-attributes", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description":"Permitted slash",
|
||||
"input":"<br/>",
|
||||
"output":[["StartTag","br",{},true]]},
|
||||
|
||||
{"description":"Non-permitted slash",
|
||||
"input":"<xr/>",
|
||||
"output":[["StartTag","xr",{},true]]},
|
||||
|
||||
{"description":"Permitted slash but in close tag",
|
||||
"input":"</br/>",
|
||||
"output":[["EndTag", "br"]],
|
||||
"errors":[
|
||||
{ "code": "end-tag-with-trailing-solidus", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"Doctype public case-sensitivity (1)",
|
||||
"input":"<!DoCtYpE HtMl PuBlIc \"AbC\" \"XyZ\">",
|
||||
"output":[["DOCTYPE", "html", "AbC", "XyZ", true]]},
|
||||
|
||||
{"description":"Doctype public case-sensitivity (2)",
|
||||
"input":"<!dOcTyPe hTmL pUbLiC \"aBc\" \"xYz\">",
|
||||
"output":[["DOCTYPE", "html", "aBc", "xYz", true]]},
|
||||
|
||||
{"description":"Doctype system case-sensitivity (1)",
|
||||
"input":"<!DoCtYpE HtMl SyStEm \"XyZ\">",
|
||||
"output":[["DOCTYPE", "html", null, "XyZ", true]]},
|
||||
|
||||
{"description":"Doctype system case-sensitivity (2)",
|
||||
"input":"<!dOcTyPe hTmL sYsTeM \"xYz\">",
|
||||
"output":[["DOCTYPE", "html", null, "xYz", true]]},
|
||||
|
||||
{"description":"U+0000 in lookahead region after non-matching character",
|
||||
"input":"<!doc>\u0000",
|
||||
"output":[["Comment", "doc"], ["Character", "\u0000"]],
|
||||
"errors":[
|
||||
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description":"U+0000 in lookahead region",
|
||||
"input":"<!doc\u0000",
|
||||
"output":[["Comment", "doc\uFFFD"]],
|
||||
"errors":[
|
||||
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
|
||||
{ "code": "unexpected-null-character", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"U+0080 in lookahead region",
|
||||
"input":"<!doc\u0080",
|
||||
"output":[["Comment", "doc\u0080"]],
|
||||
"errors":[
|
||||
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
|
||||
{ "code": "control-character-in-input-stream", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"U+FDD1 in lookahead region",
|
||||
"input":"<!doc\uFDD1",
|
||||
"output":[["Comment", "doc\uFDD1"]],
|
||||
"errors":[
|
||||
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
|
||||
{ "code": "noncharacter-in-input-stream", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"U+1FFFF in lookahead region",
|
||||
"input":"<!doc\uD83F\uDFFF",
|
||||
"output":[["Comment", "doc\uD83F\uDFFF"]],
|
||||
"errors":[
|
||||
{ "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
|
||||
{ "code": "noncharacter-in-input-stream", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"CR followed by non-LF",
|
||||
"input":"\r?",
|
||||
"output":[["Character", "\n?"]]},
|
||||
|
||||
{"description":"CR at EOF",
|
||||
"input":"\r",
|
||||
"output":[["Character", "\n"]]},
|
||||
|
||||
{"description":"LF at EOF",
|
||||
"input":"\n",
|
||||
"output":[["Character", "\n"]]},
|
||||
|
||||
{"description":"CR LF",
|
||||
"input":"\r\n",
|
||||
"output":[["Character", "\n"]]},
|
||||
|
||||
{"description":"CR CR",
|
||||
"input":"\r\r",
|
||||
"output":[["Character", "\n\n"]]},
|
||||
|
||||
{"description":"LF LF",
|
||||
"input":"\n\n",
|
||||
"output":[["Character", "\n\n"]]},
|
||||
|
||||
{"description":"LF CR",
|
||||
"input":"\n\r",
|
||||
"output":[["Character", "\n\n"]]},
|
||||
|
||||
{"description":"text CR CR CR text",
|
||||
"input":"text\r\r\rtext",
|
||||
"output":[["Character", "text\n\n\ntext"]]},
|
||||
|
||||
{"description":"Doctype publik",
|
||||
"input":"<!DOCTYPE html PUBLIK \"AbC\" \"XyZ\">",
|
||||
"output":[["DOCTYPE", "html", null, null, false]],
|
||||
"errors":[
|
||||
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
|
||||
]},
|
||||
|
||||
{"description":"Doctype publi",
|
||||
"input":"<!DOCTYPE html PUBLI",
|
||||
"output":[["DOCTYPE", "html", null, null, false]],
|
||||
"errors":[
|
||||
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
|
||||
]},
|
||||
|
||||
{"description":"Doctype sistem",
|
||||
"input":"<!DOCTYPE html SISTEM \"AbC\">",
|
||||
"output":[["DOCTYPE", "html", null, null, false]],
|
||||
"errors":[
|
||||
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
|
||||
]},
|
||||
|
||||
{"description":"Doctype sys",
|
||||
"input":"<!DOCTYPE html SYS",
|
||||
"output":[["DOCTYPE", "html", null, null, false]],
|
||||
"errors":[
|
||||
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
|
||||
]},
|
||||
|
||||
{"description":"Doctype html x>text",
|
||||
"input":"<!DOCTYPE html x>text",
|
||||
"output":[["DOCTYPE", "html", null, null, false], ["Character", "text"]],
|
||||
"errors":[
|
||||
{ "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 16 }
|
||||
]},
|
||||
|
||||
{"description":"Grave accent in unquoted attribute",
|
||||
"input":"<a a=aa`>",
|
||||
"output":[["StartTag", "a", {"a":"aa`"}]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in tag name state ",
|
||||
"input":"<a",
|
||||
"output":[],
|
||||
"errors": [
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 3 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in before attribute name state",
|
||||
"input":"<a ",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 4 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in attribute name state",
|
||||
"input":"<a a",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 5 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in after attribute name state",
|
||||
"input":"<a a ",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 6 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in before attribute value state",
|
||||
"input":"<a a =",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 7 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in attribute value (double quoted) state",
|
||||
"input":"<a a =\"a",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 9 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in attribute value (single quoted) state",
|
||||
"input":"<a a ='a",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 9 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in attribute value (unquoted) state",
|
||||
"input":"<a a =a",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 8 }
|
||||
]},
|
||||
|
||||
{"description":"EOF in after attribute value state",
|
||||
"input":"<a a ='a'",
|
||||
"output":[],
|
||||
"errors":[
|
||||
{ "code": "eof-in-tag", "line": 1, "col": 10 }
|
||||
]}
|
||||
|
||||
]}
|
1577
lib/html5lib/tests/testdata/tokenizer/unicodeChars.test
vendored
Normal file
1577
lib/html5lib/tests/testdata/tokenizer/unicodeChars.test
vendored
Normal file
File diff suppressed because it is too large
Load diff
41
lib/html5lib/tests/testdata/tokenizer/unicodeCharsProblematic.test
vendored
Normal file
41
lib/html5lib/tests/testdata/tokenizer/unicodeCharsProblematic.test
vendored
Normal file
|
@ -0,0 +1,41 @@
|
|||
{"tests" : [
|
||||
{"description": "Invalid Unicode character U+DFFF",
|
||||
"doubleEscaped":true,
|
||||
"input": "\\uDFFF",
|
||||
"output":[["Character", "\\uDFFF"]],
|
||||
"errors":[
|
||||
{ "code": "surrogate-in-input-stream", "line": 1, "col": 1 }
|
||||
]},
|
||||
|
||||
{"description": "Invalid Unicode character U+D800",
|
||||
"doubleEscaped":true,
|
||||
"input": "\\uD800",
|
||||
"output":[["Character", "\\uD800"]],
|
||||
"errors":[
|
||||
{ "code": "surrogate-in-input-stream", "line": 1, "col": 1 }
|
||||
]},
|
||||
|
||||
{"description": "Invalid Unicode character U+DFFF with valid preceding character",
|
||||
"doubleEscaped":true,
|
||||
"input": "a\\uDFFF",
|
||||
"output":[["Character", "a\\uDFFF"]],
|
||||
"errors":[
|
||||
{ "code": "surrogate-in-input-stream", "line": 1, "col": 2 }
|
||||
]},
|
||||
|
||||
{"description": "Invalid Unicode character U+D800 with valid following character",
|
||||
"doubleEscaped":true,
|
||||
"input": "\\uD800a",
|
||||
"output":[["Character", "\\uD800a"]],
|
||||
"errors":[
|
||||
{ "code": "surrogate-in-input-stream", "line": 1, "col": 1 }
|
||||
]},
|
||||
|
||||
{"description":"CR followed by U+0000",
|
||||
"input":"\r\u0000",
|
||||
"output":[["Character", "\n\u0000"]],
|
||||
"errors":[
|
||||
{ "code": "unexpected-null-character", "line": 2, "col": 1 }
|
||||
]}
|
||||
]
|
||||
}
|
20
lib/html5lib/tests/testdata/tokenizer/xmlViolation.test
vendored
Normal file
20
lib/html5lib/tests/testdata/tokenizer/xmlViolation.test
vendored
Normal file
|
@ -0,0 +1,20 @@
|
|||
{"xmlViolationTests": [
|
||||
|
||||
{"description":"Non-XML character",
|
||||
"input":"a\uFFFFb",
|
||||
"output":[["Character","a\uFFFDb"]]},
|
||||
|
||||
{"description":"Non-XML space",
|
||||
"input":"a\u000Cb",
|
||||
"output":[["Character","a b"]]},
|
||||
|
||||
{"description":"Double hyphen in comment",
|
||||
"input":"<!-- foo -- bar -->",
|
||||
"output":[["Comment"," foo - - bar "]]},
|
||||
|
||||
{"description":"FF between attributes",
|
||||
"input":"<a b=''\u000Cc=''>",
|
||||
"output":[["StartTag","a",{"b":"","c":""}]]}
|
||||
]}
|
||||
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue