1
Fork 0

add some big improvements to parse error messages

change the parser to not attempt to parse the elements following
'from' unless there is a actual 'from'
improve the symbol parser to try to deal with issues when symbols are
  next to eachother with no intervening whitespaces
improve number literal parsing to fail if there are trailing letters
  or digits which aren't part of the number and aren't separated with
  whitespace
add some code to start analysing the quality of parse error messages
This commit is contained in:
Jake Wheat 2014-04-17 18:32:41 +03:00
parent c48b057457
commit 488310ff6a
7 changed files with 298 additions and 41 deletions

33
TODO
View file

@ -1,4 +1,5 @@
continue 2003 review and tests
docs: how to run the tests
touch up the expr hack as best as can
left factor as much as possible (see below on notes)
table expression in syntax:
@ -14,11 +15,41 @@ look at fixing the expression parsing completely
represent natural and using/on in the syntax more close to the
concrete syntax - don't combine in the ast
review the token parsers, and make sure they have trailing delimiters
or consume bad trailing characters and fail (e.g. 1e2e3 in a select
list parses as '1e2 e3' i.e. '1e2 as e3'
split the general symbol and operator parsing, and make it tighter
in terms of when the symbol or operator ends (don't allow to end
early)
approach: review the lexical syntax, create complete list of
tokens/token generators. Divide into tokens which must be followed
by some particular other token or at least one whitespace, and ones
which can be immediately followed by another token. Then fix the
lexing parsers to work this way
whitespace/comments
integers
numbers
string literals
keywords
operator symbols <>=+=^%/*!|~&
non operator symbols ()?,;"'
identifiers
quoted identifiers
identifiers and keywords are ok for now
there are issues with integers, numbers, operators and non operator
symbols
review places in the parse which should allow only a fixed set of
identifiers (e.g. in interval literals)
decide whether to represent numeric literals better, instead of a
single string - break up into parts, or parse to a Decimal or
something
rough SQL 2003 todo, including tests to write:
can multipart identifiers have whitespace around the '.'?
multipart string literals
national, unicode, hex, bit string literals, escapes
string literal character sets
@ -92,7 +123,7 @@ additional stuff review:
interval stuff
collate clause
aggregate functions: lots of missing bits
complete list of keywords/reserved keywords
review areas where this parser is too permissive, e.g. value