1
Fork 0

add some big improvements to parse error messages

change the parser to not attempt to parse the elements following
'from' unless there is a actual 'from'
improve the symbol parser to try to deal with issues when symbols are
  next to eachother with no intervening whitespaces
improve number literal parsing to fail if there are trailing letters
  or digits which aren't part of the number and aren't separated with
  whitespace
add some code to start analysing the quality of parse error messages
This commit is contained in:
Jake Wheat 2014-04-17 18:32:41 +03:00
parent c48b057457
commit 488310ff6a
7 changed files with 298 additions and 41 deletions

View file

@ -0,0 +1,149 @@
Want to work on the error messages. Ultimately, parsec won't give the
best error message for a parser combinator library in haskell. Should
check out the alternatives such as polyparse and uu-parsing.
For now the plan is to try to get the best out of parsec. Skip heavy
work on this until the parser is more left factored?
Ideas:
1. generate large lists of invalid syntax
2. create table of the sql source and the error message
3. save these tables and compare from version to version. Want to
catch improvements and regressions and investigate. Have to do this
manually
= generating bad sql source
take good sql statements or expressions. Convert them into sequences
of tokens - want to preserve the whitespace and comments perfectly
here. Then modify these lists by either adding a token, removing a
token, or modifying a token (including creating bad tokens of raw
strings which don't represent anything than can be tokenized.
Now can see the error message for all of these bad strings. Probably
have to generate and prune this list manually in stages since there
will be too many.
Contexts:
another area to focus on is contexts: for instance, we have a set of
e.g. 1000 bad scalar expressions with error messages. Now can put
those bad scalar expressions into various contexts and see that the
error messages are still good.
plan:
1. create a list of all the value expression, with some variations for
each
2. manually create some error variations for each expression
3. create a renderer which will create a csv of the expressions and
the errors
this is to load as a spreadsheet to investigate more
4. create a renderer for the csv which will create a markdown file for
the website. this is to demonstrate the error messages in the
documentation
Then create some contexts for all of these: inside another value
expression, or inside a query expression. Do the same: render and
review the error messages.
Then, create some query expressions to focus on the non value
expression parts.
> module Language.SQL.SimpleSQL.ErrorMessages where
> import Language.SQL.SimpleSQL.Parser
> import Data.List
> import Text.Groom
> valueExpressions :: [String]
> valueExpressions =
> ["10.."
> ,"..10"
> ,"10e1e2"
> ,"10e--3"
> ,"1a"
> ,"1%"
> ,"'b'ad'"
> ,"'bad"
> ,"bad'"
> ,"interval '5' ays"
> ,"interval '5' days (4.4)"
> ,"interval '5' days (a)"
> ,"intervala '5' days"
> ,"interval 'x' days (3"
> ,"interval 'x' days 3)"
> ,"1badiden"
> ,"$"
> ,"!"
> ,"*.a"
> ,"??"
> ,"3?"
> ,"?a"
> ,"row"
> ,"row 1,2"
> ,"row(1,2"
> ,"row 1,2)"
> ,"row(1 2)"
> ,"f("
> ,"f)"
> ,"f(a"
> ,"f a)"
> ,"f(a b)"
TODO:
case
operators
> ,"a + (b + c"
casts
subqueries: + whole set of parentheses use
in list
'keyword' functions
aggregates
window functions
> ]
> queryExpressions :: [String]
> queryExpressions =
> map sl1 valueExpressions
> ++ map sl2 valueExpressions
> ++ map sl3 valueExpressions
> ++
> ["select a from t inner jin u"]
> where
> sl1 x = "select " ++ x ++ " from t"
> sl2 x = "select " ++ x ++ ", y from t"
> sl3 x = "select " ++ x ++ " fom t"
> valExprs :: [String] -> [(String,String)]
> valExprs = map parseOne
> where
> parseOne x = let p = parseValueExpr "" Nothing x
> in (x,either peFormattedError (\x -> "ERROR: parsed ok " ++ groom x) p)
> queryExprs :: [String] -> [(String,String)]
> queryExprs = map parseOne
> where
> parseOne x = let p = parseQueryExpr "" Nothing x
> in (x,either peFormattedError (\x -> "ERROR: parsed ok " ++ groom x) p)
> pExprs :: [String] -> [String] -> String
> pExprs x y =
> let l = valExprs x ++ queryExprs y
> in intercalate "\n\n\n\n" $ map (\(a,b) -> a ++ "\n" ++ b) l

View file

@ -264,6 +264,8 @@ select page reference
> ,"SELECT 2+2;"
> ,"SELECT distributors.* WHERE distributors.name = 'Westward';"
> -- simple-sql-parser doesn't support where without from
> -- this can be added for the postgres dialect when it is written
> --,"SELECT distributors.* WHERE distributors.name = 'Westward';"
> ]

View file

@ -8,18 +8,46 @@ We are only interested in the query syntax, goes through sections 5-10
The goal is to create some coverage tests to get close to supporting a
large amount of the SQL.
> module Language.SQL.SimpleSQL.SQL2003 where
> module Language.SQL.SimpleSQL.SQL2003 (sql2003Tests) where
> import Language.SQL.SimpleSQL.TestTypes
> import Language.SQL.SimpleSQL.Syntax
> sql2003Tests :: TestItem
> sql2003Tests = Group "sql2003Tests"
> [stringLiterals
> ,nationalCharacterStringLiterals
> ,unicodeStringLiterals
> ,binaryStringLiterals
> ,numericLiterals
> ,dateAndTimeLiterals
> ,booleanLiterals
> ,identifiers
> ,typeNames
> ,parenthesizedValueExpression
> ,targetSpecification
> ,contextuallyTypeValueSpec
> ,nextValueExpression
> ,arrayElementReference
> ,multisetElementReference
> ,numericValueExpression
> ,booleanValueExpression
> ,arrayValueConstructor
> ,tableValueConstructor
> ,fromClause
> ,whereClause
> ,groupbyClause
> ,querySpecification
> ,queryExpressions
> ,sortSpecificationList
> ]
= 5 Lexical Elements
Basic definitions of characters used, tokens, symbols, etc. Most of this section would normally be handled within the lexical analyzer rather than in the grammar proper. Further, the original document does not quote the various single characters, which makes it hard to process automatically.
[There seems to be a lot of unused stuff here, so skip this section and only do bits which
[There seems to be a lot of unused stuff here, so skip this section
and only do bits which are needed by other bits]
5.1 <SQL terminal character> (p151)
@ -488,8 +516,8 @@ standards to include everything that was dropped also?
TODO: how to escapes work here?
> bitBinaryStringLiterals :: TestItem
> bitBinaryStringLiterals = Group "bit and hex string literals" $ map (uncurry TestValueExpr)
> binaryStringLiterals :: TestItem
> binaryStringLiterals = Group "bit and hex string literals" $ map (uncurry TestValueExpr)
> [("B'101010'", undefined)
> ,("X'7f7f7f'", undefined)
> ]
@ -1031,11 +1059,12 @@ TODO: review how the special keywords are parsed and add tests for these
> targetSpecification :: TestItem
> targetSpecification = Group "target specification" $ map (uncurry TestValueExpr)
> [(":hostparam", undefined)
> ,(":hostparam indicator :another_host_param", undefined)
> ,("?", undefined)
> ,(":h[3]", undefined)
> ]
TODO: modules stuff, indicators, not sure what current_collation is
TODO: modules stuff, not sure what current_collation is
for or how it works
@ -1849,7 +1878,7 @@ Specify a set of <row value expression>s to be constructed into a table.
<contextually typed row value expression list> ::= <contextually typed row value expression> [ { <comma> <contextually typed row value expression> }... ]
> tableValueConstructor :: TestItem
> tableValueConstructor = Group "table value constructor" $ map (uncurry TestValueExpr)
> tableValueConstructor = Group "table value constructor" $ map (uncurry TestQueryExpr)
> [("values (1,2), (a+b,(select count(*) from t));", undefined)
> ]
@ -1869,7 +1898,7 @@ Specify a table or a grouped table.
TODO: expand on these tests and review uncovered grammar
> fromClause :: TestItem
> fromClause = Group "from clause" $ map (uncurry TestValueExpr)
> fromClause = Group "from clause" $ map (uncurry TestQueryExpr)
> [("select * from t,u", undefined)
> ,("select * from t as a", undefined)
@ -1990,7 +2019,7 @@ Specify a table derived by the application of a <search condition> to the result
<where clause> ::= WHERE <search condition>
> whereClause :: TestItem
> whereClause = Group "where clause" $ map (uncurry TestValueExpr)
> whereClause = Group "where clause" $ map (uncurry TestQueryExpr)
> [("select * from t where a = 5", undefined)]
@ -2042,7 +2071,7 @@ It seems even in sql 2003, you can only put column references in the
groups, and not general value expressions.
> groupbyClause :: TestItem
> groupbyClause = Group "group by clause" $ map (uncurry TestValueExpr)
> groupbyClause = Group "group by clause" $ map (uncurry TestQueryExpr)
> [("select a, sum(b) from t group by a", undefined)
> ,("select a, c,sum(b) from t group by a,c", undefined)
> ,("select a, c,sum(b) from t group by a,c collate x", undefined)
@ -2146,7 +2175,7 @@ TODO: review this and add more variants
> querySpecification :: TestItem
> querySpecification = Group "query specification" $ map (uncurry TestValueExpr)
> querySpecification = Group "query specification" $ map (uncurry TestQueryExpr)
> [("select a from t", undefined)
> ,("select all a from t", undefined)
> ,("select distinct a from t", undefined)
@ -2206,7 +2235,7 @@ TODO: common table expressions
<corresponding column list> ::= <column name list>
> queryExpressions :: TestItem
> queryExpressions = Group "query expressions" $ map (uncurry TestValueExpr)
> queryExpressions = Group "query expressions" $ map (uncurry TestQueryExpr)
> [("select a from t union select a from u", undefined)
> ,("select a from t union all select a from u", undefined)
@ -2784,7 +2813,7 @@ Specify a sort order.
TODO: review sort specifications
> sortSpecificationList :: TestItem
> sortSpecificationList = Group "sort specification list" $ map (uncurry TestValueExpr)
> sortSpecificationList = Group "sort specification list" $ map (uncurry TestQueryExpr)
> [("select * from t order by a", undefined)
> ,("select * from t order by a,b", undefined)
> ,("select * from t order by a asc,b", undefined)
@ -2794,3 +2823,7 @@ TODO: review sort specifications
> ]
TODO: what happened to the collation in order by?
Answer: sort used to be a column reference with an optional
collate. Since it is now a value expression, the collate doesn't need
to be mentioned here.

View file

@ -28,7 +28,7 @@ test data to the Test.Framework tests.
> import Language.SQL.SimpleSQL.ValueExprs
> import Language.SQL.SimpleSQL.Tpch
> import Language.SQL.SimpleSQL.SQL2003
Order the tests to start from the simplest first. This is also the
order on the generated documentation.
@ -44,6 +44,7 @@ order on the generated documentation.
> ,fullQueriesTests
> ,postgresTests
> ,tpchTests
> ,sql2003Tests
> ]
> tests :: Test.Framework.Test