notes

2014-09-13 10:45:45 +03:00 · 2014-09-13 10:45:45 +03:00 · 3bf4fdbe52
commit 3bf4fdbe52
parent ba331af24b
3 changed files with 44 additions and 38 deletions
--- a/Language/SQL/SimpleSQL/Parser.lhs
+++ b/Language/SQL/SimpleSQL/Parser.lhs
@ -1200,6 +1200,10 @@ tref
 tref
 [on expr | using (...)]

+TODO: either use explicit 'operator precedence' parsers or build
+expression parser for the 'tref operators' such as joins, lateral,
+aliases.
+
 > from :: Parser [TableRef]
 > from = keyword_ "from" *> commaSep1 tref
 >   where
@ -2003,3 +2007,12 @@ different parsers can be used for different dialects
 > guardDialect ds = do
 >     d <- getState
 >     guard (d `elem` ds)
+
+TODO: the ParseState and the Dialect argument should be turned into a
+flags struct. Part (or all?) of this struct is the dialect
+information, but each dialect has different versions + a big set of
+flags to control syntax variations within a version of a product
+dialect (for instance, string and identifier parsing rules vary from
+dialect to dialect and version to version, and most or all SQL DBMSs
+appear to have a set of flags to further enable or disable variations
+for quoting and escaping strings and identifiers).
--- a/64
+++ b/64
@ -1,40 +1,22 @@
 work on reasonable subset of sql which is similar to the current
-   subset and smaller than the complete 2011 target
-prototype for dialect handling, todo:
-  add test which test for failure
-  test that mysql specific syntax fails on ansi mode
-    and that the ansi equivalents of the mysql specific syntax which
-    has been implemented fail in mysql mode
-position annotation
+   subset and smaller than the complete 2011 target: describe the
+   exact target set for the next release

-simple stuff for error message and pretty printing monitoring
+improve the dialect testing: add notes on what to do

-work on the new refactoring of the parser
-create a new module for generic combinators
-work on getting rid of monad and guard
+position annotation in the syntax

+simple stuff for error message and pretty printing monitoring:

-value expressions which start with an identifier/keyword:
-immediate focus:
-case
-cast
-
-interval
-typed literal
-
-special functions (extract, etc)
-app
-aggregate
-window function
-identifier
-
-continue 2011 review and tests
-
-1. create an error message document for the website
-   - base off ErrorMessages but add some more variations
-2. start thinking about automated tests for invalid syntax to catch
-   bad parsing
+create a sample set of valid statements to pretty print
+pretty print these
+compare every so often to catch regressions and approve improvements
+start with tpch, and then add some others

+same with invalid statements to see the error messages
+start with some simple value exprs and a big query expr which has
+   stuff (either tokens, whitespace or junk strings)
+   semi-systematically added and/or removed

 fixing the non idiomatic (pun!) suffix parsing:
  typename parsing
@ -45,9 +27,13 @@ fixing the non idiomatic (pun!) suffix parsing:
 review names in the syntax for correspondence with sql standard, avoid
   gratuitous differences

-touch up the expr hack as best as can
+touch up the expr hack as best as can, start thinking about
+   replacement for buildExprParser, maybe this can be a separate
+   general package, or maybe something like this already exists

-careful review of token parses wrt trailing delimiters/junk
+careful review of token parses wrt trailing delimiters/junk - already
+   caught a few issues like this incidentally when working on other
+   stuff

 undo mess in the code created by adding lots of new support:
 much more documentation
@ -60,8 +46,6 @@ fix the lexing
 add documentation in Parser.lhs on the left factoring/error handling
   approach

-create error message demonstration page for the website
-
 fixes:

 keyword tree, add explicit result then can use for joins also
@ -69,9 +53,6 @@ keyword tree, add explicit result then can use for joins also
 keyword tree support prefix mode so can start from already parsed
   token

-do the final big left factor: typenames, interval lits, iden +
-   suffixes
-
 left factor/try removal summary (this list needs updating):

 identifier starts:
@ -163,6 +144,13 @@ create list of unsupported syntax: xml, ref, subtypes, modules?

 after next release

+medium term goals:
+1. replace parser and syntax in hssqlppp with this code (keep two
+   separate packages in sync)
+2. this replacement should have better error messages, much more
+   complete ansi sql 2011 support, and probably will have reasonable
+   support for these dialects: mssql, oracle and teradata.
+
 review areas where this parser is too permissive, e.g. value
   expressions allowed where column reference names only should be
   allowed, such as group by, order by (perhaps there can be a flag or
--- a/tools/Language/SQL/SimpleSQL/TestTypes.lhs
+++ b/tools/Language/SQL/SimpleSQL/TestTypes.lhs
@ -8,6 +8,11 @@ Tests.lhs module for the 'interpreter'.

 > import Language.SQL.SimpleSQL.Syntax

+TODO: maybe make the dialect args into [dialect], then each test
+checks all the dialects mentioned work, and all the dialects not
+mentioned give a parse error. Not sure if this will be too awkward due
+to lots of tricky exceptions/variationsx.
+
 > data TestItem = Group String [TestItem]
 >               | TestValueExpr Dialect String ValueExpr
 >               | TestQueryExpr Dialect String QueryExpr