1
Fork 0

work around for haddock refusal to parse literal comment lines with *

in first character position
get rid of code_units since these are not in sql2011
implement next value for
parse the nullary functions with reserved names
updates to the sql2003 file
This commit is contained in:
Jake Wheat 2014-04-19 21:17:19 +03:00
parent 7057241974
commit 7a7f4ba7aa
7 changed files with 167 additions and 66 deletions

View file

@ -38,9 +38,9 @@ lexers, this isn't 100% complete at the moment and needs fixing.
The parsing code is aggressively left factored, and try is avoided as The parsing code is aggressively left factored, and try is avoided as
much as possible. Try is avoided because: much as possible. Try is avoided because:
* when it is overused it makes the code hard to follow * when it is overused it makes the code hard to follow
* when it is overused it makes the parsing code harder to debug * when it is overused it makes the parsing code harder to debug
* it makes the parser error messages much worse * it makes the parser error messages much worse
The code could be made a bit simpler with a few extra 'trys', but this The code could be made a bit simpler with a few extra 'trys', but this
isn't done because of the impact on the parser error isn't done because of the impact on the parser error
@ -74,9 +74,9 @@ syntax.
There are three big areas which are tricky to left factor: There are three big areas which are tricky to left factor:
* typenames * typenames
* value expressions which can start with an identifier * value expressions which can start with an identifier
* infix and suffix operators * infix and suffix operators
=== typenames === typenames
@ -97,12 +97,12 @@ error messages really bad.
Here is a list of these nodes: Here is a list of these nodes:
* identifiers * identifiers
* function application * function application
* aggregate application * aggregate application
* window application * window application
* typed literal: typename 'literal string' * typed literal: typename 'literal string'
* interval literal which is like the typed literal with some extras * interval literal which is like the typed literal with some extras
There is further ambiguity e.g. with typed literals with precision, There is further ambiguity e.g. with typed literals with precision,
functions, aggregates, etc. - these are an identifier, followed by functions, aggregates, etc. - these are an identifier, followed by
@ -113,12 +113,12 @@ is.
There is also a set of nodes which start with an identifier/keyword There is also a set of nodes which start with an identifier/keyword
but can commit since no other syntax can start the same way: but can commit since no other syntax can start the same way:
* case * case
* cast * cast
* exists, unique subquery * exists, unique subquery
* array constructor * array constructor
* multiset constructor * multiset constructor
* all the special syntax functions: extract, position, substring, * all the special syntax functions: extract, position, substring,
convert, translate, overlay, trim, etc. convert, translate, overlay, trim, etc.
The interval literal mentioned above is treated in this group at the The interval literal mentioned above is treated in this group at the
@ -143,10 +143,10 @@ standard which is able to eliminate a number of possibilities just in
the grammar, which this parser allows. This is done for a number of the grammar, which this parser allows. This is done for a number of
reasons: reasons:
* it makes the parser simple - less variations * it makes the parser simple - less variations
* it should allow for dialects and extensibility more easily in the * it should allow for dialects and extensibility more easily in the
future (e.g. new infix binary operators with custom precedence) future (e.g. new infix binary operators with custom precedence)
* many things which are effectively checked in the grammar in the * many things which are effectively checked in the grammar in the
standard, can be checked using a typechecker or other simple static standard, can be checked using a typechecker or other simple static
analysis analysis
@ -481,7 +481,6 @@ TODO: this code needs heavy refactoring
> ,return Nothing] > ,return Nothing]
> return (p,x) > return (p,x)
> lobUnits = choice [LobCharacters <$ keyword_ "characters" > lobUnits = choice [LobCharacters <$ keyword_ "characters"
> ,LobCodeUnits <$ keyword_ "code_units"
> ,LobOctets <$ keyword_ "octets"] > ,LobOctets <$ keyword_ "octets"]
> -- deal with multiset and array suffixes > -- deal with multiset and array suffixes
> tnSuffix x = > tnSuffix x =
@ -657,6 +656,10 @@ multiset(query expr). It must be there for compatibility or something.
> ,keyword_ "table" >> > ,keyword_ "table" >>
> MultisetQueryCtor <$> parens queryExpr] > MultisetQueryCtor <$> parens queryExpr]
> nextValueFor :: Parser ValueExpr
> nextValueFor = keywords_ ["next","value","for"] >>
> NextValueFor <$> names
=== interval === interval
interval literals are a special case and we follow the grammar less interval literals are a special case and we follow the grammar less
@ -1161,6 +1164,7 @@ fragile and could at least do with some heavy explanation.
> ,cast > ,cast
> ,arrayCtor > ,arrayCtor
> ,multisetCtor > ,multisetCtor
> ,nextValueFor
> ,subquery > ,subquery
> ,intervalLit > ,intervalLit
> ,specialOpKs > ,specialOpKs
@ -1740,7 +1744,7 @@ means).
> ,"class_origin" > ,"class_origin"
> ,"coalesce" > ,"coalesce"
> ,"cobol" > ,"cobol"
> ,"code_units" > --,"code_units"
> ,"collation" > ,"collation"
> ,"collation_catalog" > ,"collation_catalog"
> ,"collation_name" > ,"collation_name"
@ -2002,13 +2006,13 @@ means).
> ,"cube" > ,"cube"
> ,"current" > ,"current"
> --,"current_date" > --,"current_date"
> ,"current_default_transform_group" > --,"current_default_transform_group"
> ,"current_path" > --,"current_path"
> ,"current_role" > --,"current_role"
> ,"current_time" > ,"current_time"
> ,"current_timestamp" > ,"current_timestamp"
> ,"current_transform_group_for_type" > ,"current_transform_group_for_type"
> ,"current_user" > --,"current_user"
> ,"cursor" > ,"cursor"
> ,"cycle" > ,"cycle"
> ,"date" > ,"date"
@ -2052,7 +2056,7 @@ means).
> ,"global" > ,"global"
> ,"grant" > ,"grant"
> ,"group" > ,"group"
> ,"grouping" > --,"grouping"
> ,"having" > ,"having"
> ,"hold" > ,"hold"
> --,"hour" > --,"hour"
@ -2088,7 +2092,7 @@ means).
> ,"method" > ,"method"
> --,"minute" > --,"minute"
> ,"modifies" > ,"modifies"
> ,"module" > --,"module"
> --,"month" > --,"month"
> ,"multiset" > ,"multiset"
> ,"national" > ,"national"
@ -2151,7 +2155,7 @@ means).
> --,"second" > --,"second"
> ,"select" > ,"select"
> ,"sensitive" > ,"sensitive"
> ,"session_user" > --,"session_user"
> --,"set" > --,"set"
> ,"similar" > ,"similar"
> ,"smallint" > ,"smallint"
@ -2167,7 +2171,7 @@ means).
> ,"submultiset" > ,"submultiset"
> ,"symmetric" > ,"symmetric"
> ,"system" > ,"system"
> ,"system_user" > --,"system_user"
> ,"table" > ,"table"
> ,"then" > ,"then"
> ,"time" > ,"time"
@ -2187,7 +2191,7 @@ means).
> ,"unnest" > ,"unnest"
> ,"update" > ,"update"
> ,"upper" > ,"upper"
> ,"user" > --,"user"
> ,"using" > ,"using"
> --,"value" > --,"value"
> ,"values" > ,"values"

View file

@ -218,6 +218,9 @@ which have been changed to try to improve the layout of the output.
> valueExpr (Collate v c) = > valueExpr (Collate v c) =
> valueExpr v <+> text "collate" <+> names c > valueExpr v <+> text "collate" <+> names c
> valueExpr (NextValueFor ns) =
> text "next value for" <+> names ns
> doubleUpQuotes :: String -> String > doubleUpQuotes :: String -> String
> doubleUpQuotes [] = [] > doubleUpQuotes [] = []
@ -263,7 +266,6 @@ which have been changed to try to improve the layout of the output.
> LobG -> text "G") m > LobG -> text "G") m
> <+> me (\x -> case x of > <+> me (\x -> case x of
> LobCharacters -> text "CHARACTERS" > LobCharacters -> text "CHARACTERS"
> LobCodeUnits -> text "CODE_UNITS"
> LobOctets -> text "OCTETS") u) > LobOctets -> text "OCTETS") u)
> typeName (CharTypeName t i cs col) = > typeName (CharTypeName t i cs col) =
> names t > names t

View file

@ -158,6 +158,7 @@
> | MultisetBinOp ValueExpr CombineOp SetQuantifier ValueExpr > | MultisetBinOp ValueExpr CombineOp SetQuantifier ValueExpr
> | MultisetCtor [ValueExpr] > | MultisetCtor [ValueExpr]
> | MultisetQueryCtor QueryExpr > | MultisetQueryCtor QueryExpr
> | NextValueFor [Name]
> deriving (Eq,Show,Read,Data,Typeable) > deriving (Eq,Show,Read,Data,Typeable)
> -- | Represents an identifier name, which can be quoted or unquoted. > -- | Represents an identifier name, which can be quoted or unquoted.
@ -190,7 +191,6 @@ TODO: add ref and scope, any others?
> data LobMultiplier = LobK | LobM | LobG > data LobMultiplier = LobK | LobM | LobG
> deriving (Eq,Show,Read,Data,Typeable) > deriving (Eq,Show,Read,Data,Typeable)
> data LobUnits = LobCharacters > data LobUnits = LobCharacters
> | LobCodeUnits
> | LobOctets > | LobOctets
> deriving (Eq,Show,Read,Data,Typeable) > deriving (Eq,Show,Read,Data,Typeable)

9
TODO
View file

@ -1,5 +1,11 @@
continue 2003 review and tests continue 2003 review and tests
1. start replacing the 2003 stuff with 2011
2. create an error message document for the website
- base of error messages but add some more variations
3. start thinking about tests for invalid syntax
touch up the expr hack as best as can touch up the expr hack as best as can
careful review of token parses wrt trailing delimiters/junk careful review of token parses wrt trailing delimiters/junk
@ -56,6 +62,8 @@ rules for changing the multi keyword parsing:
rough SQL 2003 todo, including tests to write: rough SQL 2003 todo, including tests to write:
switch to SQL 2011
now: now:
review the commented out reserved keyword entries and work out how to review the commented out reserved keyword entries and work out how to
fix fix
@ -64,7 +72,6 @@ go through almost all the predicates
window functions missing bits, window clauses window functions missing bits, window clauses
from: more tests, review missing from: more tests, review missing
tablesample, unnest, etc. tablesample, unnest, etc.
aggregates: where, filter + review
rows review rows review
match missing bit match missing bit
between symmetric between symmetric

View file

@ -1,4 +1,4 @@
0.4.0-dev (updated to 7a847045163feb2339ab40ebe93afe2f1c9ad813) 0.4.0-dev (updated to 705724197463cd19dd8749dfd51e2eb8f1d02b8e)
completely remove dependency on haskell-src-exts completely remove dependency on haskell-src-exts
improve the error messages a great deal improve the error messages a great deal
fix some trailing whitespace issues in the keyword style functions, fix some trailing whitespace issues in the keyword style functions,
@ -52,6 +52,11 @@
quote character in the identifier quote character in the identifier
implement complete interval literals (fixed the handling of the implement complete interval literals (fixed the handling of the
interval qualifier) interval qualifier)
make most of the standard reserved words actually reserved (still
some gaps)
change the natural in join abstract syntax to match the concrete
syntax instead of combining natural, on and using into one field
support filter and within group for aggregates
0.3.1 (commit 5cba9a1cac19d66166aed2876d809aef892ff59f) 0.3.1 (commit 5cba9a1cac19d66166aed2876d809aef892ff59f)
update to work with ghc 7.8.1 update to work with ghc 7.8.1
0.3.0 (commit 9e75fa93650b4f1a08d94f4225a243bcc50445ae) 0.3.0 (commit 9e75fa93650b4f1a08d94f4225a243bcc50445ae)

View file

@ -28,7 +28,7 @@ library
Language.SQL.SimpleSQL.Parser, Language.SQL.SimpleSQL.Parser,
Language.SQL.SimpleSQL.Syntax Language.SQL.SimpleSQL.Syntax
other-extensions: TupleSections other-extensions: TupleSections
build-depends: base >=4.6 && <4.7, build-depends: base >=4.6 && <4.8,
parsec >=3.1 && <3.2, parsec >=3.1 && <3.2,
mtl >=2.1 && <2.2, mtl >=2.1 && <2.2,
pretty >= 1.1 && < 1.2 pretty >= 1.1 && < 1.2
@ -40,7 +40,7 @@ Test-Suite Tests
type: exitcode-stdio-1.0 type: exitcode-stdio-1.0
main-is: RunTests.lhs main-is: RunTests.lhs
hs-source-dirs: .,tools hs-source-dirs: .,tools
Build-Depends: base >=4.6 && <4.7, Build-Depends: base >=4.6 && <4.8,
parsec >=3.1 && <3.2, parsec >=3.1 && <3.2,
mtl >=2.1 && <2.2, mtl >=2.1 && <2.2,
pretty >= 1.1 && < 1.2, pretty >= 1.1 && < 1.2,
@ -71,7 +71,7 @@ Test-Suite Tests
executable SQLIndent executable SQLIndent
main-is: SQLIndent.lhs main-is: SQLIndent.lhs
hs-source-dirs: .,tools hs-source-dirs: .,tools
Build-Depends: base >=4.6 && <4.7, Build-Depends: base >=4.6 && <4.8,
parsec >=3.1 && <3.2, parsec >=3.1 && <3.2,
mtl >=2.1 && <2.2, mtl >=2.1 && <2.2,
pretty >= 1.1 && < 1.2 pretty >= 1.1 && < 1.2

View file

@ -15,7 +15,8 @@ large amount of the SQL.
> sql2003Tests :: TestItem > sql2003Tests :: TestItem
> sql2003Tests = Group "sql2003Tests" > sql2003Tests = Group "sql2003Tests"
> [stringLiterals > [Group "literals" [
> stringLiterals
> ,nationalCharacterStringLiterals > ,nationalCharacterStringLiterals
> ,unicodeStringLiterals > ,unicodeStringLiterals
> ,binaryStringLiterals > ,binaryStringLiterals
@ -23,32 +24,72 @@ large amount of the SQL.
> ,intervalLiterals > ,intervalLiterals
> ,booleanLiterals > ,booleanLiterals
> ,identifiers > ,identifiers
> ,typeNameTests > ],Group "value expressions"
> [typeNameTests
> ,parenthesizedValueExpression > ,parenthesizedValueExpression
> ,someGeneralValues
> ,targetSpecification > ,targetSpecification
> ,contextuallyTypeValueSpec > ,contextuallyTypeValueSpec
> --,nextValueExpression > ,moduleColumnRef
> ,groupingOperation
> --,windowFunction
> --,caseExpression
> --,castSpecification
> ,nextValueExpression
> -- subtype treatment, method invoc, static m i, new spec, attrib/method ref, deref, method ref, ref res
> ,arrayElementReference > ,arrayElementReference
> ,multisetElementReference > ,multisetElementReference
> --,numericValueExpression > ,numericValueExpression
> --,numericValueFunction
> --,stringValueExpression
> --,stringValueFunction
> --,datetimeValueExpression
> --,datetimeValueFunction
> --,intervalValueExpression
> --,intervalValueFunction
> --,booleanValueExpression > --,booleanValueExpression
> --arrayValueExpression
> ,arrayValueConstructor > ,arrayValueConstructor
> ,multisetValueExpression > ,multisetValueExpression
> ,multisetValueFunction > ,multisetValueFunction
> ,multisetValueConstructor > ,multisetValueConstructor
> ],Group "query expressions"
> [
> -- rowValueConstructor
> --,rowValueExpression
> --,tableValueConstructor > --,tableValueConstructor
> --,fromClause > --,fromClause
> --,joinedTable
> --,whereClause > --,whereClause
> ,groupbyClause > groupbyClause
> --,havingClause
> --,windowClause
> --,querySpecification > --,querySpecification
> --,queryExpressions > --,querySpecifications
> ,quantifiedComparisonPredicate > --,setOperations
> --,withExpressions
> ],Group "predicates"
> [--comparisonPredicate
> --,betweenPredicate
> --,inPredicate
> --,likePredicate
> --,similarPredicae
> --,nullPredicate
> quantifiedComparisonPredicate
> --,existsPredicate
> ,uniquePredicate > ,uniquePredicate
> --,normalizedPredicate
> ,matchPredicate > ,matchPredicate
> --,overlapsPredicate
> --,distinctPredicate
> --,memberPredicate
> --,submultisetPredicate
> --,setPredicate
> ,collateClause > ,collateClause
> ,aggregateFunctions > ,aggregateFunctions
> ,sortSpecificationList > ,sortSpecificationList
> ] > ]
> ]
= 5 Lexical Elements = 5 Lexical Elements
@ -1001,8 +1042,6 @@ create a list of type name variations:
> ,("blob(3M)", LobTypeName [Name "blob"] 3 (Just LobM) Nothing) > ,("blob(3M)", LobTypeName [Name "blob"] 3 (Just LobM) Nothing)
> ,("blob(4M characters) " > ,("blob(4M characters) "
> ,LobTypeName [Name "blob"] 4 (Just LobM) (Just LobCharacters)) > ,LobTypeName [Name "blob"] 4 (Just LobM) (Just LobCharacters))
> ,("blob(5 code_units) "
> ,LobTypeName [Name "blob"] 5 Nothing (Just LobCodeUnits))
> ,("blob(6G octets) " > ,("blob(6G octets) "
> ,LobTypeName [Name "blob"] 6 (Just LobG) (Just LobOctets)) > ,LobTypeName [Name "blob"] 6 (Just LobG) (Just LobOctets))
> ,("national character large object(7K) " > ,("national character large object(7K) "
@ -1172,7 +1211,19 @@ This is used in row type names.
| USER | USER
| VALUE | VALUE
TODO: review how the special keywords are parsed and add tests for these
> someGeneralValues :: TestItem
> someGeneralValues = Group "some general values" $ map (uncurry TestValueExpr) $
> map mkIden ["CURRENT_DEFAULT_TRANSFORM_GROUP"
> ,"CURRENT_PATH"
> ,"CURRENT_ROLE"
> ,"CURRENT_USER"
> ,"SESSION_USER"
> ,"SYSTEM_USER"
> ,"USER"
> ,"VALUE"]
> where
> mkIden nm = (nm,Iden [Name nm])
<simple value specification> ::= <simple value specification> ::=
<literal> <literal>
@ -1269,8 +1320,11 @@ already covered above in the identifiers and names section
<basic identifier chain> <basic identifier chain>
| MODULE <period> <qualified identifier> <period> <column name> | MODULE <period> <qualified identifier> <period> <column name>
TODO: work out the exact syntax and add > moduleColumnRef :: TestItem
> moduleColumnRef = Group "module column ref" $ map (uncurry TestValueExpr)
> [("MODULE.something.something", Iden [Name "MODULE"
> ,Name "something"
> ,Name "something"])]
== 6.8 <SQL parameter reference> (p190) == 6.8 <SQL parameter reference> (p190)
@ -1304,7 +1358,19 @@ ORDER BY department, job, "Total Empl", "Average Sal";
TODO: de-oracle the syntax and add as test case TODO: de-oracle the syntax and add as test case
> groupingOperation :: TestItem
> groupingOperation = Group "grouping operation" $ map (uncurry TestQueryExpr)
> [("SELECT SalesQuota, SUM(SalesYTD) TotalSalesYTD,\n\
> \ GROUPING(SalesQuota) AS Grouping\n\
> \FROM Sales.SalesPerson\n\
> \GROUP BY ROLLUP(SalesQuota);"
> ,makeSelect
> {qeSelectList = [(Iden [Name "SalesQuota"],Nothing)
> ,(App [Name "SUM"] [Iden [Name "SalesYTD"]],Just (Name "TotalSalesYTD"))
> ,(App [Name "GROUPING"] [Iden [Name "SalesQuota"]],Just (Name "Grouping"))]
> ,qeFrom = [TRSimple [Name "Sales",Name "SalesPerson"]]
> ,qeGroupBy = [Rollup [SimpleGroup (Iden [Name "SalesQuota"])]]})
> ]
== 6.10 <window function> (p193) == 6.10 <window function> (p193)
@ -1323,6 +1389,10 @@ TODO: de-oracle the syntax and add as test case
TODO: window functions TODO: window functions
> windowFunctions :: TestItem
> windowFunctions = Group "window functions" $ map (uncurry TestValueExpr)
> [
> ]
== 6.11 <case expression> (p197) == 6.11 <case expression> (p197)
@ -1371,7 +1441,10 @@ TODO: window functions
TODO: case expressions plus the 'abbreviations' TODO: case expressions plus the 'abbreviations'
> caseExpression :: TestItem
> caseExpression = Group "case expression" $ map (uncurry TestValueExpr)
> [
> ]
== 6.12 <cast specification> (p200) == 6.12 <cast specification> (p200)
@ -1391,7 +1464,7 @@ This is already covered above
> nextValueExpression :: TestItem > nextValueExpression :: TestItem
> nextValueExpression = Group "next value expression" $ map (uncurry TestValueExpr) > nextValueExpression = Group "next value expression" $ map (uncurry TestValueExpr)
> [("next value for a.b", undefined) > [("next value for a.b", NextValueFor [Name "a", Name "b"])
> ] > ]
@ -1569,13 +1642,16 @@ Specify a numeric value.
> numericValueExpression :: TestItem > numericValueExpression :: TestItem
> numericValueExpression = Group "numeric value expression" $ map (uncurry TestValueExpr) > numericValueExpression = Group "numeric value expression" $ map (uncurry TestValueExpr)
> [("a + b", undefined) > [("a + b", binOp "+")
> ,("a - b", undefined) > ,("a - b", binOp "-")
> ,("a * b", undefined) > ,("a * b", binOp "*")
> ,("a / b", undefined) > ,("a / b", binOp "/")
> ,("+a", undefined) > ,("+a", prefOp "+")
> ,("-a", undefined) > ,("-a", prefOp "-")
> ] > ]
> where
> binOp o = BinOp (Iden [Name "a"]) [Name o] (Iden [Name "b"])
> prefOp o = PrefixOp [Name o] (Iden [Name "a"])
== 6.27 <numeric value function> (p242) == 6.27 <numeric value function> (p242)
@ -1600,12 +1676,22 @@ Specify a function yielding a value of type numeric.
<string position expression> <string position expression>
| <blob position expression> | <blob position expression>
> numericValueFunction :: TestItem
> numericValueFunction = Group "numeric value function" $ map (uncurry TestValueExpr)
> [
<string position expression> ::= <string position expression> ::=
POSITION <left paren> <string value expression> IN <string value expression> [ USING <char length units> ] <right paren> POSITION <left paren> <string value expression> IN <string value expression> [ USING <char length units> ] <right paren>
<blob position expression> ::= <blob position expression> ::=
POSITION <left paren> <blob value expression> IN <blob value expression> <right paren> POSITION <left paren> <blob value expression> IN <blob value expression> <right paren>
> ("position (a in b)",undefined)
> ,("position (a in b using characters)",undefined)
> ,("position (a in b using octets)",undefined)
TODO: position expressions TODO: position expressions
<length expression> ::= <length expression> ::=
@ -1663,6 +1749,9 @@ TODO: extract expression
TODO: lots more expressions above TODO: lots more expressions above
> ]
== 6.28 <string value expression> (p251) == 6.28 <string value expression> (p251)
Specify a character string value or a binary string value. Specify a character string value or a binary string value.
@ -3160,9 +3249,3 @@ TODO: review sort specifications
> qe = makeSelect > qe = makeSelect
> {qeSelectList = [(Star,Nothing)] > {qeSelectList = [(Star,Nothing)]
> ,qeFrom = [TRSimple [Name "t"]]} > ,qeFrom = [TRSimple [Name "t"]]}
TODO: what happened to the collation in order by?
Answer: sort used to be a column reference with an optional
collate. Since it is now a value expression, the collate doesn't need
to be mentioned here.