1
Fork 0
simple-sql-parser/Language/SQL/SimpleSQL/Parse.lhs

2255 lines
75 KiB
Plaintext
Raw Normal View History

2013-12-31 10:44:10 +01:00
2014-04-19 14:10:45 +02:00
= TOC:
2014-04-19 12:10:46 +02:00
2014-04-19 14:10:45 +02:00
notes
Public api
2014-04-19 12:10:46 +02:00
Names - parsing identifiers
Typenames
2016-02-22 22:24:25 +01:00
Scalar expressions
2014-04-19 12:22:11 +02:00
simple literals
2014-04-19 12:10:46 +02:00
star, param
parens expression, row constructor and scalar subquery
case, cast, exists, unique, array/ multiset constructor
typed literal, app, special function, aggregate, window function
2014-04-19 12:22:11 +02:00
suffixes: in, between, quantified comparison, match predicate, array
subscript, escape, collate
2014-04-19 12:10:46 +02:00
operators
2016-02-22 22:24:25 +01:00
scalar expression top level
2014-04-19 12:10:46 +02:00
helpers
2014-04-19 12:22:11 +02:00
query expressions
select lists
from clause
other table expression clauses:
where, group by, having, order by, offset and fetch
common table expressions
query expression
set operations
2014-04-19 14:10:45 +02:00
lexers
2014-04-19 12:22:11 +02:00
utilities
2014-04-19 12:10:46 +02:00
2014-04-19 14:10:45 +02:00
= Notes about the code
The lexers appear at the bottom of the file. There tries to be a clear
separation between the lexers and the other parser which only use the
lexers, this isn't 100% complete at the moment and needs fixing.
== Left factoring
The parsing code is aggressively left factored, and try is avoided as
much as possible. Try is avoided because:
* when it is overused it makes the code hard to follow
* when it is overused it makes the parsing code harder to debug
* it makes the parser error messages much worse
2014-04-19 14:10:45 +02:00
The code could be made a bit simpler with a few extra 'trys', but this
isn't done because of the impact on the parser error
messages. Apparently it can also help the speed but this hasn't been
looked into.
2015-07-31 23:04:18 +02:00
== Parser error messages
2014-04-19 14:10:45 +02:00
A lot of care has been given to generating good parser error messages
for invalid syntax. There are a few utils below which partially help
in this area.
There is a set of crafted bad expressions in ErrorMessages.lhs, these
are used to guage the quality of the error messages and monitor
regressions by hand. The use of <?> is limited as much as possible:
each instance should justify itself by improving an actual error
message.
There is also a plan to write a really simple expression parser which
doesn't do precedence and associativity, and the fix these with a pass
over the ast. I don't think there is any other way to sanely handle
the common prefixes between many infix and postfix multiple keyword
operators, and some other ambiguities also. This should help a lot in
generating good error messages also.
Both the left factoring and error message work are greatly complicated
by the large number of shared prefixes of the various elements in SQL
syntax.
== Main left factoring issues
There are three big areas which are tricky to left factor:
* typenames
2016-02-22 22:24:25 +01:00
* scalar expressions which can start with an identifier
* infix and suffix operators
2014-04-19 14:10:45 +02:00
=== typenames
There are a number of variations of typename syntax. The standard
deals with this by switching on the name of the type which is parsed
first. This code doesn't do this currently, but might in the
future. Taking the approach in the standard grammar will limit the
extensibility of the parser and might affect the ease of adapting to
support other sql dialects.
2016-02-22 22:24:25 +01:00
=== identifier scalar expressions
2014-04-19 14:10:45 +02:00
2016-02-22 22:24:25 +01:00
There are a lot of scalar expression nodes which start with
2014-04-19 14:10:45 +02:00
identifiers, and can't be distinguished the tokens after the initial
identifier are parsed. Using try to implement these variations is very
simple but makes the code much harder to debug and makes the parser
error messages really bad.
Here is a list of these nodes:
* identifiers
* function application
* aggregate application
* window application
* typed literal: typename 'literal string'
* interval literal which is like the typed literal with some extras
2014-04-19 14:10:45 +02:00
There is further ambiguity e.g. with typed literals with precision,
functions, aggregates, etc. - these are an identifier, followed by
2016-02-22 22:24:25 +01:00
parens comma separated scalar expressions or something similar, and it
2014-04-19 14:10:45 +02:00
is only later that we can find a token which tells us which flavour it
is.
There is also a set of nodes which start with an identifier/keyword
but can commit since no other syntax can start the same way:
* case
* cast
* exists, unique subquery
* array constructor
* multiset constructor
* all the special syntax functions: extract, position, substring,
2014-04-19 14:10:45 +02:00
convert, translate, overlay, trim, etc.
The interval literal mentioned above is treated in this group at the
moment: if we see 'interval' we parse it either as a full interval
literal or a typed literal only.
Some items in this list might have to be fixed in the future, e.g. to
support standard 'substring(a from 3 for 5)' as well as regular
function substring syntax 'substring(a,3,5) at the same time.
The work in left factoring all this is mostly done, but there is still
a substantial bit to complete and this is by far the most difficult
bit. At the moment, the work around is to use try, the downsides of
which is the poor parsing error messages.
=== infix and suffix operators
== permissiveness
The parser is very permissive in many ways. This departs from the
standard which is able to eliminate a number of possibilities just in
the grammar, which this parser allows. This is done for a number of
reasons:
* it makes the parser simple - less variations
* it should allow for dialects and extensibility more easily in the
2014-04-19 14:10:45 +02:00
future (e.g. new infix binary operators with custom precedence)
* many things which are effectively checked in the grammar in the
2014-04-19 14:10:45 +02:00
standard, can be checked using a typechecker or other simple static
analysis
To use this code as a front end for a sql engine, or as a sql validity
checker, you will need to do a lot of checks on the ast. A
typechecker/static checker plus annotation to support being a compiler
front end is planned but not likely to happen too soon.
Some of the areas this affects:
typenames: the variation of the type name should switch on the actual
name given according to the standard, but this code only does this for
the special case of interval type names. E.g. you can write 'int
collate C' or 'int(15,2)' and this will parse as a character type name
or a precision scale type name instead of being rejected.
2016-02-22 22:24:25 +01:00
scalar expressions: every variation on scalar expressions uses the same
2014-04-19 14:10:45 +02:00
parser/syntax. This means we don't try to stop non boolean valued
expressions in boolean valued contexts in the parser. Another area
2016-02-22 22:24:25 +01:00
this affects is that we allow general scalar expressions in group by,
2014-04-19 14:10:45 +02:00
whereas the standard only allows column names with optional collation.
These are all areas which are specified (roughly speaking) in the
syntax rather than the semantics in the standard, and we are not
fixing them in the syntax but leaving them till the semantic checking
(which doesn't exist in this code at this time).
> {-# LANGUAGE TupleSections #-}
> -- | This is the module with the parser functions.
> module Language.SQL.SimpleSQL.Parse
2013-12-13 15:04:48 +01:00
> (parseQueryExpr
2016-02-22 22:24:25 +01:00
> ,parseScalarExpr
> ,parseStatement
> ,parseStatements
2013-12-13 18:21:44 +01:00
> ,ParseError(..)) where
2013-12-13 11:39:26 +01:00
> import Control.Monad.Identity (Identity)
> import Control.Monad (guard, void)
> import Control.Applicative ((<**>))
2015-07-31 23:04:18 +02:00
> import Data.Char (toLower, isDigit)
> import Text.Parsec (setPosition,setSourceColumn,setSourceLine,getPosition
2015-07-31 23:04:18 +02:00
> ,option,between,sepBy,sepBy1
2015-08-04 21:08:32 +02:00
> ,try,many,many1,(<|>),choice,eof
2015-07-31 23:04:18 +02:00
> ,optionMaybe,optional,runParser
> ,chainl1, chainr1,(<?>))
> -- import Text.Parsec.String (Parser)
> import Text.Parsec.Perm (permute,(<$?>), (<|?>))
2015-07-31 23:04:18 +02:00
> import Text.Parsec.Prim (getState, token)
> import Text.Parsec.Pos (newPos)
> import qualified Text.Parsec.Expr as E
> import Data.List (intercalate,sort,groupBy)
> import Data.Function (on)
2013-12-13 15:04:48 +01:00
> import Language.SQL.SimpleSQL.Syntax
> import Language.SQL.SimpleSQL.Combinators
> import Language.SQL.SimpleSQL.Errors
> --import Language.SQL.SimpleSQL.Dialect
> import qualified Language.SQL.SimpleSQL.Lex as L
2015-07-31 23:04:18 +02:00
> import Data.Maybe
> import Text.Parsec.String (GenParser)
2013-12-13 11:39:26 +01:00
= Public API
2013-12-13 11:39:26 +01:00
2013-12-14 09:55:44 +01:00
> -- | Parses a query expr, trailing semicolon optional.
2014-06-27 11:19:15 +02:00
> parseQueryExpr :: Dialect
> -- ^ dialect of SQL to use
> -> FilePath
> -- ^ filename to use in error messages
2013-12-16 09:03:46 +01:00
> -> Maybe (Int,Int)
2013-12-31 11:20:07 +01:00
> -- ^ line number and column number of the first character
2014-06-27 11:19:15 +02:00
> -- in the source to use in error messages
2013-12-16 09:03:46 +01:00
> -> String
> -- ^ the SQL source to parse
2013-12-13 18:21:44 +01:00
> -> Either ParseError QueryExpr
2013-12-14 09:55:44 +01:00
> parseQueryExpr = wrapParse topLevelQueryExpr
2013-12-13 11:39:26 +01:00
> -- | Parses a statement, trailing semicolon optional.
> parseStatement :: Dialect
> -- ^ dialect of SQL to use
> -> FilePath
> -- ^ filename to use in error messages
> -> Maybe (Int,Int)
> -- ^ line number and column number of the first character
> -- in the source to use in error messages
> -> String
> -- ^ the SQL source to parse
> -> Either ParseError Statement
2015-08-01 22:16:26 +02:00
> parseStatement = wrapParse topLevelStatement
> -- | Parses a list of statements, with semi colons between
2013-12-16 09:03:46 +01:00
> -- them. The final semicolon is optional.
> parseStatements :: Dialect
2014-06-27 11:19:15 +02:00
> -- ^ dialect of SQL to use
> -> FilePath
> -- ^ filename to use in error messages
2013-12-16 09:03:46 +01:00
> -> Maybe (Int,Int)
2013-12-31 11:20:07 +01:00
> -- ^ line number and column number of the first character
2014-06-27 11:19:15 +02:00
> -- in the source to use in error messages
2013-12-16 09:03:46 +01:00
> -> String
> -- ^ the SQL source to parse
> -> Either ParseError [Statement]
> parseStatements = wrapParse statements
2016-02-22 22:24:25 +01:00
> -- | Parses a scalar expression.
> parseScalarExpr :: Dialect
2014-06-27 11:19:15 +02:00
> -- ^ dialect of SQL to use
> -> FilePath
> -- ^ filename to use in error messages
2013-12-16 09:03:46 +01:00
> -> Maybe (Int,Int)
2013-12-31 11:20:07 +01:00
> -- ^ line number and column number of the first character
2014-06-27 11:19:15 +02:00
> -- in the source to use in error messages
2013-12-16 09:03:46 +01:00
> -> String
> -- ^ the SQL source to parse
2016-02-22 22:24:25 +01:00
> -> Either ParseError ScalarExpr
> parseScalarExpr = wrapParse scalarExpr
2013-12-14 09:55:44 +01:00
This helper function takes the parser given and:
sets the position when parsing
automatically skips leading whitespace
checks the parser parses all the input using eof
converts the error return to the nice wrapper
> wrapParse :: Parser a
2014-06-27 11:19:15 +02:00
> -> Dialect
2013-12-14 09:55:44 +01:00
> -> FilePath
> -> Maybe (Int,Int)
> -> String
> -> Either ParseError a
2015-07-31 23:04:18 +02:00
> wrapParse parser d f p src = do
> let (l,c) = fromMaybe (1,1) p
> lx <- L.lexSQL d f (Just (l,c)) src
2013-12-13 18:21:44 +01:00
> either (Left . convParseError src) Right
2015-07-31 23:04:18 +02:00
> $ runParser (setPos p *> parser <* eof)
> d f $ filter keep lx
> where
> setPos Nothing = pure ()
> setPos (Just (l,c)) = fmap up getPosition >>= setPosition
> where up = flip setSourceColumn c . flip setSourceLine l
2015-07-31 23:04:18 +02:00
> keep (_,L.Whitespace {}) = False
> keep (_,L.LineComment {}) = False
> keep (_,L.BlockComment {}) = False
> keep _ = True
2013-12-13 18:21:44 +01:00
2013-12-14 00:14:23 +01:00
------------------------------------------------
2013-12-13 18:21:44 +01:00
= Names
Names represent identifiers and a few other things. The parser here
handles regular identifiers, dotten chain identifiers, quoted
identifiers and unicode quoted identifiers.
Dots: dots in identifier chains are parsed here and represented in the
Iden constructor usually. If parts of the chains are non identifier
2016-02-22 22:24:25 +01:00
scalar expressions, then this is represented by a BinOp "."
instead. Dotten chain identifiers which appear in other contexts (such
as function names, table names, are represented as [Name] only.
Identifier grammar:
unquoted:
underscore <|> letter : many (underscore <|> alphanum
example
_example123
quoted:
double quote, many (non quote character or two double quotes
together), double quote
"example quoted"
"example with "" quote"
unicode quoted is the same as quoted in this parser, except it starts
with U& or u&
u&"example quoted"
> name :: Parser Name
> name = do
> d <- getState
2016-02-12 13:13:47 +01:00
> uncurry Name <$> identifierTok (blacklist d)
todo: replace (:[]) with a named function all over
> names :: Parser [Name]
> names = reverse <$> (((:[]) <$> name) <??*> anotherName)
> -- can't use a simple chain here since we
> -- want to wrap the . + name in a try
> -- this will change when this is left factored
> where
> anotherName :: Parser ([Name] -> [Name])
> anotherName = try ((:) <$> (symbol "." *> name))
= Type Names
Typenames are used in casts, and also in the typed literal syntax,
which is a typename followed by a string literal.
Here are the grammar notes:
== simple type name
just an identifier chain or a multi word identifier (this is a fixed
list of possibilities, e.g. as 'character varying', see below in the
parser code for the exact list).
<simple-type-name> ::= <identifier-chain>
| multiword-type-identifier
== Precision type name
<precision-type-name> ::= <simple-type-name> <left paren> <unsigned-int> <right paren>
e.g. char(5)
note: above and below every where a simple type name can appear, this
means a single identifier/quoted or a dotted chain, or a multi word
identifier
== Precision scale type name
<precision-type-name> ::= <simple-type-name> <left paren> <unsigned-int> <comma> <unsigned-int> <right paren>
e.g. decimal(15,2)
== Lob type name
this is a variation on the precision type name with some extra info on
the units:
<lob-type-name> ::=
<simple-type-name> <left paren> <unsigned integer> [ <multiplier> ] [ <char length units> ] <right paren>
<multiplier> ::= K | M | G
<char length units> ::= CHARACTERS | CODE_UNITS | OCTETS
(if both multiplier and char length units are missing, then this will
parse as a precision type name)
e.g.
clob(5M octets)
== char type name
this is a simple type with optional precision which allows the
character set or the collation to appear as a suffix:
<char type name> ::=
<simple type name>
[ <left paren> <unsigned-int> <right paren> ]
[ CHARACTER SET <identifier chain> ]
[ COLLATE <identifier chain> ]
e.g.
char(5) character set my_charset collate my_collation
= Time typename
this is typename with optional precision and either 'with time zone'
or 'without time zone' suffix, e.g.:
<datetime type> ::=
[ <left paren> <unsigned-int> <right paren> ]
<with or without time zone>
<with or without time zone> ::= WITH TIME ZONE | WITHOUT TIME ZONE
WITH TIME ZONE | WITHOUT TIME ZONE
= row type name
<row type> ::=
ROW <left paren> <field definition> [ { <comma> <field definition> }... ] <right paren>
<field definition> ::= <identifier> <type name>
e.g.
row(a int, b char(5))
= interval type name
<interval type> ::= INTERVAL <interval datetime field> [TO <interval datetime field>]
<interval datetime field> ::=
<datetime field> [ <left paren> <unsigned int> [ <comma> <unsigned int> ] <right paren> ]
= array type name
<array type> ::= <data type> ARRAY [ <left bracket> <unsigned integer> <right bracket> ]
= multiset type name
<multiset type> ::= <data type> MULTISET
A type name will parse into the 'smallest' constructor it will fit in
syntactically, e.g. a clob(5) will parse to a precision type name, not
a lob type name.
Unfortunately, to improve the error messages, there is a lot of (left)
factoring in this function, and it is a little dense.
> typeName :: Parser TypeName
2015-07-31 23:04:18 +02:00
> typeName =
> (rowTypeName <|> intervalTypeName <|> otherTypeName)
> <??*> tnSuffix
> where
> rowTypeName =
> RowTypeName <$> (keyword_ "row" *> parens (commaSep1 rowField))
> rowField = (,) <$> name <*> typeName
> ----------------------------
> intervalTypeName =
> keyword_ "interval" *>
> (uncurry IntervalTypeName <$> intervalQualifier)
> ----------------------------
> otherTypeName =
> nameOfType <**>
> (typeNameWithParens
> <|> pure Nothing <**> (timeTypeName <|> charTypeName)
> <|> pure TypeName)
> nameOfType = reservedTypeNames <|> names
> charTypeName = charSet <**> (option [] tcollate <$$$$> CharTypeName)
> <|> pure [] <**> (tcollate <$$$$> CharTypeName)
> typeNameWithParens =
> (openParen *> unsignedInteger)
> <**> (closeParen *> precMaybeSuffix
> <|> (precScaleTypeName <|> precLengthTypeName) <* closeParen)
> precMaybeSuffix = (. Just) <$> (timeTypeName <|> charTypeName)
> <|> pure (flip PrecTypeName)
> precScaleTypeName = (comma *> unsignedInteger) <$$$> PrecScaleTypeName
> precLengthTypeName =
> Just <$> lobPrecSuffix
> <**> (optionMaybe lobUnits <$$$$> PrecLengthTypeName)
> <|> pure Nothing <**> ((Just <$> lobUnits) <$$$$> PrecLengthTypeName)
> timeTypeName = tz <$$$> TimeTypeName
> ----------------------------
> lobPrecSuffix = PrecK <$ keyword_ "k"
> <|> PrecM <$ keyword_ "m"
> <|> PrecG <$ keyword_ "g"
> <|> PrecT <$ keyword_ "t"
> <|> PrecP <$ keyword_ "p"
> lobUnits = PrecCharacters <$ keyword_ "characters"
> -- char and byte are the oracle spelling
> -- todo: move these to oracle dialect
> <|> PrecCharacters <$ keyword_ "char"
> <|> PrecOctets <$ keyword_ "octets"
> <|> PrecOctets <$ keyword_ "byte"
> tz = True <$ keywords_ ["with", "time","zone"]
> <|> False <$ keywords_ ["without", "time","zone"]
> charSet = keywords_ ["character", "set"] *> names
> tcollate = keyword_ "collate" *> names
> ----------------------------
2014-05-09 22:26:18 +02:00
> tnSuffix = multiset <|> array
> multiset = MultisetTypeName <$ keyword_ "multiset"
> array = keyword_ "array" *>
> (optionMaybe (brackets unsignedInteger) <$$> ArrayTypeName)
> ----------------------------
> -- this parser handles the fixed set of multi word
> -- type names, plus all the type names which are
> -- reserved words
2016-02-12 13:13:47 +01:00
> reservedTypeNames = (:[]) . Name Nothing . unwords <$> makeKeywordTree
> ["double precision"
> ,"character varying"
> ,"char varying"
> ,"character large object"
> ,"char large object"
> ,"national character"
> ,"national char"
> ,"national character varying"
> ,"national char varying"
> ,"national character large object"
> ,"nchar large object"
> ,"nchar varying"
> ,"bit varying"
> ,"binary large object"
> ,"binary varying"
> -- reserved keyword typenames:
> ,"array"
> ,"bigint"
> ,"binary"
> ,"blob"
> ,"boolean"
> ,"char"
> ,"character"
> ,"clob"
> ,"date"
> ,"dec"
> ,"decimal"
> ,"double"
> ,"float"
> ,"int"
> ,"integer"
> ,"nchar"
> ,"nclob"
> ,"numeric"
> ,"real"
> ,"smallint"
> ,"time"
> ,"timestamp"
> ,"varchar"
> ,"varbinary"
> ]
2016-02-22 22:24:25 +01:00
= Scalar expressions
2013-12-13 11:39:26 +01:00
2014-04-19 12:10:46 +02:00
== simple literals
2013-12-14 09:55:44 +01:00
See the stringToken lexer below for notes on string literal syntax.
2016-02-22 22:24:25 +01:00
> stringLit :: Parser ScalarExpr
> stringLit = (\(s,e,t) -> StringLit s e t) <$> stringTokExtend
2016-02-22 22:24:25 +01:00
> numberLit :: Parser ScalarExpr
> numberLit = NumLit <$> sqlNumberTok False
2013-12-14 09:55:44 +01:00
2016-02-22 22:24:25 +01:00
> simpleLiteral :: Parser ScalarExpr
> simpleLiteral = numberLit <|> stringLit
2014-04-19 12:10:46 +02:00
== star, param, host param
=== star
2013-12-14 09:55:44 +01:00
2013-12-17 11:24:37 +01:00
used in select *, select x.*, and agg(*) variations, and some other
2014-04-19 14:10:45 +02:00
places as well. The parser doesn't attempt to check that the star is
2016-02-22 22:24:25 +01:00
in a valid context, it parses it OK in any scalar expression context.
2013-12-14 09:55:44 +01:00
2016-02-22 22:24:25 +01:00
> star :: Parser ScalarExpr
2013-12-17 14:21:43 +01:00
> star = Star <$ symbol "*"
2013-12-13 11:39:26 +01:00
== parameter
2014-04-19 12:22:11 +02:00
unnamed parameter or named parameter
use in e.g. select * from t where a = ?
2014-04-17 18:27:18 +02:00
select x from t where x > :param
2016-02-22 22:24:25 +01:00
> parameter :: Parser ScalarExpr
2014-04-19 12:22:11 +02:00
> parameter = choice
> [Parameter <$ questionMark
> ,HostParameter
2015-07-31 23:04:18 +02:00
> <$> hostParamTok
> <*> optionMaybe (keyword "indicator" *> hostParamTok)]
2014-04-17 18:27:18 +02:00
== positional arg
2016-02-22 22:24:25 +01:00
> positionalArg :: Parser ScalarExpr
> positionalArg = PositionalArg <$> positionalArgTok
2014-04-19 12:10:46 +02:00
== parens
2013-12-13 11:39:26 +01:00
2016-02-22 22:24:25 +01:00
scalar expression parens, row ctor and scalar subquery
2013-12-14 09:55:44 +01:00
2016-02-22 22:24:25 +01:00
> parensExpr :: Parser ScalarExpr
2014-04-19 12:22:11 +02:00
> parensExpr = parens $ choice
2014-04-19 12:10:46 +02:00
> [SubQueryExpr SqSq <$> queryExpr
2016-02-22 22:24:25 +01:00
> ,ctor <$> commaSep1 scalarExpr]
2014-04-19 12:10:46 +02:00
> where
> ctor [a] = Parens a
2016-02-12 13:13:47 +01:00
> ctor as = SpecialOp [Name Nothing "rowctor"] as
2013-12-14 09:55:44 +01:00
2014-04-19 12:10:46 +02:00
== case, cast, exists, unique, array/multiset constructor, interval
2013-12-14 09:55:44 +01:00
2014-04-19 14:10:45 +02:00
All of these start with a fixed keyword which is reserved, so no other
syntax can start with the same keyword.
2014-04-18 11:28:05 +02:00
2014-04-19 12:10:46 +02:00
=== case expression
2016-02-22 22:24:25 +01:00
> caseExpr :: Parser ScalarExpr
2014-04-19 12:22:11 +02:00
> caseExpr =
2016-02-22 22:24:25 +01:00
> Case <$> (keyword_ "case" *> optionMaybe scalarExpr)
2014-04-19 12:10:46 +02:00
> <*> many1 whenClause
> <*> optionMaybe elseClause
> <* keyword_ "end"
2013-12-14 12:05:02 +01:00
> where
2016-02-22 22:24:25 +01:00
> whenClause = (,) <$> (keyword_ "when" *> commaSep1 scalarExpr)
> <*> (keyword_ "then" *> scalarExpr)
> elseClause = keyword_ "else" *> scalarExpr
2014-04-19 12:10:46 +02:00
=== cast
2013-12-14 09:55:44 +01:00
2014-04-19 12:10:46 +02:00
cast: cast(expr as type)
2013-12-14 09:55:44 +01:00
2016-02-22 22:24:25 +01:00
> cast :: Parser ScalarExpr
2014-05-10 09:02:16 +02:00
> cast = keyword_ "cast" *>
2016-02-22 22:24:25 +01:00
> parens (Cast <$> scalarExpr
2014-04-19 12:10:46 +02:00
> <*> (keyword_ "as" *> typeName))
2014-04-19 12:10:46 +02:00
=== exists, unique
subquery expression:
[exists|unique] (queryexpr)
2016-02-22 22:24:25 +01:00
> subquery :: Parser ScalarExpr
2014-04-19 12:10:46 +02:00
> subquery = SubQueryExpr <$> sqkw <*> parens queryExpr
2013-12-13 22:31:36 +01:00
> where
> sqkw = SqExists <$ keyword_ "exists" <|> SqUnique <$ keyword_ "unique"
2013-12-13 22:31:36 +01:00
2014-04-19 12:10:46 +02:00
=== array/multiset constructor
2016-02-22 22:24:25 +01:00
> arrayCtor :: Parser ScalarExpr
2014-04-19 12:10:46 +02:00
> arrayCtor = keyword_ "array" >>
> choice
> [ArrayCtor <$> parens queryExpr
2016-02-22 22:24:25 +01:00
> ,Array (Iden [Name Nothing "array"]) <$> brackets (commaSep scalarExpr)]
2014-04-19 12:10:46 +02:00
As far as I can tell, table(query expr) is just syntax sugar for
multiset(query expr). It must be there for compatibility or something.
2016-02-22 22:24:25 +01:00
> multisetCtor :: Parser ScalarExpr
2014-04-19 12:10:46 +02:00
> multisetCtor =
> choice
> [keyword_ "multiset" >>
> choice
> [MultisetQueryCtor <$> parens queryExpr
2016-02-22 22:24:25 +01:00
> ,MultisetCtor <$> brackets (commaSep scalarExpr)]
2014-04-19 12:10:46 +02:00
> ,keyword_ "table" >>
> MultisetQueryCtor <$> parens queryExpr]
2016-02-22 22:24:25 +01:00
> nextValueFor :: Parser ScalarExpr
> nextValueFor = keywords_ ["next","value","for"] >>
> NextValueFor <$> names
2014-04-19 12:10:46 +02:00
=== interval
interval literals are a special case and we follow the grammar less
permissively here
parse SQL interval literals, something like
interval '5' day (3)
or
interval '5' month
if the literal looks like this:
interval 'something'
then it is parsed as a regular typed literal. It must have a
interval-datetime-field suffix to parse as an intervallit
2014-04-19 12:22:11 +02:00
It uses try because of a conflict with interval type names: todo, fix
this. also fix the monad -> applicative
2014-04-19 12:22:11 +02:00
2016-02-22 22:24:25 +01:00
> intervalLit :: Parser ScalarExpr
2014-04-19 12:22:11 +02:00
> intervalLit = try (keyword_ "interval" >> do
> s <- optionMaybe $ choice [Plus <$ symbol_ "+"
> ,Minus <$ symbol_ "-"]
> lit <- singleQuotesOnlyStringTok
2014-04-19 12:10:46 +02:00
> q <- optionMaybe intervalQualifier
2014-04-19 12:22:11 +02:00
> mkIt s lit q)
2014-04-19 12:10:46 +02:00
> where
2016-02-12 13:13:47 +01:00
> mkIt Nothing val Nothing = pure $ TypedLit (TypeName [Name Nothing "interval"]) val
> mkIt s val (Just (a,b)) = pure $ IntervalLit s val a b
2014-04-19 12:10:46 +02:00
> mkIt (Just {}) _val Nothing = fail "cannot use sign without interval qualifier"
2014-04-17 20:05:47 +02:00
2014-04-19 12:10:46 +02:00
== typed literal, app, special, aggregate, window, iden
All of these start with identifiers (some of the special functions
start with reserved keywords).
they are all variations on suffixes on the basic identifier parser
The windows is a suffix on the app parser
=== iden prefix term
2014-04-17 20:05:47 +02:00
2016-02-22 22:24:25 +01:00
all the scalar expressions which start with an identifier
2014-04-17 20:05:47 +02:00
(todo: really put all of them here instead of just some of them)
2016-02-22 22:24:25 +01:00
> idenExpr :: Parser ScalarExpr
2014-04-19 12:22:11 +02:00
> idenExpr =
> -- todo: work out how to left factor this
> try (TypedLit <$> typeName <*> singleQuotesOnlyStringTok)
> <|> (try keywordFunction <**> app)
2014-05-09 22:26:18 +02:00
> <|> (names <**> option Iden app)
> where
> -- this is a special case because 'set' is a reserved keyword
> -- and the names parser won't parse it
> -- can't remove it from the reserved keyword list, because
> -- it is used in a lot of places which are ambiguous as a keyword
> -- this approach might be needed with other keywords which look
> -- like identifiers or functions
> keywordFunction =
> let makeKeywordFunction x = if map toLower x `elem` keywordFunctionNames
> then return [Name Nothing x]
> else fail ""
> in unquotedIdentifierTok [] Nothing >>= makeKeywordFunction
> keywordFunctionNames = [{-"abs"
> ,"all"
> ,"any"
> ,"array_agg"
> ,"avg"
> ,"ceil"
> ,"ceiling"
> ,"char_length"
> ,"character_length"
> ,"coalesce"
> ,"collect"
> ,"contains"
> ,"convert"
> ,"corr"
> ,"covar_pop"
> ,"covar_samp"
> ,"count"
> ,"cume_dist"
> ,"grouping"
> ,"intersection"
> ,"ln"
> ,"max"
> ,"mod"
> ,"percent_rank"
> ,"percentile_cont"
> ,"percentile_disc"
> ,"power"
> ,"rank"
> ,"regr_avgx"
> ,"regr_avgy"
> ,"regr_count"
> ,"regr_intercept"
> ,"regr_r2"
> ,"regr_slope"
> ,"regr_sxx"
> ,"regr_sxy"
> ,"regr_syy"
> ,"row"
> ,"row_number"
> ,-}"set"{-
> ,"some"
> ,"stddev_pop"
> ,"stddev_samp"
> ,"sum"
> ,"upper"
> ,"var_pop"
> ,"var_samp"
> ,"width_bucket"
> -- window functions added here too
> ,"row_number"
> ,"rank"
> ,"dense_rank"
> ,"percent_rank"
> ,"cume_dist"
> ,"ntile"
> ,"lead"
> ,"lag"
> ,"first_value"
> ,"last_value"
> ,"nth_value"-}
> ]
2013-12-14 09:55:44 +01:00
2014-04-19 12:10:46 +02:00
=== special
2013-12-14 09:55:44 +01:00
These are keyword operators which don't look like normal prefix,
postfix or infix binary operators. They mostly look like function
application but with keywords in the argument list instead of commas
to separate the arguments.
the special op keywords
parse an operator which is
operatorname(firstArg keyword0 arg0 keyword1 arg1 etc.)
> data SpecialOpKFirstArg = SOKNone
> | SOKOptional
> | SOKMandatory
> specialOpK :: String -- name of the operator
> -> SpecialOpKFirstArg -- has a first arg without a keyword
> -> [(String,Bool)] -- the other args with their keywords
> -- and whether they are optional
2016-02-22 22:24:25 +01:00
> -> Parser ScalarExpr
> specialOpK opName firstArg kws =
> keyword_ opName >> do
> void openParen
> let pfa = do
2016-02-22 22:24:25 +01:00
> e <- scalarExpr
> -- check we haven't parsed the first
> -- keyword as an identifier
2015-07-31 23:04:18 +02:00
> case (e,kws) of
2016-02-12 13:13:47 +01:00
> (Iden [Name Nothing i], (k,_):_)
2015-07-31 23:04:18 +02:00
> | map toLower i == k ->
> fail $ "cannot use keyword here: " ++ i
> _ -> return ()
> pure e
> fa <- case firstArg of
> SOKNone -> pure Nothing
> SOKOptional -> optionMaybe (try pfa)
> SOKMandatory -> Just <$> pfa
> as <- mapM parseArg kws
> void closeParen
2016-02-12 13:13:47 +01:00
> pure $ SpecialOpK [Name Nothing opName] fa $ catMaybes as
> where
> parseArg (nm,mand) =
2016-02-22 22:24:25 +01:00
> let p = keyword_ nm >> scalarExpr
> in fmap (nm,) <$> if mand
> then Just <$> p
> else optionMaybe (try p)
The actual operators:
EXTRACT( date_part FROM expression )
POSITION( string1 IN string2 )
SUBSTRING(extraction_string FROM starting_position [FOR length]
[COLLATE collation_name])
CONVERT(char_value USING conversion_char_name)
TRANSLATE(char_value USING translation_name)
OVERLAY(string PLACING embedded_string FROM start
[FOR length])
TRIM( [ [{LEADING | TRAILING | BOTH}] [removal_char] FROM ]
target_string
[COLLATE collation_name] )
2016-02-22 22:24:25 +01:00
> specialOpKs :: Parser ScalarExpr
> specialOpKs = choice $ map try
> [extract, position, substring, convert, translate, overlay, trim]
2013-12-14 09:55:44 +01:00
2016-02-22 22:24:25 +01:00
> extract :: Parser ScalarExpr
> extract = specialOpK "extract" SOKMandatory [("from", True)]
2013-12-13 21:38:43 +01:00
2016-02-22 22:24:25 +01:00
> position :: Parser ScalarExpr
> position = specialOpK "position" SOKMandatory [("in", True)]
2013-12-14 09:55:44 +01:00
strictly speaking, the substring must have at least one of from and
for, but the parser doens't enforce this
2013-12-14 09:55:44 +01:00
2016-02-22 22:24:25 +01:00
> substring :: Parser ScalarExpr
> substring = specialOpK "substring" SOKMandatory
> [("from", False),("for", False)]
2016-02-22 22:24:25 +01:00
> convert :: Parser ScalarExpr
> convert = specialOpK "convert" SOKMandatory [("using", True)]
2016-02-22 22:24:25 +01:00
> translate :: Parser ScalarExpr
> translate = specialOpK "translate" SOKMandatory [("using", True)]
2016-02-22 22:24:25 +01:00
> overlay :: Parser ScalarExpr
> overlay = specialOpK "overlay" SOKMandatory
> [("placing", True),("from", True),("for", False)]
trim is too different because of the optional char, so a custom parser
the both ' ' is filled in as the default if either parts are missing
in the source
2016-02-22 22:24:25 +01:00
> trim :: Parser ScalarExpr
> trim =
> keyword "trim" >>
> parens (mkTrim
> <$> option "both" sides
> <*> option " " singleQuotesOnlyStringTok
2016-02-22 22:24:25 +01:00
> <*> (keyword_ "from" *> scalarExpr))
> where
> sides = choice ["leading" <$ keyword_ "leading"
> ,"trailing" <$ keyword_ "trailing"
> ,"both" <$ keyword_ "both"]
> mkTrim fa ch fr =
2016-02-12 13:13:47 +01:00
> SpecialOpK [Name Nothing "trim"] Nothing
> $ catMaybes [Just (fa,StringLit "'" "'" ch)
> ,Just ("from", fr)]
2014-04-19 12:10:46 +02:00
=== app, aggregate, window
2014-05-09 22:26:18 +02:00
This parses all these variations:
2016-02-22 22:24:25 +01:00
normal function application with just a csv of scalar exprs
2014-05-09 22:26:18 +02:00
aggregate variations (distinct, order by in parens, filter and where
suffixes)
window apps (fn/agg followed by over)
This code is also a little dense like the typename code because of
left factoring, later they will even have to be partially combined
together.
2014-05-09 22:26:18 +02:00
2016-02-22 22:24:25 +01:00
> app :: Parser ([Name] -> ScalarExpr)
2014-05-09 22:26:18 +02:00
> app =
> openParen *> choice
> [duplicates
2016-02-22 22:24:25 +01:00
> <**> (commaSep1 scalarExpr
> <**> (((option [] orderBy) <* closeParen)
> <**> (optionMaybe afilter <$$$$$> AggregateApp)))
> -- separate cases with no all or distinct which must have at
2016-02-22 22:24:25 +01:00
> -- least one scalar expr
> ,commaSep1 scalarExpr
> <**> choice
> [closeParen *> choice
> [window
> ,withinGroup
> ,(Just <$> afilter) <$$$> aggAppWithoutDupeOrd
> ,pure (flip App)]
> ,orderBy <* closeParen
> <**> (optionMaybe afilter <$$$$> aggAppWithoutDupe)]
2016-02-22 22:24:25 +01:00
> -- no scalarExprs: duplicates and order by not allowed
2014-06-20 11:27:23 +02:00
> ,([] <$ closeParen) <**> option (flip App) (window <|> withinGroup)
> ]
2014-04-19 12:10:46 +02:00
> where
> aggAppWithoutDupeOrd n es f = AggregateApp n SQDefault es [] f
> aggAppWithoutDupe n = AggregateApp n SQDefault
2016-02-22 22:24:25 +01:00
> afilter :: Parser ScalarExpr
> afilter = keyword_ "filter" *> parens (keyword_ "where" *> scalarExpr)
2014-04-19 17:01:49 +02:00
2016-02-22 22:24:25 +01:00
> withinGroup :: Parser ([ScalarExpr] -> [Name] -> ScalarExpr)
2014-05-09 22:26:18 +02:00
> withinGroup =
> (keywords_ ["within", "group"] *> parens orderBy) <$$$> AggregateAppGroup
2014-04-19 12:10:46 +02:00
2014-05-09 22:26:18 +02:00
==== window
2014-04-19 12:10:46 +02:00
parse a window call as a suffix of a regular function call
this looks like this:
functionname(args) over ([partition by ids] [order by orderitems])
No support for explicit frames yet.
2014-05-09 22:26:18 +02:00
TODO: add window support for other aggregate variations, needs some
changes to the syntax also
2016-02-22 22:24:25 +01:00
> window :: Parser ([ScalarExpr] -> [Name] -> ScalarExpr)
> window =
> keyword_ "over" *> openParen *> option [] partitionBy
> <**> (option [] orderBy
> <**> (((optionMaybe frameClause) <* closeParen) <$$$$$> WindowApp))
2014-04-19 12:10:46 +02:00
> where
2016-02-22 22:24:25 +01:00
> partitionBy = keywords_ ["partition","by"] *> commaSep1 scalarExpr
2014-04-19 12:10:46 +02:00
> frameClause =
> frameRowsRange -- TODO: this 'and' could be an issue
> <**> (choice [(keyword_ "between" *> frameLimit True)
> <**> ((keyword_ "and" *> frameLimit True)
> <$$$> FrameBetween)
> -- maybe this should still use a b expression
> -- for consistency
> ,frameLimit False <**> pure (flip FrameFrom)])
> frameRowsRange = FrameRows <$ keyword_ "rows"
> <|> FrameRange <$ keyword_ "range"
2014-04-19 12:10:46 +02:00
> frameLimit useB =
> choice
> [Current <$ keywords_ ["current", "row"]
> -- todo: create an automatic left factor for stuff like this
> ,keyword_ "unbounded" *>
2014-04-19 12:10:46 +02:00
> choice [UnboundedPreceding <$ keyword_ "preceding"
> ,UnboundedFollowing <$ keyword_ "following"]
2016-02-22 22:24:25 +01:00
> ,(if useB then scalarExprB else scalarExpr)
> <**> (Preceding <$ keyword_ "preceding"
> <|> Following <$ keyword_ "following")
2014-04-19 12:10:46 +02:00
> ]
== suffixes
2016-02-22 22:24:25 +01:00
These are all generic suffixes on any scalar expr
2014-04-19 12:10:46 +02:00
=== in
2013-12-14 09:55:44 +01:00
in: two variations:
a in (expr0, expr1, ...)
a in (queryexpr)
2016-02-22 22:24:25 +01:00
> inSuffix :: Parser (ScalarExpr -> ScalarExpr)
> inSuffix =
> mkIn <$> inty
> <*> parens (choice
> [InQueryExpr <$> queryExpr
2016-02-22 22:24:25 +01:00
> ,InList <$> commaSep1 scalarExpr])
> where
> inty = choice [True <$ keyword_ "in"
2014-04-18 11:28:05 +02:00
> ,False <$ keywords_ ["not","in"]]
> mkIn i v = \e -> In i e v
2014-04-19 12:10:46 +02:00
=== between
2013-12-14 09:55:44 +01:00
between:
expr between expr and expr
There is a complication when parsing between - when parsing the second
expression it is ambiguous when you hit an 'and' whether it is a
binary operator or part of the between. This code follows what
postgres does, which might be standard across SQL implementations,
which is that you can't have a binary and operator in the middle
expression in a between unless it is wrapped in parens. The 'bExpr
2016-02-22 22:24:25 +01:00
parsing' is used to create alternative scalar expression parser which
2013-12-14 09:55:44 +01:00
is identical to the normal one expect it doesn't recognise the binary
2016-02-22 22:24:25 +01:00
and operator. This is the call to scalarExprB.
2013-12-14 09:55:44 +01:00
2016-02-22 22:24:25 +01:00
> betweenSuffix :: Parser (ScalarExpr -> ScalarExpr)
> betweenSuffix =
2016-02-12 13:13:47 +01:00
> makeOp <$> Name Nothing <$> opName
2016-02-22 22:24:25 +01:00
> <*> scalarExprB
> <*> (keyword_ "and" *> scalarExprB)
2013-12-13 20:13:36 +01:00
> where
> opName = choice
2013-12-13 20:13:36 +01:00
> ["between" <$ keyword_ "between"
2014-04-18 11:28:05 +02:00
> ,"not between" <$ try (keywords_ ["not","between"])]
> makeOp n b c = \a -> SpecialOp [n] [a,b,c]
2013-12-13 20:13:36 +01:00
2014-04-19 12:10:46 +02:00
=== quantified comparison
a = any (select * from t)
2016-02-22 22:24:25 +01:00
> quantifiedComparisonSuffix :: Parser (ScalarExpr -> ScalarExpr)
2014-04-19 12:22:11 +02:00
> quantifiedComparisonSuffix = do
> c <- comp
> cq <- compQuan
> q <- parens queryExpr
> pure $ \v -> QuantifiedComparison v [c] cq q
> where
2016-02-12 13:13:47 +01:00
> comp = Name Nothing <$> choice (map symbol
> ["=", "<>", "<=", "<", ">", ">="])
> compQuan = choice
> [CPAny <$ keyword_ "any"
> ,CPSome <$ keyword_ "some"
> ,CPAll <$ keyword_ "all"]
2014-04-19 12:10:46 +02:00
=== match
a match (select a from t)
2016-02-22 22:24:25 +01:00
> matchPredicateSuffix :: Parser (ScalarExpr -> ScalarExpr)
2014-04-19 12:22:11 +02:00
> matchPredicateSuffix = do
> keyword_ "match"
> u <- option False (True <$ keyword_ "unique")
> q <- parens queryExpr
> pure $ \v -> Match v u q
2014-04-19 12:10:46 +02:00
=== array subscript
2016-02-22 22:24:25 +01:00
> arraySuffix :: Parser (ScalarExpr -> ScalarExpr)
2014-04-19 12:22:11 +02:00
> arraySuffix = do
2016-02-22 22:24:25 +01:00
> es <- brackets (commaSep scalarExpr)
> pure $ \v -> Array v es
2014-04-19 12:10:46 +02:00
=== escape
2014-04-18 19:50:24 +02:00
2015-07-31 23:04:18 +02:00
It is going to be really difficult to support an arbitrary character
for the escape now there is a separate lexer ...
2016-02-12 13:13:47 +01:00
TODO: this needs fixing. Escape is only part of other nodes, and not a
separate suffix.
2016-02-22 22:24:25 +01:00
> {-escapeSuffix :: Parser (ScalarExpr -> ScalarExpr)
2014-04-19 12:22:11 +02:00
> escapeSuffix = do
> ctor <- choice
> [Escape <$ keyword_ "escape"
> ,UEscape <$ keyword_ "uescape"]
2015-07-31 23:04:18 +02:00
> c <- escapeChar
> pure $ \v -> ctor v c
2015-07-31 23:04:18 +02:00
> where
> escapeChar :: Parser Char
> escapeChar = (identifierTok [] Nothing <|> symbolTok Nothing) >>= oneOnly
> oneOnly :: String -> Parser Char
> oneOnly c = case c of
2015-07-31 23:04:18 +02:00
> [c'] -> return c'
> _ -> fail "escape char must be single char"
2016-02-12 13:13:47 +01:00
> -}
2014-04-19 12:10:46 +02:00
=== collate
2016-02-22 22:24:25 +01:00
> collateSuffix:: Parser (ScalarExpr -> ScalarExpr)
2014-04-19 12:22:11 +02:00
> collateSuffix = do
> keyword_ "collate"
> i <- names
> pure $ \v -> Collate v i
2016-02-21 22:43:19 +01:00
== odbc syntax
the parser supports three kinds of odbc syntax, two of which are
scalar expressions (the other is a variation on joins)
2016-02-22 22:24:25 +01:00
> odbcExpr :: Parser ScalarExpr
2016-02-21 22:43:19 +01:00
> odbcExpr = between (symbol "{") (symbol "}")
> (odbcTimeLit <|> odbcFunc)
> where
> odbcTimeLit =
> OdbcLiteral <$> choice [OLDate <$ keyword "d"
> ,OLTime <$ keyword "t"
> ,OLTimestamp <$ keyword "ts"]
> <*> singleQuotesOnlyStringTok
> -- todo: this parser is too general, the expr part
> -- should be only a function call (from a whitelist of functions)
> -- or the extract operator
2016-02-22 22:24:25 +01:00
> odbcFunc = OdbcFunc <$> (keyword "fn" *> scalarExpr)
2013-12-14 09:55:44 +01:00
2014-04-19 12:10:46 +02:00
== operators
2013-12-14 09:55:44 +01:00
The 'regular' operators in this parsing and in the abstract syntax are
unary prefix, unary postfix and binary infix operators. The operators
can be symbols (a + b), single keywords (a and b) or multiple keywords
(a is similar to b).
2013-12-31 11:20:07 +01:00
TODO: carefully review the precedences and associativities.
TODO: to fix the parsing completely, I think will need to parse
without precedence and associativity and fix up afterwards, since SQL
syntax is way too messy. It might be possible to avoid this if we
wanted to avoid extensibility and to not be concerned with parse error
2014-04-19 12:10:46 +02:00
messages, but both of these are too important.
2016-02-22 22:24:25 +01:00
> opTable :: Bool -> [[E.Operator [Token] ParseState Identity ScalarExpr]]
> opTable bExpr =
> [-- parse match and quantified comparisons as postfix ops
> -- todo: left factor the quantified comparison with regular
> -- binary comparison, somehow
2014-04-19 12:22:11 +02:00
> [E.Postfix $ try quantifiedComparisonSuffix
> ,E.Postfix matchPredicateSuffix
> ]
> ,[binarySym "." E.AssocLeft]
2014-04-19 12:22:11 +02:00
> ,[postfix' arraySuffix
> ,postfix' collateSuffix]
> ,[prefixSym "+", prefixSym "-"]
> ,[binarySym "^" E.AssocLeft]
> ,[binarySym "*" E.AssocLeft
> ,binarySym "/" E.AssocLeft
> ,binarySym "%" E.AssocLeft]
> ,[binarySym "+" E.AssocLeft
> ,binarySym "-" E.AssocLeft]
> ,[binarySym "||" E.AssocRight
> ,prefixSym "~"
> ,binarySym "&" E.AssocRight
> ,binarySym "|" E.AssocRight]
> ,[binaryKeyword "overlaps" E.AssocNone]
> ,[binaryKeyword "like" E.AssocNone
> -- have to use try with inSuffix because of a conflict
> -- with 'in' in position function, and not between
> -- between also has a try in it to deal with 'not'
> -- ambiguity
> ,E.Postfix $ try inSuffix
> ,E.Postfix betweenSuffix]
> -- todo: figure out where to put the try?
> ++ [binaryKeywords $ makeKeywordTree
> ["not like"
> ,"is similar to"
> ,"is not similar to"]]
> ++ [multisetBinOp]
> ,[binarySym "<" E.AssocNone
> ,binarySym ">" E.AssocNone
> ,binarySym ">=" E.AssocNone
> ,binarySym "<=" E.AssocNone
> ,binarySym "!=" E.AssocRight
> ,binarySym "<>" E.AssocRight
> ,binarySym "=" E.AssocRight]
> ,[postfixKeywords $ makeKeywordTree
> ["is null"
> ,"is not null"
> ,"is true"
> ,"is not true"
> ,"is false"
> ,"is not false"
> ,"is unknown"
> ,"is not unknown"]]
> ++ [binaryKeywords $ makeKeywordTree
> ["is distinct from"
> ,"is not distinct from"]]
> ,[prefixKeyword "not"]
> ,if bExpr then [] else [binaryKeyword "and" E.AssocLeft]
> ,[binaryKeyword "or" E.AssocLeft]
> ]
> where
> binarySym nm assoc = binary (symbol_ nm) nm assoc
> binaryKeyword nm assoc = binary (keyword_ nm) nm assoc
> binaryKeywords p =
> E.Infix (do
> o <- try p
2016-02-12 13:13:47 +01:00
> pure (\a b -> BinOp a [Name Nothing $ unwords o] b))
> E.AssocNone
> postfixKeywords p =
> postfix' $ do
> o <- try p
2016-02-12 13:13:47 +01:00
> pure $ PostfixOp [Name Nothing $ unwords o]
> binary p nm assoc =
2016-02-12 13:13:47 +01:00
> E.Infix (p >> pure (\a b -> BinOp a [Name Nothing nm] b)) assoc
2014-04-18 19:50:24 +02:00
> multisetBinOp = E.Infix (do
> keyword_ "multiset"
> o <- choice [Union <$ keyword_ "union"
> ,Intersect <$ keyword_ "intersect"
> ,Except <$ keyword_ "except"]
2014-05-09 22:26:18 +02:00
> d <- option SQDefault duplicates
> pure (\a b -> MultisetBinOp a o d b))
2014-04-18 19:50:24 +02:00
> E.AssocLeft
> prefixKeyword nm = prefix (keyword_ nm) nm
> prefixSym nm = prefix (symbol_ nm) nm
2016-02-12 13:13:47 +01:00
> prefix p nm = prefix' (p >> pure (PrefixOp [Name Nothing nm]))
> -- hack from here
> -- http://stackoverflow.com/questions/10475337/parsec-expr-repeated-prefix-postfix-operator-not-supported
> -- not implemented properly yet
> -- I don't think this will be enough for all cases
> -- at least it works for 'not not a'
> -- ok: "x is not true is not true"
> -- no work: "x is not true is not null"
> prefix' p = E.Prefix . chainl1 p $ pure (.)
> postfix' p = E.Postfix . chainl1 p $ pure (flip (.))
2013-12-13 11:39:26 +01:00
2016-02-22 22:24:25 +01:00
== scalar expression top level
2016-02-22 22:24:25 +01:00
This parses most of the scalar exprs.The order of the parsers and use
of try is carefully done to make everything work. It is a little
fragile and could at least do with some heavy explanation. Update: the
'try's have migrated into the individual parsers, they still need
documenting/fixing.
2013-12-13 11:39:26 +01:00
2016-02-22 22:24:25 +01:00
> scalarExpr :: Parser ScalarExpr
> scalarExpr = E.buildExpressionParser (opTable False) term
2016-02-22 22:24:25 +01:00
> term :: Parser ScalarExpr
2014-04-19 12:22:11 +02:00
> term = choice [simpleLiteral
> ,parameter
> ,positionalArg
2014-04-19 12:10:46 +02:00
> ,star
2014-04-19 12:22:11 +02:00
> ,parensExpr
> ,caseExpr
> ,cast
> ,arrayCtor
2014-04-18 19:50:24 +02:00
> ,multisetCtor
> ,nextValueFor
> ,subquery
2014-04-19 12:22:11 +02:00
> ,intervalLit
2014-04-19 12:10:46 +02:00
> ,specialOpKs
2016-02-21 22:43:19 +01:00
> ,idenExpr
> ,odbcExpr]
2016-02-22 22:24:25 +01:00
> <?> "scalar expression"
2013-12-13 11:39:26 +01:00
expose the b expression for window frame clause range between
2016-02-22 22:24:25 +01:00
> scalarExprB :: Parser ScalarExpr
> scalarExprB = E.buildExpressionParser (opTable True) term
2014-04-19 12:10:46 +02:00
== helper parsers
2014-04-19 14:10:45 +02:00
This is used in interval literals and in interval type names.
> intervalQualifier :: Parser (IntervalTypeField,Maybe IntervalTypeField)
> intervalQualifier =
> (,) <$> intervalField
> <*> optionMaybe (keyword_ "to" *> intervalField)
> where
> intervalField =
> Itf
> <$> datetimeField
> <*> optionMaybe
> (parens ((,) <$> unsignedInteger
> <*> optionMaybe (comma *> unsignedInteger)))
2014-04-19 14:10:45 +02:00
TODO: use datetime field in extract also
use a data type for the datetime field?
> datetimeField :: Parser String
> datetimeField = choice (map keyword ["year","month","day"
> ,"hour","minute","second"])
> <?> "datetime field"
2016-02-22 22:24:25 +01:00
This is used in multiset operations (scalar expr), selects (query expr)
2014-04-19 14:10:45 +02:00
and set operations (query expr).
2014-05-09 22:26:18 +02:00
> duplicates :: Parser SetQuantifier
> duplicates =
> choice [All <$ keyword_ "all"
> ,Distinct <$ keyword "distinct"]
2013-12-13 11:39:26 +01:00
-------------------------------------------------
= query expressions
2013-12-14 09:55:44 +01:00
== select lists
2013-12-13 16:27:02 +01:00
2016-02-22 22:24:25 +01:00
> selectItem :: Parser (ScalarExpr,Maybe Name)
> selectItem = (,) <$> scalarExpr <*> optionMaybe als
> where als = optional (keyword_ "as") *> name
2013-12-13 11:39:26 +01:00
2016-02-22 22:24:25 +01:00
> selectList :: Parser [(ScalarExpr,Maybe Name)]
2013-12-13 16:27:02 +01:00
> selectList = commaSep1 selectItem
2013-12-13 11:39:26 +01:00
2013-12-14 09:55:44 +01:00
== from
2013-12-14 12:05:02 +01:00
Here is the rough grammar for joins
2013-12-14 09:55:44 +01:00
tref
2013-12-14 12:05:02 +01:00
(cross | [natural] ([inner] | (left | right | full) [outer])) join
tref
2013-12-14 09:55:44 +01:00
[on expr | using (...)]
2014-09-13 09:45:45 +02:00
TODO: either use explicit 'operator precedence' parsers or build
expression parser for the 'tref operators' such as joins, lateral,
aliases.
> from :: Parser [TableRef]
> from = keyword_ "from" *> commaSep1 tref
2013-12-13 11:39:26 +01:00
> where
> -- TODO: use P (a->) for the join tref suffix
> -- chainl or buildexpressionparser
2013-12-14 12:05:02 +01:00
> tref = nonJoinTref >>= optionSuffix joinTrefSuffix
> nonJoinTref = choice
> [parens $ choice
> [TRQueryExpr <$> queryExpr
> ,TRParens <$> tref]
> ,TRLateral <$> (keyword_ "lateral"
> *> nonJoinTref)
> ,do
> n <- names
> choice [TRFunction n
2016-02-22 22:24:25 +01:00
> <$> parens (commaSep scalarExpr)
2016-02-21 22:48:55 +01:00
> ,pure $ TRSimple n]
> -- todo: I think you can only have outer joins inside the oj,
> -- not sure.
> ,TROdbc <$> (symbol "{" *> keyword_ "oj" *> tref <* symbol "}")
> ] <??> aliasSuffix
> aliasSuffix = fromAlias <$$> TRAlias
> joinTrefSuffix t =
> (TRJoin t <$> option False (True <$ keyword_ "natural")
> <*> joinType
2013-12-14 13:10:46 +01:00
> <*> nonJoinTref
> <*> optionMaybe joinCondition)
2013-12-14 12:05:02 +01:00
> >>= optionSuffix joinTrefSuffix
2014-04-19 14:10:45 +02:00
TODO: factor the join stuff to produce better error messages (and make
it more readable)
2014-04-18 11:28:05 +02:00
> joinType :: Parser JoinType
> joinType = choice
> [JCross <$ keyword_ "cross" <* keyword_ "join"
> ,JInner <$ keyword_ "inner" <* keyword_ "join"
> ,JLeft <$ keyword_ "left"
> <* optional (keyword_ "outer")
> <* keyword_ "join"
> ,JRight <$ keyword_ "right"
> <* optional (keyword_ "outer")
> <* keyword_ "join"
> ,JFull <$ keyword_ "full"
> <* optional (keyword_ "outer")
> <* keyword_ "join"
> ,JInner <$ keyword_ "join"]
> joinCondition :: Parser JoinCondition
2014-04-19 14:10:45 +02:00
> joinCondition = choice
2016-02-22 22:24:25 +01:00
> [keyword_ "on" >> JoinOn <$> scalarExpr
2014-04-19 14:10:45 +02:00
> ,keyword_ "using" >> JoinUsing <$> parens (commaSep1 name)]
2013-12-13 11:39:26 +01:00
2014-04-19 14:10:45 +02:00
> fromAlias :: Parser Alias
> fromAlias = Alias <$> tableAlias <*> columnAliases
> where
> tableAlias = optional (keyword_ "as") *> name
> columnAliases = optionMaybe $ parens $ commaSep1 name
2013-12-14 09:55:44 +01:00
== simple other parts
Parsers for where, group by, having, order by and limit, which are
pretty trivial.
2016-02-22 22:24:25 +01:00
> whereClause :: Parser ScalarExpr
> whereClause = keyword_ "where" *> scalarExpr
2013-12-13 11:39:26 +01:00
> groupByClause :: Parser [GroupingExpr]
2014-04-19 14:10:45 +02:00
> groupByClause = keywords_ ["group","by"] *> commaSep1 groupingExpression
2013-12-17 18:27:09 +01:00
> where
2014-04-19 14:10:45 +02:00
> groupingExpression = choice
> [keyword_ "cube" >>
2013-12-17 18:27:09 +01:00
> Cube <$> parens (commaSep groupingExpression)
> ,keyword_ "rollup" >>
2013-12-17 18:27:09 +01:00
> Rollup <$> parens (commaSep groupingExpression)
> ,GroupingParens <$> parens (commaSep groupingExpression)
2014-04-18 11:28:05 +02:00
> ,keywords_ ["grouping", "sets"] >>
2013-12-17 18:27:09 +01:00
> GroupingSets <$> parens (commaSep groupingExpression)
2016-02-22 22:24:25 +01:00
> ,SimpleGroup <$> scalarExpr
2013-12-17 18:27:09 +01:00
> ]
2013-12-13 11:39:26 +01:00
2016-02-22 22:24:25 +01:00
> having :: Parser ScalarExpr
> having = keyword_ "having" *> scalarExpr
2013-12-13 11:39:26 +01:00
> orderBy :: Parser [SortSpec]
2014-04-18 11:28:05 +02:00
> orderBy = keywords_ ["order","by"] *> commaSep1 ob
2013-12-13 16:08:10 +01:00
> where
> ob = SortSpec
2016-02-22 22:24:25 +01:00
> <$> scalarExpr
> <*> option DirDefault (choice [Asc <$ keyword_ "asc"
> ,Desc <$ keyword_ "desc"])
2013-12-17 17:28:31 +01:00
> <*> option NullsOrderDefault
2014-04-18 11:28:05 +02:00
> -- todo: left factor better
> (keyword_ "nulls" >>
2013-12-17 17:28:31 +01:00
> choice [NullsFirst <$ keyword "first"
> ,NullsLast <$ keyword "last"])
2013-12-13 11:39:26 +01:00
allows offset and fetch in either order
+ postgresql offset without row(s) and limit instead of fetch also
2016-02-22 22:24:25 +01:00
> offsetFetch :: Parser (Maybe ScalarExpr, Maybe ScalarExpr)
> offsetFetch = permute ((,) <$?> (Nothing, Just <$> offset)
> <|?> (Nothing, Just <$> fetch))
2013-12-13 16:27:02 +01:00
2016-02-22 22:24:25 +01:00
> offset :: Parser ScalarExpr
> offset = keyword_ "offset" *> scalarExpr
> <* option () (choice [keyword_ "rows"
> ,keyword_ "row"])
2016-02-22 22:24:25 +01:00
> fetch :: Parser ScalarExpr
2014-06-27 11:19:15 +02:00
> fetch = fetchFirst <|> limit
2014-04-18 22:51:05 +02:00
> where
> fetchFirst = guardDialect diFetchFirst
2016-02-22 22:24:25 +01:00
> *> fs *> scalarExpr <* ro
> fs = makeKeywordTree ["fetch first", "fetch next"]
> ro = makeKeywordTree ["rows only", "row only"]
2014-06-27 11:19:15 +02:00
> -- todo: not in ansi sql dialect
> limit = guardDialect diLimit *>
2016-02-22 22:24:25 +01:00
> keyword_ "limit" *> scalarExpr
2013-12-13 16:27:02 +01:00
2013-12-14 09:55:44 +01:00
== common table expressions
> with :: Parser QueryExpr
> with = keyword_ "with" >>
> With <$> option False (True <$ keyword_ "recursive")
> <*> commaSep1 withQuery <*> queryExpr
> where
2019-08-31 10:30:42 +02:00
> withQuery = (,) <$> (withAlias <* keyword_ "as")
2014-04-19 14:10:45 +02:00
> <*> parens queryExpr
2019-08-31 10:30:42 +02:00
> withAlias = Alias <$> name <*> columnAliases
> columnAliases = optionMaybe $ parens $ commaSep1 name
2013-12-13 16:27:02 +01:00
2013-12-14 09:55:44 +01:00
== query expression
This parser parses any query expression variant: normal select, cte,
and union, etc..
> queryExpr :: Parser QueryExpr
2014-04-19 14:10:45 +02:00
> queryExpr = choice
> [with
> ,chainr1 (choice [values,table, select]) setOp]
> where
> select = keyword_ "select" >>
> mkSelect
2014-05-09 22:26:18 +02:00
> <$> option SQDefault duplicates
> <*> selectList
> <*> optionMaybe tableExpression
> mkSelect d sl Nothing =
> makeSelect{qeSetQuantifier = d, qeSelectList = sl}
> mkSelect d sl (Just (TableExpression f w g h od ofs fe)) =
> Select d sl f w g h od ofs fe
> values = keyword_ "values"
2016-02-22 22:24:25 +01:00
> >> Values <$> commaSep (parens (commaSep scalarExpr))
> table = keyword_ "table" >> Table <$> names
local data type to help with parsing the bit after the select list,
called 'table expression' in the ansi sql grammar. Maybe this should
be in the public syntax?
> data TableExpression
> = TableExpression
> {_teFrom :: [TableRef]
2016-02-22 22:24:25 +01:00
> ,_teWhere :: Maybe ScalarExpr
> ,_teGroupBy :: [GroupingExpr]
2016-02-22 22:24:25 +01:00
> ,_teHaving :: Maybe ScalarExpr
> ,_teOrderBy :: [SortSpec]
2016-02-22 22:24:25 +01:00
> ,_teOffset :: Maybe ScalarExpr
> ,_teFetchFirst :: Maybe ScalarExpr}
> tableExpression :: Parser TableExpression
2014-04-19 14:10:45 +02:00
> tableExpression = mkTe <$> from
> <*> optionMaybe whereClause
> <*> option [] groupByClause
> <*> optionMaybe having
> <*> option [] orderBy
> <*> offsetFetch
> where
> mkTe f w g h od (ofs,fe) =
> TableExpression f w g h od ofs fe
2013-12-13 22:41:12 +01:00
> setOp :: Parser (QueryExpr -> QueryExpr -> QueryExpr)
> setOp = cq
> <$> setOpK
2014-05-09 22:26:18 +02:00
> <*> option SQDefault duplicates
> <*> corr
2014-04-19 14:10:45 +02:00
> where
> cq o d c q0 q1 = QueryExprSetOp q0 o d c q1
> setOpK = choice [Union <$ keyword_ "union"
> ,Intersect <$ keyword_ "intersect"
> ,Except <$ keyword_ "except"]
2014-04-19 14:10:45 +02:00
> <?> "set operator"
> corr = option Respectively (Corresponding <$ keyword_ "corresponding")
2013-12-13 11:39:26 +01:00
2013-12-14 09:55:44 +01:00
wrapper for query expr which ignores optional trailing semicolon.
TODO: change style
> topLevelQueryExpr :: Parser QueryExpr
> topLevelQueryExpr = queryExpr <??> (id <$ semi)
2013-12-14 09:55:44 +01:00
2015-08-01 22:16:26 +02:00
> topLevelStatement :: Parser Statement
> topLevelStatement = statement <??> (id <$ semi)
-------------------------
= Statements
> statement :: Parser Statement
> statement = choice
2015-08-02 19:56:39 +02:00
> [keyword_ "create" *> choice [createSchema
2015-08-02 22:52:01 +02:00
> ,createTable
2015-08-04 21:08:32 +02:00
> ,createView
2015-08-04 21:35:51 +02:00
> ,createDomain
> ,createSequence
2015-08-16 19:03:02 +02:00
> ,createRole
> ,createAssertion]
2015-08-04 21:08:32 +02:00
> ,keyword_ "alter" *> choice [alterTable
2015-08-04 21:35:51 +02:00
> ,alterDomain
> ,alterSequence]
2015-08-02 22:27:09 +02:00
> ,keyword_ "drop" *> choice [dropSchema
2015-08-02 22:52:01 +02:00
> ,dropTable
2015-08-04 21:08:32 +02:00
> ,dropView
2015-08-04 21:35:51 +02:00
> ,dropDomain
> ,dropSequence
2015-08-16 19:03:02 +02:00
> ,dropRole
> ,dropAssertion]
> ,delete
> ,truncateSt
> ,insert
> ,update
2015-08-04 21:53:08 +02:00
> ,startTransaction
> ,savepoint
> ,releaseSavepoint
> ,commit
> ,rollback
> ,grant
> ,revoke
> ,SelectStatement <$> queryExpr
> ]
> createSchema :: Parser Statement
> createSchema = keyword_ "schema" >>
> CreateSchema <$> names
2015-08-01 22:16:26 +02:00
> createTable :: Parser Statement
> createTable = keyword_ "table" >>
> CreateTable
> <$> names
> -- todo: is this order mandatory or is it a perm?
> <*> parens (commaSep1 (uncurry TableConstraintDef <$> tableConstraintDef
2015-08-02 19:56:39 +02:00
> <|> TableColumnDef <$> columnDef))
> columnDef :: Parser ColumnDef
> columnDef = ColumnDef <$> name <*> typeName
> <*> optionMaybe defaultClause
> <*> option [] (many1 colConstraintDef)
2015-08-01 22:16:26 +02:00
> where
> defaultClause = choice [
> keyword_ "default" >>
2016-02-22 22:24:25 +01:00
> DefaultClause <$> scalarExpr
> -- todo: left factor
> ,try (keywords_ ["generated","always","as"] >>
2016-02-22 22:24:25 +01:00
> GenerationClause <$> parens scalarExpr)
> ,keyword_ "generated" >>
> IdentityColumnSpec
> <$> (GeneratedAlways <$ keyword_ "always"
> <|> GeneratedByDefault <$ keywords_ ["by", "default"])
> <*> (keywords_ ["as", "identity"] *>
> option [] (parens sequenceGeneratorOptions))
> ]
> tableConstraintDef :: Parser (Maybe [Name], TableConstraint)
> tableConstraintDef =
> (,)
> <$> (optionMaybe (keyword_ "constraint" *> names))
> <*> (unique <|> primaryKey <|> check <|> references)
> where
> unique = keyword_ "unique" >>
> TableUniqueConstraint <$> parens (commaSep1 name)
> primaryKey = keywords_ ["primary", "key"] >>
> TablePrimaryKeyConstraint <$> parens (commaSep1 name)
2016-02-22 22:24:25 +01:00
> check = keyword_ "check" >> TableCheckConstraint <$> parens scalarExpr
> references = keywords_ ["foreign", "key"] >>
> (\cs ft ftcs m (u,d) -> TableReferencesConstraint cs ft ftcs m u d)
> <$> parens (commaSep1 name)
> <*> (keyword_ "references" *> names)
> <*> optionMaybe (parens $ commaSep1 name)
> <*> refMatch
> <*> refActions
> refMatch :: Parser ReferenceMatch
> refMatch = option DefaultReferenceMatch
> (keyword_ "match" *>
> choice [MatchFull <$ keyword_ "full"
> ,MatchPartial <$ keyword_ "partial"
> ,MatchSimple <$ keyword_ "simple"])
> refActions :: Parser (ReferentialAction,ReferentialAction)
> refActions = permute ((,) <$?> (DefaultReferentialAction, onUpdate)
> <|?> (DefaultReferentialAction, onDelete))
> where
> -- todo: left factor?
> onUpdate = try (keywords_ ["on", "update"]) *> referentialAction
> onDelete = try (keywords_ ["on", "delete"]) *> referentialAction
> referentialAction = choice [
> RefCascade <$ keyword_ "cascade"
> -- todo: left factor?
> ,RefSetNull <$ try (keywords_ ["set", "null"])
> ,RefSetDefault <$ try (keywords_ ["set", "default"])
> ,RefRestrict <$ keyword_ "restrict"
> ,RefNoAction <$ keywords_ ["no", "action"]]
> colConstraintDef :: Parser ColConstraintDef
> colConstraintDef =
> ColConstraintDef
> <$> (optionMaybe (keyword_ "constraint" *> names))
> <*> (notNull <|> unique <|> primaryKey <|> check <|> references)
> where
> notNull = ColNotNullConstraint <$ keywords_ ["not", "null"]
> unique = ColUniqueConstraint <$ keyword_ "unique"
> primaryKey = ColPrimaryKeyConstraint <$ keywords_ ["primary", "key"]
2016-02-22 22:24:25 +01:00
> check = keyword_ "check" >> ColCheckConstraint <$> parens scalarExpr
> references = keyword_ "references" >>
> (\t c m (ou,od) -> ColReferencesConstraint t c m ou od)
> <$> names
> <*> optionMaybe (parens name)
> <*> refMatch
> <*> refActions
slightly hacky parser for signed integers
> signedInteger :: Parser Integer
2015-08-02 19:56:39 +02:00
> signedInteger =
> (*) <$> option 1 (1 <$ symbol "+" <|> (-1) <$ symbol "-")
> <*> unsignedInteger
2015-08-04 21:35:51 +02:00
> sequenceGeneratorOptions :: Parser [SequenceGeneratorOption]
> sequenceGeneratorOptions =
> -- todo: could try to combine exclusive options
> -- such as cycle and nocycle
> -- sort out options which are sometimes not allowed
> -- as datatype, and restart with
> permute ((\a b c d e f g h j k -> catMaybes [a,b,c,d,e,f,g,h,j,k])
> <$?> nj startWith
> <|?> nj dataType
> <|?> nj restart
> <|?> nj incrementBy
> <|?> nj maxValue
> <|?> nj noMaxValue
> <|?> nj minValue
> <|?> nj noMinValue
> <|?> nj scycle
> <|?> nj noCycle
> )
> where
> nj p = (Nothing,Just <$> p)
> startWith = keywords_ ["start", "with"] >>
> SGOStartWith <$> signedInteger
> dataType = keyword_ "as" >>
> SGODataType <$> typeName
> restart = keyword_ "restart" >>
> SGORestart <$> optionMaybe (keyword_ "with" *> signedInteger)
> incrementBy = keywords_ ["increment", "by"] >>
> SGOIncrementBy <$> signedInteger
> maxValue = keyword_ "maxvalue" >>
> SGOMaxValue <$> signedInteger
> noMaxValue = SGONoMaxValue <$ try (keywords_ ["no","maxvalue"])
> minValue = keyword_ "minvalue" >>
> SGOMinValue <$> signedInteger
> noMinValue = SGONoMinValue <$ try (keywords_ ["no","minvalue"])
> scycle = SGOCycle <$ keyword_ "cycle"
> noCycle = SGONoCycle <$ try (keywords_ ["no","cycle"])
2015-08-02 19:56:39 +02:00
> alterTable :: Parser Statement
> alterTable = keyword_ "table" >>
> -- the choices have been ordered so that it works
> AlterTable <$> names <*> choice [addConstraint
> ,dropConstraint
> ,addColumnDef
> ,alterColumn
> ,dropColumn
> ]
2015-08-02 19:56:39 +02:00
> where
> addColumnDef = try (keyword_ "add"
> *> optional (keyword_ "column")) >>
> AddColumnDef <$> columnDef
> alterColumn = keyword_ "alter" >> optional (keyword_ "column") >>
> name <**> choice [setDefault
> ,dropDefault
> ,setNotNull
> ,dropNotNull
> ,setDataType]
> setDefault :: Parser (Name -> AlterTableAction)
> -- todo: left factor
> setDefault = try (keywords_ ["set","default"]) >>
2016-02-22 22:24:25 +01:00
> scalarExpr <$$> AlterColumnSetDefault
> dropDefault = AlterColumnDropDefault <$ try (keywords_ ["drop","default"])
> setNotNull = AlterColumnSetNotNull <$ try (keywords_ ["set","not","null"])
> dropNotNull = AlterColumnDropNotNull <$ try (keywords_ ["drop","not","null"])
> setDataType = try (keywords_ ["set","data","type"]) >>
> typeName <$$> AlterColumnSetDataType
> dropColumn = try (keyword_ "drop" *> optional (keyword_ "column")) >>
> DropColumn <$> name <*> dropBehaviour
> -- todo: left factor, this try is especially bad
> addConstraint = try (keyword_ "add" >>
> uncurry AddTableConstraintDef <$> tableConstraintDef)
> dropConstraint = try (keywords_ ["drop","constraint"]) >>
> DropTableConstraintDef <$> names <*> dropBehaviour
2015-08-01 22:16:26 +02:00
> dropSchema :: Parser Statement
> dropSchema = keyword_ "schema" >>
2015-08-02 22:27:09 +02:00
> DropSchema <$> names <*> dropBehaviour
> dropTable :: Parser Statement
> dropTable = keyword_ "table" >>
> DropTable <$> names <*> dropBehaviour
2015-08-02 22:52:01 +02:00
> createView :: Parser Statement
> createView =
> CreateView
> <$> (option False (True <$ keyword_ "recursive") <* keyword_ "view")
> <*> names
> <*> optionMaybe (parens (commaSep1 name))
> <*> (keyword_ "as" *> queryExpr)
> <*> optionMaybe (choice [
> -- todo: left factor
> DefaultCheckOption <$ try (keywords_ ["with", "check", "option"])
> ,CascadedCheckOption <$ try (keywords_ ["with", "cascaded", "check", "option"])
> ,LocalCheckOption <$ try (keywords_ ["with", "local", "check", "option"])
> ])
> dropView :: Parser Statement
> dropView = keyword_ "view" >>
> DropView <$> names <*> dropBehaviour
2015-08-04 21:08:32 +02:00
> createDomain :: Parser Statement
> createDomain = keyword_ "domain" >>
> CreateDomain
> <$> names
> <*> (optional (keyword_ "as") *> typeName)
2016-02-22 22:24:25 +01:00
> <*> optionMaybe (keyword_ "default" *> scalarExpr)
2015-08-04 21:08:32 +02:00
> <*> many con
> where
> con = (,) <$> optionMaybe (keyword_ "constraint" *> names)
2016-02-22 22:24:25 +01:00
> <*> (keyword_ "check" *> parens scalarExpr)
2015-08-04 21:08:32 +02:00
> alterDomain :: Parser Statement
> alterDomain = keyword_ "domain" >>
> AlterDomain
> <$> names
> <*> (setDefault <|> constraint
> <|> (keyword_ "drop" *> (dropDefault <|> dropConstraint)))
> where
2016-02-22 22:24:25 +01:00
> setDefault = keywords_ ["set", "default"] >> ADSetDefault <$> scalarExpr
2015-08-04 21:08:32 +02:00
> constraint = keyword_ "add" >>
> ADAddConstraint
> <$> optionMaybe (keyword_ "constraint" *> names)
2016-02-22 22:24:25 +01:00
> <*> (keyword_ "check" *> parens scalarExpr)
2015-08-04 21:08:32 +02:00
> dropDefault = ADDropDefault <$ keyword_ "default"
> dropConstraint = keyword_ "constraint" >> ADDropConstraint <$> names
> dropDomain :: Parser Statement
> dropDomain = keyword_ "domain" >>
> DropDomain <$> names <*> dropBehaviour
2015-08-02 22:52:01 +02:00
2015-08-04 21:35:51 +02:00
> createSequence :: Parser Statement
> createSequence = keyword_ "sequence" >>
> CreateSequence
> <$> names
> <*> sequenceGeneratorOptions
> alterSequence :: Parser Statement
> alterSequence = keyword_ "sequence" >>
> AlterSequence
> <$> names
> <*> sequenceGeneratorOptions
> dropSequence :: Parser Statement
> dropSequence = keyword_ "sequence" >>
> DropSequence <$> names <*> dropBehaviour
2015-08-16 19:03:02 +02:00
> createAssertion :: Parser Statement
> createAssertion = keyword_ "assertion" >>
> CreateAssertion
> <$> names
2016-02-22 22:24:25 +01:00
> <*> (keyword_ "check" *> parens scalarExpr)
2015-08-16 19:03:02 +02:00
> dropAssertion :: Parser Statement
> dropAssertion = keyword_ "assertion" >>
> DropAssertion <$> names <*> dropBehaviour
2015-08-02 22:52:01 +02:00
-----------------
= dml
> delete :: Parser Statement
> delete = keywords_ ["delete","from"] >>
> Delete
> <$> names
> <*> optionMaybe (optional (keyword_ "as") *> name)
2016-02-22 22:24:25 +01:00
> <*> optionMaybe (keyword_ "where" *> scalarExpr)
> truncateSt :: Parser Statement
> truncateSt = keywords_ ["truncate", "table"] >>
> Truncate
> <$> names
> <*> option DefaultIdentityRestart
> (ContinueIdentity <$ keywords_ ["continue","identity"]
> <|> RestartIdentity <$ keywords_ ["restart","identity"])
> insert :: Parser Statement
> insert = keywords_ ["insert", "into"] >>
> Insert
> <$> names
> <*> optionMaybe (parens $ commaSep1 name)
> <*> (DefaultInsertValues <$ keywords_ ["default", "values"]
> <|> InsertQuery <$> queryExpr)
> update :: Parser Statement
> update = keywords_ ["update"] >>
> Update
> <$> names
> <*> optionMaybe (optional (keyword_ "as") *> name)
> <*> (keyword_ "set" *> commaSep1 setClause)
2016-02-22 22:24:25 +01:00
> <*> optionMaybe (keyword_ "where" *> scalarExpr)
> where
> setClause = multipleSet <|> singleSet
> multipleSet = SetMultiple
> <$> parens (commaSep1 names)
2016-02-22 22:24:25 +01:00
> <*> (symbol "=" *> parens (commaSep1 scalarExpr))
> singleSet = Set
> <$> names
2016-02-22 22:24:25 +01:00
> <*> (symbol "=" *> scalarExpr)
> dropBehaviour :: Parser DropBehaviour
> dropBehaviour =
> option DefaultDropBehaviour
> (Restrict <$ keyword_ "restrict"
> <|> Cascade <$ keyword_ "cascade")
2015-08-04 21:53:08 +02:00
-----------------------------
2015-08-09 19:13:11 +02:00
= transaction management
2015-08-04 21:53:08 +02:00
> startTransaction :: Parser Statement
> startTransaction = StartTransaction <$ keywords_ ["start","transaction"]
> savepoint :: Parser Statement
> savepoint = keyword_ "savepoint" >>
> Savepoint <$> name
> releaseSavepoint :: Parser Statement
> releaseSavepoint = keywords_ ["release","savepoint"] >>
> ReleaseSavepoint <$> name
> commit :: Parser Statement
> commit = Commit <$ keyword_ "commit" <* optional (keyword_ "work")
> rollback :: Parser Statement
> rollback = keyword_ "rollback" >> optional (keyword_ "work") >>
> Rollback <$> optionMaybe (keywords_ ["to", "savepoint"] *> name)
------------------------------
= Access control
TODO: fix try at the 'on'
> grant :: Parser Statement
> grant = keyword_ "grant" >> (try priv <|> role)
> where
> priv = GrantPrivilege
> <$> commaSep privilegeAction
> <*> (keyword_ "on" *> privilegeObject)
> <*> (keyword_ "to" *> commaSep name)
> <*> option WithoutGrantOption
> (WithGrantOption <$ keywords_ ["with","grant","option"])
> role = GrantRole
> <$> commaSep name
> <*> (keyword_ "to" *> commaSep name)
> <*> option WithoutAdminOption
> (WithAdminOption <$ keywords_ ["with","admin","option"])
> createRole :: Parser Statement
> createRole = keyword_ "role" >>
> CreateRole <$> name
> dropRole :: Parser Statement
> dropRole = keyword_ "role" >>
> DropRole <$> name
TODO: fix try at the 'on'
> revoke :: Parser Statement
> revoke = keyword_ "revoke" >> (try priv <|> role)
> where
> priv = RevokePrivilege
> <$> option NoGrantOptionFor
> (GrantOptionFor <$ keywords_ ["grant","option","for"])
> <*> commaSep privilegeAction
> <*> (keyword_ "on" *> privilegeObject)
> <*> (keyword_ "from" *> commaSep name)
> <*> dropBehaviour
> role = RevokeRole
> <$> option NoAdminOptionFor
> (AdminOptionFor <$ keywords_ ["admin","option", "for"])
> <*> commaSep name
> <*> (keyword_ "from" *> commaSep name)
> <*> dropBehaviour
> privilegeAction :: Parser PrivilegeAction
> privilegeAction = choice
> [PrivAll <$ keywords_ ["all","privileges"]
> ,keyword_ "select" >>
> PrivSelect <$> option [] (parens $ commaSep name)
> ,PrivDelete <$ keyword_ "delete"
> ,PrivUsage <$ keyword_ "usage"
> ,PrivTrigger <$ keyword_ "trigger"
> ,PrivExecute <$ keyword_ "execute"
> ,keyword_ "insert" >>
> PrivInsert <$> option [] (parens $ commaSep name)
> ,keyword_ "update" >>
> PrivUpdate <$> option [] (parens $ commaSep name)
> ,keyword_ "references" >>
> PrivReferences <$> option [] (parens $ commaSep name)
> ]
> privilegeObject :: Parser PrivilegeObject
> privilegeObject = choice
> [keyword_ "domain" >> PrivDomain <$> names
> ,keyword_ "type" >> PrivType <$> names
> ,keyword_ "sequence" >> PrivSequence <$> names
> ,keywords_ ["specific","function"] >> PrivFunction <$> names
> ,optional (keyword_ "table") >> PrivTable <$> names
> ]
----------------------------
wrapper to parse a series of statements. They must be separated by
semicolon, but for the last statement, the trailing semicolon is
optional.
2013-12-14 09:55:44 +01:00
TODO: change style
> statements :: Parser [Statement]
> statements = (:[]) <$> statement
> >>= optionSuffix ((semi *>) . pure)
> >>= optionSuffix (\p -> (p++) <$> statements)
----------------------------------------------
= multi keyword helper
This helper is to help parsing multiple options of multiple keywords
with similar prefixes, e.g. parsing 'is null' and 'is not null'.
use to left factor/ improve:
typed literal and general identifiers
not like, not in, not between operators
help with factoring keyword functions and other app-likes
the join keyword sequences
fetch first/next
row/rows only
There is probably a simpler way of doing this but I am a bit
thick.
> makeKeywordTree :: [String] -> Parser [String]
> makeKeywordTree sets =
> parseTrees (sort $ map words sets)
> where
> parseTrees :: [[String]] -> Parser [String]
> parseTrees ws = do
> let gs :: [[[String]]]
> gs = groupBy ((==) `on` safeHead) ws
> choice $ map parseGroup gs
> parseGroup :: [[String]] -> Parser [String]
> parseGroup l@((k:_):_) = do
> keyword_ k
> let tls = catMaybes $ map safeTail l
> pr = (k:) <$> parseTrees tls
> if (or $ map null tls)
> then pr <|> pure [k]
> else pr
> parseGroup _ = guard False >> error "impossible"
> safeHead (x:_) = Just x
> safeHead [] = Nothing
> safeTail (_:x) = Just x
> safeTail [] = Nothing
2013-12-13 11:39:26 +01:00
------------------------------------------------
2015-07-31 23:04:18 +02:00
= lexing
TODO: push checks into here:
keyword blacklists
unsigned integer match
symbol matching
keyword matching
> stringTok :: Parser (String,String,String)
> stringTok = mytoken (\tok ->
2015-07-31 23:04:18 +02:00
> case tok of
> L.SqlString s e t -> Just (s,e,t)
2015-07-31 23:04:18 +02:00
> _ -> Nothing)
> singleQuotesOnlyStringTok :: Parser String
> singleQuotesOnlyStringTok = mytoken (\tok ->
2015-07-31 23:04:18 +02:00
> case tok of
> L.SqlString "'" "'" t -> Just t
2015-07-31 23:04:18 +02:00
> _ -> Nothing)
This is to support SQL strings where you can write
'part of a string' ' another part'
and it will parse as a single string
It is only allowed when all the strings are quoted with ' atm.
> stringTokExtend :: Parser (String,String,String)
2015-07-31 23:04:18 +02:00
> stringTokExtend = do
> (s,e,x) <- stringTok
2015-07-31 23:04:18 +02:00
> choice [
> do
> guard (s == "'" && e == "'")
> (s',e',y) <- stringTokExtend
> guard (s' == "'" && e' == "'")
> return $ (s,e,x ++ y)
> ,return (s,e,x)
2015-07-31 23:04:18 +02:00
> ]
2015-07-31 23:04:18 +02:00
> hostParamTok :: Parser String
> hostParamTok = mytoken (\tok ->
> case tok of
> L.PrefixedVariable c p -> Just (c:p)
2015-07-31 23:04:18 +02:00
> _ -> Nothing)
> positionalArgTok :: Parser Int
> positionalArgTok = mytoken (\tok ->
> case tok of
> L.PositionalArg p -> Just p
> _ -> Nothing)
> sqlNumberTok :: Bool -> Parser String
> sqlNumberTok intOnly = mytoken (\tok ->
2015-07-31 23:04:18 +02:00
> case tok of
> L.SqlNumber p | not intOnly || all isDigit p -> Just p
2015-07-31 23:04:18 +02:00
> _ -> Nothing)
> symbolTok :: Maybe String -> Parser String
> symbolTok sym = mytoken (\tok ->
> case (sym,tok) of
> (Nothing, L.Symbol p) -> Just p
> (Just s, L.Symbol p) | s == p -> Just p
2015-07-31 23:04:18 +02:00
> _ -> Nothing)
2016-02-12 13:13:47 +01:00
> identifierTok :: [String] -> Parser (Maybe (String,String), String)
> identifierTok blackList = mytoken (\tok ->
> case tok of
2019-08-31 10:13:09 +02:00
> L.Identifier q@(Just {}) p -> Just (q,p)
2016-02-12 13:13:47 +01:00
> L.Identifier q p | map toLower p `notElem` blackList -> Just (q,p)
2015-07-31 23:04:18 +02:00
> _ -> Nothing)
2016-02-12 13:13:47 +01:00
> unquotedIdentifierTok :: [String] -> Maybe String -> Parser String
> unquotedIdentifierTok blackList kw = mytoken (\tok ->
> case (kw,tok) of
> (Nothing, L.Identifier Nothing p) | map toLower p `notElem` blackList -> Just p
> (Just k, L.Identifier Nothing p) | k == map toLower p -> Just p
2015-07-31 23:04:18 +02:00
> _ -> Nothing)
> mytoken :: (L.Token -> Maybe a) -> Parser a
> mytoken test = token showToken posToken testToken
2013-12-14 09:55:44 +01:00
> where
2015-07-31 23:04:18 +02:00
> showToken (_,tok) = show tok
> posToken ((a,b,c),_) = newPos a b c
> testToken (_,tok) = test tok
2013-12-17 12:21:36 +01:00
> unsignedInteger :: Parser Integer
> unsignedInteger = read <$> sqlNumberTok True <?> "natural number"
2014-04-17 18:27:18 +02:00
todo: work out the symbol parsing better
> symbol :: String -> Parser String
> symbol s = symbolTok (Just s) <?> s
2015-07-31 23:04:18 +02:00
> singleCharSymbol :: Char -> Parser Char
> singleCharSymbol c = c <$ symbol [c]
> questionMark :: Parser Char
2015-07-31 23:04:18 +02:00
> questionMark = singleCharSymbol '?' <?> "question mark"
> openParen :: Parser Char
2015-07-31 23:04:18 +02:00
> openParen = singleCharSymbol '('
> closeParen :: Parser Char
2015-07-31 23:04:18 +02:00
> closeParen = singleCharSymbol ')'
> openBracket :: Parser Char
2015-07-31 23:04:18 +02:00
> openBracket = singleCharSymbol '['
> closeBracket :: Parser Char
2015-07-31 23:04:18 +02:00
> closeBracket = singleCharSymbol ']'
> comma :: Parser Char
2015-07-31 23:04:18 +02:00
> comma = singleCharSymbol ','
> semi :: Parser Char
2015-07-31 23:04:18 +02:00
> semi = singleCharSymbol ';'
2013-12-13 11:39:26 +01:00
= helper functions
> keyword :: String -> Parser String
2016-02-12 13:13:47 +01:00
> keyword k = unquotedIdentifierTok [] (Just k) <?> k
2014-04-18 11:28:05 +02:00
helper function to improve error messages
> keywords_ :: [String] -> Parser ()
> keywords_ ks = mapM_ keyword_ ks <?> intercalate " " ks
> parens :: Parser a -> Parser a
> parens = between openParen closeParen
> brackets :: Parser a -> Parser a
> brackets = between openBracket closeBracket
> commaSep :: Parser a -> Parser [a]
> commaSep = (`sepBy` comma)
> keyword_ :: String -> Parser ()
> keyword_ = void . keyword
> symbol_ :: String -> Parser ()
> symbol_ = void . symbol
> commaSep1 :: Parser a -> Parser [a]
> commaSep1 = (`sepBy1` comma)
2013-12-14 09:55:44 +01:00
> blacklist :: Dialect -> [String]
> blacklist d = diKeywords d
2013-12-13 11:39:26 +01:00
These blacklisted names are mostly needed when we parse something with
an optional alias, e.g. select a a from t. If we write select a from
t, we have to make sure the from isn't parsed as an alias. I'm not
sure what other places strictly need the blacklist, and in theory it
could be tuned differently for each place the identifierString/
2014-04-19 14:10:45 +02:00
identifier parsers are used to only blacklist the bare
minimum. Something like this might be needed for dialect support, even
if it is pretty silly to use a keyword as an unquoted identifier when
there is a quoting syntax as well.
2013-12-14 00:14:23 +01:00
The standard has a weird mix of reserved keywords and unreserved
keywords (I'm not sure what exactly being an unreserved keyword
means).
The current approach tries to have everything which is a keyword only
in the keyword list - so it can only be used in some other context if
quoted. If something is a 'ansi keyword', but appears only as an
identifier or function name for instance in the syntax (or something
that looks identical to this), then it isn't treated as a keyword at
all. When there is some overlap (e.g. 'set'), then there is either
special case parsing code to handle this (in the case of set), or it
is not treated as a keyword (not perfect, but if it more or less
works, ok for now).
It is possible to have a problem if you remove something which is a
keyword from this list, and still want to parse statements using it
as a keyword - for instance, removing things like 'from' or 'as',
will likely mean many things don't parse anymore.
-----------
bit hacky, used to make the dialect available during parsing so
different parsers can be used for different dialects
> type ParseState = Dialect
> type Token = ((String,Int,Int),L.Token)
2015-07-31 23:04:18 +02:00
> type Parser = GenParser Token ParseState
> guardDialect :: (Dialect -> Bool) -> Parser ()
> guardDialect f = do
> d <- getState
> guard (f d)
2014-09-13 09:45:45 +02:00
TODO: the ParseState and the Dialect argument should be turned into a
flags struct. Part (or all?) of this struct is the dialect
information, but each dialect has different versions + a big set of
flags to control syntax variations within a version of a product
dialect (for instance, string and identifier parsing rules vary from
dialect to dialect and version to version, and most or all SQL DBMSs
appear to have a set of flags to further enable or disable variations
for quoting and escaping strings and identifiers).
The dialect stuff can also be used for custom options: e.g. to only
parse dml for instance.