408 lines
11 KiB
Plaintext
408 lines
11 KiB
Plaintext
medium tasks next release
|
|
|
|
review alters, and think about adding rename versions
|
|
which are really common and useful, but not in ansi
|
|
https://github.com/JakeWheat/simple-sql-parser/issues/20
|
|
|
|
try to get some control over the pretty printing and the error
|
|
messages by creating some dumps of pretty printing and error messages,
|
|
then can rerun these every so often to see how they've changed
|
|
|
|
-> expose in the dialect:
|
|
keywords which can (also?) appear as identifiers in scalar expressions
|
|
keywords which can (also?) appear as app-likes - this is already implemented
|
|
|
|
|
|
finish off going through the keyword list
|
|
|
|
do more examples
|
|
what are the use cases?
|
|
sql generator - queries
|
|
sql generator - ddl
|
|
parsing some sql - for what purpose
|
|
generating documentation of ddl
|
|
write some sort of trivial sql engine or wrapper around something?
|
|
write something that takes sql, modifies it, and outputs the result
|
|
lint checker?
|
|
|
|
do an example of adding some new syntax
|
|
-> seems quite a few people are using this
|
|
and there are some feature requests
|
|
try to give people a path to implement features themselves
|
|
|
|
goals:
|
|
|
|
1. if someone might want to use this, give them some toy examples to
|
|
help bootstrap them
|
|
|
|
2. see if can encourage people who want some missing sql to add it
|
|
themselves
|
|
|
|
|
|
|
|
review main missing sql bits - focus on more mainstream things
|
|
could also review main dialects
|
|
|
|
|
|
syntax from hssqlppp:
|
|
query hints, join hints
|
|
|
|
unescaping identifiers and strings
|
|
continuation strings testing
|
|
|
|
add tests for comment pretty printing:
|
|
use pretty then lex
|
|
|
|
work on better dialect design: more basic customizability and rule /
|
|
callback driven
|
|
|
|
review/fix documentation and website
|
|
fix the groups for generated tests
|
|
|
|
check the .cabal file module lists
|
|
|
|
|
|
medium tasks next release + 1
|
|
add annotation
|
|
lots more negative tests especially for lexing, and for dialects
|
|
escape, uescape
|
|
post hoc fixity
|
|
switch pretty printing to use ansi-wl-pprint
|
|
http://conscientiousprogrammer.com/blog/2015/12/17/24-days-of-hackage-2015-day-17-ansi-wl-pprint-avoiding-string-hacking/
|
|
|
|
|
|
error message analysis:
|
|
start with a set of bad sql, generate & write
|
|
get error messages:
|
|
simplified ssp parser
|
|
tutorial parser
|
|
hssqlppp
|
|
and also:
|
|
postgres
|
|
mysql
|
|
sqlserver
|
|
oracle
|
|
db2
|
|
vertica?
|
|
evaluate other parsing libs for error messages and general
|
|
feasibility, shortlist is:
|
|
megaparsec
|
|
trifecta
|
|
uuparsinglib
|
|
other desirables from parsing lib:
|
|
incremental parsing
|
|
context dependent lexer switch
|
|
continue after error
|
|
|
|
create some benchmarks (to measure performance when modifying for
|
|
error messages, and to compare different parser libs for instance)
|
|
|
|
use quickcheck in lexing
|
|
|
|
What will make this library nice and complete:
|
|
List of all the SQL that it doesn't support
|
|
annotation, with positions coming from the parser
|
|
dml
|
|
ddl
|
|
procedural sql
|
|
dialects: reasonable support for sql server and oracle, and maybe also
|
|
postgres, mysql, teradata, redshift, sqlite, db2, sap stuff, etc.
|
|
good work on error messages
|
|
fixity code + get it right
|
|
review names of syntax
|
|
defaults handled better (use default/nothing instead of substituting
|
|
in the default)
|
|
evaluate uu parsing lib -> could at least remove need to do left
|
|
factoring, and maybe help make better error messages also
|
|
-----
|
|
|
|
work on reasonable subset of sql which is similar to the current
|
|
subset and smaller than the complete 2011 target: describe the
|
|
exact target set for the next release
|
|
|
|
improve the dialect testing: add notes on what to do
|
|
|
|
position annotation in the syntax
|
|
|
|
simple stuff for error message and pretty printing monitoring:
|
|
|
|
create a sample set of valid statements to pretty print
|
|
pretty print these
|
|
compare every so often to catch regressions and approve improvements
|
|
start with tpch, and then add some others
|
|
|
|
same with invalid statements to see the error messages
|
|
start with some simple scalar exprs and a big query expr which has
|
|
stuff (either tokens, whitespace or junk strings)
|
|
semi-systematically added and/or removed
|
|
|
|
fixing the non idiomatic (pun!) suffix parsing:
|
|
typename parsing
|
|
identifier/app/agg/window parsing
|
|
join parsing in trefs (use chain? - tricky because of postfix onExpr)
|
|
top level and queryexprs parsing
|
|
|
|
review names in the syntax for correspondence with sql standard, avoid
|
|
gratuitous differences
|
|
|
|
touch up the expr hack as best as can, start thinking about
|
|
replacement for buildExprParser, maybe this can be a separate
|
|
general package, or maybe something like this already exists
|
|
|
|
careful review of token parses wrt trailing delimiters/junk - already
|
|
caught a few issues like this incidentally when working on other
|
|
stuff
|
|
|
|
undo mess in the code created by adding lots of new support:
|
|
much more documentation
|
|
refactor crufty bits
|
|
reorder the code
|
|
reconsider the names and structure of the constructors in the syntax
|
|
refactor the typename parser - it's a real mess
|
|
fix the lexing
|
|
|
|
add documentation in Parser.lhs on the left factoring/error handling
|
|
approach
|
|
|
|
fixes:
|
|
|
|
keyword tree, add explicit result then can use for joins also
|
|
|
|
keyword tree support prefix mode so can start from already parsed
|
|
token
|
|
|
|
left factor/try removal summary (this list needs updating):
|
|
|
|
identifier starts:
|
|
interval literal
|
|
character set literal
|
|
typed literals, multikeywords
|
|
identifier
|
|
app, agg, window
|
|
keyword function
|
|
issues in the special op internals
|
|
not between + other ops: needs new expression parsing
|
|
not in also
|
|
in suffix also
|
|
lots of overlap with binary and postfix multi keyword operators
|
|
quantified comparison also
|
|
issues in the typename parsing
|
|
dot in identifiers and as operator
|
|
issues in the symbol parser
|
|
hardcode all the symbols in the symbol parser/split?
|
|
conflict with in suffix and in in position
|
|
|
|
rules for changing the multi keyword parsing:
|
|
if a keyword must be followed by another
|
|
e.g. left join, want to refactor to produce 'expected "left join"'
|
|
if the keyword is optionally followed by another, e.g. with
|
|
recursive, then don't do this.
|
|
|
|
change join defaults to be defaults
|
|
|
|
|
|
rough SQL 2011 todo, including tests to write:
|
|
|
|
review the commented out reserved keyword entries and work out how to
|
|
fix
|
|
|
|
test case insensitvity and case preservation
|
|
|
|
big areas:
|
|
window functions
|
|
nested window functions
|
|
case
|
|
|
|
table ref: tablesample, time period spec, only, unnest, table, lateral
|
|
bug
|
|
joined table: partitioned joins
|
|
group by: set quantifier
|
|
window clause
|
|
|
|
other areas:
|
|
unicode escape, strings and idens
|
|
character set behaviour review
|
|
datetime literals
|
|
mixed quoting identifier chains
|
|
names/identifiers careful review
|
|
general value bits
|
|
collate for
|
|
numeric val fn
|
|
string exp fn
|
|
datetime exp fn
|
|
interval exp fn
|
|
rows
|
|
interval qualifier
|
|
with
|
|
setop
|
|
order/offset/fetch
|
|
search/cycle
|
|
preds:
|
|
between
|
|
in
|
|
like
|
|
similar
|
|
regex like?
|
|
null
|
|
normalize
|
|
match
|
|
overlaps
|
|
distinct
|
|
member
|
|
submultiset
|
|
period
|
|
|
|
alias for * in select list
|
|
|
|
create list of unsupported syntax: xml, ref, subtypes, modules?
|
|
|
|
---
|
|
|
|
|
|
|
|
after next release
|
|
|
|
medium term goals:
|
|
1. replace parser and syntax in hssqlppp with this code (keep two
|
|
separate packages in sync)
|
|
2. this replacement should have better error messages, much more
|
|
complete ansi sql 2011 support, and probably will have reasonable
|
|
support for these dialects: mssql, oracle and teradata.
|
|
|
|
review areas where this parser is too permissive, e.g. value
|
|
expressions allowed where column reference names only should be
|
|
allowed, such as group by, order by (perhaps there can be a flag or
|
|
warnings or something), unqualified asterisk in select list
|
|
|
|
fix the expression parser completely: the realistic way is to adjust
|
|
for precedence and associativity after parsing since the concrete
|
|
syntax is so messy. should also use this expression parser for
|
|
parsing joins and for set operations, maybe other areas.
|
|
|
|
table expression in syntax:
|
|
QueryExpr = Select SelectList (Maybe TableExpr)
|
|
and the TableExpr contains all the other bits?
|
|
|
|
change the booleans in the ast to better types for less ambiguity?
|
|
|
|
decide how to handle character set literals and identifiers: don't
|
|
have any intention of actually supporting switching character sets
|
|
in the middle of parsing so maybe this would be better disabled?
|
|
|
|
review places in the parse which should allow only a fixed set of
|
|
identifiers (e.g. in interval literals), keep in mind other
|
|
dialects and extensibility
|
|
|
|
decide whether to represent numeric literals better, instead of a
|
|
single string - break up into parts, or parse to a Decimal or
|
|
something
|
|
|
|
|
|
= future big feature summary
|
|
|
|
all ansi sql queries
|
|
completely working expression tree parsing
|
|
error messages, left factor
|
|
dml, ddl, procedural sql
|
|
position annotation
|
|
type checker/ etc.
|
|
lexer
|
|
dialects
|
|
quasi quotes
|
|
typesafe sql dbms wrapper support for haskell
|
|
extensibility
|
|
performance analysis
|
|
|
|
try out uu-parsing or polyparse, especially wrt error message
|
|
improvements
|
|
|
|
= stuff
|
|
|
|
try and use the proper css theme
|
|
create a header like in the haddock with simple-sql-parser +
|
|
contents link
|
|
change the toc gen so that it works the same as in haddock (same
|
|
div, no links on the actual titles
|
|
fix the page margins, and the table stuff: patches to the css?
|
|
|
|
release checklist:
|
|
hlint
|
|
haddock review
|
|
spell check
|
|
update changelog
|
|
update website text
|
|
regenerate the examples on the index.txt
|
|
|
|
= Later general tasks:
|
|
|
|
docs
|
|
|
|
add preamble to the rendered test page
|
|
|
|
add links from the supported sql page to the rendered test page for
|
|
each section -> have to section up the tests some more
|
|
|
|
testing
|
|
|
|
review tests to copy from hssqlppp
|
|
|
|
add lots more tests using SQL from the xb2 manual
|
|
|
|
much more table reference tests, for joins and aliases etc.?
|
|
|
|
review internal sql collection for more syntax/tests
|
|
|
|
other
|
|
|
|
----
|
|
|
|
demo program: convert tpch to sql server syntax exe processor
|
|
|
|
run through other manuals for example queries and features: sql in a
|
|
nutshell, sql guide, sql reference guide, sql standard, sql server
|
|
manual, oracle manual, teradata manual + re-through postgresql
|
|
manual and make notes in each case of all syntax and which isn't
|
|
currently supported also.
|
|
|
|
check the order of exports, imports and functions/cases in the files
|
|
fix up the import namespaces/explicit names nicely
|
|
|
|
ast checker: checks the ast represents valid syntax, the parser
|
|
doesn't check as much as it could, and this can also be used to
|
|
check generated trees. Maybe this doesn't belong in this package
|
|
though?
|
|
|
|
= other sql support
|
|
|
|
top
|
|
string literals
|
|
full number literals -> other bases?
|
|
apply, pivot
|
|
|
|
maybe add dml and ddl, source poses, quasi quotes
|
|
|
|
leave: type check, dialects, procedural, separate lexing?
|
|
|
|
other dialect targets:
|
|
postgres
|
|
oracle
|
|
teradata
|
|
ms sql server
|
|
mysql?
|
|
db2?
|
|
what other major dialects are there?
|
|
sqlite
|
|
sap dbmss (can't work out what are separate products or what are the
|
|
dialects)
|
|
|
|
|
|
|
|
here is an idea for a little feature:
|
|
crunch sql: this takes sql and tries to make it as small as possible
|
|
(basically, combining nested selects where possible and inlining
|
|
ctes)
|
|
expand sql:
|
|
breaks apart complex sql using nested queries and ctes, try to make
|
|
queries easier to understand in stages
|
|
|