simple-sql-parser/TODO

medium tasks next release

review alters, and think about adding rename versions
  which are really common and useful, but not in ansi
  https://github.com/JakeWheat/simple-sql-parser/issues/20

try to get some control over the pretty printing and the error
messages by creating some dumps of pretty printing and error messages,
then can rerun these every so often to see how they've changed

finish off going through the keyword list

do more examples
what are the use cases?
  sql generator - queries
  sql generator - ddl
  parsing some sql - for what purpose
    generating documentation of ddl
    write some sort of trivial sql engine or wrapper around something?
    write something that takes sql, modifies it, and outputs the result
    lint checker?

do an example of adding some new syntax
-> seems quite a few people are using this
  and there are some feature requests
  try to give people a path to implement features themselves

goals:

1. if someone might want to use this, give them some toy examples to
help bootstrap them

2. see if can encourage people who want some missing sql to add it
themselves


review main missing sql bits - focus on more mainstream things
  could also review main dialects


syntax from hssqlppp:
  query hints, join hints

unescaping identifiers and strings
continuation strings testing

add tests for comment pretty printing:
  use pretty then lex

work on better dialect design: more basic customizability and rule /
   callback driven

review/fix documentation and website
fix the groups for generated tests

check the .cabal file module lists


medium tasks next release + 1
add annotation
lots more negative tests especially for lexing, and for dialects
escape, uescape
post hoc fixity
switch pretty printing to use ansi-wl-pprint
  http://conscientiousprogrammer.com/blog/2015/12/17/24-days-of-hackage-2015-day-17-ansi-wl-pprint-avoiding-string-hacking/


error message analysis:
start with a set of bad sql, generate & write
get error messages:
  simplified ssp parser
  tutorial parser
  hssqlppp
  and also:
    postgres
    mysql
    sqlserver
    oracle
    db2
    vertica?
evaluate other parsing libs for error messages and general
   feasibility, shortlist is:
   megaparsec
   trifecta
   uuparsinglib
   other desirables from parsing lib:
     incremental parsing
     context dependent lexer switch
     continue after error

create some benchmarks (to measure performance when modifying for
   error messages, and to compare different parser libs for instance)

use quickcheck in lexing

What will make this library nice and complete:
List of all the SQL that it doesn't support
annotation, with positions coming from the parser
dml
ddl
procedural sql
dialects: reasonable support for sql server and oracle, and maybe also
   postgres, mysql, teradata, redshift, sqlite, db2, sap stuff, etc.
good work on error messages
fixity code + get it right
review names of syntax
defaults handled better (use default/nothing instead of substituting
   in the default)
evaluate uu parsing lib -> could at least remove need to do left
   factoring, and maybe help make better error messages also
-----

work on reasonable subset of sql which is similar to the current
   subset and smaller than the complete 2011 target: describe the
   exact target set for the next release

improve the dialect testing: add notes on what to do

position annotation in the syntax

simple stuff for error message and pretty printing monitoring:

create a sample set of valid statements to pretty print
pretty print these
compare every so often to catch regressions and approve improvements
start with tpch, and then add some others

same with invalid statements to see the error messages
start with some simple scalar exprs and a big query expr which has
   stuff (either tokens, whitespace or junk strings)
   semi-systematically added and/or removed

fixing the non idiomatic (pun!) suffix parsing:
  typename parsing
  identifier/app/agg/window parsing
  join parsing in trefs (use chain? - tricky because of postfix onExpr)
  top level and queryexprs parsing

review names in the syntax for correspondence with sql standard, avoid
   gratuitous differences

touch up the expr hack as best as can, start thinking about
   replacement for buildExprParser, maybe this can be a separate
   general package, or maybe something like this already exists

careful review of token parses wrt trailing delimiters/junk - already
   caught a few issues like this incidentally when working on other
   stuff

undo mess in the code created by adding lots of new support:
much more documentation
refactor crufty bits
reorder the code
reconsider the names and structure of the constructors in the syntax
refactor the typename parser - it's a real mess
fix the lexing

add documentation in Parser.lhs on the left factoring/error handling
   approach

fixes:

keyword tree, add explicit result then can use for joins also

keyword tree support prefix mode so can start from already parsed
   token

left factor/try removal summary (this list needs updating):

identifier starts:
  interval literal
  character set literal
  typed literals, multikeywords
  identifier
  app, agg, window
  keyword function
issues in the special op internals
not between + other ops: needs new expression parsing
  not in also
  in suffix also
  lots of overlap with binary and postfix multi keyword operators
  quantified comparison also
issues in the typename parsing
dot in identifiers and as operator
issues in the symbol parser
  hardcode all the symbols in the symbol parser/split?
conflict with in suffix and in in position

rules for changing the multi keyword parsing:
  if a keyword must be followed by another
    e.g. left join, want to refactor to produce 'expected "left join"'
  if the keyword is optionally followed by another, e.g. with
   recursive, then don't do this.

change join defaults to be defaults


rough SQL 2011 todo, including tests to write:

review the commented out reserved keyword entries and work out how to
   fix

test case insensitvity and case preservation

big areas:
window functions
nested window functions
case

table ref: tablesample, time period spec, only, unnest, table, lateral
   bug
joined table: partitioned joins
group by: set quantifier
window clause

other areas:
unicode escape, strings and idens
character set behaviour review
datetime literals
mixed quoting identifier chains
names/identifiers careful review
general value bits
  collate for
numeric val fn
string exp fn
datetime exp fn
interval exp fn
rows
interval qualifier
with
setop
order/offset/fetch
search/cycle
preds:
between
in
like
similar
regex like?
null
normalize
match
overlaps
distinct
member
submultiset
period

alias for * in select list

create list of unsupported syntax: xml, ref, subtypes, modules?

---


after next release

medium term goals:
1. replace parser and syntax in hssqlppp with this code (keep two
   separate packages in sync)
2. this replacement should have better error messages, much more
   complete ansi sql 2011 support, and probably will have reasonable
   support for these dialects: mssql, oracle and teradata.

review areas where this parser is too permissive, e.g. value
   expressions allowed where column reference names only should be
   allowed, such as group by, order by (perhaps there can be a flag or
   warnings or something), unqualified asterisk in select list

fix the expression parser completely: the realistic way is to adjust
   for precedence and associativity after parsing since the concrete
   syntax is so messy. should also use this expression parser for
   parsing joins and for set operations, maybe other areas.

table expression in syntax:
  QueryExpr = Select SelectList (Maybe TableExpr)
  and the TableExpr contains all the other bits?

change the booleans in the ast to better types for less ambiguity?

decide how to handle character set literals and identifiers: don't
   have any intention of actually supporting switching character sets
   in the middle of parsing so maybe this would be better disabled?

review places in the parse which should allow only a fixed set of
   identifiers (e.g. in interval literals), keep in mind other
   dialects and extensibility

decide whether to represent numeric literals better, instead of a
   single string - break up into parts, or parse to a Decimal or
   something


= future big feature summary

all ansi sql queries
completely working expression tree parsing
error messages, left factor
dml, ddl, procedural sql
position annotation
type checker/ etc.
lexer
dialects
quasi quotes
typesafe sql dbms wrapper support for haskell
extensibility
performance analysis

try out uu-parsing or polyparse, especially wrt error message
   improvements

= stuff

try and use the proper css theme
  create a header like in the haddock with simple-sql-parser +
    contents link
  change the toc gen so that it works the same as in haddock (same
    div, no links on the actual titles
  fix the page margins, and the table stuff: patches to the css?

release checklist:
hlint
haddock review
spell check
update changelog
update website text
regenerate the examples on the index.txt

= Later general tasks:

docs

add preamble to the rendered test page

add links from the supported sql page to the rendered test page for
   each section -> have to section up the tests some more

testing

review tests to copy from hssqlppp

add lots more tests using SQL from the xb2 manual

much more table reference tests, for joins and aliases etc.?

review internal sql collection for more syntax/tests

other

----

demo program: convert tpch to sql server syntax exe processor

run through other manuals for example queries and features: sql in a
   nutshell, sql guide, sql reference guide, sql standard, sql server
   manual, oracle manual, teradata manual + re-through postgresql
   manual and make notes in each case of all syntax and which isn't
   currently supported also.

check the order of exports, imports and functions/cases in the files
fix up the import namespaces/explicit names nicely

ast checker: checks the ast represents valid syntax, the parser
   doesn't check as much as it could, and this can also be used to
   check generated trees. Maybe this doesn't belong in this package
   though?

= other sql support

top
string literals
full number literals -> other bases?
apply, pivot

maybe add dml and ddl, source poses, quasi quotes

leave: type check, dialects, procedural, separate lexing?

other dialect targets:
postgres
oracle
teradata
ms sql server
mysql?
db2?
what other major dialects are there?
sqlite
sap dbmss (can't work out what are separate products or what are the
   dialects)


here is an idea for a little feature:
crunch sql: this takes sql and tries to make it as small as possible
  (basically, combining nested selects where possible and inlining
   ctes)
expand sql:
  breaks apart complex sql using nested queries and ctes, try to make
   queries easier to understand in stages
No results found.