medium tasks next release unescaping identifiers and strings continuation strings testing refactor the symbol lexers - lots of duplicated code rename valueexpr to scalarexpr syntax from hssqlppp: query hints, join hints rename combinequeryexpr add comment to statements? review simple enums to make sure they have default work on better dialect design: more basic customizability and rule / callback driven medium tasks next release + 1 add annotation lots more negative tests especially for lexing, and for dialects escape, uescape post hoc fixity switch pretty printing to use ansi-wl-pprint http://conscientiousprogrammer.com/blog/2015/12/17/24-days-of-hackage-2015-day-17-ansi-wl-pprint-avoiding-string-hacking/ error message analysis: start with a set of bad sql, generate & write get error messages: simplified ssp parser tutorial parser hssqlppp and also: postgres mysql sqlserver oracle db2 vertica? evaluate other parsing libs for error messages and general feasibility, shortlist is: megaparsec trifecta uuparsinglib other desirables from parsing lib: incremental parsing context dependent lexer switch continue after error create some benchmarks (to measure performance when modifying for error messages, and to compare different parser libs for instance) use quickcheck in lexing What will make this library nice and complete: List of all the SQL that it doesn't support annotation, with positions coming from the parser dml ddl procedural sql dialects: reasonable support for sql server and oracle, and maybe also postgres, mysql, teradata, redshift, sqlite, db2, sap stuff, etc. good work on error messages fixity code + get it right review names of syntax defaults handled better (use default/nothing instead of substituting in the default) evaluate uu parsing lib -> could at least remove need to do left factoring, and maybe help make better error messages also ----- work on reasonable subset of sql which is similar to the current subset and smaller than the complete 2011 target: describe the exact target set for the next release improve the dialect testing: add notes on what to do position annotation in the syntax simple stuff for error message and pretty printing monitoring: create a sample set of valid statements to pretty print pretty print these compare every so often to catch regressions and approve improvements start with tpch, and then add some others same with invalid statements to see the error messages start with some simple value exprs and a big query expr which has stuff (either tokens, whitespace or junk strings) semi-systematically added and/or removed fixing the non idiomatic (pun!) suffix parsing: typename parsing identifier/app/agg/window parsing join parsing in trefs (use chain? - tricky because of postfix onExpr) top level and queryexprs parsing review names in the syntax for correspondence with sql standard, avoid gratuitous differences touch up the expr hack as best as can, start thinking about replacement for buildExprParser, maybe this can be a separate general package, or maybe something like this already exists careful review of token parses wrt trailing delimiters/junk - already caught a few issues like this incidentally when working on other stuff undo mess in the code created by adding lots of new support: much more documentation refactor crufty bits reorder the code reconsider the names and structure of the constructors in the syntax refactor the typename parser - it's a real mess fix the lexing add documentation in Parser.lhs on the left factoring/error handling approach fixes: keyword tree, add explicit result then can use for joins also keyword tree support prefix mode so can start from already parsed token left factor/try removal summary (this list needs updating): identifier starts: interval literal character set literal typed literals, multikeywords identifier app, agg, window keyword function issues in the special op internals not between + other ops: needs new expression parsing not in also in suffix also lots of overlap with binary and postfix multi keyword operators quantified comparison also issues in the typename parsing dot in identifiers and as operator issues in the symbol parser hardcode all the symbols in the symbol parser/split? conflict with in suffix and in in position rules for changing the multi keyword parsing: if a keyword must be followed by another e.g. left join, want to refactor to produce 'expected "left join"' if the keyword is optionally followed by another, e.g. with recursive, then don't do this. change join defaults to be defaults rough SQL 2011 todo, including tests to write: review the commented out reserved keyword entries and work out how to fix test case insensitvity and case preservation big areas: window functions nested window functions case table ref: tablesample, time period spec, only, unnest, table, lateral bug joined table: partitioned joins group by: set quantifier window clause other areas: unicode escape, strings and idens character set behaviour review datetime literals mixed quoting identifier chains names/identifiers careful review general value bits collate for numeric val fn string exp fn datetime exp fn interval exp fn rows interval qualifier with setop order/offset/fetch search/cycle preds: between in like similar regex like? null normalize match overlaps distinct member submultiset period alias for * in select list create list of unsupported syntax: xml, ref, subtypes, modules? --- after next release medium term goals: 1. replace parser and syntax in hssqlppp with this code (keep two separate packages in sync) 2. this replacement should have better error messages, much more complete ansi sql 2011 support, and probably will have reasonable support for these dialects: mssql, oracle and teradata. review areas where this parser is too permissive, e.g. value expressions allowed where column reference names only should be allowed, such as group by, order by (perhaps there can be a flag or warnings or something), unqualified asterisk in select list fix the expression parser completely: the realistic way is to adjust for precedence and associativity after parsing since the concrete syntax is so messy. should also use this expression parser for parsing joins and for set operations, maybe other areas. table expression in syntax: QueryExpr = Select SelectList (Maybe TableExpr) and the TableExpr contains all the other bits? change the booleans in the ast to better types for less ambiguity? decide how to handle character set literals and identifiers: don't have any intention of actually supporting switching character sets in the middle of parsing so maybe this would be better disabled? review places in the parse which should allow only a fixed set of identifiers (e.g. in interval literals), keep in mind other dialects and extensibility decide whether to represent numeric literals better, instead of a single string - break up into parts, or parse to a Decimal or something = future big feature summary all ansi sql queries completely working expression tree parsing error messages, left factor dml, ddl, procedural sql position annotation type checker/ etc. lexer dialects quasi quotes typesafe sql dbms wrapper support for haskell extensibility performance analysis try out uu-parsing or polyparse, especially wrt error message improvements = stuff try and use the proper css theme create a header like in the haddock with simple-sql-parser + contents link change the toc gen so that it works the same as in haddock (same div, no links on the actual titles fix the page margins, and the table stuff: patches to the css? release checklist: hlint haddock review spell check update changelog update website text regenerate the examples on the index.txt = Later general tasks: docs add preamble to the rendered test page add links from the supported sql page to the rendered test page for each section -> have to section up the tests some more testing review tests to copy from hssqlppp add lots more tests using SQL from the xb2 manual much more table reference tests, for joins and aliases etc.? review internal sql collection for more syntax/tests other ---- demo program: convert tpch to sql server syntax exe processor run through other manuals for example queries and features: sql in a nutshell, sql guide, sql reference guide, sql standard, sql server manual, oracle manual, teradata manual + re-through postgresql manual and make notes in each case of all syntax and which isn't currently supported also. check the order of exports, imports and functions/cases in the files fix up the import namespaces/explicit names nicely ast checker: checks the ast represents valid syntax, the parser doesn't check as much as it could, and this can also be used to check generated trees. Maybe this doesn't belong in this package though? = other sql support top string literals full number literals -> other bases? apply, pivot maybe add dml and ddl, source poses, quasi quotes leave: type check, dialects, procedural, separate lexing? other dialect targets: postgres oracle teradata ms sql server mysql? db2? what other major dialects are there? sqlite sap dbmss (can't work out what are separate products or what are the dialects) here is an idea for a little feature: crunch sql: this takes sql and tries to make it as small as possible (basically, combining nested selects where possible and inlining ctes) expand sql: breaks apart complex sql using nested queries and ctes, try to make queries easier to understand in stages