diff --git a/Makefile b/Makefile index c768e75..30b4a7c 100644 --- a/Makefile +++ b/Makefile @@ -39,7 +39,7 @@ website : website-non-haddock build-haddock .PHONY : website-non-haddock website-non-haddock : build/main.css build/ocean.css build/index.html build/supported_sql.html \ - build/test_cases.html build/contributing.html build/release_checklist.html + build/test_cases.html build/contributing.html build/main.css : website/main.css @@ -59,10 +59,6 @@ build/supported_sql.html : website/supported_sql.asciidoc website/AddLinks.hs build/contributing.html : website/contributing.asciidoc website/AddLinks.hs asciidoctor website/contributing.asciidoc -o - | cabal -v0 exec runhaskell website/AddLinks.hs > build/contributing.html -build/release_checklist.html : website/release_checklist.asciidoc website/AddLinks.hs - asciidoctor website/release_checklist.asciidoc -o - | cabal -v0 exec runhaskell website/AddLinks.hs > build/release_checklist.html - - build/test_cases.html : website/RenderTestCases.hs cabal -v0 exec runhaskell -- --ghc-arg=-package=pretty-show -itools website/RenderTestCases.hs > build/test_cases.asciidoc asciidoctor build/test_cases.asciidoc -o - | \ diff --git a/TODO b/TODO index 2719d63..15df9c4 100644 --- a/TODO +++ b/TODO @@ -1,406 +1,96 @@ -This file is completely out of date. +Some random notes on what could be done with the package in the future. None of this is scheduled. +The most important thing is adding more support for needed SQL. Everything else is very secondary to this. + +Infrastructure +-------------- + +write a CI script + +decide if to use a code formatter - pro: it will preserve git blame stuff better + +switch the website to use markdown + +try to improve the usability of the rendered test cases + +add automated tests for the examples on the website + +add a few more examples to the website: + parse some sql and detect if it has a particular feature + do a transformation on some sql + idea: convert tpch to sql server syntax + generate some sql + format some sql + check if some sql parses + trivial documentation generation for ddl + trivial lint checker + demos: + crunch sql: this takes sql and tries to make it as small as possible + (combining nested selects where possible and inlining + ctes) + expand sql: + breaks apart complex sql using nested queries and ctes, try to make + queries easier to understand in stages + +write a beginners tutorial for how to add support for some new sql syntax + show how to develop parsers interactively, then tidy them up for merging + to the main branch + +review code coverage and see if there are any important gaps to fill in +set up hlint to run easily + +Code ---- -medium tasks next release +There could be more negative tests for lexing and dialect options. -review alters, and think about adding rename versions - which are really common and useful, but not in ansi - https://github.com/JakeWheat/simple-sql-parser/issues/20 +Check the fixity in the tableref parsing, see if there is anywhere else that needs tweaking. -try to get some control over the pretty printing and the error -messages by creating some dumps of pretty printing and error messages, -then can rerun these every so often to see how they've changed +Do all sql dialects have compatible fixities? If not, want to add dialect control over the fixity. -finish off going through the keyword list +add parse error recovery -do more examples -what are the use cases? - sql generator - queries - sql generator - ddl - parsing some sql - for what purpose - generating documentation of ddl - write some sort of trivial sql engine or wrapper around something? - write something that takes sql, modifies it, and outputs the result - lint checker? +add ability to type check: + uuagc still seems like the nicest option? + uuagc has an option to attach to an external ast now, so could + put the type checker in a separate package -do an example of adding some new syntax --> seems quite a few people are using this - and there are some feature requests - try to give people a path to implement features themselves +figure out how to support parsing some sql, transforming it, pretty printing it + while perserving as much of the original formatting as possible, and all the comments + an intermediate step is to minimise the difference in non whitespace/comment tokens + when you parse then pretty print any supported sql -goals: +add an annotation field to the syntax to make it more useful + add source positions to this annotation when parsing -1. if someone might want to use this, give them some toy examples to -help bootstrap them - -2. see if can encourage people who want some missing sql to add it -themselves - - - -review main missing sql bits - focus on more mainstream things - could also review main dialects - - -syntax from hssqlppp: - query hints, join hints - -unescaping identifiers and strings -continuation strings testing - -add tests for comment pretty printing: - use pretty then lex - -work on better dialect design: more basic customizability and rule / - callback driven - -review/fix documentation and website -fix the groups for generated tests - -check the .cabal file module lists - - -medium tasks next release + 1 -add annotation -lots more negative tests especially for lexing, and for dialects -escape, uescape -post hoc fixity -switch pretty printing to use ansi-wl-pprint - http://conscientiousprogrammer.com/blog/2015/12/17/24-days-of-hackage-2015-day-17-ansi-wl-pprint-avoiding-string-hacking/ - - -error message analysis: -start with a set of bad sql, generate & write -get error messages: - simplified ssp parser - tutorial parser - hssqlppp - and also: - postgres - mysql - sqlserver - oracle - db2 - vertica? -evaluate other parsing libs for error messages and general - feasibility, shortlist is: - megaparsec - trifecta - uuparsinglib - other desirables from parsing lib: - incremental parsing - context dependent lexer switch - continue after error - -create some benchmarks (to measure performance when modifying for - error messages, and to compare different parser libs for instance) - -use quickcheck in lexing - -What will make this library nice and complete: -List of all the SQL that it doesn't support -annotation, with positions coming from the parser -dml -ddl -procedural sql -dialects: reasonable support for sql server and oracle, and maybe also - postgres, mysql, teradata, redshift, sqlite, db2, sap stuff, etc. -good work on error messages -fixity code + get it right -review names of syntax -defaults handled better (use default/nothing instead of substituting - in the default) -evaluate uu parsing lib -> could at least remove need to do left - factoring, and maybe help make better error messages also ------ - -work on reasonable subset of sql which is similar to the current - subset and smaller than the complete 2011 target: describe the - exact target set for the next release - -improve the dialect testing: add notes on what to do - -position annotation in the syntax - -simple stuff for error message and pretty printing monitoring: - -create a sample set of valid statements to pretty print -pretty print these -compare every so often to catch regressions and approve improvements -start with tpch, and then add some others - -same with invalid statements to see the error messages -start with some simple scalar exprs and a big query expr which has - stuff (either tokens, whitespace or junk strings) - semi-systematically added and/or removed - -fixing the non idiomatic (pun!) suffix parsing: - typename parsing - identifier/app/agg/window parsing - join parsing in trefs (use chain? - tricky because of postfix onExpr) - top level and queryexprs parsing +can you make it properly extensible? the goal is for users to work with asts that + represent only the dialect they are working in review names in the syntax for correspondence with sql standard, avoid - gratuitous differences + gratuitous differences -touch up the expr hack as best as can, start thinking about - replacement for buildExprParser, maybe this can be a separate - general package, or maybe something like this already exists +reduce use of booleans in the syntax -careful review of token parses wrt trailing delimiters/junk - already - caught a few issues like this incidentally when working on other - stuff +quasi quotation support -undo mess in the code created by adding lots of new support: -much more documentation -refactor crufty bits -reorder the code -reconsider the names and structure of the constructors in the syntax -refactor the typename parser - it's a real mess -fix the lexing +use this lib to build a typesafe sql wrapper for haskell -add documentation in Parser.hs on the left factoring/error handling - approach +optimise the lexer: + add some benchmarks + do some experiments with left factoring + try to use the token approach with megaparsec -fixes: +rewrite bits of the parser, lots of it is a bit questionable + - an expert with megaparsec would write something simpler + I think it's not worth doing for the sake of it, but if a bit + is too difficult to add new features to, or to improve + the error messages, then it might be worth it -keyword tree, add explicit result then can use for joins also +work on error messages -keyword tree support prefix mode so can start from already parsed - token +review the crazy over the top lexer testing + maybe it's enough to document an easy way to skip these tests -left factor/try removal summary (this list needs updating): - -identifier starts: - interval literal - character set literal - typed literals, multikeywords - identifier - app, agg, window - keyword function -issues in the special op internals -not between + other ops: needs new expression parsing - not in also - in suffix also - lots of overlap with binary and postfix multi keyword operators - quantified comparison also -issues in the typename parsing -dot in identifiers and as operator -issues in the symbol parser - hardcode all the symbols in the symbol parser/split? -conflict with in suffix and in in position - -rules for changing the multi keyword parsing: - if a keyword must be followed by another - e.g. left join, want to refactor to produce 'expected "left join"' - if the keyword is optionally followed by another, e.g. with - recursive, then don't do this. - -change join defaults to be defaults - - -rough SQL 2011 todo, including tests to write: - -review the commented out reserved keyword entries and work out how to - fix - -test case insensitvity and case preservation - -big areas: -window functions -nested window functions -case - -table ref: tablesample, time period spec, only, unnest, table, lateral - bug -joined table: partitioned joins -group by: set quantifier -window clause - -other areas: -unicode escape, strings and idens -character set behaviour review -datetime literals -mixed quoting identifier chains -names/identifiers careful review -general value bits - collate for -numeric val fn -string exp fn -datetime exp fn -interval exp fn -rows -interval qualifier -with -setop -order/offset/fetch -search/cycle -preds: -between -in -like -similar -regex like? -null -normalize -match -overlaps -distinct -member -submultiset -period - -alias for * in select list - -create list of unsupported syntax: xml, ref, subtypes, modules? - ---- - - - -after next release - -medium term goals: -1. replace parser and syntax in hssqlppp with this code (keep two - separate packages in sync) -2. this replacement should have better error messages, much more - complete ansi sql 2011 support, and probably will have reasonable - support for these dialects: mssql, oracle and teradata. - -review areas where this parser is too permissive, e.g. value - expressions allowed where column reference names only should be - allowed, such as group by, order by (perhaps there can be a flag or - warnings or something), unqualified asterisk in select list - -fix the expression parser completely: the realistic way is to adjust - for precedence and associativity after parsing since the concrete - syntax is so messy. should also use this expression parser for - parsing joins and for set operations, maybe other areas. - -table expression in syntax: - QueryExpr = Select SelectList (Maybe TableExpr) - and the TableExpr contains all the other bits? - -change the booleans in the ast to better types for less ambiguity? - -decide how to handle character set literals and identifiers: don't - have any intention of actually supporting switching character sets - in the middle of parsing so maybe this would be better disabled? - -review places in the parse which should allow only a fixed set of - identifiers (e.g. in interval literals), keep in mind other - dialects and extensibility - -decide whether to represent numeric literals better, instead of a - single string - break up into parts, or parse to a Decimal or - something - - -= future big feature summary - -all ansi sql queries -completely working expression tree parsing -error messages, left factor -dml, ddl, procedural sql -position annotation -type checker/ etc. -lexer -dialects -quasi quotes -typesafe sql dbms wrapper support for haskell -extensibility -performance analysis - -try out uu-parsing or polyparse, especially wrt error message - improvements - -= stuff - -try and use the proper css theme - create a header like in the haddock with simple-sql-parser + - contents link - change the toc gen so that it works the same as in haddock (same - div, no links on the actual titles - fix the page margins, and the table stuff: patches to the css? - -release checklist: -hlint -haddock review -spell check -update changelog -update website text -regenerate the examples on the index.txt - -= Later general tasks: - -docs - -add preamble to the rendered test page - -add links from the supported sql page to the rendered test page for - each section -> have to section up the tests some more - -testing - -review tests to copy from hssqlppp - -add lots more tests using SQL from the xb2 manual - -much more table reference tests, for joins and aliases etc.? - -review internal sql collection for more syntax/tests - -other - ----- - -demo program: convert tpch to sql server syntax exe processor - -run through other manuals for example queries and features: sql in a - nutshell, sql guide, sql reference guide, sql standard, sql server - manual, oracle manual, teradata manual + re-through postgresql - manual and make notes in each case of all syntax and which isn't - currently supported also. - -check the order of exports, imports and functions/cases in the files -fix up the import namespaces/explicit names nicely - -ast checker: checks the ast represents valid syntax, the parser - doesn't check as much as it could, and this can also be used to - check generated trees. Maybe this doesn't belong in this package - though? - -= other sql support - -top -string literals -full number literals -> other bases? -apply, pivot - -maybe add dml and ddl, source poses, quasi quotes - -leave: type check, dialects, procedural, separate lexing? - -other dialect targets: -postgres -oracle -teradata -ms sql server -mysql? -db2? -what other major dialects are there? -sqlite -sap dbmss (can't work out what are separate products or what are the - dialects) - - - -here is an idea for a little feature: -crunch sql: this takes sql and tries to make it as small as possible - (basically, combining nested selects where possible and inlining - ctes) -expand sql: - breaks apart complex sql using nested queries and ctes, try to make - queries easier to understand in stages +check more of the formatting of the pretty printing and add regression tests for this +is there a way to get incremental parsing like attoparsec? diff --git a/website/AddLinks.hs b/website/AddLinks.hs index c9dc268..53ddcf1 100644 --- a/website/AddLinks.hs +++ b/website/AddLinks.hs @@ -20,6 +20,7 @@ linkSection = \
  • Haddock
  • \n\ \
  • Supported SQL
  • \n\ \
  • Test cases
  • \n\ + \
  • Contributing
  • \n\ \\n\ \
    \n\ \