update docs

2024-01-12 19:25:13 +00:00 · 2024-01-12 19:25:13 +00:00 · fa5091ac80
commit fa5091ac80
parent fe6b71fa2a
6 changed files with 213 additions and 451 deletions
--- a/6
+++ b/6
@ -39,7 +39,7 @@ website : website-non-haddock build-haddock
 .PHONY : website-non-haddock
 website-non-haddock : build/main.css build/ocean.css build/index.html build/supported_sql.html \
-          build/test_cases.html build/contributing.html build/release_checklist.html
+          build/test_cases.html build/contributing.html
 build/main.css : website/main.css
@ -59,10 +59,6 @@ build/supported_sql.html : website/supported_sql.asciidoc website/AddLinks.hs
 build/contributing.html : website/contributing.asciidoc website/AddLinks.hs
 	asciidoctor website/contributing.asciidoc -o - | cabal -v0 exec runhaskell website/AddLinks.hs > build/contributing.html
 build/release_checklist.html : website/release_checklist.asciidoc website/AddLinks.hs
 	asciidoctor website/release_checklist.asciidoc -o - | cabal -v0 exec runhaskell website/AddLinks.hs > build/release_checklist.html
 build/test_cases.html : website/RenderTestCases.hs
 	cabal -v0 exec runhaskell -- --ghc-arg=-package=pretty-show -itools website/RenderTestCases.hs > build/test_cases.asciidoc
 	asciidoctor build/test_cases.asciidoc -o - | \
--- a/460
+++ b/460
@ -1,406 +1,96 @@
-This file is completely out of date.
+Some random notes on what could be done with the package in the future. None of this is scheduled.
 The most important thing is adding more support for needed SQL. Everything else is very secondary to this.
 Infrastructure
 --------------
 write a CI script
 decide if to use a code formatter - pro: it will preserve git blame stuff better
 switch the website to use markdown
 try to improve the usability of the rendered test cases
 add automated tests for the examples on the website
 add a few more examples to the website:
  parse some sql and detect if it has a particular feature
  do a transformation on some sql
    idea: convert tpch to sql server syntax
  generate some sql
  format some sql
  check if some sql parses
  trivial documentation generation for ddl
  trivial lint checker
  demos:
  crunch sql: this takes sql and tries to make it as small as possible
    (combining nested selects where possible and inlining
     ctes)
  expand sql:
     breaks apart complex sql using nested queries and ctes, try to make
     queries easier to understand in stages
 write a beginners tutorial for how to add support for some new sql syntax
  show how to develop parsers interactively, then tidy them up for merging
  to the main branch
 review code coverage and see if there are any important gaps to fill in
 set up hlint to run easily
 Code
 ----
-medium tasks next release
+There could be more negative tests for lexing and dialect options.
-review alters, and think about adding rename versions
+Check the fixity in the tableref parsing, see if there is anywhere else that needs tweaking.
  which are really common and useful, but not in ansi
  https://github.com/JakeWheat/simple-sql-parser/issues/20
-try to get some control over the pretty printing and the error
+Do all sql dialects have compatible fixities? If not, want to add dialect control over the fixity.
 messages by creating some dumps of pretty printing and error messages,
 then can rerun these every so often to see how they've changed
-finish off going through the keyword list
+add parse error recovery
-do more examples
+add ability to type check:
-what are the use cases?
+  uuagc still seems like the nicest option?
-  sql generator - queries
+  uuagc has an option to attach to an external ast now, so could
-  sql generator - ddl
+    put the type checker in a separate package
  parsing some sql - for what purpose
    generating documentation of ddl
    write some sort of trivial sql engine or wrapper around something?
    write something that takes sql, modifies it, and outputs the result
    lint checker?
-do an example of adding some new syntax
+figure out how to support parsing some sql, transforming it, pretty printing it
-> seems quite a few people are using this
+  while perserving as much of the original formatting as possible, and all the comments
-  and there are some feature requests
+  an intermediate step is to minimise the difference in non whitespace/comment tokens
-  try to give people a path to implement features themselves
+  when you parse then pretty print any supported sql
-goals:
+add an annotation field to the syntax to make it more useful
  add source positions to this annotation when parsing
-1. if someone might want to use this, give them some toy examples to
+can you make it properly extensible? the goal is for users to work with asts that
-help bootstrap them
+  represent only the dialect they are working in
 2. see if can encourage people who want some missing sql to add it
 themselves
 review main missing sql bits - focus on more mainstream things
  could also review main dialects
 syntax from hssqlppp:
  query hints, join hints
 unescaping identifiers and strings
 continuation strings testing
 add tests for comment pretty printing:
  use pretty then lex
 work on better dialect design: more basic customizability and rule /
   callback driven
 review/fix documentation and website
 fix the groups for generated tests
 check the .cabal file module lists
 medium tasks next release + 1
 add annotation
 lots more negative tests especially for lexing, and for dialects
 escape, uescape
 post hoc fixity
 switch pretty printing to use ansi-wl-pprint
  http://conscientiousprogrammer.com/blog/2015/12/17/24-days-of-hackage-2015-day-17-ansi-wl-pprint-avoiding-string-hacking/
 error message analysis:
 start with a set of bad sql, generate & write
 get error messages:
  simplified ssp parser
  tutorial parser
  hssqlppp
  and also:
    postgres
    mysql
    sqlserver
    oracle
    db2
    vertica?
 evaluate other parsing libs for error messages and general
   feasibility, shortlist is:
   megaparsec
   trifecta
   uuparsinglib
   other desirables from parsing lib:
     incremental parsing
     context dependent lexer switch
     continue after error
 create some benchmarks (to measure performance when modifying for
   error messages, and to compare different parser libs for instance)
 use quickcheck in lexing
 What will make this library nice and complete:
 List of all the SQL that it doesn't support
 annotation, with positions coming from the parser
 dml
 ddl
 procedural sql
 dialects: reasonable support for sql server and oracle, and maybe also
   postgres, mysql, teradata, redshift, sqlite, db2, sap stuff, etc.
 good work on error messages
 fixity code + get it right
 review names of syntax
 defaults handled better (use default/nothing instead of substituting
   in the default)
 evaluate uu parsing lib -> could at least remove need to do left
   factoring, and maybe help make better error messages also
 -----
 work on reasonable subset of sql which is similar to the current
   subset and smaller than the complete 2011 target: describe the
   exact target set for the next release
 improve the dialect testing: add notes on what to do
 position annotation in the syntax
 simple stuff for error message and pretty printing monitoring:
 create a sample set of valid statements to pretty print
 pretty print these
 compare every so often to catch regressions and approve improvements
 start with tpch, and then add some others
 same with invalid statements to see the error messages
 start with some simple scalar exprs and a big query expr which has
   stuff (either tokens, whitespace or junk strings)
   semi-systematically added and/or removed
 fixing the non idiomatic (pun!) suffix parsing:
  typename parsing
  identifier/app/agg/window parsing
  join parsing in trefs (use chain? - tricky because of postfix onExpr)
  top level and queryexprs parsing
 review names in the syntax for correspondence with sql standard, avoid
-   gratuitous differences
+  gratuitous differences
-touch up the expr hack as best as can, start thinking about
+reduce use of booleans in the syntax
   replacement for buildExprParser, maybe this can be a separate
   general package, or maybe something like this already exists
-careful review of token parses wrt trailing delimiters/junk - already
+quasi quotation support
   caught a few issues like this incidentally when working on other
   stuff
-undo mess in the code created by adding lots of new support:
+use this lib to build a typesafe sql wrapper for haskell
 much more documentation
 refactor crufty bits
 reorder the code
 reconsider the names and structure of the constructors in the syntax
 refactor the typename parser - it's a real mess
 fix the lexing
-add documentation in Parser.hs on the left factoring/error handling
+optimise the lexer:
-   approach
+  add some benchmarks
  do some experiments with left factoring
  try to use the token approach with megaparsec
-fixes:
+rewrite bits of the parser, lots of it is a bit questionable
  - an expert with megaparsec would write something simpler
  I think it's not worth doing for the sake of it, but if a bit
    is too difficult to add new features to, or to improve
    the error messages, then it might be worth it
-keyword tree, add explicit result then can use for joins also
+work on error messages
-keyword tree support prefix mode so can start from already parsed
+review the crazy over the top lexer testing
-   token
+  maybe it's enough to document an easy way to skip these tests
-left factor/try removal summary (this list needs updating):
+check more of the formatting of the pretty printing and add regression tests for this
 identifier starts:
  interval literal
  character set literal
  typed literals, multikeywords
  identifier
  app, agg, window
  keyword function
 issues in the special op internals
 not between + other ops: needs new expression parsing
  not in also
  in suffix also
  lots of overlap with binary and postfix multi keyword operators
  quantified comparison also
 issues in the typename parsing
 dot in identifiers and as operator
 issues in the symbol parser
  hardcode all the symbols in the symbol parser/split?
 conflict with in suffix and in in position
 rules for changing the multi keyword parsing:
  if a keyword must be followed by another
    e.g. left join, want to refactor to produce 'expected "left join"'
  if the keyword is optionally followed by another, e.g. with
   recursive, then don't do this.
 change join defaults to be defaults
 rough SQL 2011 todo, including tests to write:
 review the commented out reserved keyword entries and work out how to
   fix
 test case insensitvity and case preservation
 big areas:
 window functions
 nested window functions
 case
 table ref: tablesample, time period spec, only, unnest, table, lateral
   bug
 joined table: partitioned joins
 group by: set quantifier
 window clause
 other areas:
 unicode escape, strings and idens
 character set behaviour review
 datetime literals
 mixed quoting identifier chains
 names/identifiers careful review
 general value bits
  collate for
 numeric val fn
 string exp fn
 datetime exp fn
 interval exp fn
 rows
 interval qualifier
 with
 setop
 order/offset/fetch
 search/cycle
 preds:
 between
 in
 like
 similar
 regex like?
 null
 normalize
 match
 overlaps
 distinct
 member
 submultiset
 period
 alias for * in select list
 create list of unsupported syntax: xml, ref, subtypes, modules?
 ---
 after next release
 medium term goals:
 1. replace parser and syntax in hssqlppp with this code (keep two
   separate packages in sync)
 2. this replacement should have better error messages, much more
   complete ansi sql 2011 support, and probably will have reasonable
   support for these dialects: mssql, oracle and teradata.
 review areas where this parser is too permissive, e.g. value
   expressions allowed where column reference names only should be
   allowed, such as group by, order by (perhaps there can be a flag or
   warnings or something), unqualified asterisk in select list
 fix the expression parser completely: the realistic way is to adjust
   for precedence and associativity after parsing since the concrete
   syntax is so messy. should also use this expression parser for
   parsing joins and for set operations, maybe other areas.
 table expression in syntax:
  QueryExpr = Select SelectList (Maybe TableExpr)
  and the TableExpr contains all the other bits?
 change the booleans in the ast to better types for less ambiguity?
 decide how to handle character set literals and identifiers: don't
   have any intention of actually supporting switching character sets
   in the middle of parsing so maybe this would be better disabled?
 review places in the parse which should allow only a fixed set of
   identifiers (e.g. in interval literals), keep in mind other
   dialects and extensibility
 decide whether to represent numeric literals better, instead of a
   single string - break up into parts, or parse to a Decimal or
   something
 = future big feature summary
 all ansi sql queries
 completely working expression tree parsing
 error messages, left factor
 dml, ddl, procedural sql
 position annotation
 type checker/ etc.
 lexer
 dialects
 quasi quotes
 typesafe sql dbms wrapper support for haskell
 extensibility
 performance analysis
 try out uu-parsing or polyparse, especially wrt error message
   improvements
 = stuff
 try and use the proper css theme
  create a header like in the haddock with simple-sql-parser +
    contents link
  change the toc gen so that it works the same as in haddock (same
    div, no links on the actual titles
  fix the page margins, and the table stuff: patches to the css?
 release checklist:
 hlint
 haddock review
 spell check
 update changelog
 update website text
 regenerate the examples on the index.txt
 = Later general tasks:
 docs
 add preamble to the rendered test page
 add links from the supported sql page to the rendered test page for
   each section -> have to section up the tests some more
 testing
 review tests to copy from hssqlppp
 add lots more tests using SQL from the xb2 manual
 much more table reference tests, for joins and aliases etc.?
 review internal sql collection for more syntax/tests
 other
 ----
 demo program: convert tpch to sql server syntax exe processor
 run through other manuals for example queries and features: sql in a
   nutshell, sql guide, sql reference guide, sql standard, sql server
   manual, oracle manual, teradata manual + re-through postgresql
   manual and make notes in each case of all syntax and which isn't
   currently supported also.
 check the order of exports, imports and functions/cases in the files
 fix up the import namespaces/explicit names nicely
 ast checker: checks the ast represents valid syntax, the parser
   doesn't check as much as it could, and this can also be used to
   check generated trees. Maybe this doesn't belong in this package
   though?
 = other sql support
 top
 string literals
 full number literals -> other bases?
 apply, pivot
 maybe add dml and ddl, source poses, quasi quotes
 leave: type check, dialects, procedural, separate lexing?
 other dialect targets:
 postgres
 oracle
 teradata
 ms sql server
 mysql?
 db2?
 what other major dialects are there?
 sqlite
 sap dbmss (can't work out what are separate products or what are the
   dialects)
 here is an idea for a little feature:
 crunch sql: this takes sql and tries to make it as small as possible
  (basically, combining nested selects where possible and inlining
   ctes)
 expand sql:
  breaks apart complex sql using nested queries and ctes, try to make
   queries easier to understand in stages
 is there a way to get incremental parsing like attoparsec?
--- a/website/AddLinks.hs
+++ b/website/AddLinks.hs
@ -20,6 +20,7 @@ linkSection =
  \<li><a href='haddock/index.html'>Haddock</li>\n\
  \<li><a href=\"supported_sql.html\" class=\"bare\">Supported SQL</a></li>\n\
  \<li><a href=\"test_cases.html\">Test cases</a></li>\n\
  \<li><a href=\"contributing.html\">Contributing</a></li>\n\
  \</ul>\n\
  \<br />\n\
  \<ul class=\"sectlevel1\">\n\
--- a/website/contributing.asciidoc
+++ b/website/contributing.asciidoc
@ -8,40 +8,152 @@
 == Contributing to simple sql parser
-Contributions are welcome. It's preferred if they follow some guidelines:
+Guidelines:
 If you add something to the public api, follow the pattern already set for haddock.
-If something isn't ansi sql, add it under a dialect flag which isn't enabled in the ansi dialect.
+If something isn't ANSI SQL, add it under a dialect flag which isn't enabled in the ANSI dialect.
-If you add dialect flags, add them to the appropriate dialects, create a new one if it's a system which doesn't already have a dialect.
+If you add dialect flags, add them to the appropriate dialects, create a new one if the dialect doesn't already exist. The current dialects are very much WIP, improve them if you see a gap you need fixing.
-Testing
+Add tests for anything you add, modify or fix.
-Run all the preexisting tests and make sure they continue to pass.
+Tests should provide examples of SQL that now parses and what it parses to.
-Add tests for anything you add, negative tests are a good idea too - check something that's behind a dialect flag doesn't parse when disabled.
+When it's ready, make a pull request on github.
 It's ideal if tests for something set with a dialect flag go in a test file for that dialect flag, unless it's an ansi feature that's disabled in other dialects. It's also an option to put tests in a test file dedicated to the dialect that the dialect flag was introduced for. But the current testing doesn't quite stick to this approach at the moment, it's not the worse thing about the codebase.
 == Key design notes
-The parsing is done using the megaparsec library.
+The parsing is done using the Megaparsec library. The parser uses a separate lexer, also implemented with Megaparsec.
-The parser uses a separate lexer. I think this makes the code a lot simpler. It used to be a big speed boost over naively not using a separate lexer with parsec, I'm not sure this is still the case with megaparsec.
+The dialect system was introduced as a way to deal with a messy problem. Users of the library are able to decide what to consider reserved keywords - this is better than asking them to modify the library source.
-SQL comes in a huge variety of annoyingly different dialects. The aspirational goal is to have a dialect flag for each dialect that you pass to the parser and it parses that dialect and rejects things not in that dialect. This is a bit of a stretch since most major SQL systems have huge numbers of dialect options. One think you learn when writing a non toy SQL parser is you cannot hope to comprehensively support anything, you just have to do enough, and add bits you missed when you need them.
+A tradeoff is all code that uses the library needs to be prepared to deal with/ignore parts of the abstract syntax which supports all features from all dialects.
-A big tradeoff here is all code needs to be prepared to deal with the abstract syntax which supports all features from all dialects. I think the least unreasonable way you could fix this would be to have a system which generates dialect specific simple-sql-parser packages, which is still very unreasonable.
+== Legal business
-The system probably doesn't always pretty print in the right dialect from correct syntax. This might need some changes if it causes a problem.
+All contributions remain copyright of the person who wrote them. By contributing to the main repository, including but not limited to via a pull request, the copyright holder agrees to license those contributions under the BSD 3-clause license. This includes all contributions already made to the project.
-TODO: handling of keywords, and relationship with dialect
+== Release checklist
-TODO: tests overview in addition to the above
+Check the version in the cabal file - update it if it hasn't already been updated. git grep for any other mentions of the version number that need updating.
-TODO: how the website works, what it contains
+Update the changelog, use git diff or similar to try to reduce the chance of missing anything important.
-== Releasing
+Run the tests (if any fail at the point of thinking about a release, then something has gone horribly wrong ...)
-See the link:release_checklist.html[] for things that should be done before each release.
+----
 cabal test
 ----
 Generate the website
 ----
 make website
 ----
 It's a bit wonky so try running it a second time if it fails.
 Then:
 * check the webpages appear nicely
 * check all the tests are rendered on the example page -> need to find a robust way of doing this, because there are huge numbers and it's impossible to eyeball and tell if it's good unless you somehow spot a problem.
 * check the examples on the main page to check if they need updating
 Do the cabal checks:
 ----
 cabal update
 cabal outdated
 cabal check
 ----
 Update stack.yaml to the latest lts - check this page: https://www.stackage.org/ . While updating, check the extra-deps field, if there are any there, see if they can be removed.
 Install latest stack and check it works - maybe the stack.yaml file needs a tweak, maybe the cabal file. 
 ----
 ghcup list
 ghcup install stack [LATEST FROM THE LIST]
 stack test 
 ----
 Run the tests on the previous 2 ghcs latest point releases, and the latest ghc, each with the latest cabal-install they support (e.g. as of the start of 2024, these three ghc versions are 9.8.1, 9.6.4, 9.4.8). This is now trivial to do with ghcup, amazing progress in Haskell tools in recent years.
 Build the release tarball, run a test with an example using this tarball:
 ----
 cabal sdist
 mkdir temp-build
 # get the path to the tar.gz from the output of cabal sdist
 cp simple-sql-parser/main/dist-newstyle/sdist/simple-sql-parser-0.X.X.tar.gz temp-build
 cd temp-build
 cabal init -n
 cp ../tools/SimpleSqlParserTool.hs app/Main.hs
 ----
 Add these to the build-depends: for the Main in the new cabal file, temp-build.cabal:
 ----
        simple-sql-parser == 0.X.X,
        pretty-show,
        text
 ----
 Add a cabal.project file containing:
 ----
 packages:
      ./
      ./simple-sql-parser-0.X.X.tar.gz
 ----
 Run the test:
 ----
 cabal run temp-build -- parse -c "select 1"
 ----
 Example of output on success:
 ----
 $ cabal run temp-build -- parse -c "select 1"
 Build profile: -w ghc-9.8.1 -O1
 In order, the following will be built (use -v for more details):
 - simple-sql-parser-0.7.0 (lib) (requires build)
 - temp-build-0.1.0.0 (exe:temp-build) (first run)
 Starting     simple-sql-parser-0.7.0 (lib)
 Building     simple-sql-parser-0.7.0 (lib)
 Installing   simple-sql-parser-0.7.0 (lib)
 Completed    simple-sql-parser-0.7.0 (lib)
 Configuring executable 'temp-build' for temp-build-0.1.0.0..
 Preprocessing executable 'temp-build' for temp-build-0.1.0.0..
 Building executable 'temp-build' for temp-build-0.1.0.0..
 [1 of 1] Compiling Main             ( app/Main.hs, /home/jake/wd/simple-sql-parser/main/temp-build/dist-newstyle/build/x86_64-linux/ghc-9.8.1/temp-build-0.1.0.0/x/temp-build/build/temp-build/temp-build-tmp/Main.o )
 [2 of 2] Linking /home/jake/wd/simple-sql-parser/main/temp-build/dist-newstyle/build/x86_64-linux/ghc-9.8.1/temp-build-0.1.0.0/x/temp-build/build/temp-build/temp-build
 [ SelectStatement
    Select
      { qeSetQuantifier = SQDefault
      , qeSelectList = [ ( NumLit "1" , Nothing ) ]
      , qeFrom = []
      , qeWhere = Nothing
      , qeGroupBy = []
      , qeHaving = Nothing
      , qeOrderBy = []
      , qeOffset = Nothing
      , qeFetchFirst = Nothing
      }
 ]
 ----
 TODO: hlint?, how to do a spell check, what about automatic code formatting?
 If there are any non trivial changes to the website or api, upload a new website.
 Upload candidate to hackage, run a test with example using this package
  - don't remember how this works, but I think you'll do the same as testing the tarball locally, but don't copy the tarball or add a cabal.project file, after uploading the candidate I think you just need to do a 'cabal update', then the cabal build should find the candidate if you gave it the exact version.
 If all good, release the candidate - a button on the hackage website.
 Todo: try to turn as much of this into a script, with a nice report as possible, order this list properly, say what you need to check in more detail, say what else you need to redo if any steps need actions.
--- a/website/index.asciidoc
+++ b/website/index.asciidoc
@ -17,12 +17,11 @@ This is the documentation for version 0.7.0. Documentation for other
 versions is available here:
 http://jakewheat.github.io/simple-sql-parser/.
-Status: covers a lot of queries already, but the public API is
+Status: usable for parsing a substantial amount of SQL. Adding support
-probably not very stable, since adding support for all the
+for new SQL is relatively easy. Expect a little bit of churn on the AST
-not-yet-supported ANSI SQL syntax, then other dialects of SQL is
+types when support for new SQL features is added.
 likely to change the abstract syntax types considerably.
-Tested with GHC 9.8.1, 9.6.4, and 9.4.8.
+This version is tested with GHC 9.8.1, 9.6.4, and 9.4.8.
 == Feature support
@ -417,7 +416,7 @@ http://jakewheat.github.io/intro_to_parsing/ (TODO: this is out of date, hopeful
 == Contributing
-Contributions are welcome, there are some notes on these pages: link:contributing.html[], link:release_checklist.html[].
+See link:contributing.html[].
 == Links
@ -436,4 +435,4 @@ The simple-sql-parser is a lot less simple than it used to be. If you
 just need to parse much simpler SQL than this, or want to start with a
 simpler parser and modify it slightly, you could also look at the
 basic query parser in the intro_to_parsing project, the code is here:
-link:https://github.com/JakeWheat/intro_to_parsing/blob/master/SimpleSQLQueryParser0.lhs[SimpleSQLQueryParser].
+link:https://github.com/JakeWheat/intro_to_parsing/blob/master/SimpleSQLQueryParser0.lhs[SimpleSQLQueryParser] (TODO: this is out of date, hopefully it will be updated at some point).
--- a/website/release_checklist.asciidoc
+++ b/website/release_checklist.asciidoc
@ -1,36 +0,0 @@
 :toc: right
 :sectnums:
 :toclevels: 10
 :source-highlighter: pygments
 = Release checklist
 Check the version in the cabal file - update it if it hasn't already been updated
 Update the changelog, use git diff or similar to try to avoid missing anything important
 run the tests
 generate the website:
 check the webpages appear nicely
 check all the tests are rendered on the example page
 check the examples on the main page to check if they need updating
 run cabal update, cabal outdated. cabal check
 update stack.yaml to latest lts, install latest stack, run stack build
 run the tests on the previous 2 ghcs' latest point releases, and the latest ghc, each with the latest cabal-install they support
 build the release tarball, run a test with an example using this tarball
 if there are any non trivial changes, upload a new wesbite
 upload candidate to hackage, run a test with example using this package
 if all good, release the candidate
 Todo: try to turn as much of this into a script, with a nice report as possible, order this list properly, say what you need to check in more detail, say what else you need to redo if any steps need actions