1
Fork 0

update docs

This commit is contained in:
Jake Wheat 2024-01-12 19:25:13 +00:00
parent fe6b71fa2a
commit fa5091ac80
6 changed files with 213 additions and 451 deletions

View file

@ -39,7 +39,7 @@ website : website-non-haddock build-haddock
.PHONY : website-non-haddock
website-non-haddock : build/main.css build/ocean.css build/index.html build/supported_sql.html \
build/test_cases.html build/contributing.html build/release_checklist.html
build/test_cases.html build/contributing.html
build/main.css : website/main.css
@ -59,10 +59,6 @@ build/supported_sql.html : website/supported_sql.asciidoc website/AddLinks.hs
build/contributing.html : website/contributing.asciidoc website/AddLinks.hs
asciidoctor website/contributing.asciidoc -o - | cabal -v0 exec runhaskell website/AddLinks.hs > build/contributing.html
build/release_checklist.html : website/release_checklist.asciidoc website/AddLinks.hs
asciidoctor website/release_checklist.asciidoc -o - | cabal -v0 exec runhaskell website/AddLinks.hs > build/release_checklist.html
build/test_cases.html : website/RenderTestCases.hs
cabal -v0 exec runhaskell -- --ghc-arg=-package=pretty-show -itools website/RenderTestCases.hs > build/test_cases.asciidoc
asciidoctor build/test_cases.asciidoc -o - | \

460
TODO
View file

@ -1,406 +1,96 @@
This file is completely out of date.
Some random notes on what could be done with the package in the future. None of this is scheduled.
The most important thing is adding more support for needed SQL. Everything else is very secondary to this.
Infrastructure
--------------
write a CI script
decide if to use a code formatter - pro: it will preserve git blame stuff better
switch the website to use markdown
try to improve the usability of the rendered test cases
add automated tests for the examples on the website
add a few more examples to the website:
parse some sql and detect if it has a particular feature
do a transformation on some sql
idea: convert tpch to sql server syntax
generate some sql
format some sql
check if some sql parses
trivial documentation generation for ddl
trivial lint checker
demos:
crunch sql: this takes sql and tries to make it as small as possible
(combining nested selects where possible and inlining
ctes)
expand sql:
breaks apart complex sql using nested queries and ctes, try to make
queries easier to understand in stages
write a beginners tutorial for how to add support for some new sql syntax
show how to develop parsers interactively, then tidy them up for merging
to the main branch
review code coverage and see if there are any important gaps to fill in
set up hlint to run easily
Code
----
medium tasks next release
There could be more negative tests for lexing and dialect options.
review alters, and think about adding rename versions
which are really common and useful, but not in ansi
https://github.com/JakeWheat/simple-sql-parser/issues/20
Check the fixity in the tableref parsing, see if there is anywhere else that needs tweaking.
try to get some control over the pretty printing and the error
messages by creating some dumps of pretty printing and error messages,
then can rerun these every so often to see how they've changed
Do all sql dialects have compatible fixities? If not, want to add dialect control over the fixity.
finish off going through the keyword list
add parse error recovery
do more examples
what are the use cases?
sql generator - queries
sql generator - ddl
parsing some sql - for what purpose
generating documentation of ddl
write some sort of trivial sql engine or wrapper around something?
write something that takes sql, modifies it, and outputs the result
lint checker?
add ability to type check:
uuagc still seems like the nicest option?
uuagc has an option to attach to an external ast now, so could
put the type checker in a separate package
do an example of adding some new syntax
-> seems quite a few people are using this
and there are some feature requests
try to give people a path to implement features themselves
figure out how to support parsing some sql, transforming it, pretty printing it
while perserving as much of the original formatting as possible, and all the comments
an intermediate step is to minimise the difference in non whitespace/comment tokens
when you parse then pretty print any supported sql
goals:
add an annotation field to the syntax to make it more useful
add source positions to this annotation when parsing
1. if someone might want to use this, give them some toy examples to
help bootstrap them
2. see if can encourage people who want some missing sql to add it
themselves
review main missing sql bits - focus on more mainstream things
could also review main dialects
syntax from hssqlppp:
query hints, join hints
unescaping identifiers and strings
continuation strings testing
add tests for comment pretty printing:
use pretty then lex
work on better dialect design: more basic customizability and rule /
callback driven
review/fix documentation and website
fix the groups for generated tests
check the .cabal file module lists
medium tasks next release + 1
add annotation
lots more negative tests especially for lexing, and for dialects
escape, uescape
post hoc fixity
switch pretty printing to use ansi-wl-pprint
http://conscientiousprogrammer.com/blog/2015/12/17/24-days-of-hackage-2015-day-17-ansi-wl-pprint-avoiding-string-hacking/
error message analysis:
start with a set of bad sql, generate & write
get error messages:
simplified ssp parser
tutorial parser
hssqlppp
and also:
postgres
mysql
sqlserver
oracle
db2
vertica?
evaluate other parsing libs for error messages and general
feasibility, shortlist is:
megaparsec
trifecta
uuparsinglib
other desirables from parsing lib:
incremental parsing
context dependent lexer switch
continue after error
create some benchmarks (to measure performance when modifying for
error messages, and to compare different parser libs for instance)
use quickcheck in lexing
What will make this library nice and complete:
List of all the SQL that it doesn't support
annotation, with positions coming from the parser
dml
ddl
procedural sql
dialects: reasonable support for sql server and oracle, and maybe also
postgres, mysql, teradata, redshift, sqlite, db2, sap stuff, etc.
good work on error messages
fixity code + get it right
review names of syntax
defaults handled better (use default/nothing instead of substituting
in the default)
evaluate uu parsing lib -> could at least remove need to do left
factoring, and maybe help make better error messages also
-----
work on reasonable subset of sql which is similar to the current
subset and smaller than the complete 2011 target: describe the
exact target set for the next release
improve the dialect testing: add notes on what to do
position annotation in the syntax
simple stuff for error message and pretty printing monitoring:
create a sample set of valid statements to pretty print
pretty print these
compare every so often to catch regressions and approve improvements
start with tpch, and then add some others
same with invalid statements to see the error messages
start with some simple scalar exprs and a big query expr which has
stuff (either tokens, whitespace or junk strings)
semi-systematically added and/or removed
fixing the non idiomatic (pun!) suffix parsing:
typename parsing
identifier/app/agg/window parsing
join parsing in trefs (use chain? - tricky because of postfix onExpr)
top level and queryexprs parsing
can you make it properly extensible? the goal is for users to work with asts that
represent only the dialect they are working in
review names in the syntax for correspondence with sql standard, avoid
gratuitous differences
gratuitous differences
touch up the expr hack as best as can, start thinking about
replacement for buildExprParser, maybe this can be a separate
general package, or maybe something like this already exists
reduce use of booleans in the syntax
careful review of token parses wrt trailing delimiters/junk - already
caught a few issues like this incidentally when working on other
stuff
quasi quotation support
undo mess in the code created by adding lots of new support:
much more documentation
refactor crufty bits
reorder the code
reconsider the names and structure of the constructors in the syntax
refactor the typename parser - it's a real mess
fix the lexing
use this lib to build a typesafe sql wrapper for haskell
add documentation in Parser.hs on the left factoring/error handling
approach
optimise the lexer:
add some benchmarks
do some experiments with left factoring
try to use the token approach with megaparsec
fixes:
rewrite bits of the parser, lots of it is a bit questionable
- an expert with megaparsec would write something simpler
I think it's not worth doing for the sake of it, but if a bit
is too difficult to add new features to, or to improve
the error messages, then it might be worth it
keyword tree, add explicit result then can use for joins also
work on error messages
keyword tree support prefix mode so can start from already parsed
token
review the crazy over the top lexer testing
maybe it's enough to document an easy way to skip these tests
left factor/try removal summary (this list needs updating):
identifier starts:
interval literal
character set literal
typed literals, multikeywords
identifier
app, agg, window
keyword function
issues in the special op internals
not between + other ops: needs new expression parsing
not in also
in suffix also
lots of overlap with binary and postfix multi keyword operators
quantified comparison also
issues in the typename parsing
dot in identifiers and as operator
issues in the symbol parser
hardcode all the symbols in the symbol parser/split?
conflict with in suffix and in in position
rules for changing the multi keyword parsing:
if a keyword must be followed by another
e.g. left join, want to refactor to produce 'expected "left join"'
if the keyword is optionally followed by another, e.g. with
recursive, then don't do this.
change join defaults to be defaults
rough SQL 2011 todo, including tests to write:
review the commented out reserved keyword entries and work out how to
fix
test case insensitvity and case preservation
big areas:
window functions
nested window functions
case
table ref: tablesample, time period spec, only, unnest, table, lateral
bug
joined table: partitioned joins
group by: set quantifier
window clause
other areas:
unicode escape, strings and idens
character set behaviour review
datetime literals
mixed quoting identifier chains
names/identifiers careful review
general value bits
collate for
numeric val fn
string exp fn
datetime exp fn
interval exp fn
rows
interval qualifier
with
setop
order/offset/fetch
search/cycle
preds:
between
in
like
similar
regex like?
null
normalize
match
overlaps
distinct
member
submultiset
period
alias for * in select list
create list of unsupported syntax: xml, ref, subtypes, modules?
---
after next release
medium term goals:
1. replace parser and syntax in hssqlppp with this code (keep two
separate packages in sync)
2. this replacement should have better error messages, much more
complete ansi sql 2011 support, and probably will have reasonable
support for these dialects: mssql, oracle and teradata.
review areas where this parser is too permissive, e.g. value
expressions allowed where column reference names only should be
allowed, such as group by, order by (perhaps there can be a flag or
warnings or something), unqualified asterisk in select list
fix the expression parser completely: the realistic way is to adjust
for precedence and associativity after parsing since the concrete
syntax is so messy. should also use this expression parser for
parsing joins and for set operations, maybe other areas.
table expression in syntax:
QueryExpr = Select SelectList (Maybe TableExpr)
and the TableExpr contains all the other bits?
change the booleans in the ast to better types for less ambiguity?
decide how to handle character set literals and identifiers: don't
have any intention of actually supporting switching character sets
in the middle of parsing so maybe this would be better disabled?
review places in the parse which should allow only a fixed set of
identifiers (e.g. in interval literals), keep in mind other
dialects and extensibility
decide whether to represent numeric literals better, instead of a
single string - break up into parts, or parse to a Decimal or
something
= future big feature summary
all ansi sql queries
completely working expression tree parsing
error messages, left factor
dml, ddl, procedural sql
position annotation
type checker/ etc.
lexer
dialects
quasi quotes
typesafe sql dbms wrapper support for haskell
extensibility
performance analysis
try out uu-parsing or polyparse, especially wrt error message
improvements
= stuff
try and use the proper css theme
create a header like in the haddock with simple-sql-parser +
contents link
change the toc gen so that it works the same as in haddock (same
div, no links on the actual titles
fix the page margins, and the table stuff: patches to the css?
release checklist:
hlint
haddock review
spell check
update changelog
update website text
regenerate the examples on the index.txt
= Later general tasks:
docs
add preamble to the rendered test page
add links from the supported sql page to the rendered test page for
each section -> have to section up the tests some more
testing
review tests to copy from hssqlppp
add lots more tests using SQL from the xb2 manual
much more table reference tests, for joins and aliases etc.?
review internal sql collection for more syntax/tests
other
----
demo program: convert tpch to sql server syntax exe processor
run through other manuals for example queries and features: sql in a
nutshell, sql guide, sql reference guide, sql standard, sql server
manual, oracle manual, teradata manual + re-through postgresql
manual and make notes in each case of all syntax and which isn't
currently supported also.
check the order of exports, imports and functions/cases in the files
fix up the import namespaces/explicit names nicely
ast checker: checks the ast represents valid syntax, the parser
doesn't check as much as it could, and this can also be used to
check generated trees. Maybe this doesn't belong in this package
though?
= other sql support
top
string literals
full number literals -> other bases?
apply, pivot
maybe add dml and ddl, source poses, quasi quotes
leave: type check, dialects, procedural, separate lexing?
other dialect targets:
postgres
oracle
teradata
ms sql server
mysql?
db2?
what other major dialects are there?
sqlite
sap dbmss (can't work out what are separate products or what are the
dialects)
here is an idea for a little feature:
crunch sql: this takes sql and tries to make it as small as possible
(basically, combining nested selects where possible and inlining
ctes)
expand sql:
breaks apart complex sql using nested queries and ctes, try to make
queries easier to understand in stages
check more of the formatting of the pretty printing and add regression tests for this
is there a way to get incremental parsing like attoparsec?

View file

@ -20,6 +20,7 @@ linkSection =
\<li><a href='haddock/index.html'>Haddock</li>\n\
\<li><a href=\"supported_sql.html\" class=\"bare\">Supported SQL</a></li>\n\
\<li><a href=\"test_cases.html\">Test cases</a></li>\n\
\<li><a href=\"contributing.html\">Contributing</a></li>\n\
\</ul>\n\
\<br />\n\
\<ul class=\"sectlevel1\">\n\

View file

@ -8,40 +8,152 @@
== Contributing to simple sql parser
Contributions are welcome. It's preferred if they follow some guidelines:
Guidelines:
If you add something to the public api, follow the pattern already set for haddock.
If something isn't ansi sql, add it under a dialect flag which isn't enabled in the ansi dialect.
If something isn't ANSI SQL, add it under a dialect flag which isn't enabled in the ANSI dialect.
If you add dialect flags, add them to the appropriate dialects, create a new one if it's a system which doesn't already have a dialect.
If you add dialect flags, add them to the appropriate dialects, create a new one if the dialect doesn't already exist. The current dialects are very much WIP, improve them if you see a gap you need fixing.
Testing
Add tests for anything you add, modify or fix.
Run all the preexisting tests and make sure they continue to pass.
Tests should provide examples of SQL that now parses and what it parses to.
Add tests for anything you add, negative tests are a good idea too - check something that's behind a dialect flag doesn't parse when disabled.
It's ideal if tests for something set with a dialect flag go in a test file for that dialect flag, unless it's an ansi feature that's disabled in other dialects. It's also an option to put tests in a test file dedicated to the dialect that the dialect flag was introduced for. But the current testing doesn't quite stick to this approach at the moment, it's not the worse thing about the codebase.
When it's ready, make a pull request on github.
== Key design notes
The parsing is done using the megaparsec library.
The parsing is done using the Megaparsec library. The parser uses a separate lexer, also implemented with Megaparsec.
The parser uses a separate lexer. I think this makes the code a lot simpler. It used to be a big speed boost over naively not using a separate lexer with parsec, I'm not sure this is still the case with megaparsec.
The dialect system was introduced as a way to deal with a messy problem. Users of the library are able to decide what to consider reserved keywords - this is better than asking them to modify the library source.
SQL comes in a huge variety of annoyingly different dialects. The aspirational goal is to have a dialect flag for each dialect that you pass to the parser and it parses that dialect and rejects things not in that dialect. This is a bit of a stretch since most major SQL systems have huge numbers of dialect options. One think you learn when writing a non toy SQL parser is you cannot hope to comprehensively support anything, you just have to do enough, and add bits you missed when you need them.
A tradeoff is all code that uses the library needs to be prepared to deal with/ignore parts of the abstract syntax which supports all features from all dialects.
A big tradeoff here is all code needs to be prepared to deal with the abstract syntax which supports all features from all dialects. I think the least unreasonable way you could fix this would be to have a system which generates dialect specific simple-sql-parser packages, which is still very unreasonable.
== Legal business
The system probably doesn't always pretty print in the right dialect from correct syntax. This might need some changes if it causes a problem.
All contributions remain copyright of the person who wrote them. By contributing to the main repository, including but not limited to via a pull request, the copyright holder agrees to license those contributions under the BSD 3-clause license. This includes all contributions already made to the project.
TODO: handling of keywords, and relationship with dialect
== Release checklist
TODO: tests overview in addition to the above
Check the version in the cabal file - update it if it hasn't already been updated. git grep for any other mentions of the version number that need updating.
TODO: how the website works, what it contains
Update the changelog, use git diff or similar to try to reduce the chance of missing anything important.
== Releasing
Run the tests (if any fail at the point of thinking about a release, then something has gone horribly wrong ...)
See the link:release_checklist.html[] for things that should be done before each release.
----
cabal test
----
Generate the website
----
make website
----
It's a bit wonky so try running it a second time if it fails.
Then:
* check the webpages appear nicely
* check all the tests are rendered on the example page -> need to find a robust way of doing this, because there are huge numbers and it's impossible to eyeball and tell if it's good unless you somehow spot a problem.
* check the examples on the main page to check if they need updating
Do the cabal checks:
----
cabal update
cabal outdated
cabal check
----
Update stack.yaml to the latest lts - check this page: https://www.stackage.org/ . While updating, check the extra-deps field, if there are any there, see if they can be removed.
Install latest stack and check it works - maybe the stack.yaml file needs a tweak, maybe the cabal file.
----
ghcup list
ghcup install stack [LATEST FROM THE LIST]
stack test
----
Run the tests on the previous 2 ghcs latest point releases, and the latest ghc, each with the latest cabal-install they support (e.g. as of the start of 2024, these three ghc versions are 9.8.1, 9.6.4, 9.4.8). This is now trivial to do with ghcup, amazing progress in Haskell tools in recent years.
Build the release tarball, run a test with an example using this tarball:
----
cabal sdist
mkdir temp-build
# get the path to the tar.gz from the output of cabal sdist
cp simple-sql-parser/main/dist-newstyle/sdist/simple-sql-parser-0.X.X.tar.gz temp-build
cd temp-build
cabal init -n
cp ../tools/SimpleSqlParserTool.hs app/Main.hs
----
Add these to the build-depends: for the Main in the new cabal file, temp-build.cabal:
----
simple-sql-parser == 0.X.X,
pretty-show,
text
----
Add a cabal.project file containing:
----
packages:
./
./simple-sql-parser-0.X.X.tar.gz
----
Run the test:
----
cabal run temp-build -- parse -c "select 1"
----
Example of output on success:
----
$ cabal run temp-build -- parse -c "select 1"
Build profile: -w ghc-9.8.1 -O1
In order, the following will be built (use -v for more details):
- simple-sql-parser-0.7.0 (lib) (requires build)
- temp-build-0.1.0.0 (exe:temp-build) (first run)
Starting simple-sql-parser-0.7.0 (lib)
Building simple-sql-parser-0.7.0 (lib)
Installing simple-sql-parser-0.7.0 (lib)
Completed simple-sql-parser-0.7.0 (lib)
Configuring executable 'temp-build' for temp-build-0.1.0.0..
Preprocessing executable 'temp-build' for temp-build-0.1.0.0..
Building executable 'temp-build' for temp-build-0.1.0.0..
[1 of 1] Compiling Main ( app/Main.hs, /home/jake/wd/simple-sql-parser/main/temp-build/dist-newstyle/build/x86_64-linux/ghc-9.8.1/temp-build-0.1.0.0/x/temp-build/build/temp-build/temp-build-tmp/Main.o )
[2 of 2] Linking /home/jake/wd/simple-sql-parser/main/temp-build/dist-newstyle/build/x86_64-linux/ghc-9.8.1/temp-build-0.1.0.0/x/temp-build/build/temp-build/temp-build
[ SelectStatement
Select
{ qeSetQuantifier = SQDefault
, qeSelectList = [ ( NumLit "1" , Nothing ) ]
, qeFrom = []
, qeWhere = Nothing
, qeGroupBy = []
, qeHaving = Nothing
, qeOrderBy = []
, qeOffset = Nothing
, qeFetchFirst = Nothing
}
]
----
TODO: hlint?, how to do a spell check, what about automatic code formatting?
If there are any non trivial changes to the website or api, upload a new website.
Upload candidate to hackage, run a test with example using this package
- don't remember how this works, but I think you'll do the same as testing the tarball locally, but don't copy the tarball or add a cabal.project file, after uploading the candidate I think you just need to do a 'cabal update', then the cabal build should find the candidate if you gave it the exact version.
If all good, release the candidate - a button on the hackage website.
Todo: try to turn as much of this into a script, with a nice report as possible, order this list properly, say what you need to check in more detail, say what else you need to redo if any steps need actions.

View file

@ -17,12 +17,11 @@ This is the documentation for version 0.7.0. Documentation for other
versions is available here:
http://jakewheat.github.io/simple-sql-parser/.
Status: covers a lot of queries already, but the public API is
probably not very stable, since adding support for all the
not-yet-supported ANSI SQL syntax, then other dialects of SQL is
likely to change the abstract syntax types considerably.
Status: usable for parsing a substantial amount of SQL. Adding support
for new SQL is relatively easy. Expect a little bit of churn on the AST
types when support for new SQL features is added.
Tested with GHC 9.8.1, 9.6.4, and 9.4.8.
This version is tested with GHC 9.8.1, 9.6.4, and 9.4.8.
== Feature support
@ -417,7 +416,7 @@ http://jakewheat.github.io/intro_to_parsing/ (TODO: this is out of date, hopeful
== Contributing
Contributions are welcome, there are some notes on these pages: link:contributing.html[], link:release_checklist.html[].
See link:contributing.html[].
== Links
@ -436,4 +435,4 @@ The simple-sql-parser is a lot less simple than it used to be. If you
just need to parse much simpler SQL than this, or want to start with a
simpler parser and modify it slightly, you could also look at the
basic query parser in the intro_to_parsing project, the code is here:
link:https://github.com/JakeWheat/intro_to_parsing/blob/master/SimpleSQLQueryParser0.lhs[SimpleSQLQueryParser].
link:https://github.com/JakeWheat/intro_to_parsing/blob/master/SimpleSQLQueryParser0.lhs[SimpleSQLQueryParser] (TODO: this is out of date, hopefully it will be updated at some point).

View file

@ -1,36 +0,0 @@
:toc: right
:sectnums:
:toclevels: 10
:source-highlighter: pygments
= Release checklist
Check the version in the cabal file - update it if it hasn't already been updated
Update the changelog, use git diff or similar to try to avoid missing anything important
run the tests
generate the website:
check the webpages appear nicely
check all the tests are rendered on the example page
check the examples on the main page to check if they need updating
run cabal update, cabal outdated. cabal check
update stack.yaml to latest lts, install latest stack, run stack build
run the tests on the previous 2 ghcs' latest point releases, and the latest ghc, each with the latest cabal-install they support
build the release tarball, run a test with an example using this tarball
if there are any non trivial changes, upload a new wesbite
upload candidate to hackage, run a test with example using this package
if all good, release the candidate
Todo: try to turn as much of this into a script, with a nice report as possible, order this list properly, say what you need to check in more detail, say what else you need to redo if any steps need actions