-
What is a script? What is a library? How should I structure my project?
Well, there’s no single answer. What you want is a structure that everyone is comfortable with, consider the following points:
- If you have to do a sequence of ls, cd, ls, cd commands, whenever you want to navigate to a file, or worse constantly use the find command then your structure is wrong. You should know where files are
- q is not a compiled language, so the runtime structure should look the same as the git structure, don’t be tempted to strip layers of folders out, reorganise files, rename things as part of your build process
- decide how many git repositories you want to have. is it 1? is it 20? if you have 4 or 5 repositories, chances are you’ve gone very wrong, you should have 1 or lots (20, 30). Either go with 1 repo for everything an deal with merge conflicts but don’t worry about component versions, or have 1 repo per component. If you’re getting started, then asks your git admin for 20 repos, call them myteamnamerepo1..20 for now. If your git admin/boss looks at you like you are crazy and starts talking about costs (in some places 20*small is more expensive than 1*large) then go with a single repo. If you have multiple repos, then under no circumstances should a dev EVER have to type git clone. You should have a wiki page with a 1 liner (even if its a bash for loop) that clones them all
- decide on folder structure and names (more on this later) and be consistent across repos
- is part of your codebase written by users, are quants giving you analytics. Try to give them a git repo separate to yours where they can do a pull request, which your devs then review. But try to chill when management insist that this is not allowed and that quants must email you code instead. Sometimes life makes zero sense.
- you WILL have other languages, such as BASH, python, java, c++ so make sure your layout works for all of them
Ok, so you have a good idea of the repo layout, you’ve talked to the team and gathered ideas. Now you need to decide on your folder layout. Come up with your solution, consider what the team has said but don’t take their ideas too seriously, come up with your solution, document it then ask them to challenge it. Why? Because if you ask someone else, unless they are going to spend a couple of hours answering you, it won’t include all the scenarios, or will just be what they are familiar with rather than being good. Remember, they probably don’t care, and will be happy with your solution, but ask them and they will have an opinion.
So, how do I layout my projects?
src/ test/
All the real code, the SUT is inside src/. This works well with various tools and other languages, it doesn’t have to be called src, but thats pretty common. All your tests live in test/. What would be a bad layout?
lib/ scripts/ q/ test/
This is a terrible layout. You want your unit tests to have the same folder structure as your code, which means that test/ now needs to have lib,scripts,q folders inside it, it’s not symmetrical which hurts productivity.
What about putting test folders inside lib, scripts, q?
Well now your release script changes from cp -r src/ … to something more complex as you don’t want test code in production(probably).
So what about other kinds of test?
src/lib src/scripts src/q test/lib test/scripts test/q test/integration
Just put them in the test/ folder with an appropriate name.
What about putting tests at the end of q files? We do it in python.
if __name__ == “__main__”: unit test stuff…
No, just no. This isn’t good in python, and it isn’t good in q. I know it’s all geeky cool, but no. So many times quants send me python code where they mispell the if statement line, or foget to include the if, but include the unit tests. Don’t you want to write good tests? If you do, then your test code will be longer than the code it tests, so it makes little sense to be in the same file. Also, remind me, when you are debugging and want to see all references to a function, how are you achieving this? Almost all q IDEs will be of no help, so you resort to
grep -r
How exactly are you excluding the test calls from the real ones in the result list?
If tests are in test/ you can do either of these commands to exclude tests
grep -r src #just don’t look inside test grep -r|grep -v /test/ #grep everything (helpful when you have multiple repos) then filter out test folders
TL;DR What is a good layout?
$ find . ./src/ ./src/lib/ ./src/lib/utils ./src/lib/utils/disk.q ./src/lib/utils/utils.q ./src/lib/permissions ./src/lib/permissions/permissions.q ./src/lib/feeds/exchange1.q ./src/lib/feeds/exchange2.q ./src/lib/feeds/exchange3.java #if this needs multiple classes then exchange3 should be a folder, or you can move feeds into src/ instead of lib, etc. ./src/lib/tp/tp.q ./src/lib/tp/u.q ./src/lib/rdb/rdb.q ./src/lib/rdb/r.q ./src/scripts/tp.q ./src/scripts/rdb.q ./src/bin/tp.sh #.sh is optional but be consistent and not that no extension is harder to grep for. ./src/bin/rdb.sh ./test ./test/lib/…… #every file/folder in src should have a file/folder in ./test with the same name, with unit tests ./test/integration/ ./test/health/ ./test/performance/
Why are there two tp.q’s?
One is the library the other is the script. There are two types of bugs:
- Ones where you just know the folder/file/function to look at, in order to fix it, and you don’t even need a running process to understand it
- Everything else where you don’t really have a clue. You want your new team members to be able to do some of that annoying support work right? So they are going to start here for every bug.
In the second scenario you don’t know where the issue is, so you want to start from the beginning. “The process started at X datetime, with these arguments, and then loaded Y, did Z, ….”. Finding bugs is not about reading the entire codebase, printing it out, memorising it and then saying AHA! Finding bugs is about not reading the codebase, I mean, reading the minimum number of lines. So we split our code.
lib/tp/tp.q
This is the library. A library just contains variables and functions, it doesn’t do anything when loaded. It doesn’t listen on a port, it doesn’t look process data, it just has definitions. It might define some table schemas(maybe audit.q defines .audit.defaultTable, but it doesn’t define `..audit or .audit.table), but mostly just variables and functions. So what functions are in tp.q lib?
A selection might be: .tp.sub:{…} .tp.pub:{…} .tp.upd:{…} .tp.endofday:{…} .tp.init:{…} /should either be the first or last function, not hidden in the middle
Ok, so what does scripts/tp.q look like?
Scripts pretty much DON’T define functions. Think of most languages other than kdb and web languages. They have a function called main right? Your main function should be in its own file, not in between a bunch of class definitions. So our example would look like this:
l ….framework.q /i might write a whole blog on bash and q pwd stuff, but for now, there is a framework.q file somewhere, every single process in your plant loads it. framework.q loads the utils that prettymuch every process needs like log.q, audit.q, utils.q, etc l ….libs/tp/tp.q /and u.q if it exists, hopefully you don’t have single letter filenames! .tp.init[.z.d;args`port] .log.info”TP ready”
These are just my opinions and what has worked for me. I hope to help people avoid some of the horror I have experienced. I’d be interested to know your layouts, and what you like/dislike about them.
Log in to reply.