q: A Programming Language You’ve Never Heard of (And That’s A Bad Thing)

   There is no shortage of popular programming languages out right now, and q is not one of them.  

   Not too long ago Brooklyn College’s comp sci program changed its teaching language from C++ to Java. I was told in class that the department lost professors who didn’t want to transition to Java. I thought it was cool that Brooklyn College was teaching C++ while most colleges have been teaching Java for over 20 years. Often the transition of teaching languages went from Pascal to C to Java.  ecently other colleges such as Harvard and M.I.T. start students off with Python, which might be more useful because of its ubiquity in hot areas such as data science and machine learning, leaving Java as the most outsourced language out there. 

   Still, the great thing about most of these languages is that if you know one well, it is totally possible to pick another one up.  All you have to do learn some new syntax and libraries. You structure your code in loops, and other data structures in a top-down approach.  

   The q programming is 3GL, but very different from normal languages. It is an Array programming language like Lisp or APL.  This means instead of having data spoon-fed in a loop, the entire set of data gets operated on at once. Such scalar operations are similar to Professor Preston’s Linear Algebra course, but to a much larger scale.  

   In fact, q is just a wrapper language on top of another language called K. The K programming language was created in the 1990s by Arthur Whitney while working at Morgan Stanley. Like James Naismith who invented Basketball in Springfield, Massachusetts, Whitney is a Canadian investing in the U.S.  Actually you can say that about Brian Kernighan too, who is half the team that invented UNIX, C, AWK, and the other letter combinations known and loved by programmers.

   K is written in C and it is said to be faster than C. How can that be?  The difference is the philosophy of K – and by extension, q and by extension kdb+ (we’ll get to that shortly).  The philosophy is a minimalist approach to put it mildly. The average technological worker looks at a q code and asks “What the  . . . ?“ This makes q/kdb+ consultants attractive. The code looks like this: “10 {x,sum -2#x}/ 0 1” – that computes the Fibonacci sequence. The idea is you can write q quickly in a few short optimized lines versus writing 100 lines in C, which would be more error-prone and dependent on the efficiency of the developer. 

   This is where we get to kdb+ , which is the database. Kdb+ and q come packaged together.  It is the fastest time-series column stored database. It is used by finance corporations, big Pharma, NASA, and really anywhere time-series analysis needs to be done up to the nano-second.  An example you might see in your first job coming out with a CS degree is that KDB+ is the tick database used for trading, algo’s and analysis. If you want to be a Quant, you will come across kdb+.

   One of the beautiful things about kdb+ is that it has an 800kb memory footprint that lives in the L1/L2 caches in your processors.  This means that you can put it anywhere. You can put it in anything that runs a computer in it, which almost everything these days. Being that it’s vector programming, it takes advantage of the vector optimizations added to modern chip architecture. It can search 4 billion records per core and ingest 4.5 billion bulk events per second per core.

   Kdb/q has both a real-time database (RDB) and a historical database (HDB). The RDB keeps all the data in memory. One of the slowest parts of a computer is the read/write functionality that is constantly happening.  The RDB does have that program. Unlike other analytical environments, where there is column-based data (where you extract data with SQL then maybe have some middle transport layer say XML and then work on it using Java or C++), in kdb everything happens in q at once. The processes are flattened. Its minimalism improves speed and development time.

   If you take some event like a particle collision from the Higgs boson supercollider at CERN, where the first events are crucial, you need to be able to look at data on the nanosecond layer. You are building models to trade stock in real time, so you need to be able to analyze large data in 8 milliseconds. There really isn’t another good tool.