## Archive for the ‘programming’ Category

### Visualizing Pandas GroupBy object

March 15, 2017

I am a beginner again, this time learning Python and Pandas. I am enjoying it quite a lot. For learning I write code in a Jupyter notebook and this post is actually written as one – converted to HTML with nbconvert. The quality of the conversion is rather bad, but this is probably the best one can do without adding custom CSS to this blog setup, which would require upgrading to WordPress.com Premium.

Development using Jupyter  is similar to how KDB+ coding is mostly done. In KDB+ one sends commands to a KDB+ server from a client like Studio for KDB+, getting an instant feedback on the result. Pandas is not as expressive and concise as q, but the style of a high-level API for vectorized data manipulation with avoidance of explicit iteration (loops) is similar.

One exception to the instant feedback rule in Jupyter and Pandas is the GroupBy object. To see what I mean let’s define a simple data frame from a dictionary of columns:

In [1]:
import pandas as pd
data = pd.DataFrame({'sym':['a','b','c'],
'price1':[100.0,150.0,130.0],
'price2':[110.0,150.0,120.0],
'vol1':[1000.0,1200.0,1300.0],
'vol2':[1500.0,1300.0,1100.0]})
data

Out[1]:
price1 price2 sym vol1 vol2
0 100.0 110.0 a 1000.0 1500.0
1 150.0 150.0 b 1200.0 1300.0
2 130.0 120.0 c 1300.0 1100.0

Grouping is more often done for rows (along the 0 axis), but this time we want to group columns (along axis=1). One group is made of the price1 and price2 columns, the second one groups vol1 and vol2 and the sym column forms its one element group. To do this we define a function that takes a column name and classifies it into one of three categories:

In [2]:
def classifier(column):
if column.startswith('price'): return 'price'
if column.startswith('vol'): return 'volume'
return 'sym'


Now we can group the columns using the classifier:

In [3]:
data.groupby(classifier,axis=1)

Out[3]:
<pandas.core.groupby.DataFrameGroupBy object at 0x00000048DE8A1CF8>

As we can see, the GroupBy object is not printed nicely (at least in Pandas 0.19.2 that I am using).
Of course, there are many ways to print it. One way that I found intuitive and useful is to first convert the GroupBy object to a dictionary of dataframes keyed by the classifier value. This can be done using the dictionary comprehension like {grp:subdf for grp,subdf in df.groupby(classifier,axis=1)}. The dictionary obtained this way can be passed to Panda’s concat function. concat puts the dictionary of dataframes together to get a single dataframe with multi-level columns. The additional column level clearly shows the structure of the original GroupBy object.

In [4]:
def groupCols(df,classifier):
return pd.concat({grp:subdf for grp,subdf in df.groupby(classifier,axis=1)},
axis=1)

groupCols(data,classifier)

Out[4]:
price sym volume
price1 price2 sym vol1 vol2
0 100.0 110.0 a 1000.0 1500.0
1 150.0 150.0 b 1200.0 1300.0
2 130.0 120.0 c 1300.0 1100.0

This trick also works for classifying rows if one uses axis=0 instead of axis=1 in a function similar to groupCols above.

### Cheap air quality monitoring

January 16, 2017

About a year ago I set up an air quality monitoring station based on the Air Quality Egg product by Wicked Device (see also my previous post on AQE). It was working during the winter season, providing real time data on PM2.5 concentrations near my house. This graph shows some of the data collected.

This worked OK, except that the data from the device were uploaded to Xively and available only from there. Xively was providing this service free as it was fulfilling an old commitment from one of their acquisitions. It was not high on their priority list and the service was frequently down. I decided to build my own device then to have full control over the process – from collecting the data to displaying them on a public web page. And to have some creative fun. The result works well (at least at the time of the writing) and cost less than \$80 in materials, including a Raspberry Pi that I used. I think building a PM2.5 monitoring station in a way similar to what I describe below would make an excellent high school project.

### Binary comprehensions in Erlang

September 19, 2014

In set theory there is a convenient notation for defining new sets, called set comprehension. For example when we have a set $A$ we can define a collection of singleton sets with elements from the set  $A$ as $\{ \{x\}:x\in A\}$. Sometimes vertical bar is used instead of the colon, and in Isabelle/ZF a single dot is used (something like $\{ \{x\}. x\in A\}$ parses successfully in Isabelle/ZF). In some programming languages a similar notation is used for lists. For example in Python one can write

[ [x] for x in [1,2,3] ]


to get a list of singleton lists and in Haskell

[ [x] | x <- [1,2,3] ]

gives the same.

Erlang also has a syntax for list comprehension, very similar to Haskell’s:

[ [X] || X <- [1,2,3] ]


The part X <- [1,2,3] above is called the generator expression.

Erlang has also something unique (as far as I know): binary comprehensions. This is a concept similar to the list comprehensions, but the dummy variable bound by the notation (x in the examples above) can range over binaries rather than lists. I found this very convenient when I was implementing Erlang interface to the  KDB+ IPC protocol.

### KDB+, Elm and web sockets

July 15, 2014

In the KDB+ implementation of the Conway’s Game of Life that I presented in my previous post there was one element missing – a GUI that would display the results of the simulation. Since I have been planning to have a look at Elm for a while, I checked if I can set up KDB+ talking to an Elm application and it turned out it is very easy to do – with web sockets.

Elm is a functional programming language that compiles to HTML, JavaScript and CSS code intended to run in a browser. The declared paradigm for Elm is Functional Reactive Programming. FRP is not exactly what it used to be back in the days of Conal Elliot and Paul Hudak’s original work. It is now more of a buzzword covering more and more semantic area. The key concept in the  Elm’s take on FRP are signals. A Signal a type in Elm represents a value of type a that changes in time. Signals can be thought of as streams of discrete events carrying values. They can be combined, filtered, counted and so on. Ultimately we obtain a value of type Signal Element that can be displayed in the browser (possibly in a div if one embeds an Elm application this way). The Elm site contains a very convincing argument on why all this is a good idea.

### Game of Life with Enterprise Components

May 20, 2014

DEVnet (the company I work for) has decided to release one of its products – Enterprise Components (called EC below) – as free software. Since I contributed some code to it I would like to advertise it a bit on this blog.

## What are Enterprise Components?

EC is a collection of libraries and tools that support building systems based on KDB+ – a database made by Kx Systems. Traditionally KDB+ has been mostly used for time series analytics in finance industry. Part of the reason for that was that the commercial license for KDB+ was very expensive. However, some two months ago Kx announced that they now allow commercial use of the free-of-cost 32-bit version of their product. This opened a path for KDB+ applications in areas well beyond institutions with deep pockets.

The main limitation of the 32-bit version is the amount of data it can handle – about 1GB. This limitation is per-process though and since KDB+ has a built-in inter-process communication protocol one can create very capable systems using the free version just running as many instances as needed.

Enterprise Components can be thought of as a layer above KDB+. EC provide logging, process management, configuration management, user access control, system monitoring and more – all that is needed if you want to build a larger scale system. Typical use cases are covered in the standard components provided with EC so that one can define a basic system with data feeds, in-memory database backed by on-disk journals and storage and archiving at the end of day with configuration only.

### Java, Mockito and ArgumentCaptor

September 3, 2013

I knew that day would come at some point. I am programming in Java. One thing that took me longer that I think it should was to figure out how to use Mockito‘s ArgumentCaptor to verify arguments with which methods are called. In this post I share some code snippets that do that, for my reference and hopefully to help other Java beginners.

### Scala

October 26, 2012

One thing that I like working for my company is that it tries to evaluate new technologies from time to time. About a year ago I got a task to learn Scala by writing a small internally used tool in it and share my impressions. The main question I was supposed to answer was: “Should we use Scala (instead of Java) to implement the next iteration of one of our products?”.  I had a feeling that my boss would have liked me to answer enthusiastically “yes!” to that question. And I really liked the language. However, my answer was “probably not”.

### Q – six months later

February 13, 2010

It has been six months since I started to earn my living by writing code in Q. I am still not a guru (a Q god as they call it in Q circles), but not a complete newbie either. (more…)