2015-01-30

Stable build environment using Docker

Docker is intended for application deployment, but it can also streamline development, as I learned when adopting it for Ninchat.

Build dependencies can be problematic. When deploying software on Linux, we can use the libraries and tools provided by the Linux distributions. But in order to build our application, we need to have the libraries and build tools available. For repeatable production builds, the versions should also be fixed. Keeping workstations in sync with the requirements may be inconvenient or impossible (e.g. not a Linux), so a separate build machine or a continuous integration server might be used. But that isn't helpful if we would like to compile incrementally while writing code, or run unit tests before committing changes to a version control system.

I'll show a simple way to set up a Docker environment for running builds and tests right from your source tree. Here's an example C program, hello.c:

#include <stdio.h>
#include <gperftools/tcmalloc.h>

int main(int argc, char **argv)

{
        printf("hello world\n");
tc_malloc_stats();
        return 0;
}

It will be built using this Makefile:

hello: hello.c
        $(CC) -o $@ $^ -ltcmalloc

clean::

        rm -f hello

Let's record its build dependencies (Ubuntu package names) to deps.txt:

build-essential
libgoogle-perftools-dev

We can create the build environment (a Docker image) using this Dockerfile:

FROM ubuntu:14.04

COPY deps.txt /tmp/


RUN apt-get update && \

    xargs apt-get -y install < /tmp/deps.txt && \
    apt-get clean && \
    rm /tmp/deps.txt

So let's create it:

$ docker build -t hello-env .

There you have a nice, unchanging system image that you can put in a Docker registry and share with your teammates.

Next we need two shell scripts to help us with it. in-docker launches a Docker container with the working directory mounted inside it (make it executable with chmod +x in-docker):

#!/bin/sh
uid=`id -u`
name=`whoami`
dir=`readlink -f .`
exec docker run --rm --tty --volume=$dir:$dir --workdir=$dir hello-env sh docker-setup.sh $uid $name $HOME "$@"

docker-setup.sh is run inside the container in order to create a user account before executing the actual build command:

uid=$1
shift
name=$1
shift
home=$1
shift
chown $uid:$uid $home
adduser --uid $uid --disabled-password --gecos "" --quiet $name
exec sudo --set-home -u $name "$@"

Finally, we can build the program:

$ ./in-docker make
cc -o hello hello.c -ltcmalloc
$ ./hello
./hello: error while loading shared libraries: libtcmalloc.so.4: cannot open shared object file: No such file or directory

The resulting binary can be found normally in the working directory, but since google-perftools is missing, we can't run it... except inside the container:

$ ./in-docker ./hello
hello world
------------------------------------------------
MALLOC:          16768 (    0.0 MiB) Bytes in use by application
MALLOC: +       933888 (    0.9 MiB) Bytes in page heap freelist
MALLOC: +        97696 (    0.1 MiB) Bytes in central cache freelist
MALLOC: +            0 (    0.0 MiB) Bytes in transfer cache freelist
MALLOC: +          224 (    0.0 MiB) Bytes in thread cache freelists
MALLOC: +      1142936 (    1.1 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =      2191512 (    2.1 MiB) Actual memory used (physical + swap)
MALLOC: +            0 (    0.0 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =      2191512 (    2.1 MiB) Virtual address space used
MALLOC:
MALLOC:             10              Spans in use
MALLOC:              1              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

Now, imagine a test suite which requires a database to run. Instead of installing a database on your laptop and resetting it to a known state before each run, we can bundle it in our build environment: install and prepare it in the Dockerfile, and start it in the docker-setup.sh script. Couldn't be simpler.

Finally, we probaby also want to use Docker to deploy the built program. One approach is to create a base image with runtime dependencies, and create the image with the build dependencies on it (instead of directly on Ubuntu). The final deployment image may then also be created on the same base image, without including the unnecessary build dependencies.

2015-01-23

Counting entries during insert/delete with PostgreSQL

Say we are storing some kind of entries in PostgreSQL, and also need to report changes in their amount. We'll be using this in the examples:

=> CREATE TABLE entries (
     id  SERIAL PRIMARY KEY,
     foo TEXT,
     bar TEXT
   );

The obvious thing to do is:

=> INSERT INTO entries (foo, bar) VALUES ('baz', 'quux') RETURNING id;
 id
----
 25

=> SELECT count(*) FROM entries;
 count 
-------
    13

But I like to avoid multiple commands when possible, so I came up with the following version:

=> INSERT INTO entries (foo, bar) VALUES ('baz', 'quux') RETURNING (SELECT count(*) FROM entries), id;
 count | id
-------+----
    12 | 25

The result is almost the same: since the subquery is executed before the insert, we have to add 1 to the count to get what we want.

The equivalent delete command:

=> DELETE FROM entries WHERE id = 25 RETURNING (SELECT count(*) FROM entries), *;
 count | id | foo | bar
-------+----+-----+------
    13 | 25 | baz | quux

Here we have to subtract 1 from the count.

I hope you find this information nugget helpful.

2013-10-04

Follow-up to Ninchat talk at Reaktor Dev Day

I just finished talking about the backend architecture of Ninchat, and noticed that some of the questions asked in the Dev Day's questions-for-the-speakers chat didn't make the cut, so I'll answer some of them here.

[Droptable-magician] Is there any IRC integration / gateways?


Work in progress (and highly experimental): https://github.com/ninchat/ninchat-python/tree/master/ninchat/ircd

[Droptable-cloaker] also, what is this colored ball next to the message bar?


I think it's a connection latency indicator.

[Startup-superstar] Should this channel appear in the Ninchat channel search ?


Yes, it's called "REAKTORDD Sali1."

[Bigdata-ninja] how about Erlang for connection servers?


It would probably be a perfect fit; no reason not to use it, but you have to choose one.

[Scrum-meisseli] Why C++ and not C?
[Scrum-meisseli] Ah, RAII, right?


Yes, and smart pointer templates in particular.

[Core-superstar] protocol buffers vs thrift?


Extending the original answer:  While Thift supports RPC in addition to data serialization, we don't have use for its simple request-response RPC model.

2012-03-20

Concurrency

Concrete has had infrastructure for I/O-driven scheduling of multiple threads of execution since last summer, but an API for actually launching concurrent functions was missing until now. The approach is similar to what StacklessEventlet and gevent (without forgetting CIO) do. All internal functions are implemented as continuations the states of which are stored in the portable arena. If the arena moves to another context while a continuation is waiting for I/O, the associated system resources will disappear and the I/O operation will be restarted (like as syscall).

There is no mechanism for Communicating yet, though, so there are just Sequential Processes for now. This green-thread approach (with single-threaded dispatch) may fit well as the method for implementing parallelism in Concrete since pressure is more on regulating than optimizing resource usage.

2012-02-03

Concrete Python on Native Client

This example page loads pre-compiled Python code and runs it using Native Client (if you enabled it in Google Chrome via chrome://flags).

Python's print() output shows up on the web page and Concrete's Trace() goes to JavaScript console.

Source.

2011-10-05

Prepare


Every once in a while I promise myself not to hack on build systems anymore.  Luckily this project is only loosely related to C/C++ building.  It generalizes the source code preprocessor I talked about in my last post.

The Python snippets in the template files may declare globally accessible classes and functions.  Only symbols which start with an uppercase letter are exported.  They may be called by producer templates which collect data and consumer templates which use that data.  Inter-template dependencies are automatically determined by analyzing symbol declarations and references.  Additionally, the producer/consumer functions have to be tagged with @producer/@consumer decorators so that data-collecting templates can be evaluated before the data-using ones.

Silly example

/* type.hy */
{{{
        types = []

        @producer
        def Struct(name):
                global types
                types.append(name)
                echo("struct {name}")

        @consumer
        def Types():
                global types
                return types
}}}

/* type.cy */
void dump_types(void)
{
        printf("Types:{{{ for name in Types(): echo(' {name}') }}}\n");
}

/* point.hy */
{{{ Struct("point") }}} {
        int x;
        int y;
};

2011-09-26

C++ source code generation with inline Python

While struggling with boilerplate code needed for Concrete's built-in Python object implementations and other repetitive structures that are involved in gluing dynamic and static worlds together, I decided to go ahead and generate some of the C++ source code files from templates (not to be confused with C++ template types). One alternative I considered was to use a full C++ parser for building an abstract syntax tree and to generate augmented C++ sources from that, but the options didn't seem too straightforward: clang 2.8 (included in Ubuntu) can't parse Concrete and the information I found about its AST output support wasn't encouraging. GCC-XML is based on a GCC version which is too old, and doesn't promise full C++ template support.

I use standard C pre-processor macros for some of the object boilerplate, like declaring copy constructors and such. Those macro invocations double as meta data which is parsed by the C++ source template processor. The template processor (implemented in Python) reads all standard and template source files, uses regular expressions to parse interesting bits into convenient data structures, and converts the template files into standard C++. The template files contain Python code snippets enclosed in {{{ and }}} which access the parsed data and output C++ code.

Example

Standard C++ header declares an object type with inheritance information (and implicit properties implied by the "default" macro flavor):

    class StringObject: public Object {
            CONCRETE_OBJECT_DEFAULT_DECL(StringObject, Object)
            // ...
    };

Template processor gathers that information:

    Objects = set()

    @parse(r"\s*CONCRETE_OBJECT_.*_DECL\((.+), (.+)\)")
    def ParseObject(filename, name, parent):
            Objects.add(Object(name, parent))


C++ source template generates code using that information:

    void InstantiateAllObjectsJustForTheFunOfIt()
    {
            {{{ for o in Objects:
                    lowername = o.name.lower()
                    echo("{o.name} {lowername}_inst;") }}}
    }

The generated C++ source would look like:

    void InstantiateAllObjectsJustForTheFunOfIt()
    {
            Object object_inst;
            StringObject stringobject_inst;

            // ...
    }


Complete real-world examples:

It would be cool to replace the C macros and the regex-parsing with more inline Python which populates the meta data structures directly.  That would make the template processor generic and the applications self-contained.

There's more

    for i in seq:
            if pred(i):
                    func(i)

and

    [func(i) for i in seq if pred(i)]

work but

    for i in seq if pred(i):
            func(i)

doesn't, and Python 3 doesn't accept

    for i in seq: if pred(i): func(i)

as a one-liner which would be really handy in templates.  But thanks to a regular expression you can write

    {{{ for i in seq if pred(i): func(i) }}}

in a Concrete template file.

2011-02-21

Concrete Android

I ported (or built an app around) Concrete for Android 2.2 over the weekend.  The app contains the bytecode of example.py and displays the value of the first argument passed to concrete.test() after running it.

Android SDK and NDK are pleasant to work with.

2011-01-28

Concrete Python

Concrete is an experiment on portable execution state; to make it possible to save the state of a running program at arbitrary point of execution without help from the client code, and restore it—possibly on a different kind of system.  I thought that it's less frustrating to develop a custom runtime environment instead of patching an existing one, and it gives greater flexibility to play with the execution model (which is, after all, the whole point).  So I decided to implement a yet another Python virtual machine.

The VM takes in a CPython 3.1 bytecode file and executes it so that the program's memory image is in a coherent state after every opcode: the image could theoretically be backed by a memory-mapped file, which could be loaded to a new VM instance.  Objects' contents are kept in little-endian byte order.  Pointers are 32 bits wide.

The implementation language is C++.  (Haters gonna hate.)  With it we get types with automatic byte order conversion on big-endian architectures and implicit reference counting of Python objects.  I've also played with C++0x type inference for kicks.  Oh, and the license is LGPL.

For now it supports just about enough opcodes to be able to import a built-in module, define functions, call them and add integers together.  The heap allocator is really dumb.  But it does it all in the Concrete way.

Further thinking

Since having to swap bytes on every object access (and reference/dereference!) on big-endian CPUs strikes as incredibly slow, another approach could be to mark the byte order used in the saved memory image and convert the objects while loading.  Coupled with garbage collection, heap compaction etc. this could be convenient.  Then again, with garbage collection the reference counting would be eliminated...  Yet another approach could be to convert objects on demand, but that would require conditional jumps—even on the vastly more popular little-endian CPUs.  But that could come as a bonus if the heap allocator implemented copy-on-write snapshots, which in turn could enable interesting new language constructs...

The plan is not to support standard Python modules (except pure-Python ones which don't depend on native ones).  Instead, the idea is to use custom I/O abstractions which are applicable to the transient scenarios.

Metablogging

I thought I'd write about my projects here whenever I have something to say.