Not remotely funny

Step 8 - Not remotely funny

This is the eighth step, adding out-of-process calling - each step adds or improves a small feature, this text is just highlighting a few details along the way, the best way to work through is to build and run each one, reading through the code with this description alongside. Similarly reading this alongside the diff for each step will also mean more than either just reading the text or just reading the code alone.

So far we're sticking to loading the marked up libdemo.so into the same process as the python interpreter. Now it gets interesting, what if we have multiple C++ libraries with clashing dependencies? Or the simple fact that if the C++ libary crashes our python interpreter will terminate. Since we have a simple clear C API at the interface, we could pipe that over an inter process comms (IPC) system into a separate process, and if we've got the plumbing right it should "just work" without needing any additional work to our demo library at all. The library runs in a separate server process, and a thin shim provides exactly the same C API to the Python binding forwarding every call over named pipes or any other mechanism.

The architecture

New components are added:

  • ipc/ - the wire protocol, serialisation, and pipe helpers
  • server/ - a standalone process that loads the library and handles requests
  • shim/ - a drop-in replacement for libxplat.so that the Python binding loads instead, this forwards each C API call to the server

The Python binding has no changes at all, it still calls all the same functions - XPLAT_invoke, XPLAT_createInstance, and so on, but now calling into libshim.so instead of libxplat.so.

The protocol

xplat_ipc_protocol.hpp defines the wire format. Every function in the C API maps to a Command opcode:

enum class Command : uint32_t {
    // Lifecycle
    CONNECT      = 1,
    LOAD_LIBRARY = 3,

    // Metadata queries
    GET_NUM_CLASSES        = 11,
    GET_CLASS_NAME         = 12,
    GET_CLASS_METHOD_COUNT = 14,
    GET_METHOD             = 15,
    // ...

    // Instances
    CREATE_INSTANCE  = 60,
    DESTROY_INSTANCE = 61,
    INVOKE           = 62,

    // Builders
    CREATE_STRUCT_BUILDER = 70,
    SET_BUILDER_FIELD     = 72,
    CREATE_ARRAY_BUILDER  = 80,
    SET_ARRAY_ELEMENT     = 84,
    // ...
};

Each message is a fixed-size header followed by a variable-length payload:

struct RequestHeader {
    uint32_t requestId;
    Command  command;
    uint32_t payloadSize;
};

struct ResponseHeader {
    uint32_t requestId;
    Status   status;
    uint32_t payloadSize;
};

Objects that live inside the server (instances, builders) are referred to by an opaque uint64_t handle, keeping the handle stable and avoiding dereferencing.

Serialisation

xplat_ipc_serialize provides WriteBuffer / ReadBuffer wrappers that know how to encode XPlatValue, handles, strings and metadata pointers into byte sequences. For example, serialising an invoke call writes: the class name, the handle of the instance, the method name, the argument count, then each XPlatValue in turn (with its type tag, followed by the payload for each). Similar to the type information, the XPlatValue is itself recursive - a Vector/Map value writes the array size followed by each element.

The server

xplat_server is a standalone executable.

On startup it - creates two named pipes (one for requests, one for responses) using the session ID it received as a command-line argument, opens the pipes and waits for the client to connect, finally enters the handler loop:

  1. read a RequestHeader
  2. read payloadSize bytes of payload
  3. decode the command
  4. call the real libxplat / libdemo API function
  5. serialise the result
  6. write a ResponseHeader and response payload.

The server resolves all of our C API symbols dynamically with dlopen/dlsym so it has no build time/link dependency on the built library. It presents our marked up library classes and functions, but doesn't need to know what the functions are.

bool loadLibrary(const std::string &libPath)
{
    m_libHandle = dlopen(libPath.c_str(), RTLD_NOW | RTLD_LOCAL);
    // resolve every symbol by name...
#define RESOLVE_SYM(name) \
    m_api.name = reinterpret_cast<decltype(m_api.name)>(dlsym(m_libHandle, #name));
    RESOLVE_SYM(registry_get);
    RESOLVE_SYM(XPLAT_getNumClasses);
    RESOLVE_SYM(XPLAT_invoke);
    // ... and so on
}

The server maintains maps from Handle → pointers for every object and builder it has created and passed, when it receives a DESTROY_INSTANCE command it erases the entry. On failure/termination the map can be iterated and flushed.

The shim

libshim.so exports exactly the same C API as libxplat.so, internally it holds a ShimClient singleton that:

  1. Generates a unique session ID (from getpid()).
  2. Creates the named pipes.
  3. Forks and execs xplat_server with the session ID.
  4. Opens the pipes and sends a CONNECT command to synchronise.

Every exported C API function then becomes a two-liner:

void *XPLAT_createInstance(const char *className)
{
    WriteBuffer buf;
    buf.writeString(className);
    auto [resp, payload] = ShimClient::instance().sendCommand(Command::CREATE_INSTANCE, buf);
    return reinterpret_cast<void *>(ReadBuffer(payload).readHandle());
}

Again *footnote ! :) this is not optimal and there is potential to make this more efficient, e.g. we can assign classes unique ID's on library load and use these rather than strings everwhere.... however, for development and debugging, strings are awesome!

The shim single translates pointer arguments to handles, forwards, and translates back on the way out, the "pointers" returned to Python are just the handle values (as mentioned before our ponters are effectively opaque handle objects, we don't dereference in Python at all)

Usage

From the Python side the only change we need is passing use_shim=True to bind_library and it kicks in:

bind_library('demo', use_shim=True)
from xplat import demo

obj = demo.Demo()          # -> CREATE_INSTANCE over pipe
print(obj.getInt())        # -> INVOKE over pipe
del obj                    # -> DESTROY_INSTANCE over pipe

Everything else - struct dicts, vector/map conversion, date handling - behave identically to the in-process mode, we're just jumping our C API calls across a process boundary, currently with named pipes, but there's no reason we couldn't feed through sockets and pass between machine boundaries.

Popular posts from this blog

seven month update

Tracking running Part #2

Capsure RM200 hacking