Coding: Bring the Noise

%26lt;0011 Em

A binary serialization mixin for Python: Metaclasses (Part 3)

leave a comment »

Like part 2, this post will focus on points three and five:

  1. Many values fit in one or two bytes at most – booleans fit in a bit, color channels fit in 1 byte. Any solution needs to allow bit-width specification.
  2. However, I shouldn’t need to specify the bit-width of every value. Reasonable defaults should exist that can be dropped in where appropriate.
  3. Serialization shouldn’t get in the way, and I shouldn’t have to give it much thought when I write my classes.
  4. Of course, I should be able to drop into the details when a class requires custom serialization.
  5. Performance overhead, both time and space, should be as low as possible. Per-instance overhead is almost entirely unacceptable.
  6. Serializable classes should be nestable, and the performance overheads for nested classes should be as low as possible.

This post we’re only going to create a metaclass that leverages some of the magic of the Field class – next post we tackle serialization-specific work in the metaclass.

Where were we?

We had just constructed Field, a Descriptor class which wrapped the init args to a class to create a factory of sorts:

from functools import wraps

class Field:
    def __init__(self, cls, *args, **kwargs):
        '''
        Allows us to defer instantation of an attribute
        of a class from class creation
        '''
        self.cls = cls
        self.args = args
        self.kwargs = kwargs

    def __get__(self, obj, cls):
        return obj.__dict__[self.name]

    def __set__(self, obj, value):
        obj.__dict__[self.name] = value

    @property
    def instance(self):
        return self.cls(*(self.args), **(self.kwargs))

def f(cls):
    '''Wraps a class so it can Fieldatized'''
    @wraps(cls)
    def init(*args, **kwargs):
        return Field(cls, *args, **kwargs)
    return init

And this was the target implementation that we’re trying to facilitate:

class Channel(BASE_CLASS_X):
    serializable_format = 'uint:8'

class Color(BASE_CLASS_Y):
    r = f(Channel)(100)
    g = f(Channel)(101)
    b = f(Channel)(102)

color = Color(r=201, g=202, b=203)
color2 = Color()

Before we define either base class above, we’re going to need a metaclass to do some of the heavy lifting. Those base classes are there to curtain off some of the magic, so users don’t need to worry about how we’re injecting their fields when they subclass Serializable.

As a reminder, all of this code is using Python3.3 – adapting it to work with 2.7 is possible, but when ordering of fields matters it can be a pain. Here’s a metaclass that notifies us when it creates a new class, but otherwise does nothing:

class MetaNotifier(type):
    def __new__(cls, name, bases, attrs):
        print("I'm making a new class named " + name)
        return super().__new__(cls, name, bases, attrs)

class Foo(metaclass=MetaNotifier):
    name = "Bill"
    def __init__(self):
        print("I don't get called in this example")
    def random_func(self):
        print("That was unexpected")

print("End of example.")

We never created an instance of Foo, so its __init__ method wasn’t called. The __new__ method in MetaNotifier was passed the name of the class Foo that we wanted to create, along with some other information about the class. It then returned a class object that Foo was assigned. Let’s break down what each of those fields is:

– cls: The metaclass we’re using. This is the self of a metaclass, not the class we’re creating.
– name: The name of the class we’re creating.
– bases: Tuple of the base objects of the class we’re making. In our example, this is (object,) because the assumed base when none is declared is object.
– attrs: dictionary of attribute names => attribute values of the class we’re building. In the example above, this attrs would be:

{
  '__init__': <function at 0xsomething>,
  'name': 'Bill',
  'random_func': <function at 0xother>
}

Notice that we don’t have the ordering of the attributes as we declared them. This is unacceptable for serialization, since we need consistent ordering. Luckily, Python 3 provides another method on the metaclass that lets us specify a structure to store attributes in, and we can use an ordered dictionary!

import collections

class MetaNotifier(type):
    @classmethod
    def __prepare__(cls, name, bases):
        return collections.OrderedDict()

    def __new__(cls, name, bases, attrs):
        print("I'm making a new class named " + name)
        return super().__new__(cls, name, bases, attrs)

now the dictionary we get back is:

OrderedDict({
  'name': 'Bill',
  '__init__': <function at 0xsomething>,
  'random_func': <function at 0xother>
})

Perfect. Inside of the __new__ function in MetaNotifier, we can manipulate these values – insert new values, change existing values, remove them entirely. Notice for example, that we could entirely remove any __init__ method that’s defined for the class, or replace it with our own. To help with the field magic, we’re going to be modifying a decent amount of this dictionary, including the init function.

Hacking up the class dictionary

For the next few snippets, we’re assuming that we are inside of the __new__ function above, before the return statement. The first thing we need is a(n ordered) dictionary of all the attributes that are instances of our special Field class. Here’s how we can do that:

declared_fields = collections.OrderedDict()
for attr_name, attr_val in attrs.items():
    if isinstance(attr_val, Field):
        declared_fields[attr_name] = attr_val
        attr_val.name = attr_name
attrs['_declared_fields'] = declared_fields

Oh, that’s nice. If we find a field, the first line will add it to our collection of fields – not too exciting. The second line, however, takes care of one of the open questions from part 2 – how do we set the field’s name without repeating the string everywhere? Since the keys of the attribute dictionary are the names the attributes are assigned to, we can hand them over to the attributes they’ll be named! That line gives the descriptor the name it will use when storing values in the instance dictionary. Finally, we push that dictionary of Fields back into attrs, so that instances of the class can get at the raw field objects if they need to, without being forced through those fields’ __get__ and __set__ methods.

Next, we’re going to wrap the init function of the class so that we create real versions of the Fields they declared in the class.

noop_init = lambda self, *a, **kw: None
real_init = attrs.get('__init__', noop_init)
def fake_init(self, *args, **kwargs):
    for field_name, field in self._declared_fields.items():
        setattr(self, field_name, field.instance)
    real_init(self, *args, **kwargs)
attrs['__init__'] = fake_init

The first two lines try to find an existing init method, and if there isn’t one, we swap in a function that does nothing. Then, we go through each field in the _declared_fields dictionary (which was created above, sort of – inside of the init function, _declarative_fields is already tied to the class so we can get at it) and for each one, we create an instance (of the class passed to that field, with its args and kwargs) and set that on the object instance we initializing using setattr. We use setattr because this will trigger the __get__ function on the field that is mapped to self.field_name. Finally, we call real_init, which is the init function they’ve defined (if any) or our noop function and push that init function back into attrs.

The last section we’ll add is a reasonable default for __str__ if one isn’t provided:

if '__str__' not in attrs:
    def str_(self):
        cls_name = self.__class__.__name__
        fmt = lambda item: '{}={}'.format(*item)
        decl_iter = ', '.join(map(fmt, self._declared_fields.items()))
        return '{}({})'.format(cls_name, decl_iter)
    attrs['__str__'] = str_

Which gives…

Here’s the final metaclass we’ll be using for classes with declarative fields:

import collections
noop_init = lambda *a, **kw: None


class DeclarativeMetaclass(type):
    @classmethod
    def __prepare__(cls, name, bases):
        return collections.OrderedDict()

    def __new__(cls, name, bases, attrs):
        declared_fields = collections.OrderedDict()
        for attr_name, attr_val in attrs.items():
            if isinstance(attr_val, Field):
                declared_fields[attr_name] = attr_val
                attr_val.name = attr_name
        attrs['_declared_fields'] = declared_fields

        real_init = attrs.get('__init__', noop_init)
        def fake_init(self, *args, **kwargs):
            for field_name, field in self._declared_fields.items():
                setattr(self, field_name, field.instance)
            real_init(self, *args, **kwargs)
        attrs['__init__'] = fake_init

        if '__str__' not in attrs:
            def str_(self):
                cls_name = self.__class__.__name__
                fmt = lambda item: '{}={}'.format(*item)
                decl_iter = ', '.join(map(fmt, self._declared_fields.items()))
                return '{}({})'.format(cls_name, decl_iter)
            attrs['__str__'] = str_


        return super().__new__(cls, name, bases, attrs)

Kicking the tires

Let’s test it out (along with our Field class defined at the top of this post):

class Channel:
    def __init__(value, enabled=True):
        self.value = value
        self.enabled = enabled
    def __str__(self):
        return 'Channel({}, enabled={})'.format(self.value, self.enabled)

class Color(metaclass=DeclarativeMetaclass):
    r = f(Channel)(100)
    g = f(Channel)(101, False)
    b = f(Channel)(102)
    a = f(Channel)(103, enabled=False)

    def __str__(self):
        return 'Color(r={}, g={}, b={}, a={})'.format(self.r, self.g, self.b, self.a)

color1 = Color()
color2 = Color()

print("Created two colors, should be the same:")
print(color1)
print(color2)
print()

color1.r.value = 200
color2.a.enabled = True

print("Even though we didn't explicitly initialize different Channel objects,")
print("changes in one color object don't reflect in the other:")
print(color1)
print(color2)
print()

That’s it for metaclasses and fields. Next post, we subclass DeclarativeMetaclass to start adding our serialization layer in. This will primarily be computing the serialization_format string and creating a subset of _declared_fields that are serializable, in case some declared fields aren’t serializable.

Advertisements

Written by delwinna

May 19, 2013 at 8:48 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: