Software Carpentry logo

Binary Data

April 24, 2010: We are pleased to announce that Version 4 of this course is now under development. For updates and an early peek at the content, please check out the Software Carpentry blog at http://www.software-carpentry.org/blog/.

1) Introduction

2) You Can Skip This Lecture If...

3) Why Binary?

4) How Numbers Are Stored

5) Two's Complement

Two's Complement

Figure 19.1: Two's Complement

6) Bitwise Operators

Name Symbol Purpose Example
And & 1 if both bits are 1, 0 otherwise 0110 & 1010 = 0010
Or | 1 if either bit is 1 0110 | 1010 = 1110
Xor ^ 1 if the bits are different, 0 if they're the same 0110 ^ 1010 = 1100
Not ~ Flip each bit ~0110 = 1001

Table 19.1: Bitwise Operators in Python

7) Bit Operator Examples

def format_bits(val, width=1):
    '''Create a base-2 representation of an integer.'''
    result = ''
    while val:
        if val & 0x01:
            result = '1' + result
        else:
            result = '0' + result
        val = val >> 1
    if len(result) < width:
        result = '0' * (width - len(result)) + result
    return result

tests = [
    [ 0, None, '0'],
    [ 0, 4,    '0000'],
    [ 5, None, '101'],
    [19, 8,    '00010011']
]

for (num, width, expected) in tests:
    if width is None:
        actual = format_bits(num)
    else:
        actual = format_bits(num, width)
    print '%4d %8s %10s %10s' % (num, width, expected, actual)
   0     None          0          0
   0        4       0000       0000
   5     None        101        101
  19        8   00010011   00010011

8) Shifting

9) Cautions

10) Setting and Clearing Bits

Setting and Clearing Bits

Figure 19.2: Setting and Clearing Bits

11) Bit Flags

Using Bits to Record Sets of Flags

Figure 19.3: Using Bits to Record Sets of Flags

#            hex     binary
MERCURY    = 0x01  # 0001
PHOSPHORUS = 0x02  # 0010
CHLORINE   = 0x04  # 0100

# Sample contains mercury and chlorine
sample = MERCURY | CHLORINE
print 'sample: %04x' % sample

# Check for various elements
for (flag, name) in [[MERCURY,    "mercury"],
                     [PHOSPHORUS, "phosphorus"],
                     [CHLORINE,   "chlorine"]]:
    if sample & flag:
        print 'sample contains', name
    else:
        print 'sample does not contain', name
sample: 0005
sample contains mercury
sample does not contain phosphorus
sample contains chlorine

12) Floating Point

Floating Point Representation

Figure 19.4: Floating Point Representation

13) Floating Point Spacing

Uneven Spacing of Floating-Point Numbers

Figure 19.5: Uneven Spacing of Floating-Point Numbers

14) Floating Point Roundoff

15) Binary I/O

16) Binary I/O Mode

import sys
print sys.platform
for mode in ('r', 'rb'):
    f = open('open_binary.py', mode)
    s = f.read(40)
    f.close()
    print repr(s)
cygwin
'import sys\r\nprint sys.platform\r\nfor mode'
linux
'import sys\nprint sys.platform\nfor mode in '

17) Packing and Unpacking

C Storage vs. Python Storage

Figure 19.6: C Storage vs. Python Storage

18) Packing Data

Packing Data

Figure 19.7: Packing Data

19) Unpacking Data

20) The struct Module

import struct

fmt = 'hh' # two 16-bit integers
x = 31
y = 65
binary = struct.pack(fmt, x, y)
print 'binary representation:', repr(binary)
normal = struct.unpack(fmt, binary)
print 'back to normal:', normal
binary representation: '\x1f\x00A\x00'
back to normal: (31, 65)

21) Hexadecimal Characters

22) Format Specifiers

Format Meaning
"c" Single character (i.e., string of length 1)
"B" Unsigned 8-bit integer
"h" Short (16-bit) integer
"i" 32-bit integer
"f" 32-bit float
"d" Double-precision (64-bit) float
"2" String of fixed size (see below)

Table 19.2: Packing Format Specifiers

23) Notes on Binary Format Specifiers

24) Calculating Sizes

25) Endianness

import struct

packed = struct.pack('4c', 'a', 'b', 'c', 'd')
print 'packed string:', repr(packed)

left16, right16 = struct.unpack('hh', packed)
print 'as two 16-bit integers:', left16, right16

all32 = struct.unpack('i', packed)
print 'as a single 32-bit integer', all32[0]

float32 = struct.unpack('f', packed)
print 'as a 32-bit float', float32[0]
packed string: 'abcd'
as two 16-bit integers: 25185 25699
as a single 32-bit integer 1684234849
as a 32-bit float 1.67779994081e+22

26) Packing Variable-Length Data

Packing a Variable-Length Vector

Figure 19.8: Packing a Variable-Length Vector

def pack_vec(vec):
    buf = struct.pack('i', len(vec))
    for v in vec:
        buf += struct.pack('i', v)
    return buf

27) Unpacking Variable-Length Data

def unpack_vec(buf):

    # Get the count of the number of elements in the vector.
    int_size = struct.calcsize('i')
    count = struct.unpack('i', buf[0:int_size])[0]

    # Get 'count' values, one by one.
    pos = int_size
    result = []
    for i in range(count):
        v = struct.unpack('i', buf[pos:pos+int_size])
        result.append(v[0])
        pos += int_size

    return result

28) Dynamic Formats

def pack_strings(strings):
    result = ''
    for s in strings:
        length = len(s)
        format = 'i%ds' % length
        result += struct.pack(format, length, s)
    return result

29) Unpacking Dynamic Formats

def unpack_strings(buf):
    int_size = struct.calcsize('i')
    pos = 0
    result = []
    while pos < len(buf):
        length = struct.unpack('i', buf[pos:pos+int_size])[0]
        pos += int_size
        format = '%ds' % length
        s = struct.unpack(format, buf[pos:pos+length])[0]
        pos += length
        result.append(s)
    return result

30) Metadata

31) Metadata File Structure

Structure of a Binary File With Metadata

Figure 19.9: Structure of a Binary File With Metadata

32) Packing with Metadata

def store(outf, format, values):
    '''Store a list of lists, each of which has the same structure.'''
    length = struct.pack('i', len(format))
    outf.write(length)
    outf.write(format)
    for v in values:
        temp = [format] + v
        binary = struct.pack(*temp)
        outf.write(binary)

33) Unpacking with Metadata

def retrieve(inf):
    '''Retrieve data from a self-describing file.'''
    data = inf.read(struct.calcsize('i'))
    format_length = struct.unpack('i', data)[0]
    format = inf.read(format_length)
    record_size = struct.calcsize(format)
    result = []
    while True:
        data = inf.read(record_size)
        if not data:
            break
        values = list(struct.unpack(format, data))
        result.append(values)
    return result

34) Testing

35) Summary