š¼ļø Adventures in netpbm
Continuing on a journey of computing archaeology, itās time to extend Pillow with a plugin that can load images that are supported by netpbm.
The first step was a bridge for the anyto* applications, but it turned out it
wasnāt so reliable - the tools rely on file and many ancient and obscure
formats lack proper libmagic detection, some types of test data is hard to come
by. So I ended up having to write my own detection rules, which was a bit of a
pig.
I grabbed a bit of test data for formats I really care about (like seascape.iff), created synthetic test data for as many as I could using netpbm itself, pushed an alpha release to pypi and asked for it to be added to Pillowās docs, so everyone in Pythonland can open the majority of files with as little friction as possible.
This was the easy part.
Then it was time to find test data around the web, figure out detection rules
for data that file didnāt support, add functions where thatās infeasible, link
in MIME types wherever theyāre wrong around the web, expand my test coverage and
open bug reports and pull requests to help steer various projects towards
consistency, update wikidata where I could be bothered, submit new types and
amendments to PRONOM, find bugs in my bridge and CI pipeline. And of course
document the research and work here as I go, partly because I have a short
memory, partly a lack of interesting things to write about, and a short memory,
but also to link sources for future archivists, and a short memory, and also for
the trophy cabinet š.
Andrew Toolkit Raster

A nice simple file format to load, a properly registered MIME type - at least
for the data format itself (application/andrew-inset) - with test data
available in the projectās source since the
90ās. It has an
ArchiveTeam page
but didnāt have a Wikidata entry.
netpbm can load and save these, but file thinks all Andrew files are
LaTeX documents, and the example raster files are buried in the distroās source
tree.
Letās fix both of those things:
AutoCAD Slide

AutoCAD slides are vector screen dumps of AutoCAD created by MSLIDE and viewed with VSLIDE. While the specs have been removed from AutoDeskās website, Martin Reddy has a backup.
This is when I discovered Robert Schultzās awesome
test data collection,
which Iām now mining for test data. These were detected as data by file -
which means no detection rule. Its
PRONOM entry lists
3 different MIME types (application/sld, application/x-sld and
image/x-sld), and AutoDesk never actually registered an
image/vnd.sld with IANA. Unsurprising since other people registered their
other formats on their behalf, but they were registered and I always wanted to
do that. PRONOMās first entry has a weak collision with
application/vnd.ogc.sld+xml, and the bad MIME bleeds through into
Archive Teamās wiki, and across
github like in MegaMimes and
Vitam, presumably via
DNSCoreās seed data repo.
Fixing as much of this as possible, I contacted PRONOM, sent a PR to MegaMimes,
submitted a libmagic rules file, updated WikiData sources with links to existing
usage, started the IANA registration process for image/vnd.sld to replace the
image/x-sld Iāve been lobbying for. 40 years late, but better late than never
right?
Once the IANA registration is complete and/or detection rules are in libmagic, Apache Tika and freedesktop shared-mime-info will one day follow suit. If not, Iāll give them a nudge.
- šŖ libmagic rules
- š IANA submission šØ
- š MegaMimes PR
- š WikiData src links
- š PRONOM: TNA1774192312Q50
FIASCO (Fractal Image And Sequence Codec)

This novel fractal video format was created back in ā94-ā99 by Ullrich Hafner as part of his PhD thesis. It uses Weighted Finite Automata to compress the data, and the results are superb - it crushes other formats of the time at low bitrates, the above heavily compressed image would be unrecognisable as a 3.7K JPEG.
Thereās actually two file types here that share a similar magic: one is the
compressed binary data containing an image or some video, the other is an ASCII
basis dictionary used during compression. Writing an image loader means I only
care about the first type, but detection rules deserve to support both. netpbm
uses the .wfa extension for compressed files, while the standalone
FIASCO tools use .fco for both types.
So weāll document both in pillow-netpbm and detect by magic.
The files detect as data as they lack a detection rule. Thereās also no PRONOM
identifier and no MIME type, but since it is known by both
ArchiveTeam and
Wikidata, and was even mentioned in
Linux Journal, that warrants both
detection rules and a PRONOM identifier. TBH the format really deserved
wide support back in the 2000s, I suspect it would have out-performed DivX
during the era of early swashbuckling video if compression was faster.
So, letās fix the omission:
- šŖ libmagic rules
- šļø PRONOM: TNA1774396137A87
MRF (Monochrome Recursive Format)

Created by Russell Marks in 1997 for zgv, this simple monochrome bitmap compression format uses quadtree decomposition - images are divided into 64x64 tiles, each recursively subdivided until we end up with a single colour. Brian Raiter liked it enough to create a colour extension called PRF (Polychrome Recursive Format), which, unlike netpbm, is supported by both XnView and Konvertor.
MRF has no MIME type registered and nobody is using image/x-mrf anywhere in
the wild. Itās unsupported by libmagic and unknown by PRONOM. But again has a
Wikidata entry and an
ArchiveTeam page,
and of course Sembiance has
more test data
for us to play with.
So weāll pilfer his test data again and write a magic rule, use the rule in the detector, and link this one in to PRONOM like the others:
- šŖ libmagic rules
- šļø PRONOM: TNA1774653669Q59
YBM Face File

Created by Bennet Yee at CMU around 1988 for his face and xbm programs -
small monochrome portraits for UNIX user avatars. Jamie Zawinski and Jef
Poskanzer wrote the netpbm converters in 1991. The name doesnāt seem to be
official - āYBMā appears to be a netpbm convention for āYee BitMapā, to
distinguish it from X BitMap (XBM).
The format is simple: a 6-byte header (!! magic, BE 16-bit width, BE
16-bit height), then 1bpp bitmap data packed into 16-bit BE words with reversed
bit order. It has a Wikidata entry
and an ArchiveTeam page, but no
PRONOM identifier or MIME type. Sembiance has
test data, of course.
The 2-byte magic !! followed by two 16-bit big-endian ints is too generic for
a reliable libmagic rule, since it lacks a way to do bit-twiddling arithmetic to
make sure the file is the proper size. So rather than matching almost everything
starting with !! and rarely ever see a positive match, weāll skip submission
to libmagic. I did make a rule file anyway, and submitted the format details to
PRONOM for posterity.
- šŖ magic rule file
- šļø PRONOM: TNA1774654680B32
NEO (Atari NEOchrome)

NEOchrome was Atari Corporationās own paint program, released in 1985 by Dave Staugas and Jim Eisenstein. Images are planar 16 colour files very similar to Atari DEGAS but with extra header data and a size of exactly 32128 bytes.
Unfortunately, the header partially overlaps with DEGAS files. This causes false detection as DEGAS in libmagic and scrambles images in my own degas loader. netpbmās anytopnm doesnāt do automatic detection of DEGAS or NEOchrome images, which is also worth flagging.
The format has an ArchiveTeam entry
but no PRONOM identifier. Wikidata
lists both image/x-neo and image/x-neochrome as unregistered MIME types, but
I couldnāt find evidence of the latter being used; dexvert and ksquirrel use
image/x-neo. Sembianceās dexvert test data contains
14 test files for us
to plunder, including the one above.
Letās fix some of this:
- šļø release pillow-degas v0.2.0
- šŖ libmagic bug report
- šļø PRONOM: TNA1776854663K81
JBIG (Joint Bi-level Image experts Group)

JBIG1 is a black and white image
compression standard used for fax and scanned documents. The standalone file
format is called a BIE (Bi-level Image Entity). Files use .jbg, .jbig, or
.bie extensions - all the same format. The reference implementation is Markus
Kuhnās jbigkit.
image/x-jbig is the de-facto MIME used by both
freedesktop and
Apache Tika, itās in
PRONOM,
Wikidata, and
ArchiveTeamās wiki.
So, itās well-known, but files are misidentified by file as Targa image data
because like Targa thereās no magic string. I wrote a detection rule which is
too fragile for libmagic. Crappy file extension detection is all we have in
pillow-netpbm for now. Oh, and two of Sembianceās
JBIG test files are
mislabelled JPEGs, so I guess I can help there at least.
CompuServe RLE

Back in the 80s before GIF killed it, Compuserve had a 7-bit image format for use over serial links. Basically ESC G followed by M or H for medium or high resolution, then alternating lengths of off and on pixels with space being zero.
file identifies these as āASCII text with escape sequencesā, which they are.
But we can do better than that since ESC G isnāt used for anything else. Itās a
well-known format with no MIME type
and a missing magic rule, so I wonāt link to all the things, just add the magic
rule:
Usenix FaceSaver

FaceSaver was a system by Metron Computerware for capturing and distributing
grayscale face photos of USENIX
conference attendees, starting in 1987. Itās an internet message format text
file with headers like FirstName: and E-mail:, and hex-encoded pixel data as
the body. So, it detects as ASCII text. The above picture is
RMS attending USENIX 1990, the original file
contains his MIT AI lab email address - legendary piece of test data.
We have a Wikidata entry and a mention on ArchiveTeamās wiki, but no MIME, PRONOM, or libmagic rule. Easy enough to detect - letās fix the last two:
- šŖ libmagic submission
- šļø PRONOM: TNA1776855177C92
Sun Icon
![]()
The icon and cursor format for SunView
from the mid-1980s. These are identified as ASCII text by file, and have a
a C-style comment header (/* Format_version=1, Width=64, ... */) followed by
comma-separated hex values. Usually monochrome 64x64 bitmaps, though 8-bit
grayscale was added later for OpenWindows.
So, no libmagic rule, only icons in Sembianceās test data. No PRONOM identifier and no MIME type. Has a Wikidata entry and ArchiveTeam wiki page.
Letās submit a detection rule with image/x-sun-icon to match the existing
image/x-sun-rasterfile convention, add to PRONOM, steal some test data from an
early version of Solaris and update the wiki page.
- [šŖ libmagic submission] todo
- š¾ test data
- šļø PRONOM: TNA1776856152V67
SBIG CCDOPS

SBIG sold CCD
cameras for amateur astronomy starting in 1988. Their CCDOPS software used a
format with a 2048-byte ASCII header starting with ST-N Image(where N is the
camera model), followed by 14-bit or 16-bit little-endian pixel
data with optional delta compression. The header contains metadata in
Key = Value pairs separated by LF-CR (wut?!) line endings.
file identifies these as ādataā as thereās no libmagic rule. Thereās a
Wikidata entry and a sparse
ArchiveTeam page
that explicitly asks for expansion, but no PRONOM identifier or MIME type. The
separate Application Note spec has been offline for years since SBIG was bought
by Diffraction Limited, though the
CCDOPS Version 3.5 manual
is still hosted and contains the full Type 3 spec in Appendix A. A bit crappy of
them, but SBIG cameras use FITS instead nowadays - so whatever š¤·
Sembiance has a single sample to work from, and we can generate them with netpbm. So letās add a magic rule and a PRONOM entry:
- [šŖ libmagic submission] TODO
- šļø PRONOM TNA1776867677I97
Interleaf

Interleaf was an aerospace/defense
doc system bought out by BroadVision around 2000. The software is long gone, but
the image format has lived on in netpbmās leaftoppm and ppmtoleaf since ā94.
Magdir/interleaf matches the document format but not the image format, so
file thinks images are ādataā. Netpbm, Sebianceās
samples and
wikipedia
confirm a magic of \x89OPS for the images. Thereās an
ArchiveTeam page and
a Wikidata entry, but no PRONOM
identifier and no MIME type.
Letās fix it in PRONOM and file:
- [šŖ libmagic submission] TODO
- šļø PRONOM: TNA1776869372T15
GEM Raster

GEM was Digital Researchās graphical desktop environment, first released in 1985 for the IBM PC and later the Atari ST. Its raster image format (IMG) was used by GEM Paint and Ventura Publisher. The format is well-standardized for monochrome images, but color support spawned several incompatible variants: XIMG, STTT, TIMG, and HyperPaint.
GEM has an ArchiveTeam page,
a PRONOM entry, a
Wikidata entry, and libmagic rules
already exist in Magdir/images ā but theyāre broken. Every GEM
file comes back as ādataā. The root cause: GEM v1 images share a 0x0001
version word with Atari DEGAS mid-res bitmaps, and the existing rule tree nests
the GEM header-size checks as siblings of DEGAS file-size tests that use
>>-0 offset (end-of-file). This corrupts the offset context for the GEM
checks that follow ā they end up reading from past the end of the file instead
of from the header.
The fix is straightforward: give GEM v1 its own top-level 0 beshort 0x0001
entry, independent of DEGAS. Since this touches the same block as the
NEOchrome fix, Iāve combined both into a single updated
patch. Sembiance has a generous
collection of over 100
GEM files to test with, including XIMG and TIMG variants ā though netpbmās
gemtopnm only handles v1.
- šŖ libmagic patch (updated)
MacPaint

MacPaint was the original bitmap
editor for the Macintosh, created by Bill Atkinson and released in 1984. Images
are always 576x720 monochrome, PackBits compressed. The format comes in three
flavors: MacBinary-wrapped (128-byte finder header, PNTGMPNT type/creator at
offset 65), versioned (4-byte version number + 304 bytes of brush/pattern data),
and null-header (512 bytes of zeros before the pixel data).
libmagicās existing rule in Magdir/apple detects the MacBinary variant via
PNTGMPNT at offset 65, but the other two variants come back as ādataā. The
versioned files start with 0x00000002 or 0x00000003 followed by 0xFFFFFFFF
(default brush patterns) ā distinctive enough for a magic rule. The null-header
variant is indistinguishable from any other file that starts with zeros.
The format has a PRONOM entry,
a Wikidata entry, and uses
image/x-macpaint as its MIME type. Sembiance has a good
collection of 31
files across all three variants.
- [šŖ libmagic submission] TODO
Group 3 Fax

CCITT Group 3 is the standard fax compression format, using Modified Huffman coding for monochrome scan lines. Raw G3 files have no header, no magic bytes, and no signature of any kind ā just Huffman-coded pixel data starting with an EOL marker. The first byte varies depending on the content of the first scan line.
libmagicās existing rule in Magdir/modem attempts detection via 0x0100 or
0x1400 (common first EOL byte patterns), then runs a deep exclusion tree to
avoid false positives against TrueType fonts, DEGAS bitmaps, GEM images,
Panorama databases, and various other formats that happen to start with similar
bytes. It catches some files but misses most ā three of our four
Sembiance test files
come back as ādataā. Thereās no fix here that wouldnāt also produce false
positives; raw G3 is fundamentally undetectable by content.
netpbmās g3topbm only supports MH (Modified Huffman) compression, not MR or
MMR. Itās also worth noting that many ā.faxā files in the wild are actually
TIFF-wrapped G3, not raw ā Sembianceās CROW.FAX is a TIFF with
compression=bi-level group 3.
The format has a PRONOM entry,
a Wikidata entry, MIME type
image/g3fax, and an
ArchiveTeam page.
Extension-only detection is all we can do in pillow-netpbm.
The rest
So, this was the lowish hanging fruit and historical formats that kind of matter, thereās a bunch more that I couldnāt find test data for, but I think 3 weeks of shovelling reports onto other projects is enough. Itās slow work and puts an unfair burden on others, given the pace at which I work. So I think Iāll go back to silo-mode for a while - at least until some of the above are merged.