🖼️ pillow-netpbm

Continuing on the journey of computing archaeology, it’s time to write a Pillow plugin that can load any images supported by netpbm but not by Pillow.

The first step here was to write a bridge for the anyto* applications, but it turned out it wasn’t so reliable - many ancient formats lack proper libmagic detection, there’s a lack of test data for them, and being obscure they didn’t get a lot of real world usage. So let’s fix as much of that as is possible.

Andrew Toolkit Raster

atk

A nice easy file format to load, a properly registered MIME type - at least for the data format itself (application/andrew-inset) - with test data available in the project’s source since the 90’s.

netpbm can load and save these, but file thinks all Andrew files are LaTeX documents, and the example bitmaps are buried in the distro tree. So let’s fix both of those things:

AutoCAD Slide

sld

AutoCAD slides are vector screen dumps of AutoCAD created by MSLIDE and viewed with VSLIDE. While the specs have been removed from AutoDesk’s website, Martin Reddy has it documented too.

My test files, which were partly taken from Robert Schultz’s collection, were detected as data by file, which means no detection. Its PRONOM entry lists 3 different MIME types (application/sld, application/x-sld and image/x-sld), and AutoDesk never actually registered an image/vnd.sld with IANA. PRONOM’s first entry has a weak collision with application/vnd.ogc.sld+xml, and the ambiguity has bled through into Archive Team’s wiki, and across github like in MegaMimes and Vitam, presumably via DNSCore’s seed data repo.

Fixing as much of this as possible, I contacted PRONOM, asked for an account on ArchiveTeam’s wiki, sent a PR to MegaMimes, submitted a libmagic a rules file, updated WikiData sources with links to existing usage, started the IANA registration process for image/vnd.sld to replace the image/x-sld I was lobbying for. Once the IANA registration is complete and/or detection rules, are in libmagic, Apache Tika and freedesktop shared-mime-info will likely follow suit. If not, I’ll give them a nudge.


Fiasco Wavelet (FIASCO)

Fiasco Wavelets 6-byte ASCII magic at offset 0.

0       string  FIASCO  Fiasco wavelet image data
!:ext   wfa

MRF

4-byte ASCII magic at offset 0.

0       string  MRF1    MRF image data
!:ext   mrf

YBM Face File

Only 2 bytes (!!), may need extra validation to avoid false positives. YBM files are 2 bytes of magic, 2 bytes width (LE), 2 bytes height (LE), then bitmap data.

0       string  !!      YBM face image data
!:ext   ybm

Misidentified formats

Atari Neochrome → wrongly matched as DEGAS

File reports Atari DEGAS Elite bitmap 320x200x16 for .neo files. Similar Atari header layout but distinct format. Neochrome has a different structure but no unique magic that distinguishes it within the first 16 bytes.

JBIG → wrongly matched as Targa

Standalone JBIG/BIE files are misidentified as Targa image data. Targa detection in libmagic is notoriously loose. JBIG rules only exist within TIFF compression detection, not for standalone .jbig/.bie files.

CompuServe RLE → “ASCII text with escape sequences”

ESC-based encoding, hard to detect without pattern matching on escape sequences.

FaceSaver → “ASCII text”

Text-based header format. Could match on FirstName: or PicData: strings.

Sun Icon → “ASCII text”

Text-based C-style format. Could match on /* Format_version= string.

SBIG ST-4 → “ISO-8859 text”

Raw scientific image data misread as text.

Broken/incomplete rules

Interleaf

Existing Magdir rule uses \x88OPS but netpbm’s leaftoppm (and our test file) expects \x89OPS. May be a format variant that needs adding to the existing rule.

GEM Raster

Rules exist in Magdir but compete with DEGAS detection. Our valid GEM file comes back as “data” — the condition tree seems to exclude it.

MacPaint

Rules exist but require PNTGMPNT at offset 65 (resource fork magic). Headerless/data-fork-only MacPaint files are not detected.

Group 3 Fax

Rules exist in Magdir’s modem file but raw G3 detection is fragile/conditional. Our test file comes back as “data”.