đźď¸ pillow-netpbm
Continuing on the journey of computing archaeology, itâs time to write a Pillow plugin that can load any images supported by netpbm but not by Pillow.
The first step here was to write a bridge for the anyto* applications, but it
turned out it wasnât so reliable - many ancient formats lack proper libmagic
detection, thereâs a lack of test data for them, and being obscure they didnât
get a lot of real world usage. So letâs fix as much of that as is possible.
Andrew Toolkit Raster

A nice easy file format to load, a properly registered MIME type - at least for
the data format itself (application/andrew-inset) - with test data available
in the projectâs source since the 90âs.
netpbm can load and save these, but file thinks all Andrew files are
LaTeX documents, and the example bitmaps are buried in the distro tree. So letâs
fix both of those things:
AutoCAD Slide

AutoCAD slides are vector screen dumps of AutoCAD created by MSLIDE and viewed with VSLIDE. While the specs have been removed from AutoDeskâs website, Martin Reddy has it documented too.
My test files, which were partly taken from
Robert Schultzâs collection,
were detected as data by file, which means no detection.
Its PRONOM entry lists
3 different MIME types (application/sld, application/x-sld and
image/x-sld), and AutoDesk never actually registered an
image/vnd.sld with IANA.
PRONOMâs first entry has a weak collision with application/vnd.ogc.sld+xml,
and the ambiguity has bled through into
Archive Teamâs wiki, and across
github
like in MegaMimes and
Vitam, presumably via
DNSCoreâs seed data repo.
Fixing as much of this as possible, I contacted PRONOM, asked for an account on
ArchiveTeamâs wiki, sent a PR to MegaMimes, submitted a libmagic a rules file,
updated WikiData sources with links to existing usage, started the IANA
registration process for image/vnd.sld to replace the image/x-sld I was
lobbying for. Once the IANA registration is complete and/or detection
rules, are in libmagic,
Apache Tika and
freedesktop shared-mime-info
will likely follow suit. If not, Iâll give them a nudge.
- đŞ libmagic rules submission
- đ IANA submission
- đ MegaMimes PR
- đ WikiData src links
- đ PRONOM correction ref: TNA1774192312Q50
Fiasco Wavelet (FIASCO)
Fiasco Wavelets 6-byte ASCII magic at offset 0.
0 string FIASCO Fiasco wavelet image data
!:ext wfa
MRF
4-byte ASCII magic at offset 0.
0 string MRF1 MRF image data
!:ext mrf
YBM Face File
Only 2 bytes (!!), may need extra validation to avoid false positives. YBM
files are 2 bytes of magic, 2 bytes width (LE), 2 bytes height (LE), then
bitmap data.
0 string !! YBM face image data
!:ext ybm
Misidentified formats
Atari Neochrome â wrongly matched as DEGAS
File reports Atari DEGAS Elite bitmap 320x200x16 for .neo files. Similar
Atari header layout but distinct format. Neochrome has a different structure
but no unique magic that distinguishes it within the first 16 bytes.
JBIG â wrongly matched as Targa
Standalone JBIG/BIE files are misidentified as Targa image data. Targa
detection in libmagic is notoriously loose. JBIG rules only exist within TIFF
compression detection, not for standalone .jbig/.bie files.
CompuServe RLE â âASCII text with escape sequencesâ
ESC-based encoding, hard to detect without pattern matching on escape sequences.
FaceSaver â âASCII textâ
Text-based header format. Could match on FirstName: or PicData: strings.
Sun Icon â âASCII textâ
Text-based C-style format. Could match on /* Format_version= string.
SBIG ST-4 â âISO-8859 textâ
Raw scientific image data misread as text.
Broken/incomplete rules
Interleaf
Existing Magdir rule uses \x88OPS but netpbmâs leaftoppm (and our test
file) expects \x89OPS. May be a format variant that needs adding to the
existing rule.
GEM Raster
Rules exist in Magdir but compete with DEGAS detection. Our valid GEM file comes back as âdataâ â the condition tree seems to exclude it.
MacPaint
Rules exist but require PNTGMPNT at offset 65 (resource fork magic).
Headerless/data-fork-only MacPaint files are not detected.
Group 3 Fax
Rules exist in Magdirâs modem file but raw G3 detection is
fragile/conditional. Our test file comes back as âdataâ.