Posted on:
At some point, I've performed testing of one Go-based streaming JavaScript minifier for compatibility with real-world code.
I've decided to run it against CDNjs repo (https://cdnjs.com/libraries) as a source of all the possible popular libraries, which makes it a decent test suite for JavaScript/CSS tooling.
After narrowing the list to just JavaScript libraries, then only to latest version of each, and then to those that don't end with .min.js to avoid obviously pre-minified ones (and speed up testing), I've got 17,547 JavaScript files with a total weight of 771 MB.
Then, I've run them all against latest minify
binary and parsed with a full-featured (non-streaming) ES5 parser to compare ASTs before and after transformation to detect semantic mismatches or syntactically broken files.
You can see detailed results and findings on the original issue, but I'd like to stop on one interesting case.
Minification of moxie-html4.js from webshim library and further AST (format-agnostic) comparison revealed the following changes:
@@ -2736,12 +2736,12 @@
4: 'Flash',
9: 'Fine weather',
10: 'Cloudy weather',
11: 'Shade',
- 12: 'Daylight fluorescent (D 5700 - 7100K)',
- 13: 'Day white fluorescent (N 4600 -5400K)',
- 14: 'Cool white fluorescent (W 3900 - 4500K)',
- 15: 'White fluorescent (WW 3200 - 3700K)',
+ 12: 'Daylight fluorescent(D 5700-7100K)',
+ 13: 'Day white fluorescent(N 4600-5400K)',
+ 14: 'Cool white fluorescent(W 3900-4500K)',
+ 15: 'White fluorescent(WW 3200-3700K)',
17: 'Standard light A',
18: 'Standard light B',
19: 'Standard light C',
20: 'D55',
...
Clearly, something went wrong that caused significant spaces in strings to disappear.
But how do we figure out what exactly caused that, given that the rest of the file was correct? We can't just manually remove lines here and there in 74 KB minified file expecting that we will be lucky and reduce it quickly (believe me, I tried that first :) ).
Luckily, there is an automatic solution for that: creduce
Wait, you might say, but it's README says it's a reducer for C programs. Yeah, it happens to be primarily designed for C, but, by its nature, it can also work on JavaScript, Rust and any other C-like languages, even if they have custom syntax.
Basically, this tool tries to remove blocks ({ ... }
), expressions (identifiers, numbers, parenthesized expressions ( ... )
, etc.), comments, pieces of strings etc. as long as provided script still fails (erm, passes), resulting in a minimal reproducible test case.
First of all, we should store the original file for the base sample:
$ curl https://cdnjs.cloudflare.com/ajax/libs/webshim/1.16.0/minified/shims/moxie/js/moxie-html4.js -o html4.js
Now, creduce
expects an "interestingness test" - a script that returns 0
(succeeds) by default on our example, and fails otherwise. That means, we should fail when resulting code either doesn't parse or it's structurally equal to the original.
Let's install, for example, Esprima to parse scripts and use it for a simple one-off script that will perform the actual comparison:
#!/usr/bin/node
const { readFileSync } = require("fs");
const { parse } = require("esprima");
const { notDeepEqual } = require("assert");
let [ast1, ast2] = process.argv
.slice(2) // skip node executable and script filename
.map((file) => readFileSync(file, "utf-8")) // read given files as strings
.map((code) => parse(code)); // parse into ASTs
notDeepEqual(ast1, ast2); // ensure they're still not equal
Assume we saved it as compare.js
.
Now, we need to call it right after the minifier, so let's write a wrapper shell script (compare.sh
) for that (we could do that from Node as well, but this might be a bit easier):
#!/usr/bin/env bash
~/minify html4.js -o html4.min.js && ~/compare.js html4.js html4.min.js
Make sure you're using absolute paths for custom executables. creduce
will be generating reduced test cases with same file name as original in temporary folders (in order to parallelize reduction) and so you can't expect your script to be caled from the same working directory as you're launching creduce
in.
Time to ensure that it returns exit code 0 by invoking ./compare.sh
manually, and ready to go:
$ ./compare.sh
$ echo $?
0
$ creduce compare.sh html4.js
Now, after quite some time of waiting and log lines, creduce
ends up with this JavaScript in html4.js
:
( function ( )
{
try
{
if ( 0 ) /, /
}
catch ( e )
{
}
}
)
( "moxie/runtime/html5/image/ExifParsersRGBUncalibratedUnknownAverageCenterWeightedAverageSpotMultiSpotPatternPartialOtherDaylightFliorescentTungstenFlashFine
longitude" )
This is much better than original, and, while there are few more small things we could still reduce manually, it already reveals the problem - indeed, it's the most common problem in JavaScript parsing where context affects tokenizations and /
after )
can be parsed either as a division or a regular expression depending on tokens before the matching )
.
Depending on that, it's either safe to remove space after /
in if ( 0 ) /, /
example or not - and in the given failure the tested minifier removes it despite this construct being, in fact, a regular expression.
As you can see, creduce
is quite a powerful tool for manual reductions before trying to fix or report issues to language tooling - where it's C, Rust, JavaScript or, who knows, CSS.
More posts: