After introducing a second run of reference tests (with Windows-style
newlines), tests hit the 10m deadline very hard. Bump the timeout by a
lot to see how much is actually needed. The right fix is probably to
drop the race detector, since we don't have any parallelism and it
doesn't buy us much, only burns the CPU cycles, but that should be
addressed separately.