Curious line-endings in FTP

Whilst hurriedly implementing basic FTP support in a program that’s due in a couple of days, I ran into a strange phenomenon:

  • Retrieving ftp://login:password@server/data.csv, a multiline text file, will return the file intact.
  • Retrieving ftp://login:password@server/data.dat, another multiline text file, won’t: all the data will be on one line.

The code simply uses Java’s builtin URL handling:

URL url = new URL("ftp://login:password@server/data.dat");
InputStream inputStream = URL.openStream();
//read data from inputStream

And the data is hosted on a VMS server to which I have no other access.

Fetching the file using the FTP command line program works successfully. But the data returned in Java contains no end-of-line bytes of any kind.

The obvious culprit is the binary/ascii dichotomy. FTP contains functionality to translate ASCII files between different line-ending conventions. And in this case, it seems that the default mode to use is inferred from the filename in the URL: if it ends in .csv, it’s “obviously” a text file; if it ends in .dat it’s “obviously” a data (i.e. not text) file, and so on. I believe (but have not verified) that this determination is made by the client, which prefaces its get command with the binary or ascii command as appropriate.  On the command line, setting binary mode before fetching results in a single-line file, matching the behaviour of Java.

The curious thing is that the .dat file is detected as a binary, but the resulting data has been translated as if it was text. This seems to indicate some mismatch between what the server and the client are doing.

It could also indicate something unusual in the end-of-line indicators that are used in the original file — that VMS FTP knows how to translate them to ASCII, but that the recipient system doesn’t recognise them at all (not even as bytes). Does VMS use EBCDIC? Of course, there is no way I can find out what those bytes are, because my only access is via FTP and the indicator bytes are stripped in binary mode, or converted in text mode.

Anyway, at this point, it seems I just need to force an ascii mode transfer. The URL syntax for this turns out to be:

ftp://login:password@server/data.dat;type=a

(The alternative for binary transfers being ;type=i, being binaries are “obviously” images…)

This works, though it does feel a bit sneaky to alter the user-provided URLs like this. But at least I can support FTP in a few lines of code, instead of needing to rely on yet another third-party library.

This entry was posted in Programming and tagged , , , , . Bookmark the permalink.

Leave a comment