Wednesday 19 December 2018

More on reading microsoft long file names on QLs

This time it was the turn of PC formatted floppy disks. Those who use smsq/e as opposed to QDOS will know that floppy disks, DD or HD formatted on a PC can be read and written to by smsq/e. What can be frustrating is that files written on the floppy by a PC can have a long file name (LFN) written on the disk by the PC but when read on the QL side under smsq/e only the DOS short file name can be read.

A long file name on a PC can have up to 255 unicode characters including spaces, while a short file name or DOS 8.3 file name has a maximum of only 8 uppercase characters for the file name and 3 characters for the file extension and that's it. Windows has a scheme to shorten long file names to 8.3 DOS names, so that all files written to disk under windows always have an 8.3 DOS file name even if they also have a long file name. Problems arise with the shortened names as if there are a series of files whose names start with the same 6+ characters then a maximum of 6 characters will be used followed by ~ and then a number. If the numbers go in to two digits then only 5 characters can be used as only a maximum of 8 is ever allowed in the name part of the 8.3 DOS name.

Typically the long file name precedes the 8.3DOS name  in the directory but just in case that does not happen a check sum is calculated from the 8.3 DOS name and stored in each segment of its long file name, to confirm that they both belong to the same file. This is important a when a file is deleted under widows the first character if the 8.3 DOS name is over written with the value of 229 to indicate that the file has been deleted but nothing happens to the long file name segments unless they are over written in a future write of a file name to the directory.

So here is what happened when QL Heaven looked at a HD floppy that had been used for some time to transfer files from a PC to a QL system.


 The first table contains a number apparent files with LFNs that do not have any file information as their 8.3 names have been erased.

Then when check sums are used to link LFNs  to 8.3 names everything is tidied up.


The process of calculating the check sum is arcane and interestingly can only produce values from 0 to 255 as there is only a single byte to hold the value in the LFN section. In directories with hundreds of files what are the odds of 2 files having the same check sum?

Here is the calculation in SBASIC.

18750 DEFine FuNction MakeCheckSum (sum$)
18760 x$=BIN$(CODE(sum$(1)),8)
18770 FOR i= 2 TO 11
18780 x$=RotRt(x$) : ans%=BIN(x$) : nxt%=CODE(sum$(i)) : x%=ans%+nxt% : x$=BIN$(x%,8)
18790 END FOR i
18800 x%=BIN(x$) : RETurn x%
18810 END DEFine

18820 :

18830 DEFine FuNction RotRt (x$)
18840 r$=x$(8) : m$=x$(1 TO 7) : ans$=r$&m$ : RETurn ans$
18850 END DEFine

1 comment:

  1. As far as I know the checksum is not used to match up the LFN and short filename. The short filename must _always_ follow directly in the slot after the LFN. The checksum is just an additional check to see if the entry still matches or if some none LFN-aware utility has messed with the file there.

    ReplyDelete