Fix SSH tunnel MTU with iptables

One of my server provides two incompatibles services regarding the MTU of his ethernet interface.

The first service is a SSH server that many embedded devices connect to with SSH tunnel enabled to let manage them remotely. This service is prone to hit the well know stall when big packets are exchanged inside the tunnel because the TCP MTU is too large and the ICMP messages to reduce them are dropped by some firewall between the device and the server. The usual way to solve this problem is to set the MTU of the ethernet interface to a bit lower than 1500, I use 1412. It work very well, but…

The second service is a TFTP server that provides a PXE image, including SYSLINUX and the Linux kernel, to boot a embedded device in testing mode by just disengaging the CF memory card (removing it is painful due to the mechanical design) and connecting a ethernet cable to it. Sadly the TFTP/PXE exchange require a MTU of 1500 on the ethernet interface to work, at least with the BIOS of this embedded device. The quick and dirty trick I used was to set the MTU of the ethernet interface of the server to 1500 just the time to boot the tested device and to revert back to a MTU of 1412 immediately  after to let the SSH tunnel work without stall.

The point to understand is that the interface MTU is used by default to set the TCP MTU, because the TCP MTU can’t be bigger than the interface MTU without fragmentation, but the TCP MTU can be smaller than the interface MTU without problem. This is in fact what will solve my problem for the two services: I want a interface MTU of 1500 and a TCP MTU of 1412. How to get that ?

The Linux kernel netfilter can set the TCP MTU on the initialization of a TCP stream using a iptables command. This is commonly used on router that use NAT between interfaces with different MTU (if we don’t look at the details). But in my case I want to only affect the SSH service and it’s not a router. I finally found that this command do the job:

iptables -t mangle -A OUTPUT -p tcp --sport 22 \
-m tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1412

Remove the leading 0xff of a file

Curious that this question don’t actually raise any simple response on the internet because this is a common situation when you dump the full flash of a microcontroller and only want to keep the useful part of the file. To do that I use this command:

sed -i -e "$ s/\xff*$//" file.bin

The first $ make the execution of the next command only for the last line of the file to avoid removing 0xff in the middle of the file. The s/\xff*$// is a usual substitute command that replace all the consecutive 0xff before the end of the line by nothing.

Set Linux hostname from the device tree

Some embedded boards are manufactured in quantity. In a such situation it’s common to require that each board is uniquely identified for production/sales/maintenance tracking. Serial number is a common solution. The question is where to store the serial number ? I usually recommend to store it in the /etc/hostname file because it make it widely available to all applications and will be display in most log files. If the board have something like a network interface, chance are that you need to assign an unique address to it using the device tree. So I wondering if it’s possible set the hostname as a serial number from the device tree too. I was unable to find something like this on the internet, so I started thinking how can I do that.

Adding the ‘hostname‘ node on the device tree is really trivial. Just add the following line in the <board.dts> file corresponding to the hardware:

  hostname = "board-123456";

Then run ‘make <board.dts>’ from the Linux kernel source tree to get a new <board.dtb> file. Once you boot the Linux kernel with this new <board.dtb> file, you must get the ‘hostname‘ entry in the sysfs filesystem. This allow to do this:

# cat /sys/firmware/devicetree/base/hostname

The next step is to setup the Linux hostname at boot time to the value of the device tree ‘hostname‘ node. At first I was thinking that this will need a kernel patch since the device tree is maintained into the kernel and that the hostname value is stored into the ‘nodename‘ field of the kernel’s ‘uts‘ namespace. Until I realize that the hostname command will maybe accept to read the /sys/firmware/devicetree/base/hostname file and correctly set the hostname, even is the buffer returned by the sysfs contain a zero byte at the end. I try it and it worked fine:

# hostname -F /sys/firmware/devicetree/base/hostname
# hostname

Inspecting the uname call using ‘strace hostname‘ showed that the zero suffix was correctly ignored:

uname({sys="Linux", node="board-123456", ...}) = 0

Since the hostname command is already called early in the system initialization process to set the hostname to the content of the /etc/hostname file, the simplest solution is to replace the /etc/hostname file by a symbolic link to the /sys/firmware/devicetree/base/hostname file:

# ln -fs /sys/firmware/devicetree/base/hostname /etc/

To get this working the sysfs filesystem need to be mounted before the call to the hostname command. Fortunately this is the case at least in Debian jessie using systemd, but I am almost certain that this will also work with systemvinit.

So simple after all. I hope this hack will work on others distributions too.