Setup GlusterFS on Linux

Written by on .

Create Wireguard VPN

Setup a Wireguard VPN, so that the `glusterd` service can privately/securely connect to each other.

/etc/wireguard/fs0.conf: # This WireGuard device is for creating a VPN with other servers in the same FileSystem cluster (running GlusterFS). # # Update this configuration seamlessly using: wg syncconf fs0 <(wg-quick strip fs0) # Wireguard start/stop/status usage: wg-quick up fs0 / wg-quick down fs0 / wg-quick show fs0 # wg-quick executes PreUp/PostUp/PreDown/PostDown, and also interprets Address with mask to add the device, set the virtual IP-address, and routing. # Endpoints cannot contain domain names, they must be IP-addresses. # [Interface] # s1.example.org PrivateKey = <s1 Private Key> Address = 10.0.0.1/32 ListenPort = 5102 # Ensure UDP port 5102 is open: PostUp = ! iptables-save | grep -qFx -- "-A INPUT -p udp --dport 5102 -j ACCEPT" && iptables -A INPUT -p udp --dport 5102 -j ACCEPT || true # Set default forwarding policy to DROP: PostUp = iptables -P FORWARD DROP # Create new FS0_FW chain that contains forwarding rules related to the fs0 interface: PostUp = iptables -N FS0_FW && iptables -A FORWARD -i fs0 -j FS0_FW || iptables -F FS0_FW # Setup forwarding rules, only UDP traffic from 10.0.0.1/24 is allowed to go to 10.0.0.1/24 and only within the same device: PostUp = iptables -A FS0_FW -m state --state INVALID -j DROP PostUp = iptables -A FS0_FW -m state --state RELATED,ESTABLISHED -j ACCEPT PostUp = iptables -A FS0_FW -s 10.0.0.0/24 -d 10.0.0.0/24 -o fs0 -j ACCEPT PostUp = iptables -A FS0_FW -j DROP # Enable IPv4 forwarding: PostUp = sysctl -w net.ipv4.ip_forward=1 # Clean up rules: PostDown = iptables -D INPUT -p udp --dport 5102 -j ACCEPT || true PostDown = iptables -D FORWARD -i fs0 -j FS0_FW || true PostDown = iptables -F FS0_FW || true PostDown = iptables -X FS0_FW || true [Peer] # s1.example.org PublicKey = <s1 Public Key> Endpoint = :5102 AllowedIPs = 10.0.0.1/32 [Peer] # s2.example.org PublicKey = <s2 Public Key> Endpoint = :5102 AllowedIPs = 10.0.0.2/32 [Peer] # s3.example.org PublicKey = <s3 Public Key> Endpoint = :5102 AllowedIPs = 10.0.0.3/32

Generate a new private key on every server using wg genkey. Then derive the public key, by pasting the private key in the stdin of wg pubkey (use CTRL+D to close stdin). Finally, run wg-quick up fs0 to enable the Wireguard VPN.

Ping to one of the AllowedIPs to test the connection, and review the configuration status in wg show fs0. Note that Wireguard uses UDP, and thus there is no "active" connection. A handshake only shows up, when the tunnel is actually being used.

If a ping to the virtual address fails, check the configuration on both ends. If the configuration is incorrect on one end, communication cannot succeed either way.

Setup the GlusterFS server pool

Let's customize some easy to use DNS records, and mark on each server the correct localhost entry. The name 'virtual' has no special meaning, and is just an example to indicate these are not actual public DNS-records.

/etc/hosts: 10.0.0.1 virtual.s1.example.org virtual.localhost.example.org 10.0.0.2 virtual.s2.example.org 10.0.0.3 virtual.s3.example.org (...)

Now make sure you can ping these virtual hostnames. The /etc/hosts file is enabled in /etc/nsswitch.conf.

Setup the GlusterFS pool of peers. The probe automatically ensures other peers will connect with each other in both directions. Therefore, this only needs to be executed once on one of the servers. It doesn't hurt to execute more than once, or if the peer is actually the localhost (this is automatically detected).

gluster peer probe virtual.s1.example.org gluster peer probe virtual.s2.example.org gluster peer probe virtual.s3.example.org (...)

Create the virtual GlusterFS volume

Now let's create the distributed replicated volume. If you only want a volume that expands over multiple disks/servers, without any replication, just leave out the replica argument and its replication count. Advised is to use a replication count of at least 3, to avoid split brain situations (for robust partition tolerance). The force argument at the end is needed in order to define an absolute path to the data storage directory (/srv/example/...). Furthermore, the self-healing and bitrot detection and resolving daemons are enabled for automatic redundancy and improved consistency.

gluster volume create example replica 3 virtual.s1.example.org:/srv/example/s1 virtual.s2.example.org:/srv/example/s2 virtual.s3.example.org:/srv/example/s3 force gluster volume heal example enable gluster volume bitrot example enable

replica 3 enables replication for every three servers in the complete list of servers. If the replication count is set to 3 (recommended minimum), and let's say there are 12 servers listed. Then the volume is distributed over 4 groups of 3 servers (order of listing matters). So if every server provides 1TB of storage, then the storage capacity of the volume is 4TB. Although the actual consumed storage is 12TB, because every 1TB is replicated 3 times. If the total is not a multiple of the replication count, you may add arbiters or thin-arbiters, that help to decide on the majority in case of a partition, to avoid split brain situations.

Now setup the bind-address in the configuration file. This is an important step, otherwise the GlusterFS volume is publicly available as by default it listens at all interfaces (instead of only the local loopback address). Verify which services are listening at what port and bind address using netstat -ntlepa | grep gluster.

/etc/glusterfs/glusterd.vol: volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option transport.socket.listen-port 24007 option transport.socket.bind-address virtual.localhost.example.org option ping-timeout 0 option event-threads 1 # option lock-timer 180 # Uncomment the following line, if the bind-address resolves to an IPv6 address: # option transport.address-family inet6 # option base-port 49152 option max-port 60999 end-volume

When glusterd.service is restarted, some glusterfsd processes may be lingering. These must be killed, before the glusterd service should be started again, otherwise the mount will fail. To ensure that the forked processes are also killed upon (re)starting glusterd, see this GitHub issue.

/etc/systemd/system/glusterd.service.d/override.conf: [Service] KillMode=control-group

Use systemctl daemon-reload to apply this file, before restarting the glusterd service.

Mount the virtual GlusterFS volume

To access the distributed virtual volume manually on each server:

mkdir /mnt/example mount -t glusterfs virtual.localhost.example.org:example /mnt/example # To use virtual IPv6-addresses add: -o xlator-option=transport.address-family=inet6

Or automatically, using:

/srv/example/mnt-example.mount: [Unit] Description = Mount the virtual volume by glusterd on /mnt/example Requires = glusterd.service network-online.target Wants = network-online.target Conflicts = rescue.target rescue.service shutdown.target After = glusterd.service [Mount] Type = glusterfs What = virtual.localhost.example.org:example Where = /mnt/example # Note: don't use quotes in the Options= # To use virtual IPv6-addresses add: xlator-option=transport.address-family=inet6 #Options = rw,default_permissions,defaults,_netdev,allow_other,loglevel=WARNING,max_read=131072,backup-volfile-servers=virtual.s1.example.org:virtual.s2.example.org:virtual.s3.example.org,... [Install] WantedBy = multi-user.target

About using IPv6-addresses with GlusterFS

In my experience, while using virtual IPv6-addresses in the Wireguard VPN, most just works as long as the transport.address-family=inet6 is set. Both for the server volume configuration, and as a mount option. However, the daemons like self-heal and bitrot don't seem to start due to an error resolving the address. Looking at the code repository, there are a lot of issues relating to IPv6 functionality which was only added later on through multiple separate patches and bug-fixes (in 2019-2021). My recommendation is to avoid using IPv6, since it may not necessarily be implemented as a stable feature yet (in 2023).

RPC port 111

GlusterFS makes use of an RPC portmapper listening on port :111. If this port is publicly accessible, then it may be utilized by malicious actors to amplify a DDoS attack. Therefore, rpcbind should only listen on local interfaces and/or the virtual private network addresses that belong to the host. Create the following file on the server with virtual IP-address 10.0.0.x:

/etc/systemd/system/rpcbind.socket.d/override.conf: [Socket] # By default rpcbind listens on all interfaces, which is a security risk as amplification for DDoS attacks # Changes to ListenStream= or ListenDatagram= require a system reboot ListenStream=127.0.0.1:111 ListenDatagram=127.0.0.1:111 ListenStream=10.0.0.x:111 ListenDatagram=10.0.0.x:111

So on server 3, substitute 10.0.0.x with 10.0.0.3.

Since 1/systemd listens on the port, the system must be rebooted in order to apply these changes.

Troubleshooting GlusterFS

The functionality of GlusterFS is pretty low-level, written in C. There may be error messages that can be safely ignored due to some unused functionality, and they might be misleading. Furthermore, errors that occur, usually refer to the log files. Logs by GlusterFS are stored as follows.

/var/log/glusterfs/glusterd.log:
Main service daemon log.
/var/log/glusterfs/glustershd.log:
Self-heal daemon log.
/var/log/glusterfs/mnt-example.log
FUSE-mount log. One dynamic log-file per mount location.