My Ceph Cluster runs now! And it is amizingly powerful 🙂
Quite some time has passed since my last tries to get a ceph cluster running on ARM and compiling it on 32-bit ARM. But with every unsolved problem, moles are not well known to forget about unfinished tunneling projects. It’s again time to blast the solid rock with some pieces of dynamite 🙂 Ah, wrong project… we are under water not below ground… (do not get confused… octopus was a release of ceph and v17.2 aka. quincy is the latest version as of writing this)
I purchased three ODROID HC4 (the P-Kit) quite some weeks ago. It has a 64 bit ARM core, 4 GB of RAM and two slots for hard drives. And the drives I bought some years ago (4TB WD Red Standard) are still not continuously used and I could get two of them for use in my cluster. They just served to keep some data temporarly. I bought now another four pieces of WD Red (this time the Pro) and assembled it with the two remaining HC4-P Kits. So I have now three pieces of HC4 containing six WD Red hard drives in total, which means 24 TB of raw capacity.
As you may guessed already, installing ceph is still not an easy task. You can often not simply stick to the manual and this is, why I started writing again. In fact, I took two long ways without success out of many more wrong ways, that ended very quickly. And here is the way, that was successful, with only minor headaches to crush.
After trying different things (original HC4, debian bullsey stable and unstable), I got over to try armbian for ODROID HC4 (Jammy CLI version as of 2023-04-27). It is based on ubuntu and provides recent updates (the only distribution containing cephadm and ceph quincy aka v17.2). Flashing is easily done with Balena Etcher.
But to get it running, you need to get rid of petitboot, preinstalled on HC4 SPI flash. The instructions mentioned on the armbian page were not working. With latest kernel, the MTD devices do not show up any longer in the booted system (reached by holding the bottom button of HC4 and powering on) and also petitboot did not show up when a monitor was connected. I don’t know exactly why, but I soon took out the screws from the case and connected the UART (115200 8N1). Fortunately, I had an original ODROID USB-serial converter (aka. USB-UART 2 Module Kit) at hand.
After power up, a minimal system presented itself on the console. So i issued the commands to erase the SPI flash:
$ flash_eraseall /dev/mtd0 $ flash_eraseall /dev/mtd1 $ flash_eraseall /dev/mtd2 $ flash_eraseall /dev/mtd3
This took (especially for /dev/mtd3) a few minutes… After re-powering the board, all went fine and the system from SD-Card came up. It made its SSH service available on the network. Default user should be
root and password
1234. You then need to go through some little setup wizard and the system should greet you accordingly.
Now update the system, change the hostname and reboot:
$ apt update $ apt upgrade -y $ hostname <YOUR_HOSTNAME> $ hostname > /etc/hostname $ sed -i "s/odroidhc4/$(hostname)/" /etc/hosts $ sed -i "s/^.*\(SIZE\)=.*$/\1=256M/" /etc/default/armbian-ramlog $ reboot
For getting a nice status on OLED, you can easily install sys-oled-hc4 as a user with sudo permissions:
$ git clone https://github.com/rpardini/sys-oled-hc4 $ cd sys-oled-hc4 $ sudo ./install.sh
To bring a bit of color into life (and to quickly see, if you are root or somebody else)
$ sudo curl --silent \ -o /root/.bashrc \ https://raw.githubusercontent.com/the78mole/the78mole-snippets/main/configs/.bashrc_root $ curl --silent \ -o ~/.bashrc \ https://raw.githubusercontent.com/the78mole/the78mole-snippets/main/configs/.bashrc_user $ sudo cp ~/.profile /root/
Now install the real ceph stuff 🙂
$ sudo apt install podman catatonit lvm2 cephadm ceph-common $ cd ~ && mkdir bin $ curl --silent --location -o bin/cephadm \ https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm $ chmod +x bin/cephadm $ . .bashrc $ echo $PATH # Check if ~/bin/ is in your PATH
On every host you want to include in your cluster, you need to install following packages (I did this all with ansible, maybe I’ll write some post about it in the future -> Leave me a comment if you are interested):
$ apt install podman catatonit lvm2 gdisk
If you are on a Debian machine (not armbian), you neet to install cephadm described in the official documentation. Here are the commands (don’t try to use it on ubuntu or armbian, it will not work 😋):
$ curl --silent --remote-name --location \ https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm $ chmod +x cephadm $ ./cephadm add-repo --release quincy $ ./cephadm install
Installing Ceph using cephadm
Cephadm helps a lot to bootstrap a ceph cluster by preparing the host (also the remote hosts) from one single point in your network/cluster. What I learned painfully is, if you have a mixed architecture (arm64/aarch64 and amd64/x86_64), you should not start deploying your cluster from some amd64 machine. To read, why this is the case, just drop down the accordeon:
Why a ceph roll-out fails when starting on amd64 (klick if you want to read more)
When starting from amd64, it will install perfectly on your amd64 hosts and will also install some of the services (docker/podman containers) on your arm64, but also a few services will just fail. A first look in their quay repo revealed, that it has images for arm64 and amd64. But if you dig deeper, you can see, that arm64 build actions failed on GitHub. Applied to our case, this could mean, that arm64 machines will build their own images, when starting roll-out on arm64, but they will receive a wrong image hash, when starting cephadm on amd64. If you go back in history of the ceph container builds, you can see, that 2 years ago, the build was working (last ceph version on docker hub is 16.2.5), just before they switched hosting their repos from docker hub to quay.io. I believe, this is because red hat took over the ceph and quay.io is a red hat product. I somewhere read, that quay can not host images for different architectures in parallel, but I stopped digging here…
It seems, if you just start deployment from some arm64 machine, it works like a charm :-).
As a preparation for your deployment, you are well advised to distribute the SSH pubkey to all hosts you want to include in your cluster. This is quite easy… We will use the root user and an ssh-key without password, since it makes things way easier…
$ sudo -i $ ssh-keygen -t ed25519 # No password, just press enter two times $ ssh-copy-id root@<CLUSTER_HOST1> $ ssh-copy-id root@<CLUSTER_HOST2> ...
Now we can start, kick-start our cluster. I used a little Raspberry Pi3 (1GB RAM) running armbian. Better would be a Pi4, but they are hard to get currently. You could also use a ODROID-C4, but the one I have is intended for a display application with ODROID-VU7C display. And for just doing the manager-stuff, the little RPi3 is enough. I’ll move the heavy tasks (mon, prometheus, graphana,…) to a VM on my big arm64 server in a VM with 8 GB RAM. So the Pi only has to do some easy tasks until it is replaced by some Pi4 with 8GB RAM, as soon as they are available again.
To sow the seed, issue the following command:
$ sudo cephadm bootstrap \ --mon-ip $(ip route get 18.104.22.168 | grep -oP 'src \K\S+')
Now you are done with the initial step… You can log in using the hostname and the credentials as shown. If localhost is shown as the URL, simply replace it by the IP or the hostname of your mgr daemon.
Then log in to the web UI.
Now you are asked to provide a new password and re-login.
Ceph should greet you with the following screen. Just ignore the
Now go to
Hosts. There should be only a single host, the mgr (with mon) you just bootstrapped.
Now head over to your other hosts with ssh and prepare the HDDs for use as OSD storages. You will scrub the partition table on it with the following command.
$ sgdisk --zap-all /dev/sda /dev/sdb <...>
This should help, getting it prepared as OSDs.
Now head over to your ceph web UI again and select
Add... in the hosts section.
TODO some Screenshots
I took the model as a filter, so every HDD with the same model string on every HC4 get simply added when ceph is bootstrapping the host. Since we do not have SSDs or NVMes, it makes no sense to define WAL or DB devices… They are also not available here 🙂
We then can add services to this host.
Now distributing some daemons over the cluster.
- Using labels to define the hosts to run services on
- Making Graphana and Prometheus work (SSL issues with self signed certificate)
- Creating and mounting a CephFS
- Using the object store (Swift and S3)
- Some more details and hints on the HW infrastructure
- Adjusting the CRUSH map
- Creating own certificates (create a new tutorial post)
Creating a CephFS
Creating a CephFS with Replication
Easiest and most secure way to create a CephFS is to use two replication. Usually ceph stores Data with a redundancy of 3, meaning, it will create 2 copies of your data striped accross failure domains (usually hosts). In my setup with 3 hosts (each with 2 OSDs), this is the maximum.
The easiest solution, is to simply create the CephFS and let it implicitly create your pools and strategies.
$ cephadm shell $ ceph fs volume create <NAME_OF_FS>
Thats all with creating it 🙂
To mount it, you need to create a keyring:
$ ceph auth get-or-create client.<CLIENT_NAME> \ mon 'allow r' \ mds 'allow r, allow rw path=/' \ osd 'allow rw pool=erbw12-bigdata-fs' \ -o /root/ceph.client.<CLIENT_NAME>.keyring $ ceph fs authorize <NAME_OF_FS> client.<CLIENT_NAME> / rw
Now, cat the keyring and copy paste the content to
/etc/ceph/ceph.client.<CLIENT_NAME>.keyring to the host, where you want to mount your CephFS. Now go to this other host, install ceph-fuse package and execute the following:
$ sudo -i $ mkdir /etc/ceph $ cd /etc/ceph/ $ ssh-keygen -t ed25519 $ ssh-copy-id root@<CEPH_MON_HOST> $ scp root@<CEPH_MON_HOST>:/etc/ceph/ceph.conf . $ echo "<YOUR_COPIED_KEYRING>" > ceph.client.<CLIENT_NAME>.keyring # another way is to get the key through ssh from the client # host if your ceph command is accessible outside of the shell container $ ssh root@<CEPH_MON_HOST> ceph auth get client.<CLIENT_NAME> \ > ceph.client.<CLIENT_NAME>.keyring
Now you can mount your CephFS
$ mkdir <YOUR_MOUNT_POINT> $ ceph-fuse -n client.<CLIENT_NAME> none \ -m <CEPH_MON_HOST> <YOUR_MOUNT_POINT>
You can then check, if everything worked (
df -h or
mount) and put your data in
Creating a CephFS with Erasure Coding
Creating a pool with erasure code as data pool, you need to create the fs a bit more manually. First create your pools using the web UI.
If you want to use EC for CephFS, checking the
EC Overwrites is mandatory. Otherwise, Ceph will not accept the pool for cephfs. As the EC profile, you need to keep your cluster in mind. The following example will not work on a 3 node cluster (my one has only 3 failure domains = 3 hosts).
You also need to create a replication pool for the CephFS metadata.
cephadm shell, now issue the following commands:
$ ceph fs new <FS_NAME> <META_POOL_NAME> <DATA_POOL_NAME>
For authorization, refer to the stuff in replication pool above.
Ceph Node Diskspace Warning
This warning can be mostly ignored and it is not documented anywhere in the helt check documentation. The warning araises because Armbian is using a RAM-log (
/var/log) that get’s rsynced to HDD (SD card on
/var/log.hdd) every day. It is also rotated, compressed and purged on the card daily. This warning will usually be resolved automatically, especially with the 256M ramlog setting (40M was armbian default) and should not pop up to often or only after setting up the cluster, while a huge amount of loggin is created.
If the problem persists, you could dive into details using the healt check operation documentation.
After one day of runtime, ceph GUI reported a crash of the manager daemon. To inspect this, you need the ceph command, which is included in ceph-common, we installed previously without a need at that time. But for administrative purposes, it is quite handy 🙂
To inspect the crash, we will first list all crashes (not only new ones):
$ ceph crash ls ## Alternative to show only new crashes $ ceph crash ls-new
We will now get a detailes crash report.
In my case, I’m not sure, if this is just a side effect of the Healt-Warn state of the cluster, not being able to pull device metrics. We will see, if it persists 🙂
To get rid of the warning, just issue an archive command
$ ceph crash archive <ID> # Or to archive all listed (not showing up in ls-new) $ ceph crash archive-all
To delete older crashes (and also remove them from
ceph crash ls), issue the following command.
$ ceph crash prune <OLDER_THAN_DAYS> $ ceph crash prune 3 # Will remove crashes older than 3 days
The OLED does not yet work on bullseye unstable
Now update your repository cache and do an upgrade of the system. You should also change your timezone settings for the OLED of the HC4 to show the correct local time.
$ apt update $ dpkg-reconfigure tzdata $ apt upgrade #### if using the ODROID HC4 $ apt install odroid-homecloud-display wget curl gpg jq $ reboot
I struggeled a lot to install ceph on the ARM64 ODROID HC4… Here are my misleaded tries
$ mkdir -m 0755 -p /etc/apt/keyrings $ curl -fsSL https://download.docker.com/linux/debian/gpg | \ sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg $ echo \ "deb [arch=$(dpkg --print-architecture) \ signed-by=/etc/apt/keyrings/docker.gpg] \ https://download.docker.com/linux/debian \ $(lsb_release -cs) stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null $ apt update $ apt-get install docker-ce docker-ce-cli containerd.io \ docker-buildx-plugin docker-compose-plugin $ sudo docker run hello-world
We need to build ceph ourself, because the packages do not contain many of the needed packages. Alternatively, you can run the management node on some x86 and only use arm64 for the OSDs.
$ git clone https://github.com/ceph/ceph.git # or $ git clone https://github.com/the78mole/ceph.git $ git checkout quincy-release $ ./install-deps.sh $ cd ceph # To prepare a release build... Takes some minutes $ ./do_cmake.sh -DCMAKE_BUILD_TYPE=RelWithDebInfo $ cd build # Next step will take many hours (maybe some days) $ ninja -j1
To be able to distribute the packages (we will need more than a single host for ceph to make any sense), we will setup a debian package repository. I will make mine public so you can skip the process of compiling your packages. I used a german tutorial on creating an own repository, a tutorial to host a package repository using GitHub Pages and PPA repo hosted on GitHub.
$ mkdir ~/bin && cd ~/bin $ curl --silent --remote-name --location \ https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm #### or # wget https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm $ chmod +x cephadm $ cp cephadm /usr/sbin #### On arm64, the cephadm package is not available, even if we have this #### python script already at hand. Therefore, we put it in /usr/sbin and #### fake the package to be installed with equiv. Don't do this on other #### non-ARM systems #### vvvv Dirty Hack Start vvvv $ apt install equivs $ mkdir -p ~/equivs/build && cd ~/equivs $ curl --silent --remote-name --location \ https://raw.githubusercontent.com/the78mole/the78mole-snippets/main/ceph/cephadm-17.2.5-1.equiv $ cd build $ equivs-build ../cephadm-17.2.5-1.equiv $ dpkg -i cephadm_17.2.5-1~bpo11+1_arm64.deb #### ^^^^^ Dirty Hack End ^^^^^ #### If someone feels responsible to fix the real cephadm package for #### all arches (it is a python tool !!!), please do it :-) $ cephadm add-repo --release quincy $ apt update $ cephadm install $ which cephadm # should give /usr/sbin/cephadm # Tweak needed for cephadm is to enable root login over SSH $ sed -i \ 's/#PermitRootLogin.*$/PermitRootLogin yes/' \ /etc/sshd_config $ service ssh restart