Whether you want to provide Ceph Object Storage and/or Ceph Block Device services to Cloud Platforms, deploy a Ceph Filesystem or use Ceph for another purpose, all Ceph Storage Cluster deployments begin with setting up each Ceph Node, your network, and the Ceph Storage Cluster. A Ceph Storage Cluster requires at least one Ceph Monitor, Ceph Manager, and Ceph OSD (Object Storage Daemon). The Ceph Metadata Server is also required when running Ceph Filesystem clients.
Monitors: A Ceph Monitor (ceph-mon) maintains maps of the cluster state, including the monitor map, manager map, the OSD map, and the CRUSH map. These maps are critical cluster state required for Ceph daemons to coordinate with each other. Monitors are also responsible for managing authentication between daemons and clients. At least three monitors are normally required for redundancy and high availability.
Managers: A Ceph Manager daemon (ceph-mgr) is responsible for keeping track of runtime metrics and the current state of the Ceph cluster, including storage utilization, current performance metrics, and system load. The Ceph Manager daemons also host python-based plugins to manage and expose Ceph cluster information, including a web-based dashboard and REST API. At least two managers are normally required for high availability.
Ceph OSDs: A Ceph OSD (object storage daemon, ceph-osd) stores data, handles data replication, recovery, rebalancing, and provides some monitoring information to Ceph Monitors and Managers by checking other Ceph OSD Daemons for a heartbeat. At least 3 Ceph OSDs are normally required for redundancy and high availability.
MDSs: A Ceph Metadata Server (MDS, ceph-mds) stores metadata on behalf of the Ceph Filesystem (i.e., Ceph Block Devices and Ceph Object Storage do not use MDS). Ceph Metadata Servers allow POSIX file system users to execute basic commands (like ls, find, etc.) without placing an enormous burden on the Ceph Storage Cluster.
Ceph stores data as objects within logical storage pools. Using the CRUSH algorithm, Ceph calculates which placement group should contain the object, and further calculates which Ceph OSD Daemon should store the placement group. The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically.
The principal requirements for deploying Ceph in a CloudCIX region are…
From ceph installation guide, the following system requirements must be met before deployment.
Python 3
Podman or Docker for running containers
Time synchronization (such as chrony or NTP)
LVM2 for provisioning storage devices
To allow easy configuration of large numbers of Ceph hosts, Ansible is used.
Once the hosts are configured, Ceph is installed using the cephadm. This is installed on the first Mon node.
The Public Network of Ceph is the Management Network of CloudCIX. In the rest of this documentation, the name Public Network will be use.
The Cluster Network of Ceph is the Private Network of CloudCIX. In the rest of this documentation, the name Cluster Network will be use.
Both networks should be IPv6.
CloudCIX Ceph is built using Ubuntu 20.04 as the underlying operating system. Ceph user documentation lists the minimum and recommended hardware requirements for Monitors and Nodes https://docs.ceph.com/en/latest/start/hardware-recommendations/.
Network Name |
IP Address Assignment |
VLAN ID |
---|---|---|
Internal Cluster |
fc00::<rack>.<unit>/64 This address range is replicated in each Region’s Ceph Cluster. |
2 Ports are access. |
Public (Connected to CloudCIX Management Network) |
<prefix>::60:<rack>:<unit>/64 |
3 |
<prefix> represents the first 48 bits of the IPv6 subnet assigned to the Pod.
<rack> is a decimal (not hexadecimal) representation of the rack id in the Pod.
<unit> represents U location (bottom U for a multi U rack) of the host in the rack.
DNS records for Ceph Monitors and Hosts are named as outlined here.
We will deploy cluster in configuration:
3 Monitors, each with one disk /dev/sda for system.
6 OSD nodes, each with three disks /dev/sda for system, /dev/sdb and /dev/sdc for storage.
Each host has 2 NIC for Public and Cluster networks.
Ssh to the first node (cephmon001001), became root.
Prepare hosts file for ansible
[ceph_mon]
cephmon001001 ansible_host=<prefix>::60:1:1
cephmon002001 ansible_host=<prefix>::60:2:1
cephmon003001 ansible_host=<prefix>::60:3:1
[ceph_osd]
ceph001002 ansible_host=<prefix>::60:1:2
ceph001003 ansible_host=<prefix>::60:1:3
ceph001004 ansible_host=<prefix>::60:1:4
ceph002002 ansible_host=<prefix>::60:2:2
ceph002003 ansible_host=<prefix>::60:2:3
ceph002004 ansible_host=<prefix>::60:2:4
Save this code to file prepare-ceph-nodes.yml
---
- name: Prepare ceph nodes
hosts: all
become: yes
become_method: sudo
#vars:
#ceph_admin_user: cephadmin
tasks:
- name: Set timezone
timezone:
name: UTC
tags: timezone
- name: Set hostname
hostname:
name: "{{inventory_hostname_short}}"
tags: update_host
- name: Update system
apt:
name: "*"
state: latest
update_cache: yes
- name: Install common packages
apt:
name: [vim,git,bash-completion,wget,curl,chrony]
state: present
update_cache: yes
- name: Install Docker
shell: |
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable" > /etc/apt/sources.list.d/docker-ce.list
apt update
apt install -qq -y docker-ce docker-ce-cli containerd.io
- name: Reboot server after update and configs
reboot:
Run the code:
ansible-playbook -i hosts prepare-ceph-nodes.yml --user administrator --ask-pass --ask-become-pass
Install cephadm. The cephadm command can
bootstrap a new cluster
launch a containerized shell with a working Ceph CLI
aid in debugging containerized Ceph daemons
apt install cephadm
Prepare minimal configuration for networking in ceph.conf in your home directory:
[global] cluster network = fc00::/64 public network = <prefix>::60/64
The first step in creating a new Ceph cluster is running the cephadm bootstrap command on the Ceph cluster’s first host. The act of running the cephadm bootstrap command on the Ceph cluster’s first host creates the Ceph cluster’s first “monitor daemon”, and that monitor daemon needs an IP address. You must pass the IP address of the Ceph cluster’s first host to the ceph bootstrap command, so you’ll need to know the IP address of that host.
cephadm bootstrap --mon-ip <host1-ip> --allow-fqdn-hostname
ceph -s
Cephadm does not require any Ceph packages to be installed on the host. However, it recommends enabling easy access to the ceph command.
You can install the ceph-common package, which contains all of the ceph commands, including ceph, rbd, mount.ceph (for mounting CephFS file systems), etc.:
cephadm install ceph-common
To add each new host to the cluster, perform two steps:
Install the cluster’s public SSH key in the new host’s cephadm user’s authorized_keys file:
Use ansible code create_cephadmin.yml:
--- - name: Prepare ceph nodes hosts: all become: yes become_method: sudo tasks: - name: Create user cephadmin user: name: cephadmin password: long_ceph_admin_password generate_ssh_key: no state: present - name: sudo without password for cephadmin user copy: content: 'cephadmin ALL=(ALL:ALL) NOPASSWD:ALL' dest: /etc/sudoers.d/cephadmin mode: '0440' owner: root group: root - name: Set authorized key taken from file to cephadmin user authorized_key: user: cephadmin state: present key: "{{ lookup('file', '/etc/ceph/ceph.pub') }}"
Tell Ceph that the new node is part of the cluster:
A storage device is considered available if all of the following conditions are met:
The device must have no partitions.
The device must not have any LVM state.
The device must not be mounted.
The device must not contain a file system.
The device must not contain a Ceph BlueStore OSD.
The device must be larger than 5 GB.
Ceph will not provision an OSD on a device that is not available.
To add storage to the cluster, either tell Ceph to consume any available and unused device:
ceph orch apply osd --all-available-devices
After running the above command:
If you add new disks to the cluster, they will automatically be used to create new OSDs.
If you remove an OSD and clean the LVM physical volume, a new OSD will be created automatically.
If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the –unmanaged parameter:
ceph orch apply osd --all-available-devices --unmanaged=true
Create an OSD from a specific device on a specific host:
ceph orch daemon add osd *<host>*:*<device-path>*
The –dry-run flag causes the orchestrator to present a preview of what will happen without actually creating the OSDs.
For example:
ceph orch apply osd --all-available-devices --dry-run
In order to deploy an OSD, there must be a storage device that is available on which the OSD will be deployed.
Run this command to display an inventory of storage devices on all cluster hosts:
ceph orch device ls
Output will be like:
Hostname Path Type Serial Size Health Ident Fault Available host2 /dev/sdb hdd 85.8G Unknown N/A N/A Yes host3 /dev/sdb hdd 85.8G Unknown N/A N/A Yes host1 /dev/sdb hdd 85.8G Unknown N/A N/A Yes
Pools are logical partitions for storing objects. When you first deploy a cluster without creating a pool, Ceph uses the default pools for storing data.
By default, Ceph makes 3 replicas of RADOS objects. Ensure you have a realistic number of placement groups. Ceph recommends approximately 100 per OSD and always use the nearest power of 2.
root@host1:~# ceph osd lspools 1 device_health_metrics root@host1:~# ceph osd pool create datapool 128 128 pool 'datapool' created root@host1:~# ceph osd lspools 1 device_health_metrics 2 datapool root@host1:~# ceph osd pool ls detail pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 22 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth pool 2 'datapool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 39 flags hashpspool stripe_width 0 root@host1:~# ceph osd pool get datapool all
On the admin node, use the rbd tool to initialize the pool for use by RBD:
rbd pool init datapool
The rbd command enables you to create, list, introspect and remove block device images. You can also use it to clone images, create snapshots, rollback an image to a snapshot, view a snapshot, etc.
rbd create --size 512000 datapool/rbdvol1 rbd feature disable datapool/rbdvol1 object-map fast-diff deep-flatten rbd map datapool/rbdvol1 rbd showmapped lsblk ls -la /dev/rbd/datapool/rbdvol1 rbd status datapool/rbdvol1 rbd info datapool/rbdvol1
You can use Linux standard commands to create filesystem on the volume and mount it for different purpose.
Script for removing cluster and clear all hosts (if something went wrong):
#!/bin/bash display_usage() { echo "The ceph cluster fsid must be provided" echo -e "\nUsage: $0 <fsid> \n" } if [ -z $1 ] then display_usage exit 1 fi fsid=$1 #Get information about hosts in the cluster bootstrap=$(hostname) hosts=$(cephadm shell --fsid $fsid -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring ceph orch host ls --format yaml | grep hostname | cut -d " " -f2) #Clean Bootstrap node echo "Purge cluster in $bootstrap:" cephadm rm-cluster --fsid $fsid --force rm -rf /etc/ceph/* rm -rf /var/log/ceph/* rm -rf /var/lib/ceph/$fsid # Clean the rest of hosts for host in $hosts do if [ $host != $bootstrap ] then echo "Purge cluster in $host:" cephadm_in_host=$(ssh -o StrictHostKeyChecking=no $host ls /var/lib/ceph/$fsid/cephadm*) ssh -o StrictHostKeyChecking=no $host python3 $cephadm_in_host rm-cluster --fsid $fsid --force # Remove ceph target ssh -o StrictHostKeyChecking=no $host systemctl stop ceph.target ssh -o StrictHostKeyChecking=no $host systemctl disable ceph.target ssh -o StrictHostKeyChecking=no $host rm /etc/systemd/system/ceph.target ssh -o StrictHostKeyChecking=no $host systemctl daemon-reload ssh -o StrictHostKeyChecking=no $host systemctl reset-failed # Remove ceph logs ssh -o StrictHostKeyChecking=no $host rm -rf /var/log/ceph/* # Remove config files rm -rf /var/lib/ceph/$fsid fi done