(Editor's note: This article is part of a series of articles looking at the technology underlying the QCon Voting Box, a collaboration between C4Media and Thoughtworks that aimed to solve the question of how to add a “like” button to the real world.)
Our first article in this series, “Introduction to Bluetooth Meshes”, covered the basics of Bluetooth connection protocols. We described the “central” and “peripheral” roles that make up a connection, how a central device connects to a peripheral device, and how these connections may interact to form a mesh of multiple Bluetooth-capable devices. We ended by describing one of the simplest mesh algorithms, a conforming mesh, where every device in the mesh is capable of simultaneously fulfilling both the central and peripheral roles.
During our collaboration with C4Media, our first working mesh was built as a conforming mesh. We quickly realized, however, that this algorithm needed some modifications to reliably and efficiently transmit data to all devices. In this article, we will examine one of the first pitfalls we encountered with our mesh algorithm and explain how to successfully avoid this trap.
When you have a dynamic conforming mesh—meaning that connections are not orchestrated by humans or by a gateway, but are generated ad hoc by individual devices as they enter the range of others—there is a strong probability that a circular chain of connections, or a loop, will form. This will happen whenever two devices that are part of the same mesh cluster (and as such, can already share data via other connections) form a new direct connection.
A simple example of this is illustrated below: three dual-role devices within range of each other are switched on at once, and all begin both advertising for connections and scanning for advertisements. “A” as a central scans and connects to “B” as a peripheral; “B” as a central scans and connects to “C” as a peripheral; and then “C” as a central scans and connects to “A” as a peripheral, completing the loop.
[Image 1: Three dual-role devices forming a connected loop.]
At first glance, this arrangement doesn’t seem so terrible; why wouldn’t it be desirable to have multiple paths available for communication between devices, or to have “redundant” connections in case of failure?
As it turns out, the existence of a loop in a mesh has some very serious drawbacks, starting with the way it complicates the propagation of data.
The Problem with Loops
Data propagation techniques for meshes can vary wildly depending on requirements. For our purposes, since we were using basic memory-constrained System-on-Chips (SoCs) for our custom hardware, we wanted to keep the message passing logic simple and clean. One common approach is programming each device to take any message received from one of its connections, process it as appropriate, and send a copy to all other connections. This ensures that every member of the mesh receives all available data, and works for directed messages (meant for a specific device) as well as non-directed messages (meant for all devices).
However, when data travels in this fashion through a mesh with a loop, a single message may be delivered to the same device multiple times. Take, as an example, the looping mesh pictured below.
“A” is broadcasting a directed packet of data to the entire mesh that is ultimately intended for “C”. Because “A” is connected to both “B” and “C”, the packet is sent to both devices. “C”, recognizing that it is the intended recipient of the packet, accepts and reads the the packet. However, because “B” knows the packet is intended for another device, it unnecessarily forwards the packet to its connections.
In this scenario, “C” redundantly receives the same packet twice. The inefficiencies of increased traffic and redundant messages are limited in this example, but will grow with the complexity of the mesh.
[Image 2: Device C redundantly receives Device A’s message twice in this loop structure.]
If this same packet-forwarding algorithm is followed for non-directed data, “C” would then forward the packet it received from “B” to “A”, creating an infinite data spiral. Regardless of the specific destination of a message, some devices will still receive the same data more than once. Depending on the purpose of your mesh, the contents of the message, and the capabilities of the devices themselves, this could cause unwanted behavior.
One solution to the redundant message delivery caused by loops is to implement a form of message tracking, like assigning a unique identifier to every packet of data. This allows a device to recognize and appropriately handle any message that it has already received.
Alternately, if the device stores information about the structure of the mesh in on-device memory, it will be able to make decisions about how and where data is propagated—for example, sending data along the shortest route to a particular node rather than sharing with all of its connections by default. Unfortunately, either one of these mitigation strategies would also increase the overhead of on-device processing required for simple message propagation.
It may be tempting to overlook this issue, especially for smaller collections of devices, and to decide that enhancing the connection algorithm to avoid looping just isn’t worth it. However, even if your mesh can handle repetitive messages and increased traffic, the presence of loops will present an additional hindrance to the formation of the mesh itself.
The Bigger Problem with Loops
One important distinction between the central and peripheral roles is that while a central device—or a dual-role device currently playing the central role—may connect to multiple peripherals, a peripheral may only connect to a single central at a time. Although language added to the Bluetooth 4.1 specification implies that peripherals may connect to more than one central at a time, as of this writing, no BLE stack that we are aware of, including the Nordic Semiconductor “Soft Device” series we used, provides this functionality.
Given this practical limitation, once a peripheral device has accepted an incoming connection request from a central, it stops advertising for more connections because its one central connection slot is full. In a loop, all devices are maintaining a central connection (and at least one peripheral connection). Since no more central connections are allowed, no devices in the loop will be advertising, though they may continue to scan for advertising devices. An example of this can be seen above in the last panel of Image 1.
The only way to add more devices to a looped mesh is for an advertising device to enter into Bluetooth range and become a peripheral of any of the loop’s devices, at which point it will also stop advertising, maintaining the pure scanning nature of the looping mesh. From this perspective, a loop, no matter its size or structure, essentially acts as a single-role central device.
[Image 3: A looped mesh where no devices are advertising.]
And if members of a loop cannot advertise and two non-advertising entities will never connect to one another, two loops will never be able to join to form a unified mesh. Simply put, there is no way for more than one loop to exist in a mesh!
This is a huge impediment for the reliable formation of a dynamic mesh; once multiple loops have organically formed, there is no way to join them without breaking existing connections. After discovering this particular hindrance, we realized the necessity of reworking our algorithm to prevent loops altogether.
[Image 4: Two loops that are unable to connect to form a single mesh.]
No Loop For You!
The logical next question is: how do you ensure that your mesh algorithm prevents loops from occurring? There are various ways of achieving this within a conforming mesh.
The strategy that we ultimately adopted is to store a “mesh ID” on each device that identifies the cluster of connections to which it belongs. As the devices begin to form connections with one another, the mesh IDs are synchronized via message passing so that all participants in a single mesh cluster will have the same mesh ID.
Since every device begins in a non-connected state, it essentially forms its own “cluster” and generates its own mesh ID, including this information in the advertising data sent out when in peripheral mode. When a central device receives an advertisement, it compare its mesh ID to the one advertised by the peripheral device. If the mesh IDs match, then the two devices are part of the same connected cluster; they would form a loop if they shared a connection, and so the central ignores the advertisement.
However, if the IDs are different—indicating that the two devices are not part of the same cluster—the central initiates a connection. Once the connection is complete, the peripheral’s mesh ID will be updated to match the central’s, and the new mesh ID is also propagated to any other devices connected to the peripheral. In the end, all members of a cluster will have one and only one mesh ID and there will be no loops. This algorithm is illustrated below.
[Image 5: A loop-preventing algorithm. (The value in parentheses next to the device name represents the current mesh ID for that device.)]
There are other approaches to creating loop-free meshes, such as storing the entire mesh structure (or a list of all participants) on each device. However, the mesh-ID approach is fairly simple and works well for conforming meshes.
We now have a modified mesh algorithm that can swiftly join a collection of dual-role devices, and that can ensure no loops exist. However, more challenges lie ahead.
Loopless meshes have certain inherent properties that caught us by surprise when we attempted to expand beyond the conforming mesh structure. When single-role devices (purely central, or purely peripheral) came into play, a set of unique challenges forced us to take our algorithm in an entirely new direction.
We will explore these challenges, and the intriguing solutions, in the next and final article of this series.