Long Wireless links and monitoring.

2019-07-31

Intro

Some time ago I built 2 P-t-P links between some family members' buildings.

Thing is that my brother and my sister live in an area with no coverage from traditional ISPs, but that is quite close (5.5km on a straight line, with no obstacles) to my parent’s which have good coverage (even FTTH) and plenty of providers to choose from.

This project has grown organically so to speak, and the requisites kept changing.

That, and my lack of experience on the subject make all this far from an optimal solution.

In the end it has been working for almost 3 years now. This is an attempt to document all the infrastructure and the bits and pieces used so I do not forget about them and maybe it can be of use to somebody else.

First steps and research

As I said, I knew nothing about this before tackling the project. I have some solid knowledge about networking, but I knew little about long (for me) wireless links, antennas, propagation and a bunch of other stuff I never heard of. So I had to do some research.

If you want to do something like this, is better to plan ahead. See what the requisites are and start digging.

Some things to take into consideration are:

Materials

This is a list of materials I choose and why I choose them. It is short, as it is really an easy installation.

Antennas

I ended up using Ubiquity PowerBeams to create the 2 links. Four in total, 2 for each link.

I was looking for some reputable manufacturer trying to avoid problems in the future. Also, I wanted something as simple as possible. This kind of antennas have the “emitter/receiver” and the antenna all in the same device. So no special connectors to be crimped, virtually no losses on cables, just an easy PoE setup from the house to the rooftop.

Also, this antenna has an easy to setup web interface and an SSH server that leaves you in a busybox with some proprietary commands that are pretty handy for automation and data collection.

There are newer models now and other manufacturers. Do your research, read on forums and all the usual stuff. I can say those work for this setup with minor issues.

If you know something about this subject you may be wondering why I did not use something with a wider angle on the “access point” side and use just 3 antennas instead of 4. Truth is, I tried, but I had some problems with the 2nd link giving poor performance. Not being an expert on this I can only guess that the partial obstruction on the LOS (line of sight) path for the second link was the cause of the poor performance, specially on bad weather days (WiFi is pretty sensitive to heavy rain) and episodes of spectrum saturation.

Creating a separate link with a dedicated pair of antennas improved the situation a lot.

Cables

As the antennas only need a network connection, we only need Ethernet cable. Be sure that is CAT5e or better.

Always use cable rated for outdoor use. Regular network cable will not last long exposed to rain and the sun’s UV. I went for this one because it was available at the time on Amazon.

Connectors

Don’t go extra cheap on this, but anything with reasonable quality will do here. The antennas are built in a way that the connectors are never exposed, so this part is not that critical.

Antenna pole and other hardware

I cannot say much about this. What to buy here depends a lot on your particular setup. Remember that the higher the better for the antennas, and remember wind is a thing … you do not want it to fly away like a plastic bag.

Build steps

This is a list of the build steps I took. I started checking the list mentioned on the First steps section. Specifically the location of the antennas and the clear line of sight.

I have to admit that I did a sloppy job on the second link, because I did not know about the Fresnel zone back then, but there’s some things you can do to mitigate its effects.

Calculate signal strength

There’s a simple way to calculate the signal strength you should see on the other side of the link (on ideal conditions). This can be taken as a reference to see if the setup is viable and what conditions and speed negotiation you can expect between the 2 endpoints of the link.

The simplified formula to calculate the signal is:

emitterPower + emitterGain - signalLoss + receiverGain

I say this is the simplified formula, because it does not take into account loses on cables and connectors, that’s because I choose to use a “all in one packet” type of antenna, so that makes no sense in this case. This is a huge advantage for a beginner. Also, because I only take into account the free space loss and not any other kinds of loss, that would be a lot more difficult to calculate. That was sufficient for me anyway, as the conditions of line of sight are pretty good.

To calculate signal loss, this is the formula:

loss = 20*log((4*π*d)/λ)

Being d the distance between the 2 endpoints in meters and λ the wavelength, also in meters. If you do not remember how to calculate the wavelength from the frequency is just:

λ = C/f

Being C the speed of light in meters per second and f the frequency in Hertz.

So, as an example, let’s say I choose channel 137 which is 5685 MHz, and the 2 endpoints are 5.2km apart. That gives us a signal loss of 121.85 dB.

According to the antenna datasheet the transmission power is 5 dBm, the gain of the antenna is 25 dBi (that’s on average I guess across the whole range of channels). So putting all that together I should get on the other end -66.86 dBm. This works both ways in this case, so now we have to check sensitivity. Again according to the datasheet, there’s no problem in any modulation negotiation with this kind of signal strength (in theory, so to be on the safe side add at least -3 dB to your results).

Physical setup and alignment

With the theory calculations out of the way, knowing that was possible, the fun part started, I had to get on the roof and install the antennas.

Of course I won’t be saying much about this, as this is different for every single installation. Suffice to say, I had a “pretty fun time” up on ladders and climbing to places not meant to be climbed …

Before securing the antenna to the pole in its final position it has to be aligned. I did this the best I could given the lack of specialised equipment.

On the datasheet there are radiation plots for the chosen model. The principle is simple, those are 2D representations of the radiation lobes of the antenna, and the loss referred to the total gain. So basically you want to point them to one another as perfectly as possible, specially for parabolic antennas, which have a very narrow beam.

Those radiation plots confused me at first as, in case of the PowerBeam there are 4 of them “Vertical Azimuth”, “Vertical Elevation”, “Horizontal Azimuth” and “Horizontal Elevation”. This did not make any sense for me in the beginning, as the azimuth is an horizontal angle and elevation is a vertical one. It drove me nuts. It turns out it refers to both polarisations of the signal that those devices create … Once you understand that is easy, they are just the same measurement but times 2, one for each polarisation.

Once I knew how much of an angle I had before starting to loose signal, and with a bit of the good old trigonometry, I knew my margin of error when pointing the antennas to each other.

I did this standing behind the antenna and looking as if my line of sight was the beam. With some fiddling, that should be enough for the horizontal alignment. For the vertical one, it was easier, because the error margin is pretty big compared to the distance to the ground, even if you’re on a tall building (again, trigonometry, that angle at 5km is some meters …). Anyway with the help of some online tool I could calculate that easily to make it as precise as possible (search for “antenna downtilt calculator” on your favourite search engine).

Network diagram and configuration

With the antennas installed, it was time for some configuration.

This is a basic diagram of the network setup I came up with:

                                                               192.168.1.6/24
                                                                +--------+
                                                                | Bro.   |
                            192.168.1.2/24      192.168.1.4/24  | Router |
                            +---------+         +----------+    +--------+
                            | Antenna |         | Antenna  |   / 192.168.10.1/24
                        ----| AP1     |+++++++++| ST1      |---           
     192.168.1.1/24 ---/    +---------+         +----------+              
              +---------+                                                 
+---------+  -|  ISP    |                                                 
|Internet |-/ |  Router |                                                 
+---------+   +---------+                                                 
                 |  --\     +---------+         +-----------+             
                  \    --\  | Antenna |         |  Antenna  |             
                   \      --| AP2     |+++++++++|  ST2      |-\           
                   |        +---------+         +-----------+  -\ 192.168.1.7/24
                    \        192.168.1.3/24      192.168.1.5/24 +---------+
             +------------+                                     | Sis.    |
             | Rpi        |                                     | Router  |
             | Monitoring |                                     +---------+
             +------------+                                    192.168.10.1/24
             192.168.1.10/24

All are cable connections but the ++++ ones, which are the 5km links.

On the routers/APs at the end of the chain I used the same network segment for both, as hey will be isolated and do NAT. I did this because I have little control over the ISP router. It is “reset to defaults” from time to time and that caused me problems before. So setting static routes would be a pain to maintain. That produces double NAT on my siblings', but that’s a small price to pay for having a stable setup.

Yes, I know that’s a shitty thing to do for an ISP (they break your dhcp reservations and port forwarding too …), but most of the ISPs where I live are the biggest idiots and do the dumbest stuff you can imagine, so that’s not even something for them.

The PowerBeams are configurable via a web interface that is pretty intuitive. They can also be configured via an SSH access and editing a text file + some commands.

Some things I did:

If you prefer the command line to configure the antennas, log into them via SSH and edit the file /tmp/system.cfg. Then save to NVRAM with the command cfgmtd -w. Then reset with /usr/etc/rc.d/rc.softrestart force.

I do not recommend that method at the beginning, until you get familiar with all the options and configurations possible. You can make a pretty big mess.

As I said earlier, those antennas have a sort of spectrum analyser you can use to determine which channel is less busy. It uses some java applet (yes, I know …) and it has been broken in 2 occasions on some firmware updates. But it can be of assistance if your spectrum is really busy.

Performance tests

There are 2 ways to easily test the throughput of the links. The web interface has a “speed test” built in. You have to put the credentials of the other end and it can test TX, RX or both.

The other way (that I like the most) is iperf(1). The antennas have installed a basic implementation of that tool, so log into the antenna on the other end, and use iperf(1) either as server or client to test both sides of the communication.

Play a bit with the channel width. More channel width allows for faster transfer rates, but a narrow channel increases stability.

I ended up using 20 MHz for one of the links and 10 MHz for the other. That last one is the one with less than ideal LOS situation. In the end reducing the channel width and choosing the least busy channel did the trick and I could get a stable link.

In the end for the first link I get around 32Mbps symmetrical. The second link is a lot more variable depending on the conditions and the interferences from other stations. I get up to 17Mbps symmetrical, and is usually more than 12Mbps, but on worst case scenario it can get as low as 6Mbps. Which is still enough to watch online videos at 1080p with today’s compressions and is more than enough to do any kind of browsing, email and whatever … so I guess is enough.

Monitoring and management

For various reasons I wanted to monitor the whole thing. My brother had some network outages and I did not know why (I’m pretty sure they are related to some firmware bug introduced on a recent update, but I have no proof).

My idea for this was to put a Raspberry PI on my parent’s network that I could connect to and install all the necessary software for monitoring.

As I said earlier, I have little control over the ISP router. Also, I did not want to setup a VPN at my house or something similar on a VPS … So I ended up using Zerotier to create a “local network” between one of my hosts at my home office and the PI at my parent’s. This software creates an interface on the device with a private range, just like a VPN. The main difference in this case is that the server part is managed (you can host it yourself too) and it uses some clever tricks to find the best path between to endpoints so latency is always the least possible. It falls back to relay servers if none of the direct strategies work. Besides, is quite easy to add or remove devices to/from a given virtual network.

They have some documentation to make this process easy.

Having the monitoring PI on a local network segment, I could now use it as a jump box to ssh into the antennas and routers (using ProxyJump), making management easier.

In the end I decided to have some data collection and graphing and, after some consideration, I choose influxdb + telegraf + grafana. That gives me also alerts (more on that later).

InfluxDB for the database backend, telegraf as the “agent collector” and grafana for graphing tool.

I choose influxdb because is really easy to setup on the PI. Check that the retention is enabled so you do not fill up the little SD card on the PI. Is also quite easy to set up telegraf and grafana.

With that running I set up the InfluxDB data source on Grafana. I used the database named “telegraf”, which was automatically created by the telegraf process as soon as it started collecting data.

Then I configured telegraf to get snmp data from the “Access point” antennas and also from the routers at my siblings'.

To do this I had to add a file to the configuration folder (something /etc/telegraf/telegraf.d/snmp.conf) with the snmp config parameters:

[[inputs.snmp]]
  agents = [ "192.168.1.2", "192.168.1.3", "192.168.1.6", "192.168.1.7" ]
  version = 1
  community = "mycommunity"
  interval = "60s"
  timeout = "10s"
  retries = 3

  [[inputs.snmp.field]]
    name = "hostname"
    oid = "RFC1213-MIB::sysName.0"
    is_tag = true

  [[inputs.snmp.field]]
    name = "uptime"
    oid = "DISMAN-EXPRESSION-MIB::sysUpTimeInstance"

  # IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards.
  [[inputs.snmp.table]]
    name = "interface"
    inherit_tags = [ "hostname" ]
    oid = "IF-MIB::ifTable"

    # Interface tag - used to identify interface in metrics database
    [[inputs.snmp.table.field]]
      name = "ifDescr"
      oid = "IF-MIB::ifDescr"
      is_tag = true

The info that comes from this is basically network traffic for all interfaces and uptime.

I also set up telegraf to collect pings to the remote routers. That gives me info about the health of the link, and I based some alerts on that.

The needed config was:

[[inputs.ping]]
  ## List of urls to ping
  urls = ["192.168.1.6", "192.168.1.7"]

  ## Number of pings to send per collection (ping -c <COUNT>)
  count = 3
  ## Per-ping timeout, in s. 0 == no timeout (ping -W <TIMEOUT>)
  timeout = 1.0

And finally, I wanted to have some info the devices provide, but only through some internal commands. For instance, the number of connected devices.

There are 2 commands that run on those devices that provide some internal information (like signal strength, connected devices, and much more). They are mca-status and wstalist.

It turns out telegraf can execute commands and store that as metrics data, no problem. The configuration looks like this:

[[inputs.exec]]
  ## Commands array
  commands = [ "/usr/local/bin/get_connected_devices.sh router1" ]
  interval = "300s"

  name_override = "conn_devices"
  tag_keys = [ "hostname" ]
  timeout = "5s"
  data_format = "json"

The script is this:

#!/bin/sh

set -eu

device=${1:-router1}
device_info=$(ssh "ubnt@$device" mca-status | tr -d "\r")
connected_devices=$(echo "$device_info" |grep wlanConnections| cut -d'=' -f 2)

printf '{"hostname": "%s", "devices": %d }' "$device" "$connected_devices"

It outputs some JSON that telegraf understands.

After this it was just a matter of setting up some grafana dashboards to see what I wanted to see. I think there is enough information on the internet on how to do that, so I won’t be explaining it here.

As I mentioned my brother was having some outages that I still cannot explain. They are fixed rebooting the “access point” part of the link (I’m pretty sure they would go away simply kicking out the client, but I could not be bothered in looking how to do that programatically …).

So I thought on automating the reboot process as a mitigation for the inconveniences it produces. I set up an alert on grafana for the ping metric that, when it triggers calls a webhook.

I did it that way because I wanted to be notified and also automatically take action based on those alerts. The setup I came up with may seem a bit complicated, but it works with simple tools and it has been on service for some months now.

For the webhook, I found this, which is meant to be a sort of gateway from webhook to XMPP. It only accepts grafana calls but it can be adapted pretty easily.

I did some modifications to not only send an xmpp message, but also to write a flag file on disk on a specified folder if it gets an alert with a specific string on it. Then, there’s a cron job running that checks for those flags and, if it finds any, executes the script of the same name and deletes the flag on success. All pretty simple to do with shell script.

On the ping alert case, the shell scripts just connect to the “access point” antenna and perform a reboot(8).

With that done, outages do not last more than 5 minutes, and they are pretty rare anyway. So I think is a good solution until the day I take the time to dig into it (if I ever do it …).

I also created a custom handler with super simple payload, so I could use it from other scripts (not necessarily from this project) to just be notified via xmpp.

Conclusion

And that’s the whole setup. Without using anything too complicated or expensive I could connect those isolated flats, have some insight on what happens on the network, have alerts on the most interesting metrics and even automate responses if I need to.

I hope this may serve as a source of ideas for similar projects.