So at the start of development, we had roles very similar to what we outlined in the whitepaper. There was a client, intermediary, and gateway firmware each specialized for their roles. The gateway for example had no exit tunnel to get internet access etc.
This turned out to be very confusing for our early users and has resulted in having all Althea firmwares we currently ship with all the functionality of a client, gateway, and intermediary all at once.
The main problem here is that you have a router with a backhaul connection (aka a gateway) and it needs to balance between setting it’s own default route (to do client role stuff, like route user traffic from a LAN) and operating the backhaul connection to peer exits into the mesh.
To further complicate matters you may restart Rita, at which point you no longer have a default route from dhcp to detect but you need to restore that default route you don’t have in order to re-establish contact with exits to bridge into the mesh.
Here’s what we do right now.
- Tunnel manager adds routes into the routing table for each manual peer (the exits that gateways bridge to)
- The exit connection manager stores the default route if it finds one and places it into the config, where tunnel manager can use it if it’s restarted and lacks a default route.
This causes problems.
- If the exit manager starts before the tunnel manager resolving domain names for manual peers fails and the system gets stuck in a do-nothing state waiting for the mesh to come up so the exit tunnel works while the mesh waits on the exit tunnel to resolve ip’s.
- It’s possible for the system to move from a gateway to a client and then try to restore a default route that no longer applies since it was stored persistently in the config file in case of reboot.
The current proposed solution.
- Do our own dns resolution using trust-dns, this lets us see the error messages in more detail and realize when our manually re-added default route is invalid, at which point we can dump it. This should resolve the gateway to mesh device transition issue.
- By using 1 and checking if the default route is over wg_exit we can know when we need to use that stored default route with more confidence.
There’s one major problem with this.
- we need to make sure that the dns server used for this is not the same as the one used by the system in /etc/resolv.con or in dnsmasq or we could end up leaking system operations (not so bad) or user traffic (bad) to the naked backhaul connection. If gateway devices are to act as a pay per byte proxy box for users that’s not a good outcome.
Other solutions.
We could use net namespaces and have Rita perform all of it’s operations in one namespace while the lan network is in another, this is problematic because Rita needs to manage things on both sides of the divide.