Azure VPN Gateway 實踐 BGP + ECMP: 以 FortiGate 為例
本文提供如何建立 IPsec VPN 及 BGP 連線路由,從地端 ForiGate 到 Azure VPN Gateway (Active-Active + BGP + ECMP) 過程
我這天想了一下如果用 Tifa 的圖,可能涉及到 D&I 條款,為了我扁扁的荷包著想,各位回家自己看電腦
最近正在開始籌備 Kubernetes Community Days Taipei 2024,下面是幾個明確的時程表:
- KCD Taipei 2024 CFP 開始: 2024/3/31
- KCD Taipei 2024 CFP 結束: 2024/5/9
- KCD Taipei 2024 議程發布: 2024/7/7
- COSCUP 2024 活動舉辦: 2024/8/3 - 2024/8/4
我們最近 一群用愛發電的阿宅們持續抓頭找贊助中,如果有各大公司願意當 Kubernetes Community Days Taipei 2024 乾爹乾媽的話,可以聯繫 organizers [at] cloudnative.tw
過往活動紀錄 KCD Taiwan 2023
總體架構圖
建議下載這張圖對照下面設定,IP 和名稱是可以對齊下面的所有內容和設定
然後分為 2 個角度處理
- 雲端 Azure 方
- 地端 FortiGate 方
A. 雲端 Azure 方
1. 選大本營,開始挖網路
你需要先選定其中一個 Azure Region 區域,並且建立一個新的 Virtual Network 網段,並且在裡面建立一個保留子網段 GatewaySubnet,且最小的網段大小為 /27
如本圖是在 Azure Region: Japan East 建立 Virtual Network 名稱為 vnet-hub-japaneast 10.0.0.0/24,根據 Visual Subnet Calculator (Azure Edition) 的計算,可以在這個網段內再建立一個 10.0.0.0/27 的子網段,且使用保留子網段名稱為 GatewaySubnet
一般我都會說這樣子的環境就是 Hub-Spoke 網路架構的 Hub 部分
2. 建立 Azure VPN Gateway
因為 Azure VPN Gateway 的參數選擇會影響到後續架構的設計,我建議先從下面這種正常方式開局.也是最通用的架構
按照這個設定來說,因 GatewaySubnet 是 10.0.0.0/27,可以分析一下這網段在幹嘛
服務 |
IP |
作用 |
Azure |
10.0.0.1 |
Gateway IP |
Azure |
10.0.0.2 |
保留給 DNS 用 |
Azure |
10.0.0.3 |
保留給 DNS 用 |
VPN IN_O |
10.0.0.4 |
IN_0 的 BGP Peering IP,這晚一點會用到 |
VPN IN_1 |
10.0.0.5 |
IN_1 的 BGP Peering IP,這晚一點會用到 |
3. 建立 Local Network Gateway
1 個 Local Network Gateay 只能描述 1 個地端對外 IP 和 1 個 BGP Peering 資訊
這頁設定是要來描述地端設備的基本參數,你需要先知道你
Local Network Gateway 個數 = 有多少 WAN Port 要連上來 * 該設備有多少個 BGP 需要跟 Azure Peering * 站點數量
譬如說上圖就是,我只有 1 個 WAN Port 對外,我也只有一個 BGP 65510 需要跟 Azure Peering,站台數量總共也才 1 個,所以我只需要建立一個 Local Network Gateway 即可
我知道有人家裡是 2 個 WAN Port 然後不同 ISP,所以按照上面邏輯,你需要建立 2 個 Local Network Gateway
4. 用 Connection 連接 Local Network Gateway 和 Azure VPN Gateway
這頁主要是在描述 Azure VPN Gateway 的具體參數要怎麼跟地端設備連接,基本上沒意外都是預設值,如果有問題的話,建議看你地端設備的 IPsec / IKE policy 有沒有什麼長得很不一樣的要特別調整,通常建立不起來的主要原因是兩邊的加密選擇對不起來,要特別調整
B. 地端 FortiGate 方
因為我家只有 FortiGate 可以用,所以下面就以 FortiGate 的設備為主,如果你是其他家的設備也可以自行參考,我都是用基本設定,大部分都沒用什麼特殊的黑魔法
此外,因為我本身也不是專精 FortiGate,都是受客戶驅動才去買了一台回家自己搞,如果有任何設定不合理或者是漏了,歡迎指正,我會非常感謝你,台灣很多甲方同行也會很多人會感謝你
0. 版本資訊
1. 盤一下介面
system-interface |
---|
| homecloud-fgt # show system interface
config system interface
edit "wan1"
set vdom "root"
set mode pppoe
set allowaccess ping
set type physical
set lldp-reception disable
set monitor-bandwidth enable
set role wan
set snmp-index 1
set username "[email protected]"
set password ENC tifaismywife
next
edit "loopback" # (1)
set vdom "root"
set ip 1.2.3.4 255.255.255.255 # (4)
set allowaccess ping https ssh http
set bfd enable # (5)
set type loopback
set role lan
set snmp-index 15
next
edit "azure-tunnel-1" # (2)
set vdom "root"
set allowaccess ping
set type tunnel
set snmp-index 19
set interface "wan1"
next
edit "azure-tunnel-2" # (3)
set vdom "root"
set allowaccess ping
set type tunnel
set snmp-index 20
set interface "wan1"
next
|
- 利用 loopback interface 永遠都是 UP 的特性和提供多條實體或虛擬線路中進行負載平衡或冗餘的特性,來提供 BGP Peering 的 IP 位置,而非採用 Interface IP 來提供
- 這個需要等 IPsec 建立之後才會出現
- 這個需要等 IPsec 建立之後才會出現
- 我想大家都專業的,你也知道我這是亂寫的,不要認真
- Azure VPN Gateway 本身沒支援 BFD,但 ExpressRoute Gateway 有
2. 挖 IPsec Tunnel
就是照著官方文件寫 FortiGate - IPsec VPN to Azure with virtual network gateway 設定即可,基本上沒啥太大問題
phase 1 |
---|
| homecloud-fgt # show vpn ipsec phase1-interface
config vpn ipsec phase1-interface
edit "azure-tunnel-1"
set interface "wan1"
set ike-version 2
set keylife 28800
set peertype any
set net-device disable
set proposal aes256-sha1 3des-sha1 aes256-sha256
set dpd on-idle
set dhgrp 2
set nattraversal disable
set remote-gw <Azure VPN Public IP IN_0>
set psksecret ENC n0gam3n0l1f3
next
edit "azure-tunnel-2"
set interface "wan1"
set ike-version 2
set keylife 28800
set peertype any
set net-device disable
set proposal aes256-sha1 3des-sha1 aes256-sha256
set dpd on-idle
set dhgrp 2
set nattraversal disable
set remote-gw <Azure VPN Public IP IN_1>
set psksecret ENC n0gam3n0l1f3
next
end
|
phase 2 |
---|
| homecloud-fgt # show vpn ipsec phase2-interface
config vpn ipsec phase2-interface
edit "azure-tunnel-1"
set phase1name "azure-tunnel-1"
set proposal aes256-sha1 3des-sha1 aes256-sha256
set pfs disable
set auto-negotiate enable # (1)
set keylifeseconds 27000
next
edit "azure-tunnel-2"
set phase1name "azure-tunnel-2"
set proposal aes256-sha1 3des-sha1 aes256-sha256
set pfs disable
set auto-negotiate enable # (2)
set keylifeseconds 27000
next
end
|
- 這個官方文件沒寫,但我有加,主要是為了 keepalive enable,這個看起來是藏起來的參數
- 這個官方文件沒寫,但我有加,主要是為了 keepalive enable,這個看起來是藏起來的參數
3. 開通 Firewall Policy
每一條 Tunnel 最少都要有 2 個 Policy,去跟回都要,因為 Azure VPN Gateway 有 2 個 Public IP 可以建立 Tunnel,所以你最少需要建立 4 個 Policy
Firewall Policy |
---|
| homecloud-fgt # show firewall policy
config firewall policy
edit 7
set name "toAzure-over-azure-tunnel-1"
set srcintf "Zone-BigSwitch"
set dstintf "azure-tunnel-1"
set srcaddr "all"
set dstaddr "all"
set action accept
set schedule "always"
set service "ALL"
set tcp-mss-sender 1350 # (1)
set tcp-mss-receiver 1350 # (2)
next
edit 11
set name "toAzure-over-azure-tunnel-2"
set srcintf "Zone-BigSwitch"
set dstintf "azure-tunnel-2"
set srcaddr "all"
set dstaddr "all"
set action accept
set schedule "always"
set service "ALL"
set tcp-mss-sender 1350
set tcp-mss-receiver 1350
next
edit 10
set name "FromAzure-over-azure-tunnel-1"
set srcintf "azure-tunnel-1"
set dstintf "Zone-BigSwitch"
set srcaddr "all"
set dstaddr "all"
set action accept
set schedule "always"
set service "ALL"
set tcp-mss-sender 1350
set tcp-mss-receiver 1350
next
edit 12
set name "FromAzure-over-azure-tunnel-2"
set srcintf "azure-tunnel-2"
set dstintf "Zone-BigSwitch"
set srcaddr "all"
set dstaddr "all"
set action accept
set schedule "always"
set service "ALL"
set tcp-mss-sender 1350
set tcp-mss-receiver 1350
next
end
|
- TCP MSS 要 1350,如果沒支援的參數的話,找一下 MTU 要 1400 bytes
- TCP MSS 要 1350,如果沒支援的參數的話,找一下 MTU 要 1400 bytes
4. 中場檢查一下 IPsec Tunnel 狀態
畢竟 BGP Peering 是基於 IPsec Tunnel 而成,如果沒有先把 IPsec 先搞好的話,後面不用做,所以這邊要進行檢查確認
從 FortiGate 角度看
看到 selectors(total,up): 1/1
就是正常的
Get VPN IPsec Tunnel Summary |
---|
| homecloud-fgt # get vpn ipsec tunnel summary
'azure-tunnel-2' <Azure VPN Public IP IN_1>:0 selectors(total,up): 1/1 rx(pkt,err): 98031/0 tx(pkt,err): 157394/0
'azure-tunnel-1' <Azure VPN Public IP IN_0>:0 selectors(total,up): 1/1 rx(pkt,err): 111535/0 tx(pkt,err): 223401/1
|
從 Azure 的角度看
看到 Local Network Gateway 中 Status 那一欄呈現 Connected
才是正確的
5. 使出普通 BGP 攻擊魔法
我還是需要申明一下,下面設定是我個人家裡的設定,可能不適用於你的環境,可自行調整或提供建議
Config BGP |
---|
| homecloud-fgt # show router bgp
config router bgp
set as 65510
set router-id 1.2.3.4
set keepalive-timer 5 # (1)
set holdtime-timer 10 # (2)
set ebgp-multipath enable # (3)
set multipath-recursive-distance enable # (4)
set graceful-restart enable
set fast-external-failover enable # (9)
config neighbor
edit "10.0.0.5" # (5)
set advertisement-interval 1 # (6)
set ebgp-enforce-multihop enable # (7)
set link-down-failover enable # (8)
set soft-reconfiguration enable
set capability-graceful-restart enable
set interface "azure-tunnel-2" # (10)
set remote-as 65515 # (11)
set update-source "loopback" # (12)
next
edit "10.0.0.4"
set advertisement-interval 1
set ebgp-enforce-multihop enable
set link-down-failover enable
set soft-reconfiguration enable
set capability-graceful-restart enable
set interface "azure-tunnel-1"
set remote-as 65515
set update-source "loopback"
next
end
config redistribute "connected"
set status enable
end
config redistribute "rip"
end
config redistribute "ospf"
end
config redistribute "static"
set status enable # (13)
end
config redistribute "isis"
end
config redistribute6 "connected"
end
config redistribute6 "rip"
end
config redistribute6 "ospf"
end
|
- 目前有強者我客戶實測調到這數字 Failover 時間很好,我之前是設定 10
- 目前有強者我客戶實測調到這數字 Failover 時間很好,我之前是設定 30
- 因為我是用 Loopback IP 而不是 Interface IP,如果不點的話,預設 eBGP TTL 是 1,跟現行架構至少跳 2 跳會導致連不上,得開,詳細請洽 Forti 文件
- 因為我採用 BGP ECMP 路由,有特別開這個,詳細請洽 Forti 文件
- 這個就是 Azure VPN Gateway 提供的其中一個 BGP Peer IP
- MRAI 我是用預設值, 沒特別改, 但你想要設定 0 也是可以
- 這個會影響 ebgp-multihop-ttl 從預設值 1 變成 255
- 一旦 VPN Tunnel 不小心跳起來的話,Forti 會自動關閉 BGP Neighbor 並且刪除相關路由,避免走錯路
- 這個不知道為什麼不會出現在設定檔內,但看起來是有啟動,我自己補寫上去
- 這個就是走 azure-tunnel-2 路線去 BGP Peering
- Azure VPN Gateway 所設定的 BGP ASN
- 要特別指定 loopback 作為 BGP Peer 的 Update Source
- BGP Peering 路由我是用 Static Route 放的,所以這個要勾,你如果是別的方式 redistribute 路由,就自行調整
6. 上 Static Route
這邊是為了要做 BGP Peering 之用,我需要放路由指向特定的 Interface,如果你有其他熟悉的方式可以達到同樣的效果,也可以用其他方式,這邊只是提供參考
Check Static Route |
---|
| homecloud-fgt # show router static
config router static
edit 1
set dst 10.0.0.4 255.255.255.255
set device "azure-tunnel-1"
next
edit 2
set dst 10.0.0.5 255.255.255.255
set device "azure-tunnel-2"
next
edit 3
set dst 10.0.0.4 255.255.255.255
set device "azure-tunnel-2"
next
edit 4
set dst 10.0.0.5 255.255.255.255
set device "azure-tunnel-1"
next
end
|
7. 非對稱路由 (Asymmetric Routing)
基於 Technical Tip: Differences between asymmetric routing and auxiliary sessions,我是有把設定寫上去,因為我地端沒有 IPS 服務,所以我開起來是沒啥關係的,但如果你有的話,就要自行斟酌是否要用,因為你應該沒辦法做成 ECMP 架構,你只能同時走一條線而已 (去回一致)
但我這個好像沒有很好的測試手段,因為都會通,有人如果知道怎麼測試的話可以跟我分享一下
Check BGP Info |
---|
| homecloud-fgt # show system settings
config system settings
set auxiliary-session enable # (2)
set asymroute enable # (1)
set sip-expectation enable
set allow-subnet-overlap enable
set ecmp-max-paths 10 # (3)
set gui-dns-database enable
set gui-explicit-proxy enable
set gui-dynamic-routing enable
set gui-sslvpn-personal-bookmarks enable
set gui-sslvpn-realms enable
set gui-policy-based-ipsec enable
set gui-wireless-controller disable
set gui-switch-controller disable
set gui-fortiextender-controller disable
set gui-advanced-policy enable
end
|
- 預設狀況是會關閉,如果不開的話,因為遇到非對稱路由問題,走另一條路的時候就會被 Firewall 丟掉
- ECMP 啟用的狀況下,把這個開起來,同一個 TCP Session 可以從不同 Interface 進出
- 預設 255, 但我改 10
8. 檢查 BGP Establish 狀態
從 FortiGate 角度看
Check BGP Info |
---|
| homecloud-fgt # get router info bgp summary
VRF 0 BGP router identifier 1.2.3.4, local AS number 65510
BGP table version is 132
2 BGP AS-PATH entries
0 BGP community entries
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.0.4 4 65515 58875 58884 131 0 0 2d19h11m 5
10.0.0.5 4 65515 52628 52682 130 0 0 4d11h58m 5
Total number of neighbors 2
homecloud-fgt # get router info bgp neighbors 10.0.0.4
VRF 0 neighbor table:
BGP neighbor is 10.0.0.4, remote AS 65515, local AS 65510, external link
BGP version 4, remote router ID 10.0.0.4
BGP state Established, up for 2d19h21m # (1)
Last read 00:00:08, hold time is 30, keepalive interval is 10 seconds
Configured hold time is 10, keepalive interval is 5 seconds
Neighbor capabilities:
Route refresh: advertised and received (new)
Address family IPv4 Unicast: advertised and received
Address family IPv6 Unicast: advertised and received
Received 58944 messages, 0 notifications, 0 in queue
Sent 58945 messages, 10 notifications, 0 in queue
Route refresh request: received 0, sent 0
Minimum time between advertisement runs is 1 seconds
Update source is loopback
homecloud-fgt # get router info bgp neighbors 10.0.0.5
VRF 0 neighbor table:
BGP neighbor is 10.0.0.5, remote AS 65515, local AS 65510, external link
BGP version 4, remote router ID 10.0.0.5
BGP state = Established, up for 4d12h09m # (2)
Last read 00:00:01, hold time is 30, keepalive interval is 10 seconds
Configured hold time is 10, keepalive interval is 5 seconds
Neighbor capabilities:
Route refresh: advertised and received (new)
Address family IPv4 Unicast: advertised and received
Address family IPv6 Unicast: advertised and received
Received 52707 messages, 0 notifications, 0 in queue
Sent 52747 messages, 13 notifications, 0 in queue
Route refresh request: received 0, sent 0
Minimum time between advertisement runs is 1 seconds
Update source is loopback
|
- BGP 狀態需為 Established 才是正確的
- BGP 狀態需為 Established 才是正確的
從 Azure 的角度看
9. 檢查路由表
從 FortiGate 角度看
Check RIB from FortiGate |
---|
| homecloud-fgt # get router info routing-table database
Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP
O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
> - selected route, * - FIB route, p - stale info
Routing table for VRF=0
S *> 0.0.0.0/0 [5/0] via 5.6.7.254, ppp1 # (1)
B *> 10.56.0.0/24 [20/0] via 10.0.0.5 (recursive is directly connected, azure-tunnel-2), 2d19h56m # (2)
*> [20/0] via 10.0.0.4 (recursive is directly connected, azure-tunnel-1), 2d19h56m
B *> 10.0.0.0/24 [20/0] via 10.0.0.5 (recursive is directly connected, azure-tunnel-2), 2d19h56m # (3)
*> [20/0] via 10.0.0.4 (recursive is directly connected, azure-tunnel-1), 2d19h56m
S *> 10.0.0.4/32 [10/0] is directly connected, azure-tunnel-1 # (4)
*> [10/0] is directly connected, azure-tunnel-2
S *> 10.0.0.5/32 [10/0] is directly connected, azure-tunnel-2
*> [10/0] is directly connected, azure-tunnel-1
C *> 5.6.7.8/32 is directly connected, ppp1 # (5)
C *> 5.6.7.254/32 is directly connected, ppp1 # (6)
C *> 192.168.200.0/24 is directly connected, vlan200
C *> 1.2.3.4/32 is directly connected, loopback # (7)
|
- 這個是我家 PPPoE 出去的路由,不用理會
- 這個是 Spoke 網段的路由,可以從 azure-tunnel-2 或 azure-tunnel-1 走
- 這個是 Hub 網段的路由,可以從 azure-tunnel-2 或 azure-tunnel-1 走
- 下面四條都是做 IPsec Tunnel 等級的 Redundancy 路由,當然如果 WAN Port 炸了,這四條就沒用了
- WAN IP
- WAN Gateway IP
- Loopback IP
Show BGP info for network |
---|
| homecloud-fgt # get router info bgp network
VRF 0 BGP table version is 132, local router ID is 1.2.3.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight RouteTag Path
*> 0.0.0.0/0 5.6.7.254 32768 0 ? <-/1>
* 10.0.0.0/24 10.0.0.4 0 0 0 65515 i <-/->
*> 10.0.0.5 0 0 0 65515 i <-/1>
* 10.56.0.0/24 10.0.0.4 0 0 0 65515 i <-/->
*> 10.0.0.5 0 0 0 65515 i <-/1>
*> 10.0.0.4/32 0.0.0.0 32768 0 ? <-/1>
*> 10.0.0.5/32 0.0.0.0 32768 0 ? <-/1>
*> 5.6.7.8/32 0.0.0.0 32768 0 ? <-/1>
*> 5.6.7.254/32 0.0.0.0 32768 0 ? <-/1>
*> 192.168.200.0 0.0.0.0 32768 0 ? <-/1>
*> 1.2.3.4/32 0.0.0.0 32768 0 ? <-/1>
Total number of prefixes 15
|
Check BGP info for route |
---|
| homecloud-fgt # get router info bgp network-longer-prefixes 10.56.0.0/24
VRF 0 BGP table version is 132, local router ID is 1.2.3.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight RouteTag Path
* 10.56.0.0/24 10.0.0.4 0 0 0 65515 i <-/->
*> 10.0.0.5 0 0 0 65515 i <-/1>
Total number of prefixes 1
|
從 Azure 的角度看
良心建議...你直接下載 Learned routed 用 Excel 看比較快
Q&A
Q1: 斷線了, 連線檢查步驟是如何?
A1:
- 檢查雲端 IPsec Tunnel 狀態,因為點 Azure Portal 很容易一翻兩瞪眼,只要不是 Connected 就是有問題
- 檢查地端 IPsec Tunnel 狀態,應為 Up
- 檢查是不是被 Firewall 擋下來了
- 檢查 BGP 狀態是否為 Established
- 檢查路由表,看那條路由到底飛去哪裡
因為大概 9 成問題都是被擋在 Firewall 漏寫規則...
Q2: 如果我不要讓這兩條路同時通,我只想做 Active Standby 的話呢?
A2: 請洽 魂系架構 Phil's Workspace Azure VPN Gateway 實踐 BGP + 線路備援: 以 FortiGate 為例
Q3: 如果我在 Azure 上開 Azure Firewall,且我需要地端上雲都要經過 Azure Firewall 的話,我還需要動地端路由嗎?
A3: 地端不需要做任何變動,若要動路由的部分,是要動雲端的
Q4: Azure VPN Gateway 開啟 BGP 的狀態下,支援 BFD (Bidirectional Forwarding Detection) 嗎?
A4: 不支援,開了沒作用,替代方案是使用 keepalives
但 ExpressRoute Gateway 有支援,所以還是可以在 Loopback 上開 BFD 起來
Q5: 我地端有 IPS 之類的服務,做 ECMP 不會有問題嗎?
A7: 有可能會有去回不同路的問題,如果防火牆不願意放行的話,你只能看對稱路由的架構了,也就是去回同路,請參考 魂系架構 Phil's Workspace Azure VPN Gateway 實踐 BGP + 線路備援: 以 FortiGate 為例
文後廢言
不知道為什麼我來 Microsoft 上班,都比較熟別人家產品,畢竟上班五天,三天全部都來問我 Forti 技術問題,實在是...
看看下次哪家要來工商一下地端設備讓我接
References