Zone-Based Policy Firewall (ZFW) – basic configuration example

Zone-Based Policy Firewall (ZFW) is a new feature that has replaced the CBAC (Context-Based Access Control) – legacy firewall IOS based feature. The drawback of CBAC was just stateful inspection policy on an interface-based model due of this all traffic passing through the interface was subject to the same inspection policy.
Zone-Based Policy Firewall has changed the IOS Stateful Inspection architecture from interface-based to a more flexible zone-based configuration architecture.
In ZFW router interfaces are assigned to security zones, firewall inspection policy is applied to traffic moving between the zones. By default router cannot pass traffic to interfaces in other security zones until an explicit policy allowing traffic is defined. The firewall rule has to defined what traffic is allowed to pass between interfaces in other security zones.
Firewall policies are configured using Class-Based Policy Language (CPL), which employs a hierarchical structure to define inspection for network protocols and the groups of hosts’ traffic to which inspection will be applied. Inter-zone policies offer considerable flexibility and granularity, so different inspection policies can be applied to hosts, host groups, or subnets connected to the same router interface.

The following tasks are required to complete the ZFW configuration using the CPL:

  1. Creating class-map(s) that identify the traffic that must have policy applied as it traverses a zone-pair
  2. Define a policy-map to apply action to the traffic in a class-map
  3. Defining zones
  4. Defining zone-pairs
  5. Appling a policy-map to a zone-pair
  6. Assigning interface to zones

Now I’m going to present you short examples of ZFW.

We have 3 routers for test, connected on the row.

We have pure configuration, just OSPF is running between each other. Ping and telnet from R1 to R3 is working fine.

R1#ping 10.0.23.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.23.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/57/112 ms
R1#telnet 10.0.23.3
Trying 10.0.23.3 ... Open
User Access VerificationPassword:

We will configure R2 as ZB firewall router between inside network where R1 is reside and outside network where we have R3. FW will just inspect icmp traffic from inside to outside, thanks to statefull inspection traffic will be allowed back the same like in CBAC.
First, we have to create inspect class-map to match ICMP traffic.
R2(config)#class-map type inspect match-all ICMP
R2(config-cmap)# match protocol icmp

Next, create inspect policy-map and assign ICMP class-map.
R2(config-cmap)#policy-map type inspect POLICY-INSIDE>OUTSIDE
R2(config-pmap)# class type inspect ICMP
R2(config-pmap-c)# inspect

Now, we have to create zones and zone pairs, so source and destination of traffic.
R2(config-pmap-c)#zone security INSIDE
R2(config-sec-zone)#zone security OUTSIDE
R2(config-sec-zone)#zone-pair security ZONE-PAIR-INSIDE>OUTSIDE source INSIDE destination OUTSIDE
R2(config-sec-zone-pair)#service-policy type inspect POLICY-INSIDE>OUTSIDE

Last step is to assign zones to interfaces.
R2(config)#int fa0/0
R2(config-if)#zone-member security INSIDE
R2(config-if)#int fa0/1
R2(config-if)#zone-member security OUTSIDE

OK, now let’s make a test again. First ping.

R1#ping 10.0.23.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.23.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/108/188 ms

Looks fine, so what about telnet.
R1#telnet 10.0.23.3
Trying 10.0.23.3 ...
% Connection timed out; remote host not responding

Good, no response as we have expected as no telnet or tcp inspection defined. Let’s do show policy-map to see inspection statistics.

R2#show policy-map type inspect zone-pair ZONE-PAIR-INSIDE>OUTSIDE
Zone-pair: ZONE-PAIR-INSIDE>OUTSIDE
Service-policy inspect : POLICY-INSIDE>OUTSIDE
Class-map: ICMP (match-all)
Match: protocol icmp
Inspect
Packet inspection statistics [process switch:fast switch]
icmp packets: [0:10]
Session creations since subsystem startup or last reset 1
Current session counts (estab/half-open/terminating) [0:0:0]
Maxever session counts (estab/half-open/terminating) [1:1:0]
Last session created 00:01:32
Last statistic reset never
Last session creation rate 0
Maxever session creation rate 1
Last half-open session total 0
Class-map: class-default (match-any)
Match: any
Drop (default action)
2 packets, 48 bytes

OK let’s add next class-map with telnet.

R2(config)#class-map type inspect match-all TELNET
R2(config-cmap)# match protocol telnet
R2(config-cmap)#policy-map type inspect POLICY-INSIDE>OUTSIDE
R2(config-pmap)# class type inspect TELNET
R2(config-pmap-c)# inspect

Quick test.

R1#telnet 10.0.23.3
Trying 10.0.23.3 ... Open
User Access Verification
Password:
R3#

We are in :), so see statictis and session details.

R2#show policy-map type inspect zone-pair ZONE-PAIR-INSIDE>OUTSIDE
Zone-pair: ZONE-PAIR-INSIDE>OUTSIDE
Service-policy inspect : POLICY-INSIDE>OUTSIDE
Class-map: ICMP (match-all)
Match: protocol icmp
Inspect
Packet inspection statistics [process switch:fast switch]
icmp packets: [0:20]
Session creations since subsystem startup or last reset 2
Current session counts (estab/half-open/terminating) [0:0:0]
Maxever session counts (estab/half-open/terminating) [1:1:0]
Last session created 00:02:10
Last statistic reset never
Last session creation rate 0
Maxever session creation rate 1
Last half-open session total 0
Class-map: TELNET (match-all)
Match: protocol telnet
Inspect
Packet inspection statistics [process switch:fast switch]
tcp packets: [0:24]
Session creations since subsystem startup or last reset 1
Current session counts (estab/half-open/terminating) [1:0:0]
Maxever session counts (estab/half-open/terminating) [1:1:0]
Last session created 00:00:08
Last statistic reset never
Last session creation rate 1
Maxever session creation rate 1
Last half-open session total 0
Class-map: class-default (match-any)
Match: any
Drop (default action)
2 packets, 48 bytes

R2#show policy-map type inspect zone-pair ZONE-PAIR-INSIDE>OUTSIDE sessions
Zone-pair: ZONE-PAIR-INSIDE>OUTSIDE
Service-policy inspect : POLICY-INSIDE>OUTSIDE
Class-map: ICMP (match-all)
Match: protocol icmp
Inspect
Class-map: TELNET (match-all)
Match: protocol telnet
Inspect
Established Sessions
Session 666D2AEC (10.0.12.1:31763)=>(10.0.23.3:23) telnet SIS_OPEN
Created 00:02:03, Last heard 00:01:59
Bytes sent (initiator:responder) [31:71]
Class-map: class-default (match-any)
Match: any
Drop (default action)
2 packets, 48 bytes

At this stage all ICMP traffic from the inside is going thru.

R1#ping 10.0.23.3 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.23.3, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 124/152/196 ms

Now let’s be more specific. We let just ICMP from 10.0.12.0/24

R2(config)#ip access-list standard INSIDE-SUBNET
R2(config-std-nacl)# permit 10.0.12.0
R2(config-std-nacl)#class-map type inspect match-all ICMP
R2(config-cmap)#match access-group name INSIDE-SUBNET

What about now?

R1#ping 10.0.23.3 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.23.3, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
.....
Success rate is 0 percent (0/5)

OK is working. Now all traffic from outside to inside is blocked. Let’s add some rules like to allow telnet to 10.0.12.1 from 10.0.23.3 with inspection. We have to create new class, policy and zone-pair.

First test.
R3#telnet 10.0.12.1
Trying 10.0.12.1 ...
% Connection timed out; remote host not responding

Now configuration.

R2(config)#ip access-list extended OUTSIDE-TELNET
R2(config-ext-nacl)#permit ip host 10.0.23.3 host 10.0.12.1
R2(config-ext-nacl)#exit
R2(config)#class-map type inspect OUTSIDE-TELNET
R2(config-cmap)#match access-group name OUTSIDE-TELNET
R2(config-cmap)#exit
R2(config)#policy-map type inspect POLICY-OUTSIDE>INSIDE
R2(config-pmap)#class type inspect OUTSIDE-TELNET
R2(config-pmap-c)#zone-pair security ZONE-PAIR-OUTSIDE>INSIDE source OUTSIDE destination INSIDE
R2(config-sec-zone-pair)#service-policy type inspect POLICY-OUTSIDE>INSIDE

What about now, second try.

R3#telnet 10.0.12.1
Trying 10.0.12.1 ... Open
User Access Verification
Password:
R1#

Cool, working.

R2#show policy-map type inspect zone-pair ZONE-PAIR-OUTSIDE>INSIDE sessions
Zone-pair: ZONE-PAIR-OUTSIDE>INSIDE
Service-policy inspect : POLICY-OUTSIDE>INSIDE
Class-map: OUTSIDE-TELNET (match-all)
Match: protocol telnet
Match: access-group name OUTSIDE-TELNET
Inspect
Established Sessions
Session 666D2AEC (10.0.23.3:38211)=>(10.0.12.1:23) telnet SIS_OPEN
Created 00:00:04, Last heard 00:00:02
Bytes sent (initiator:responder) [31:71]
Class-map: class-default (match-any)
Match: any
Drop (default action)
0 packets, 0 bytes

It was just basic ZFW configuration, there is some more advanced features besides similar to CBAC like sessions limit, max-incomplete, tcp syn or idle time, alert and audit trail we have other like limiting aggregated packet rate for the flows between security zones that I will try to show you in next post. Enjoy!

Forwarding broadcast packets by Cisco router

Following post will present you how Cisco router handles broadcast IP packets.

We have two types of IP broadcast address:

  • All subnets broadcast IP (255.255.255.255)
  • Directed broadcast – specific subnet broadcast IP (e.g. 10.0.12.255 for 10.0.12.0/24 subnet)

It’s worth to add that all subnets broadcast IP type is not directed broadcast, directed means broadcast sent to all hosts in specific subnets (directed to specific group of hosts).

By default Cisco router does not forward IP packets addressed to any type of broadcast address – router simple drops them or in case it’s ICMP echo to router’s directly connected broadcast subnet respond via echo reply to requestor.

Directed broadcast example

Let’s take a look on the first example. I have generated ping message from R1 to 10.0.23.255. Because R2 is directly connected to the 10.0.23.0/24 subnet will respond to echo via echo reply but will not forward the ICMP packet over Fa0/1 link towards R3 so R3 will never get it.

Here you are debug IP packet from R1 after ping:

R1#ping 10.0.23.255 repeat 1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 10.0.23.255, timeout is 2 seconds:
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 60/60/60 ms
R1#
*Mar 1 00:24:54.467: IP: tableid=0, s=10.0.12.1 (local), d=10.0.23.255 (FastEthernet0/0), routed via FIB
*Mar 1 00:24:54.471: IP: s=10.0.12.1 (local), d=10.0.23.255 (FastEthernet0/0), len 100, sending
*Mar 1 00:24:54.475: ICMP type=8, code=0
*Mar 1 00:24:54.515: IP: tableid=0, s=10.0.12.2 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), routed via RIB
*Mar 1 00:24:54.519: IP: s=10.0.12.2 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), len 100, rcvd 3
*Mar 1 00:24:54.523: ICMP type=0, code=0

 As you can see R1 gets just R2’s respond.

Let’s add no ip directed-broadcast under Fa0/1 on R2 and see how th debug looks like now on R1:

R2(config-if)#int fa0/1
R2(config-if)#no ip directed-broadcast

R1#ping 10.0.23.255 repeat 1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 10.0.23.255, timeout is 2 seconds:
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 36/36/36 ms
R1#
*Mar 1 00:03:56.839: IP: tableid=0, s=10.0.12.1 (local), d=10.0.23.255 (FastEthernet0/0), routed via FIB
*Mar 1 00:03:56.843: IP: s=10.0.12.1 (local), d=10.0.23.255 (FastEthernet0/0), len 100, sending
*Mar 1 00:03:56.847: ICMP type=8, code=0
*Mar 1 00:03:56.863: IP: tableid=0, s=10.0.12.2 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), routed via RIB
*Mar 1 00:03:56.867: IP: s=10.0.12.2 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), len 100, rcvd 3
*Mar 1 00:03:56.871: ICMP type=0, code=0
*Mar 1 00:03:56.931: IP: tableid=0, s=10.0.23.3 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), routed via RIB
R1#
*Mar 1 00:03:56.935: IP: s=10.0.23.3 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), len 100, rcvd 3
*Mar 1 00:03:56.939: ICMP type=0, code=0

As you see R1 now gets response from R2 and R3.

Take a look how it looks like on R2 and R3:

R2#*Mar  1 00:10:16.995: IP: tableid=0, s=10.0.12.1 (FastEthernet0/0), d=10.0.23.255 (FastEthernet0/1), routed via RIB
*Mar  1 00:10:16.999: IP: s=10.0.12.1 (FastEthernet0/0), d=10.0.23.255 (FastEthernet0/1), g=255.255.255.255, len 100, forward directed broadcast
*Mar  1 00:10:17.007:     ICMP type=8, code=0

R3#*Mar  1 00:07:20.491: IP: s=10.0.12.1 (FastEthernet0/1), d=255.255.255.255, len 100, rcvd 2
*Mar  1 00:07:20.495:     ICMP type=8, code=0
*Mar  1 00:07:20.499: IP: tableid=0, s=10.0.23.3 (local), d=10.0.12.1 (FastEthernet0/1), routed via FIB
*Mar  1 00:07:20.499: IP: s=10.0.23.3 (local), d=10.0.12.1 (FastEthernet0/1), len 100, sending
*Mar  1 00:07:20.503:     ICMP type=0, code=0

As you can discovered ip directed-broadcast changes the destination directed broadcast address (10.1.23.255) to all subnet broadcast 255.255.255.255.

What in case we would still send directed broadcast to subnet IP? We can use broadcast-address command for this propose.

R2#show run int fa0/1
interface FastEthernet0/1
 ip address 10.0.23.2 255.255.255.0
 ip broadcast-address 10.0.23.255
 ip directed-broadcast

Now R3 gets ICMP packet directed to subnet broadcast 10.0.23.255.

R3#*Mar  1 00:41:35.391: IP: s=10.0.12.1 (FastEthernet0/1), d=10.0.23.255 (FastEthernet0/1), len 100, rcvd 3
*Mar  1 00:41:35.395:     ICMP type=8, code=0

Here you are diagram that shows above tests.

 

 

All subnets broadcast example

In the following example I will show you how router handles typical broadcast packets. The best example is the DHCP address allocation process (more about it you can read here). The first message called as DHCP Discovery is sent to 255.255.255.255 broadcast address. By default router will ignore this packet and drop it. To properly handle it and send as unicast IP toward final destination we have to use ip helper-address command under fa0/0 interface on R2, exactly under interface that receives broadcast packets.

Please check following scheme and take a look on the mentioned post. Enjoy 😉

  

 

QoS Values Calculator v2 (CoS, ToS, ToS HEX, DSCP, AF, IPP, CS, DP, ECN)

Here you are our most popular NetContractor’s post about QoS fields mystery.

QoS Classification is done mainly based on two fields: in Ethernet it’s CoS field and in IP header it’s ToS. Naming convention for specific fields in IP header has developed over years from the CS and IPP to DSCP. The main reason for that was not enough naming class to classify traffic. BTW, today once we classify traffic and would send it over provider’s MPLS cloud we have to properly map our classes to provider classes to take advantage from the QoS features that have been purchased. What is interesting that MPLS frame uses  3-bits long EXP field that can only address up tp 8 classes of traffic so marking more classes (from the client perspective) have no sense when we would push it over MPLS.

But let’s back to the naming. Due to demand for more classes naming has changed. At the begining just first 3 bits of 8-bits ToS was used to name and mark traffic, it would be enough even until now. Then QoS fetures and class naming has changed due to fast grow of VoIP. QoS has been popular and key significant to achieve better voice quality. Engineers tried to involve more bits to mark more classes. Finally we have still 8-bits long ToS field with few class names depends on what part of the field we take.  For someone that is just starting with QoS it maybe confusing so I thought to share with you the QoS Values Calculator that I have created and used during my CCIE study.

I’ve added ToS in HEX to the QoS Values Calculator v2 . These values are useful when you would like to generate IP traffic with specific ToS/DSCP value by ping command from the IOS CLI. Ping with ToS is very helpful during QoS configuration test. You can easily generate test ICMP traffic with specific value in ToS field  and see if it matches rigth QoS class.

Be aware that during extended ping from IOS CLI, TOS HEX value has to be defined in the 0xHH format where HH is HEX value.

To be honest this is the most popular post of this blog.

Please take a look, any feedbacks are more then welcome. Enjoy.

QoS Values Calculator v3 in PDF format here.

EIGRP – fast notes

Here you are my fast notes regarding EIGRP.

  • IP Protocol: 88, Uses Multicast IP: 224.0.0.10
  • Protocol Dependent Modules (IP, IPX, Appletalk)

Determining Loop Free Path

  • Feasibility Condition (AD<FD) must be meet
  • Split-Horizon – never advertise a route out of the interface through which you learned it

Reliable Transport Protocol (RTP)

  • Packets (reliable delivery and packets will be delivered in order – waits for ACK)

Guaranteed delivery > reliable multicast and confirmation reply as unicast ACK

Ordered delivery > 2 sequence number in EIGRP packet (incremented seq each pack. and last received seq)

  • HELLO – multicast, unreliable
  • ACK – (it Hello packet with no data in them), unicast, unreliable
  • UPDATE – include route info, multicast/unicast, reliable
  • QUERY – manage DUAL computation, multicast or unicast, reliable
  • REPLAY – manage DUAL computation, unicast, reliable
  • If packet is reliable/multicast and ACK is no received from the neighbor
    • Then packet is retransmitted as a unicast to unresponding neighbor
    • If ACK is not received after 16 unicast retransmission > neighbor is dead
    • Timers – calculated based on the Smooth Round Trip Time (SRTT)>average time between transmission of packet to the neighbor and the receipt of an ACK
    • Multicast Flow Timer – time to wait for ACK before switching from multicast to unicast
    • Retransmission Timeout (RTO) – time between subsequent unicast packets

Neighbor Discovery/Recovery

  • Hello – 5/60 seconds – ip hello-interval eigrp
  • Hold-Time – 15/180 – ip hold-time eigrp

DUAL – Diffusing Update Algorithm

  • Feasible Distance – lowest calculated metric to the destination
  • Successor – router (next-hop) with the lowest (best) metric to the destination
  • Feasible Successor – one of the backup of Successor that meets Feasibility Condition

Before DUAL compute the metric following have to take place

  1. Establish adjacency between neighbors
  2. Updates exchange
  3. DUAL calculates metric based on the received Advertised Distance from the neighbor + cost to the neighbor
  4. Lowest calculated metric is Feasible Distance (FD), router that advertised this metric is Successor
  5. Successor’s router with the best metric is RIB
  6. If the Feasibility Condition is meet when AD<FD (FD of current successor) [Loop Free condition]
  7. If neighbor’s AD to the destination meets the FC neighbors become a Feasible Successor. FS can be elected as Successor when current Successor goes down and if has the lowest metric to the destination with comparison with other Feasible Successors

DUAL Finite State Machine

  1. If FS can’t be found in the Topology Table, then router begins a Diffusing computation and route is Active
  2. Router sends Query to all of it’s neighbors
  3. If Neighbor has one or more Feasible Successors for the destination it will send reply to the questioning router
  4. If Router doesn’t receive reply to query in Active time, route is declared Stuck-In-Active (SIA)
  5. Neighbors that didn’t reply will be removed from the neighbor table

 

METRIC (BW, Delay, Load, Reliability)

M = (10 000 000 /minBW + DLY/10 [in 10 micro sec])*256

traffic-share balanced

  • Bandwidth— A value represented as the smallest bandwidth between the source and destination
  • Delay— The collective delay of interfaces along the path
  • Reliability— The lowest (worst) reliability along the network path
  • Load— Represented by the worst load on a connection between the source and destination, in bps
  • MTU— The smallest maximum transmission unit value in the path
  • K1 = bandwidth
  • K2 = load
  • K3 = delay
  • K4 = reliability
  • K5 = MTU

Default EIGRP metric weight K1=1, K2=0, K3=1, K4=0, K5=0

Stuck-In-Active (SIA)

  • timers active-time 3
  • timers active-time disabled
  • Stub and Summarization allows to reduce sending of Query to neighbors
  • show ip eigrp topology all-links – shows routes that are not Feasible Successors

Default routing orignation in OSPF, EIGRP, RIP and BGP

Default routing is very important feature and can be find in each network as last resort mechanism to route packets out of organization to unknow destination. Default origination has few configuration dependency on routing protocol and these will be presented in this post.

OSPF

Let’s start from the most popular IGP protocol. In OSPF default prefix (0/0) can be propagated in two different ways:

  • Explicitly with default-information originate
  • Stub Area Border Router (ABR)

To originated 0/0 explicitly we have to issue following command under OSPF process:

R1(config-router)#default-information originate

Once above command has been issued OSPF router will act as Autonomous System Boundary Router (ASBR). Default prefix will not appear in ASBR’s LS database and will not be originated to peers until 0/0 prefix exist in routing table.

To get default network in the routing table we have two options:

    Redistribute 0/0 from the another routing protocol (RIP, EIGRP, BGP)
    Add static route for 0/0

Default-information originate command has optional keyword – “always” which means originate 0/0 even if no default prefix in routing table exist.

By default network will be propagated as E2 type with metric 1, of course it can be adjusted using metric or metric-type command option.

The second way to originate default is to configure stub area, then ABR will generate 0/0. Please look at OSPF Area Types and LSA Propagation post for details here. Keep in mind that ABR router does not originated 0/0 to standard Not-So-Stubby (NSSA) area, default-information originate or no-summary keyword is needed then.

EIGRP

With EIGRP protocol we have 4 options to generate default route, via:

    network 0.0.0.0
    redistribution
    summarization
    ip default-network

First option is similar to OSPF. Default route needs to exist in routing table and then will be propagated once network 0.0.0.0 command is added under EIGRP process.

R1(config)#router eigrp 1
R1(config-router)# network 0.0.0.0
R1(config-router)#ip route 0.0.0.0 0.0.0.0 null 0

 

R2#sh ip route
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
1.0.0.0/32 is subnetted, 1 subnets
D 1.1.1.1 [90/409600] via 10.0.12.1, 00:06:32, FastEthernet0/0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
D* 0.0.0.0/0 [90/281600] via 10.0.12.1, 00:05:51, FastEthernet0/0

R2 sees default route as EIGRP internal (AD=90) route with star. Star means default – last resort route will be used if no specific route exist to the specific destination.

Second option is to use redistribute command and take default based on static route or from another routing protocol.

R2#show ip route
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
D*EX 0.0.0.0/0 [170/281600] via 10.0.12.1, 00:00:07, FastEthernet0/0

In this case peers will see default as EIGRP external (AD=170) route with star.

Third option of default route generation is based on the summarization. In EIGRP routes’ summarization is done per interface. It’s very handy option and can be find just in EIGRP.

R1(config)#int fa0/0
R1(config-if)#ip summary-address eigrp 1 0.0.0.0 0.0.0.0

Peers will see default route as EIGRP internal (AD=90).

R2#sh ip route
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
D* 0.0.0.0/0 [90/307200] via 10.0.12.1, 00:00:15, FastEthernet0/0

The last option is using ip default-network command in global configuration mode; additionally prefix needs to be added under EIGRP process. Prefix needs to be classfull network. Of course local interface on router needs to exist and be in up state.

R1(config-if)#int lo1
R1(config-if)#ip add 1.0.0.1 255.0.0.0
R1(config-if)#router eigrp 1
R1(config-router)#network 1.0.0.0

R2#sh ip route
Gateway of last resort is not set
D* 1.0.0.0/8 [90/156160] via 10.0.12.1, 00:00:02, FastEthernet0/0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0

R2 sees 1.0.0.0 subnet as candidate default route and 10.0.12.1 peer will be used as default gateway.

RIP

With RIP protocol we have 4 options to generate default route, via:

  • network 0.0.0.0
  • default-information originate
  • redistribution
  • ip default-network

First option propagates default route without need to exist in routing table.

R1(config)#router rip
R1(config-router)#no auto
R1(config-router)#network 0.0.0.0

R2#sh ip route
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
R* 0.0.0.0/0 [120/1] via 10.0.12.1, 00:00:02, FastEthernet0/0

Second option is propagates default route the same like default-information originate always in OSPF – prefix does not need to exist in routing table.

R1(config)#router rip

R1(config-router)#version 2
R1(config-router)#no auto
R1(config-router)#network 10.0.0.0
R1(config-router)#default-information originate

R2#sh ip route>
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
R* 0.0.0.0/0 [120/1] via 10.0.12.1, 00:00:02, FastEthernet0/0

Third option is simply redistribution.

R1(config)#ip route 0.0.0.0 0.0.0.0 Null0
R1(config)#router rip
R1(config-router)# redistribute static metric 5

R2#sh ip route
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
R* 0.0.0.0/0 [120/5] via 10.0.12.1, 00:00:01, FastEthernet0/0

The last option is similar to ip default-network in EIGRP but interesting thing – does not need add classfull network under RIP configuration process.

R1(config)#int lo1
R1(config-if)# ip add 1.0.0.1 255.0.0.0
R1(config-if)# ip default-network 1.0.0.0

The output of show ip route command is also different – instead of classful network with star showing pure 0.0.0.0/0

R2#*Mar 10 23:39:27.912: RIP-DB: redist 0.0.0.0/0(metric 1, last interface FastEthernet0/0) to RIP
*Mar 10 23:39:27.912: RIP-DB: network_update with 0.0.0.0/0 succeeds
*Mar 10 23:39:27.912: RIP-DB: adding 0.0.0.0/0 (metric 1) via 10.0.12.1 on FastEthernet0/0 to RIP database
*Mar 10 23:39:27.912: RIP-DB: add 0.0.0.0/0 (metric 1) via 10.0.12.1 on FastEthernet0/0
*Mar 10 23:39:27.916: RIP-DB: Adding new rndb entry 0.0.0.0/0
*Mar 10 23:39:27.916: RIP-DB: Created rip ndb summary entry for 0.0.0.0/0
*Mar 10 23:39:27.916: RIP-DB: Adding new rndb entry 0.0.0.0/0
*Mar 10 23:39:31.113: RIP-DB: network_update with 0.0.0.0/0 succeeds
*Mar 10 23:39:31.113: RIP-DB: adding 0.0.0.0/0 (metric 1) via 10.0.12.1 on FastEthernet0/0 to RIP database

R2#sh ip route
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
R* 0.0.0.0/0 [120/1] via 10.0.12.1, 00:00:06, FastEthernet0/0

BGP

We have covered all IGP protocols. Let’s take a closer look at BGP.

With BGP protocol we have 3 options to generate default route, via:

    default-information originate
    network 0.0.0.0
    default-originate to specific neighbor

First option is similar to OSPF and EIGRP but with one difference. Besides 0/0 needs to exist in routing table additionally has to be redistributed to BGP routing from static or any other dynamic routing protocol. Just one important note – 0/0 prefix is not visible in BGP table until default-information originate command will be issued, strange but true. Let’s test it.

R1(config)#ip route 0.0.0.0 0.0.0.0 Null0
R1(config)#ip route 2.2.2.2 255.255.255.255 Null0
R1(config)#router bgp 1
R1(config-router)# redistribute static
R1(config-router)#exit
R1#sh ip route
Gateway of last resort is 0.0.0.0 to network 0.0.0.0
2.0.0.0/32 is subnetted, 1 subnets
S 2.2.2.2 is directly connected, Null0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
S* 0.0.0.0/0 is directly connected, Null0

R1#sh ip bgp
Network Next Hop Metric LocPrf Weight Path
*> 2.2.2.2/32 0.0.0.0 0 32768 ?

As you can see no 0/0 prefix in BGP table, let’s add key command.

R1(config)#router bgp 1

R1(config-router)#default-information originate

R1(config-router)#do sh ip bgp
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0 0.0.0.0 0 32768 ?
*> 2.2.2.2/32 0.0.0.0 0 32768 ?

Here we are! Confirmed that R2 is getting route.

R2#sh ip route
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
2.0.0.0/32 is subnetted, 1 subnets
B 2.2.2.2 [200/0] via 10.0.12.1, 00:00:19
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
B* 0.0.0.0/0 [200/0] via 10.0.12.1, 00:00:05

Second option, use of network 0.0.0.0 under BGP requires 0/0 prefix in routing table too – the same like with first one but network command assure existence default network in the BGP table and propagation to all neighbors, so no need to redistribute into BGP table.

R1(config)#router bgp 1
R1(config-router)#network 0.0.0.0
R1(config-router)#ip route 0.0.0.0 0.0.0.0 null 0

R2#sh ip bgp
Network Next Hop Metric LocPrf Weight Path
*>i0.0.0.0 10.0.12.1 0 100 0 i
R2#sh ip route
Gateway of last resort is 10.0.12.1 to network 0.0.0.0
10.0.0.0/24 is subnetted, 1 subnets
C 10.0.12.0 is directly connected, FastEthernet0/0
B* 0.0.0.0/0 [200/0] via 10.0.12.1, 00:01:08

Third option is usfull and allows to select to which neighbors to send 0/0 prefix without need of filtering. This option does not need to have 0/0 in routing table to originate default.

R1(config)#router bgp 1
R1(config-router)# neighbor 10.0.12.2 default-originate
R2#sh ip bgp
Network Next Hop Metric LocPrf Weight Path
*>i0.0.0.0 10.0.12.1 0 100 0 i

As you see there is some dependency in default route generation. It’s good to know it.

OSPF LSA filtering to routing table issue on ABR

Today I would like to show you strange OSPF ABR’s behaviour that denies the OSPF algorithm. Simple LSA filtering to routing table on ABR can break the LSA propagation to non-zero area.

First, let’s briefly recall the basic rules and operations of OSPF.

  • Adjacencies – routers exchange hello packets and establish neighbor adjacency.
  • Link-state advertisements – routers exchange LSAs that describe all of the router’s known links.
  • Link-state databes synchronization – by flooding LSAs throughout an area, all routers build identical link-state databases.
  • Building routing table based on SPF tree – once the databases are complete, routers run the SPF algorithm to calculate a loop-free topology describing the shortest path to every destination. Routing table is build based on the SPF tree.

After all link-state details have been flooded to all neighbors in an area and all have verified that their databases are identicalthat then the link-state databases have been synchronized and the route tables have been built.

Here I would to closer look at the ABR router, his function and operation.

Area Border Routers (ABRs) connect one or more areas to the backbone and works as a gateway for inter-area traffic. An ABR always has at least one interface that belongs to the backbone and maintain a separate link-state database for each connected areas.

Network Summary LSAs are originated by ABRs. They are sent into a single area to advertise destinations outside that area. ABR tells the internal routers of an attached area what destinations the ABR can reach. An ABR also advertises the destinations within its attached areas into the backbone with Network Summary LSAs

Once the ABR’s function and operation has been described let’s configure simple OSPF architecture with four routers which two are ABRs based on the below topology.

R1’s loopback has been added to Area 1. R2 and R3 are ABRs.
Let’s see how the R3 LS database looks like:

R3#sh ip ospf database
OSPF Router with ID (3.3.3.3) (Process ID 1)
Router Link States (Area 0)
Link ID ADV Router Age Seq# Checksum Link count
2.2.2.2 2.2.2.2 1033 0x80000006 0x00CFFD 1
3.3.3.3 3.3.3.3 1270 0x80000005 0x009332 1
Net Link States (Area 0)
Link ID ADV Router Age Seq# Checksum
10.0.23.2 2.2.2.2 1292 0x80000003 0x00A553
Summary Net Link States (Area 0)
Link ID ADV Router Age Seq# Checksum
1.1.1.1 2.2.2.2 1033 0x80000003 0x00899A
10.0.12.0 2.2.2.2 1033 0x80000007 0x009E70
10.0.34.0 3.3.3.3 1020 0x80000005 0x009165
Router Link States (Area 34)
Link ID ADV Router Age Seq# Checksum Link count
3.3.3.3 3.3.3.3 1020 0x80000004 0x00921D 1
4.4.4.4 4.4.4.4 937 0x80000004 0x005156 1
Net Link States (Area 34)
Link ID ADV Router Age Seq# Checksum
10.0.34.3 3.3.3.3 1021 0x80000003 0x005888
Summary Net Link States (Area 34)
Link ID ADV Router Age Seq# Checksum
1.1.1.1 3.3.3.3 1800 0x80000001 0x00D344
10.0.12.0 3.3.3.3 1021 0x80000003 0x00EC18
10.0.23.0 3.3.3.3 1271 0x80000003 0x000FF4


R4 gets 1.1.1.1/32 subnet as expected:

R4#sh ip ospf database summary 1.1.1.1
OSPF Router with ID (4.4.4.4) (Process ID 1)
Summary Net Link States (Area 34)
Routing Bit Set on this LSA
LS age: 1324
Options: (No TOS-capability, DC, Upward)
LS Type: Summary Links(Network)
Link State ID: 1.1.1.1 (summary Network Number)
Advertising Router: 3.3.3.3
LS Seq Number: 80000001
Checksum: 0xD344
Length: 28
Network Mask: /32
TOS: 0 Metric: 21
R4#sh ip route 1.1.1.1
Routing entry for 1.1.1.1/32
Known via "ospf 1", distance 110, metric 31, type inter area
Last update from 10.0.34.3 on FastEthernet0/0, 00:22:15 ago
Routing Descriptor Blocks:
* 10.0.34.3, from 3.3.3.3, 00:22:15 ago, via FastEthernet0/0
Route metric is 31, traffic share count is 1

OK let’s move to the LSA filtering to routing table on ABR router.

We have following options to filter out LSA from LSA database to routing table:

  • Distance with 255 administrative distance
  • Distribute list
  • Static route to null0 (route will appear in RT but effect will be the same like above – drop packets

First let’s take a look at R2 ABR and apply filtering option with distribute list to filter out the 1.1.1.1 subnet from the routing table.

R2(config)#access-list 2 deny 1.1.1.1
R2(config)#access-list 2 permit any
R2(config)#router ospf 1
R2(config-router)#distribute-list 2 in FastEthernet0/1

To confirm that 1.1.1.1 still appears as LSA3 in LSA DB on R2.

R2#show ip ospf database summary 1.1.1.1
OSPF Router with ID (2.2.2.2) (Process ID 1)
Summary Net Link States (Area 0)
LS age: 763
Options: (No TOS-capability, DC, Upward)
LS Type: Summary Links(Network)
Link State ID: 1.1.1.1 (summary Network Number)
Advertising Router: 2.2.2.2
LS Seq Number: 8000000C
Checksum: 0x77A3
Length: 28
Network Mask: /32
TOS: 0 Metric: 11

To confirm that 1.1.1.1 has been withdrawn from the routing table on the R2.

R2#sh ip route 1.1.1.1
% Network not in table

To confirm that 1.1.1.1 still appears as LSA3 in LSA DB on R3:

R3#show ip ospf database summary 1.1.1.1
OSPF Router with ID (3.3.3.3) (Process ID 1)
Summary Net Link States (Area 0)
Routing Bit Set on this LSA
LS age: 939
Options: (No TOS-capability, DC, Upward)
LS Type: Summary Links(Network)
Link State ID: 1.1.1.1 (summary Network Number)
Advertising Router: 2.2.2.2
LS Seq Number: 8000000C
Checksum: 0x77A3
Length: 28
Network Mask: /32
TOS: 0 Metric: 11
Summary Net Link States (Area 34)
LS age: 4
Options: (No TOS-capability, DC, Upward)
LS Type: Summary Links(Network)
Link State ID: 1.1.1.1 (summary Network Number)
Advertising Router: 3.3.3.3
LS Seq Number: 80000001
Checksum: 0xD344
Length: 28
Network Mask: /32
TOS: 0 Metric: 21

To confirm that 1.1.1.1 still appears in R3 and R4 routing table.

R3#sh ip route 1.1.1.1
Routing entry for 1.1.1.1/32
Known via "ospf 1", distance 110, metric 21, type inter area
Last update from 10.0.23.2 on FastEthernet0/1, 01:09:07 ago
Routing Descriptor Blocks:
* 10.0.23.2, from 2.2.2.2, 01:09:07 ago, via FastEthernet0/1
Route metric is 21, traffic share count is 1

R4#sh ip route 1.1.1.1
Routing entry for 1.1.1.1/32
Known via "ospf 1", distance 110, metric 31, type inter area
Last update from 10.0.34.3 on FastEthernet0/0, 01:08:45 ago
Routing Descriptor Blocks:
* 10.0.34.3, from 3.3.3.3, 01:08:45 ago, via FastEthernet0/0
Route metric is 31, traffic share count is 1

OK, all is working as expected. Let’s apply the same filtering rule on the R3 and see the diference.

R3(config)#access-list 2 deny 1.1.1.1
R3(config)#access-list 2 permit any
R3(config)#router ospf 1
R3(config-router)#distribute-list 2 in FastEthernet0/1

Routing table on R3 as expected, 1.1.1.1 has been withdrawn.

R3#sh ip route 1.1.1.1
% Network not in table

What about LSA database?

R3#show ip ospf database summary 1.1.1.1
OSPF Router with ID (3.3.3.3) (Process ID 1)
Summary Net Link States (Area 0)
Routing Bit Set on this LSA
LS age: 916
Options: (No TOS-capability, DC, Upward)
LS Type: Summary Links(Network)
Link State ID: 1.1.1.1 (summary Network Number)
Advertising Router: 2.2.2.2
LS Seq Number: 8000000C
Checksum: 0x77A3
Length: 28
Network Mask: /32
TOS: 0 Metric: 11

1.1.1.1 still appears for Area 0 as expected but what about Area34? Routes has been withdrawn from the Area34. What about R4, does it mean that R4 will not get this route anymore?

Let’s see the routing table and LSA database on R4.

R4#sh ip route 1.1.1.1
% Network not in table
R4#show ip ospf database summary 1.1.1.1
OSPF Router with ID (4.4.4.4) (Process ID 1)

Exactly, link has been withdrawn from LS database for area 34 on R4 once we applied LS filtering to the routing table on R3, good to know ;).

Let’s change the filtering option on R3 from distribute list to static route to null0.

R3(config)#router ospf 1
R3(config-router)#no distribute-list 2 in FastEthernet0/1
R3(config-router)#ip route 1.1.1.1 255.255.255.255 null 0

Let’s see what we have on R4 now:
R4#sh ip route 1.1.1.1
% Network not in table
R4#show ip ospf database summary 1.1.1.1
OSPF Router with ID (4.4.4.4) (Process ID 1)

OK so even if 1.1.1.1 exist in the routing table but is prefered by the non OSPF protocol (in this case by static) is not advertised by the ABR from area 0 to non-zero area.

Conclusion – OSPF ABR that doing LSAs propagation from the Area 0 to to non-zero area advertised to non-zero area only these LSAs that exist in the routing table and are marked as OSPF prefered routing protocol.

Let me know if you find about it in any OSPF RFC or Cisco documentation :).
Thanks to Narbik for mentoring.

Static inside and outside NAT example

In simplified way Network Address Translation (NAT) allow us to translate source or/and destination address of IP packet. There is few reasons to do it like managment purpose, security or IP address savings. NAT is one of technology that with success delays IPv6 world wide deployment. First of all let’s explain NAT wording that at first sight looks slightly confused. Below simple diagram will help us to understand the concept.

  • Inside local IP is how inside address is seen localy by inside hosts, so from the our LAN perspective it’s real IP of our PC. 
  • Inside global IP is how inside address is seen globaly by outside hosts, so from the outside hosts in Internet it’s translated (NATed) IP of our host. 
  • Outside local IP is how outside address is seen localy by inside hosts, so from the LAB perspective it’s translated (NATed) IP of host that resides out of our network like in Internet. Hosts in LAN will use it as destination IP address.
  • Outside global IP is how outside address is seen globaly by outside hosts, so from the our LAN perspective it’s real IP address of host that resides out of our network.

Inside translation type is frequently used in today’s networks. In case we have 10000 hosts in our LAN and would allow them to connect to the Internet resource then we need provide external public IP address for each internal hosts. So big range of public IP addresses expensive but sometimes even not be available for not service provider company. NAT is a solution in this example. We can simple translate all inside IP addresses to 1 public IP using Port Address Translation (PAT) NAT feature. The first question that comes to the mind is how the router will be able to distinguish the packets once they back from the Internet. PAT simply translate the IPs to one outside IP but additionaly translates the layer 4 source ports. Router initially simple rewrites the TCP or UDP source port changing just a source IP but in case another host intimates session with the same layer 4 source then router will take first free port. Based on this easy mechanism router is able to create around 64511 session for one public IP (we have 65535 ports where first 1024 are reserved). The second example of inside NAT is static one-2-one translation. Inside NAT allows us to hide the server real IP address (frequently used private IP range) and put it under public IP in the Internet as service for public use. Thanks to this we can hide our network infrastructure and additionaly again save our public IP range because all of our public services can be hosted under one IP.
The outside NAT translation changing destination IP address. It’s usful when our company has business connection to third party and they are using IP address that is already used in our network somewhere.

Let’s take a first example and try to configure inside and outside static NAT translation based on the below diagram.

First inside so source IP (from the LAN perspective) translation.
R2(config)#ip nat inside source static ?
A.B.C.D Inside local IP address
esp IPsec-ESP (Tunnel mode) support
network Subnet translation
tcp Transmission Control Protocol
udp User Datagram Protocol
R2(config)#ip nat inside source static 10.0.12.1 ?
A.B.C.D Inside global IP address
interface Specify interface for global address
R2(config)#ip nat inside source static 10.0.12.1 132.0.1.100

Inside done. All traffic from the PC with IP 10.0.12.1 will be NATed to the 132.0.1.100.
Next let’s define the server IP address (192.168.1.7) that will be used by local PC as destination IP.

R2(config)#ip nat outside source static ?
A.B.C.D Outside global IP address
network Subnet translation
tcp Transmission Control Protocol
udp User Datagram Protocol
R2(config)#ip nat outside source static 200.1.2.3 ?
A.B.C.D Outside local IP address
R2(config)#ip nat inside source static 10.0.12.1 132.0.1.100

We have to define the inside and outside interface.
R2(config)#int fa0/1
R2(config-if)#ip nat outside
R2(config-if)#int fa0/0
R2(config-if)#ip nat inside
R2(config-if)#no ip route-cache
R2#show ip nat translations
Pro Inside global Inside local Outside local Outside global
--- --- --- 192.168.1.7 200.1.2.3
--- 132.0.1.100 10.0.12.1 --- ---

OK all done. Two static one2one translation have been added to the NAT table. It’s worth to mention here that this type of entry is kinf of reversible translation type – it’s possible to initiate connection from inside or from outside. In case of dynamic NAT it’s impossible to initiate connection from outside unless dynamic NAT with route-map and reversible option at the end is used.

Let’s run the debug IP packet and initiate test traffic doing telnet to outside local IP address – 192.168.1.7 from the PC.

R2#debug ip packet detail
IP packet debugging is on (detailed)
*Mar 1 02:32:44.819: IP: s=10.0.12.1 (FastEthernet0/0), d=192.168.1.7, len 44, unroutable
*Mar 1 02:32:44.823: TCP src=35649, dst=23, seq=391750710, ack=0, win=4128 SYN
*Mar 1 02:32:44.827: IP: tableid=0, s=10.0.12.2 (local), d=10.0.12.1 (FastEthernet0/0), routed via FIB
*Mar 1 02:32:44.831: IP: s=10.0.12.2 (local), d=10.0.12.1 (FastEthernet0/0), len 56, sending
*Mar 1 02:32:44.835: ICMP type=3, code=1

Hmm 192.168.1.7 is unroutable, does it mean that router first take routing decision before translation.
Here you are short list of Cisco IOS order of operation: 

  1. If IPsec, then check input access list
  2. Decryption—for Cisco Encryption Technology (CET) or IPsec
  3. Check input access list
  4. Check input rate limits
  5. Input accounting
  6. Policy routing
  7. Routing
  8. Redirect to Web cache
  9. NAT
  10. Crypto (check map and mark for encryption)
  11. Check output access list
  12. Inspect context-based access control (CBAC)
  13. TCP intercept
  14. Encryption
  15. Queueing

OK let’s add routing to the 192.168.1.7 and see what happens.
R2#*Mar 1 03:23:34.311: IP: tableid=0, s=10.0.12.1 (FastEthernet0/0), d=192.168.1.7 (FastEthernet0/1), routed via FIB ---Routing over Fa0/1
*Mar 1 03:23:34.315: NAT: s=10.0.12.1->132.0.1.100, d=192.168.1.7 [318] --- source NAT
*Mar 1 03:23:34.319: NAT: s=132.0.1.100, d=192.168.1.7->200.1.2.3 [318] --- destination NAT
*Mar 1 03:23:34.323: IP: s=132.0.1.100 (FastEthernet0/0), d=200.1.2.3 (FastEthernet0/1), g=10.0.23.3, len 44, forward
*Mar 1 03:23:34.327: TCP src=31740, dst=23, seq=1913933555, ack=0, win=4128 SYN

R2#show ip nat translations
Pro Inside global      Inside local       Outside local      Outside global
--- ---                ---                192.168.1.7        200.1.2.3
tcp 132.0.1.100:31740  10.0.12.1:31740    192.168.1.7:23     200.1.2.3:23
--- 132.0.1.100        10.0.12.1          ---                ---

Connection initiated successfully. First SYN packet is routed and will be push out over Fa0/1 (red), next inside (source) and outside (destination) NAT is taking place. TCP source and destination ports 31740 and 23 respectively have been written in NAT translation table. Enjoy the NAT.

QoS with GRE and IPsec – qos pre-classify feature

Is it a issue to achive proper QoS for IP tunneled traffic over GRE encrypted by IPsec tunnel – for Cisco router is not the case. With use of GRE/IPsec ToS byte is copied from the original IP header to the new GRE and IPsec IP headers. It’s done by default without any specific configuration, this process is called the „ToS Byte Preservation” feature.
For example if router gets IP packet with DSCP value equal to 46 then packet is encapsulated in to GRE header, ToS byte is copied to the GRE IP header ToS field. In case this GRE packet is subject to IPsec encryption the same process occurs and ToS byte from GRE headers is copied into IPsec IP header – all is done by automatically. But what if QoS policy would classify and do action based on original source IP and specific TCP or UDP port. Solution is the QoS pre-classify feature. It allows router to temporarily copy and store the layer 3 and 4 headers from the original IP packet and take action based on these values so classify, queue and schedule on the router’s egress interface accordingly.

To achive above QoS feature have to be enabled both under GRE tunnel and IPsec crypto map as follows:
Router(config)#int tun 0
Router(config-if)#qos pre-classify
Router(config-if)#exit
Router(config)#crypto map MAPA 100 ipsec-isakmp
Router(config-crypto-map)#qos pre-classify

Now you are ready to go with class-map and policy-map. Remember that in this case policy needs to be assigned to physical interface where crypto map is configured already.

CoS and DSCP marking and remarking options on Catalyst switches

By default Cisco Catalyst switch does not take into account any bit from the Layer 2 CoS or Layer 3 ToS field. Does it mean that packets will be transmited in the original for and CoS/ToS fiels will be untouched.
Once we enable quality of service (QoS) for the entire switch using:
SW(config)#mls qosQoS will be enabled with the default parameters on all ports in the system. What it means that switch by default will remark CoS and ToS values to default 0 (zero).
Once we enable trusting under interface then system will analyze CoS and ToS fields. We have two options; trusting CoS or ToS (DSCP). If you would learn more about the QoS language take a look on this post.

Below you can find some examples and clarifications about sepcific options:

Trusting CoS
SW(config-if)#mls qos trust cos

  • Switch gets packet with CoS=5
  • Switch will pass through CoS value untouch, but DSCP will be rewritten based on the map table (by default CoS 5 will set DSCP to 46)
  • Conclustion: DSCP value is set based on the mls qos map cos-dscp

Trusting DSCP
SW(config-if)#mls qos trust dscp

  • Switch gets packet with CoS=4 and DSCP=46
  • Switch will pass through DSCP value untouch but CoS will be rewritten based on the map table (by default DSCP will rewrite CoS to 5 )
  • Conclusion: CoS value is set based on the mls qos map dscp-cos

Assigning CoS to port
SW(config-if)#mls qos cos 5

  • Switch gets packet on CoS untrusted port without QoS field, so it’s kind of untagged frame without 802.1p field(like in case of native VLAN)
  • Switch sets default CoS value that is assigned to the port, in this case CoS 5 (by default is 0). Marked value (CoS 5) later on is used to mark DSCP based on the mls qos map cos-dscp.
  • Conclusion: CoS value is set for all non capable 802.1p tag (layer 2 QoS field) frames

CoS overriding
SW(config-if)#mls qos cos 5
SW(config-if)#mls qos cos override

  • Switch gets tagged frame with CoS value of 4
  • Switch will tag frame with CoS value of 5 then it’s used to mark DSCP base on the mls qos map cos-dscp.
  • Conclusion: switch sets CoS for all frames even if they have CoS value already assigned to (base on the value in mls qos cos x)

Trusting DSCP just from Cisco IP Phone
SW(config-if)#mls qos trust dscp
SW(config-if)#mls qos trust device cisco-phone

  • Switch has Cisco IP Phone connected (phone’s visible over CDP) that sends frames with DSCP=46
  • Switch will pass through DSCP value untouch, CoS will be marked based on the mls qos map dscp-cos
  • Conclusion: trusting DSCP value only when a Cisco Phone is connected and reported via CDP on the respective interface; works in conjunction with the mls qos trust dscp and mls qos trust cos commands

Here you are QoS settings for not connected port (base on the above configuration):
SW#sh mls qos interface gi1/0/1
GigabitEthernet1/0/1
trust state: not trusted
trust mode: trust dscp
trust enabled flag: dis
COS override: dis
default COS: 0
DSCP Mutation Map: Default DSCP Mutation Map
Trust device: cisco-phone
qos mode: port-based

And here you are QoS settings output once we connected Cisco IP Phone to the port:
SW#sh mls qos interface gi1/0/1
GigabitEthernet1/0/1
trust state: trusted
trust mode: trust dscp
trust enabled flag: ena
COS override: dis
default COS: 0
DSCP Mutation Map: Default DSCP Mutation Map
Trust device: cisco-phone
qos mode: port-based

as you can notice enabled flag field has changed to ena (enabled) and trust state changed to trusted state, so port is ready to trust DSCP.

No DSCP/IPP to CoS rewriting (3550 only)
SW(config-if)#mls qos trust dscp pass-through cos

  • Switch gets packet with DSCP=46 and CoS=0
  • Switch will pass through DSCP and CoS value untouch, so DSCP=46 and CoS=0
  • Conclustion: switch does not remark CoS value

No CoS to DSCP rewrite (2960, 3560, 3750 only)
SW(config)#no mls qos rewrite ip dscp
SW#show mls qos
QoS is enabled
QoS ip packet dscp rewrite is disabled

  • Switch gets packet with DSCP=46
  • Switch will pass through DSCP value untouch
  • Conclustion: CoS will be trusted, DSCP will be preserved, switch does not modify DSCP value, leave it default as it is in the outgoing packet

Matching traffic with specific DSCP value in ACL (VLAN-Based) SW(config)#interface FastEthernet 1/1
SW(config-if)#switchport access vlan 100
SW(config-if)#switchport voice vlan 110
SW(config-if)#spanning-tree portfast
SW(config-if)#mls qos vlan-based
SW(config-if)#srr-queue bandwidth shape 10 0 0 0
SW(config-if)#srr-queue bandwidth share 10 30 40 20
SW(config-if)#queue-set 1
SW(config-if)#priority-queue out
SW(config-if)#ip access-list extended RTP
SW(config-ext-nacl)#permit udp any any range 16384 32767 dscp 46
SW(config-ext-nacl)#class-map match-any VOICE
SW(config-cmap)#match access-group name RTP
SW(config-cmap)#policy-map POLICY-VOICE
SW(config-cmap)#class VOICE
SW(config-pmap-c)#set dscp af31
SW(config-pmap-c)#interface vlan 110
SW(config-if)#service-policy input POLICY-VOICE

  • Switch gets packet with DSCP=46 and CoS=0
  • Switch will set DSCP to 26 and CoS value based the mls qos map dscp-cos map table
  • Conclustion: mls qos vlan-based overrides QoS interfaces level trusting seetings, port will not clear the CoS/DSCP field even that we don’t have trusting under policy, CoS/DSCP will be preserve and can be match by class-map
  • If you have more or better example please share with us under comments. Enjoy!

    Testing MTU with ping tool

    Ping tool is very usful to discover or understand some network behaviours related to network protocols or services that are run on Cisco routers or MS Windows. Before we start with ping test, first of all we have to know how this tool has been deployed in both systems.

    Datagram/Packet size (IP Total Length field) = IP Header + Payload
    As name imply it’s datagram/packet size so IP Header + Payload.

    With Cisco IOS the case is simple. In below example packet size is equal to 1000 bytes, it means that ICMP Echo ping message will be generated with standard 20 bytes for IP Header + 980 for ICMP Payload (ICMP type field: 1B, code: 1B, checksum: 2B, identifier: 2B, sequence number: 2B, data: 972B).

    R1#ping 10.0.23.3 repeat 1 size 1000
    Type escape sequence to abort.
    Sending 1, 1000-byte ICMP Echos to 10.0.23.3, timeout is 2 seconds:
    !
    Success rate is 100 percent (1/1), round-trip min/avg/max = 80/80/80 ms

    In case of Windows OS it’s more complicated. We can define specific data size (don’t mix up it with payload) so in this example 1000B means exactly ICMP data in payload + fields in ICMP + IP Header. In this case IP packet size will have 20 bytes for IP Header + 1008 bytes for ICMP Payload (ICMP type field: 1B, code: 1B, checksum: 2B, identifier: 2B, sequence number: 2B, data: 1000B). So in case of Ethernet network frame will have 1042B (14B Ethernet II Header included). 

    C:\>ping 10.0.10.1 -l 1000 -n 1

    Here you can find sniff of ICMP echo request based on the Windows ping tool.

    To recap:
    Cisco ping size = total IP packet length
    Windows ping size = just ICMP data without ICMP 8B fields like type, code, checksum, identifier and sequence.

    Fragmentation and Maximum Transmission Unit (MTU)

    For IP protocol MTU defines size of IP packet (so IP Total Length field) can can be send thru network device interface. As packets are encapsulated into frame in data link layer they have to be small enough to be transmited by the physical transmission technology.  In case packet is larger then maximum size of underlying network technology there is need to divide an IP packet in to smaller IP chunks. This process is called as IP fragmentation, puttin chunks all together back on the second end of the transmission is called as reassemble, the reassemble of IP packets is done on the IP destination end.

    Fragmentation issues

    Fragmentation couse more overhead for the receiver then to sender. Device responsible for fragmentation needs to create new header and devide orignal pacekts into fragments. From the other side receivermust allocate memeory to properly serve all fragments and consolidate them all togther. It is not a issue for final destinatio like a host but could couse a problem for routers.

    Several issues related to IP fragmentation that couse that is should be avoided:

    • Requires the support of slow path processing on the routers and additionaly use CPU of the receiver part  to reassemble the fragments (forwarding functions is done by software so process switched). 
    • In case router gets fragmented needs to do IP reassembling it reserve the largest buffer available with which to work because it has no idea what size of the original IP packet will be get.
    • If one fragment of an IP packet is dropped, then the entire IP packet must be resend and fragmented again.

     Fragmentation with GRE and IPsec tunnel

    The biggest issue with IP fragmentation is related to GRE tunnel. Let’s take a look on the following example. Router A gets 1476B packet then adds GRE header into and sends to tunnel destination. Assume there is a router in the MPLS cloud with link MTU 1400B. This router will fragment GRE packet (GRE IP header, inner origanl IP and TCP headers will be only in the first fragment), in this case GRE tunnel destination router must reassemble the fragmented packets.

    To avoid IP fragmentation the best way is to increase the MTU on whole packet way (so on each router on the path), in case of MPLS provider is not the issue because more and more vendors already increased this value up to 1600B to support IPsec or GRE without any problems. In case provider does not support higher MTU or transmit network is Internet we have three options: decrease IP MTU on tunnel interface, adjust MSS value or use PMTUD.  Let’s move from fragmentation theory and take a look how to take advantage of ping to discover the fragmentation issue related to MTU on the packet way and how it can be avoided.

    Example – pure network

    Suppose that we have pure network enviroment, two CE routers one in branch A and second one in remote branch B, we would confirm that MTU on the path especially minimum MTU in the provider MPLS cloud is equal to 1500B.

    To test the MTU on the path the best option is to send the packets each time incrementing overall size. Extended ping feature with sweep option is perfect in this case. In the below example we do send ICMP echo request eith overall size equal to 1490B, we set sweep max size to 1510B where weep interval so we are going to increment each packet by 1B. Important option in this example is setting of Don’t Fragment bit.
    R1#p
    Protocol [ip]:
    Target IP address: 10.0.23.3
    Repeat count [5]:
    Datagram size [100]:
    Timeout in seconds [2]:
    Extended commands [n]: y
    Source address or interface:
    Type of service [0]:
    Set DF bit in IP header? [no]: y
    Validate reply data? [no]:
    Data pattern [0xABCD]:
    Loose, Strict, Record, Timestamp, Verbose[none]:
    Sweep range of sizes [n]: y
    Sweep min size [36]: 1490
    Sweep max size [18024]: 1510
    Sweep interval [1]:
    Type escape sequence to abort.
    Sending 105, [1490..1510]-byte ICMP Echos to 10.0.23.3, timeout is 2 seconds:
    Packet sent with the DF bit set
    !!!!!!!!!!!..........!!!!!!!!!!!..........!!!!!!!!!!!..........!!!!!!!
    !!!!..........!!!!!!.
    Success rate is 54 percent (50/91), round-trip min/avg/max = 40/99/224 ms

    We have sent out 10 ICMP echo request starting from 1490B and we have got 10 responses, but next 10 pacets has not been answered at all, it’s means that packets was not be able to reach a final destination due to MTU, in fact pacekts longer then 1500B even has not been routed and leave router. Tested router has MTU equal 1500B configured on the interfaces so in the debug you can show that packets longer then 1500B has been routed just by FIB not by RIB like others.
    *Mar 1 00:16:42.179: IP: tableid=0, s=10.0.12.1 (local), d=10.0.23.3 (FastEthernet0/0), routed via FIB
    *Mar 1 00:16:42.183: IP: s=10.0.12.1 (local), d=10.0.23.3 (FastEthernet0/0), len 1499, sending
    *Mar 1 00:16:42.183: ICMP type=8, code=0
    *Mar 1 00:16:42.219: IP: tableid=0, s=10.0.23.3 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), routed via RIB
    *Mar 1 00:16:42.223: IP: s=10.0.23.3 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), len 1499, rcvd 3
    *Mar 1 00:16:42.227: ICMP type=0, code=0
    *Mar 1 00:16:42.231: IP: tableid=0, s=10.0.12.1 (local), d=10.0.23.3 (FastEthernet0/0), routed via FIB
    *Mar 1 00:16:42.231: IP: s=10.0.12.1 (local), d=10.0.23.3 (FastEthernet0/0), len 1500, sending
    *Mar 1 00:16:42.231: ICMP type=8, code=0
    *Mar 1 00:16:42.283: IP: tableid=0, s=10.0.23.3 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), routed via RIB
    *Mar 1 00:16:42.287: IP: s=10.0.23.3 (FastEthernet0/0), d=10.0.12.1 (FastEthernet0/0), len 1500, rcvd 3
    *Mar 1 00:16:42.291: ICMP type=0, code=0
    *Mar 1 00:16:42.299: IP: tableid=0, s=10.0.12.1 (local), d=10.0.23.3 (FastEthernet0/0), routed via FIB
    *Mar 1 00:16:42.299: IP: s=10.0.12.1 (local), d=10.0.23.3 (FastEthernet0/0), len 1501, sending
    *Mar 1 00:16:42.303: ICMP type=8, code=0
    *Mar 1 00:16:44.295: IP: tableid=0, s=10.0.12.1 (local), d=10.0.23.3 (FastEthernet0/0), routed via FIB
    *Mar 1 00:16.:44.299: IP: s=10.0.12.1 (local), d=10.0.23.3 (FastEthernet0/0), len 1502, sending

    Router sends 20 packets (starting from 1490 up to 1510, later on starting process again this is why we see exclamation mark and dots and exclamation again). We have sent 10 packets to starting from 1490 + 10 it means that last one was 1500B long and the router has 1500B MTU configured and droped the packets because was not able to fragment it due to DF bit set. The propose of this example was just to show way of test.

    Example 2 – GRE over IPsec

    Fragmentation can be a issue in case of use GRE over IPsec where end point that terminates the IPsec tunnel has to add GRE Header and next encrypts it all together. GRs header adds 24B to the packet, IPsec with ESP 56B so we have 80B overhead. As we have mentioned the best option to use in this case is PMTUD but what if routers on the path blocks the ICMP messages then the best option is to use optimum MTU size under tunnel GRE interface. GRE source router then will responsible for IP fragmentation befrore adding GRE and IPsec headers. To calulate the the MTU for tunnel the best option is to use sweep ping as above.

    Let’s assume that we have GRE/IPsec between R1 and R3. We have experience issues related to the slow response time of application and high CPU on the R3 router. We expect that IP fragmentation is a culprit. We would to figure out what minimal MTU is on the path to adjust tunnel MTU to avoid IP fragmentation.
    We are not able to ping with DF bit set to simulate traffic with GRE/IPsec because router will not copy DF bit in to IPsec header ( GRE tunnel cleares DF bit unless we use tunnel path-mtu discovery under tunnel GRE), istead of this we have to ping pure IP so for example remote head end of IPsec tunnel and be sure that this traffic will not be encrypted.

    Test – test of minimal MTU on the path

    R1#ping
    Protocol [ip]:
    Target IP address: 10.0.23.3
    Repeat count [5]:
    Datagram size [100]:
    Timeout in seconds [2]:
    Extended commands [n]: y
    Source address or interface:
    Type of service [0]:
    Set DF bit in IP header? [no]: y
    Validate reply data? [no]:
    Data pattern [0xABCD]:
    Loose, Strict, Record, Timestamp, Verbose[none]:
    Sweep range of sizes [n]: y
    Sweep min size [36]: 1300
    Sweep max size [18024]: 1500
    Sweep interval [1]:
    Type escape sequence to abort.
    Sending 1005, [1300..1500]-byte ICMP Echos to 10.0.23.3, timeout is 2 seconds:
    Packet sent with the DF bit set
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!M.M.M.M.M.M.M.M.M.M.M.M.M.M.M.M.M.M.M.M
    .M.M.M.
    Success rate is 68 percent (101/147), round-trip min/avg/max = 4/19/60 ms

    After 101 successful request so we get response for ICMP with 1300B lenght until 1400B so the minimal MTU on the path is 1400B. M means could not fragment. Then recomended is to lower IP MTU under GRE to minimal MTU on the path – 80B (24B for GRE and 56B for ESP tunnel IPsec).

    To read more about IP Fragmentation, MTU, MSS and PMTUD issues with GRE and IPsec I recomend to read this very good and interesting Cisco Technology White Paper.