OpenBSD CVS

CVS log for src/sys/net/if_veb.c


[BACK] Up to [local] / src / sys / net

Request diff between arbitrary revisions


Default branch: MAIN


Revision 1.35 / (download) - annotate - [select for diffs], Tue Feb 13 12:22:09 2024 UTC (3 months, 3 weeks ago) by bluhm
Branch: MAIN
CVS Tags: OPENBSD_7_5_BASE, OPENBSD_7_5, HEAD
Changes since 1.34: +1 -2 lines
Diff to previous 1.34 (colored)

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6.  Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h.  Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route.  Internet PCB
and TCP SYN cache can use a plain struct route now.  All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@

Revision 1.34 / (download) - annotate - [select for diffs], Sat Dec 23 10:52:54 2023 UTC (5 months, 2 weeks ago) by bluhm
Branch: MAIN
Changes since 1.33: +3 -1 lines
Diff to previous 1.33 (colored)

Backout always allocate per-CPU statistics counters for network
interface descriptor.  It panics during attach of em(4) device at
boot.

Revision 1.33 / (download) - annotate - [select for diffs], Fri Dec 22 23:01:50 2023 UTC (5 months, 2 weeks ago) by mvs
Branch: MAIN
Changes since 1.32: +1 -3 lines
Diff to previous 1.32 (colored)

Always allocate per-CPU statistics counters for network interface
descriptor.

We have the mess in network interface statistics. Only pseudo drivers
do per-CPU counters allocation, all other network devices use the old
`if_data'. The network stack partially uses per-CPU counters and
partially use `if_data', but the protection is inconsistent: some times
counters accessed with exclusive netlock, some times with shared
netlock, some times with kernel lock, but without netlock, some times
with another locks.

To make network interfaces statistics more consistent, always allocate
per-CPU counters at interface attachment time and use it instead of
`if_data'. At this step only move counters allocation to the if_attach()
internals. The `if_data' removal will be performed with the following
diffs to make review and tests easier.

ok bluhm

Revision 1.32 / (download) - annotate - [select for diffs], Thu Nov 23 23:45:10 2023 UTC (6 months, 2 weeks ago) by dlg
Branch: MAIN
Changes since 1.31: +8 -7 lines
Diff to previous 1.31 (colored)

avoid passing weird mbuf chains to pf when pushing out a veb.

pf expects the ip header to be in the first mbuf of the chain we
pass to pf_test, but in some situations the ethernet header is the
only data in the first mbuf. after we remove the ethernet header,
the first mbuf had no data in it which confused pf. fix this by
passing all packets to ip_check on output as well as input. ip input
handlers do all the necessary m_pullups.

found by Mark Patruck.

Revision 1.31 / (download) - annotate - [select for diffs], Tue May 16 14:32:54 2023 UTC (12 months, 3 weeks ago) by jan
Branch: MAIN
CVS Tags: OPENBSD_7_4_BASE, OPENBSD_7_4
Changes since 1.30: +2 -2 lines
Diff to previous 1.30 (colored)

Use separate IFCAPs for LRO and TSO.

This diff introduces separate capabilities for TCP offloading.  We split this
into LRO (large receive offloading) and TSO (TCP segmentation offloading).
LRO can be turned on/off via tcprecvoffload option of ifconfig and is not
inherited to sub interfaces.

TSO is inherited by sub interfaces to signal this hardware offloading capability
to the network stack.

With tweaks from bluhm, claudio and dlg

ok bluhm, claudio

Revision 1.30 / (download) - annotate - [select for diffs], Mon Feb 27 09:35:32 2023 UTC (15 months, 2 weeks ago) by jan
Branch: MAIN
CVS Tags: OPENBSD_7_3_BASE, OPENBSD_7_3
Changes since 1.29: +3 -1 lines
Diff to previous 1.29 (colored)

Turn off TSO if interface is added to layer 2 devices.

ok bluhm@, claudio@

Revision 1.29 / (download) - annotate - [select for diffs], Wed Jun 1 17:34:13 2022 UTC (2 years ago) by sashan
Branch: MAIN
CVS Tags: OPENBSD_7_2_BASE, OPENBSD_7_2
Changes since 1.28: +3 -3 lines
Diff to previous 1.28 (colored)

callers to pf(4) must continue to run with packet as returned
by firewall.

OK dlg@

Revision 1.28 / (download) - annotate - [select for diffs], Sun May 15 21:37:29 2022 UTC (2 years ago) by bluhm
Branch: MAIN
Changes since 1.27: +9 -16 lines
Diff to previous 1.27 (colored)

Use strncmp() and IFNAMSIZ for if_xname in veb(4) consistently.
OK dlg@

Revision 1.27 / (download) - annotate - [select for diffs], Sun May 15 03:54:07 2022 UTC (2 years ago) by deraadt
Branch: MAIN
Changes since 1.26: +2 -2 lines
Diff to previous 1.26 (colored)

gcc insists the decl for veb_ports_free also use inline

Revision 1.26 / (download) - annotate - [select for diffs], Sun May 15 03:18:41 2022 UTC (2 years ago) by dlg
Branch: MAIN
Changes since 1.25: +332 -78 lines
Diff to previous 1.25 (colored)

avoid calling if_enqueue from an smr critical section.

claudio@ is right that as a rule of thumb it is a bad idea to call
arbitrary code from an smr crit section because the scope of what
is called is very hard to keep in your head. in this particular
case sashan@ points out that if_enqueue can call vport handlers,
which calls if_vinput, which will push a packet into the network
stack, which will call pf and try to take an rwlock. you can't sleep
in an smr crit section.

SMRs in this situation are protecting references to ports in the
list of span and actual ports attached to a veb. when we needed to
send a packet to an unknown unicast, broadcast, or multicast packet
the code would SMR_TAILQ_FOREACH over all the ports, duplicating
the mbuf and calling if_enqueue against the port. span port handling
is basically the same, but we unconditionally send to them.

this replaces the SMR_TAILQ with maps (arrays) of ports. the veb
port map data structure contains a struct refcnt and the number of
ports. the forwarding paths use an SMR crit section to get a reference
to the map, increase the refcnt, and then leaves the smr crit section
before iterating over the array of ports in the map. after the
iteration it releases the refcnt.

this does add a couple of atomic ops in the forwarding path, but
only in the uncommon case (most packets are (should be) to known
unicast addresses), and it's only one set of ops for all ports
instead of ops per port. the known unicast case follows this pattern
too.

reported by Barbaros Bilek on bugs@
fix tested by me and hrvoje popovski
ok claudio@ sashan@ bluhm@ (who also did a lot of the initial analysis)

Revision 1.25 / (download) - annotate - [select for diffs], Tue Jan 4 06:32:39 2022 UTC (2 years, 5 months ago) by yasuoka
Branch: MAIN
CVS Tags: OPENBSD_7_1_BASE, OPENBSD_7_1
Changes since 1.24: +2 -2 lines
Diff to previous 1.24 (colored)

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees.  ipsp_ids_lookup() returns `ids' with bumped reference
counter.  original diff from mvs

ok mvs

Revision 1.24 / (download) - annotate - [select for diffs], Tue Dec 28 23:13:20 2021 UTC (2 years, 5 months ago) by dlg
Branch: MAIN
Changes since 1.23: +1 -2 lines
Diff to previous 1.23 (colored)

whitespace tweak, no functional change.

Revision 1.23 / (download) - annotate - [select for diffs], Tue Dec 28 23:10:58 2021 UTC (2 years, 5 months ago) by dlg
Branch: MAIN
Changes since 1.22: +5 -0 lines
Diff to previous 1.22 (colored)

it doesnt make sense to configure a vport as a span port.

Revision 1.22 / (download) - annotate - [select for diffs], Tue Dec 28 23:10:30 2021 UTC (2 years, 5 months ago) by dlg
Branch: MAIN
Changes since 1.21: +47 -24 lines
Diff to previous 1.21 (colored)

move away from using the M_PROTO1 flag to prevent loops with vports

if a vlan interface is configured on a vport interface, vlan(4)
will take the packet away from ether_input before the veb bridge
input handler gets to clear M_PROTO1. this leaves the flag on the
mbuf as it goes through the l3 stacks. if it goes back out a vport
into a veb, the presence of M_PROTO1 means the packet ends up getting
dropped, which is unexpected.

this diff specialises vport handling by veb even more to avoid the
problem the flag was handling. vports get their own bridge input
handler that skips veb processing completely because a packet being
received on a vport can only occur if a veb has decided to forward
it there and has already processed it. when the stack sends a packet
out a vport interface, then we do actual veb bridge input handling.

bug reported on misc@ and the fix tested by Simon Baker

Revision 1.21 / (download) - annotate - [select for diffs], Mon Nov 8 04:15:46 2021 UTC (2 years, 7 months ago) by dlg
Branch: MAIN
Changes since 1.20: +7 -4 lines
Diff to previous 1.20 (colored)

veb rules are an smr list, so traversal should be in an smr crit section

reported by stsp@
an earlier diff was tested by and ok stsp@
ok jmatthew@

Revision 1.20 / (download) - annotate - [select for diffs], Wed Jul 7 20:19:01 2021 UTC (2 years, 11 months ago) by sashan
Branch: MAIN
CVS Tags: OPENBSD_7_0_BASE, OPENBSD_7_0
Changes since 1.19: +31 -4 lines
Diff to previous 1.19 (colored)

tell ether_input() to call pf_test() outside of smr_read sections,
because smr_read sections don't play well with sleeping locks in pf(4).

OK bluhm@

Revision 1.19 / (download) - annotate - [select for diffs], Wed Jun 2 00:44:18 2021 UTC (3 years ago) by dlg
Branch: MAIN
Changes since 1.18: +32 -9 lines
Diff to previous 1.18 (colored)

use ipv4_check and ipv6_check to well, check ip headers before running pf.

unlike bridge(4), these checks are only run when the packet is
entering the veb/tpmr topology. the assumption is that only valid
IP packets end up inside the topology so we don't have to check
them when they're leaving.

ok bluhm@ sashan@

Revision 1.18 / (download) - annotate - [select for diffs], Thu May 27 03:43:23 2021 UTC (3 years ago) by dlg
Branch: MAIN
Changes since 1.17: +3 -1 lines
Diff to previous 1.17 (colored)

ajacouto says i missed copying some bits from bridge for divert-to.

Revision 1.17 / (download) - annotate - [select for diffs], Wed May 26 02:38:01 2021 UTC (3 years ago) by dlg
Branch: MAIN
Changes since 1.16: +11 -1 lines
Diff to previous 1.16 (colored)

support divert-to when pf applies it to a packet.

when a divert-to rule applies to a packet, pf doesnt take the packet
away and shove it in the socket directly. pf marks the packet, and
then ip (or ipv6) input processing looks at the mark and picks the
local socket to queue it on. because veb operates at layer 2, ip
input processing only occurred if the packet was destined to go
into a vport interface.

bridge(4) handles this by checking if the packet has the pf divert
to mark set on it and calls ip input if it's set. this copies the
semantic to veb.

this allows divert-to to steal (take?) packets going over a veb and
process them on a local socket.

reported by ajacatot@

Revision 1.16 / (download) - annotate - [select for diffs], Wed Mar 10 10:21:48 2021 UTC (3 years, 3 months ago) by jsg
Branch: MAIN
CVS Tags: OPENBSD_6_9_BASE, OPENBSD_6_9
Changes since 1.15: +2 -2 lines
Diff to previous 1.15 (colored)

spelling

ok gnezdo@ semarie@ mpi@

Revision 1.15 / (download) - annotate - [select for diffs], Fri Mar 5 06:44:09 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.14: +10 -6 lines
Diff to previous 1.14 (colored)

pass the uint64_t dst ethernet address from ether_input to bridges.

tested on amd64 and sparc64.

Revision 1.14 / (download) - annotate - [select for diffs], Wed Mar 3 00:00:03 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.13: +2 -3 lines
Diff to previous 1.13 (colored)

clean up span ports as span ports, not bridge ports.

the visible result of this is that span ports aren't made promisc
like bridge ports. when cleaning up a span port, trying to take
promisc off it screwed up the refs, and it makes the underlying
interface not able to be promisc when it should be promisc.

found by dave voutila

Revision 1.13 / (download) - annotate - [select for diffs], Tue Mar 2 23:40:06 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.12: +4 -3 lines
Diff to previous 1.12 (colored)

fix an assert in veb_p_ioctl() that failed when called by a span port.

veb_p_ioctl() is used by both veb bridge and veb span ports, but
it had an assert to check that it was being called by a veb bridge
port. this extends the check so using it on a span port doesnt cause
a panic.

found by dave voutila

Revision 1.12 / (download) - annotate - [select for diffs], Fri Feb 26 01:57:20 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.11: +22 -3 lines
Diff to previous 1.11 (colored)

try do a better job of filtering 802.1 reserved group addresses.

if the bridge is supposed to carry vlan packets, assuming it's an
s-vlan component and should allow certain group addresses to cross
between "customer" bridges.

i should probably let some of these groups fall back through to the
calling ether_input rather than drop them.

Revision 1.11 / (download) - annotate - [select for diffs], Fri Feb 26 01:42:47 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.10: +26 -26 lines
Diff to previous 1.10 (colored)

use uint64_ts for ethernet addresses in the src/dst bits of rules.

Revision 1.10 / (download) - annotate - [select for diffs], Fri Feb 26 01:28:51 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.9: +8 -22 lines
Diff to previous 1.9 (colored)

use a uint64_t for the ethernet address in the etherbridge table.

testing has shown up to a 30% improvement in the veb forwarding
rate with this change.

an earlier diff was tested by hrvoje popovski
tested on amd64 and sparc64

Revision 1.9 / (download) - annotate - [select for diffs], Fri Feb 26 00:16:41 2021 UTC (3 years, 3 months ago) by deraadt
Branch: MAIN
Changes since 1.8: +3 -3 lines
Diff to previous 1.8 (colored)

gcc is more strict about union decls
ok dlg

Revision 1.8 / (download) - annotate - [select for diffs], Wed Feb 24 01:20:03 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.7: +49 -1 lines
Diff to previous 1.7 (colored)

add support for adding and deleting address table entries.

Revision 1.7 / (download) - annotate - [select for diffs], Tue Feb 23 23:42:17 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.6: +5 -1 lines
Diff to previous 1.6 (colored)

handle ifconfig veb0 flush with etherbridge_flush, like bpe and nvgre

Revision 1.6 / (download) - annotate - [select for diffs], Tue Feb 23 11:40:28 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.5: +287 -1 lines
Diff to previous 1.5 (colored)

make a start on transparent ipsec interception, based on bridge(4).

i found the Transparent Network Security Policy Enforcement paper
by angelos and jason was useful for understanding the background
and why you'd want to do this.

the implementation is a little bit different to the bridge one
because i've tweaked the order that pf and ipsec processing happens,
depending on which direction the packet is going over the bridge.
bridge always runs ipsec processing before pf, no matter which
direction the packet is going. packets going into veb, pf runs first
and then ipsec input processing is allowed to happen. in the outgoing
direction ipsec happens first and then pf. pf runs before ipsec in
the inbound direction so pf can apply policy to ipsec encapsulated
packets before they hit pf. this allows you to apply policy to both
the encrypted and unencrypted packets in both directions.

the code is disabled for now. this is mostly because i want veb(4)
to have a good chance at operating outside the netlock, and i'm
pretty sure the ipsec stack isn't ready for that yet. the other
reason why it's disabled is getting a test setup is effort, but i
want to sleep.

Revision 1.5 / (download) - annotate - [select for diffs], Tue Feb 23 07:29:07 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.4: +2 -2 lines
Diff to previous 1.4 (colored)

use link0 to allow vlans to cross the bridge.

Revision 1.4 / (download) - annotate - [select for diffs], Tue Feb 23 05:23:02 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.3: +24 -1 lines
Diff to previous 1.3 (colored)

implement support for the blocknonip port flag.

Revision 1.3 / (download) - annotate - [select for diffs], Tue Feb 23 05:01:00 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.2: +48 -1 lines
Diff to previous 1.2 (colored)

add support for setting and getting bridge port flags.

Revision 1.2 / (download) - annotate - [select for diffs], Tue Feb 23 04:40:27 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN
Changes since 1.1: +21 -1 lines
Diff to previous 1.1 (colored)

filter MAC Bridge component Reserved address

im considering converting ethernet addresses into uint64_ts to make
comparisons (and masking) easier. im trialling it here, and it
doesn't seem like the worst.

Revision 1.1 / (download) - annotate - [select for diffs], Tue Feb 23 03:30:04 2021 UTC (3 years, 3 months ago) by dlg
Branch: MAIN

add veb(4), a Virtual Ethernet Bridge driver.

my intention is to replace bridge(4), but the way it works is
different enough from from bridge that a name change is justified
to distinguish them. it also makes it easier to commit it to the
tree and work on it in parallel to bridge, and allows a window of
migration.

the main difference between veb(4) and bridge(4) is how they use
interfaces as ports. veb takes over interfaces completely and only
uses them to receive and transmit ethernet packets. bridge also use
each interface as a port to the ethernet segment it's connected to,
but also tries to continue supporting the use of the interface as
a way to talk to the network stack on the local system. supporting
the use of interfaces for both external and local communication is
where most of my confusion with bridge comes from, both when i'm
trying to operate it and also understand the code. changing this
semantic is where most of the simplification in veb comes from
compared to bridge.

because veb takes over interfaces, the ethernet network set up on
a veb is isolated from the host network stack. by default veb does
not interact with pf or the ip (and mpls) stacks. to enable pf for
ip frames going over veb ports link1 on the veb interface must be
set. to have the stack interact with a veb network, vport interfaces
must be created and added as ports to a veb.

the vport interface driver is provided as part of veb, and is handled
specially by veb. veb usually prevents the use of ports by the stack
for sending an receiving packets, but that's why vports exist, so
veb has special handling for them.

veb already supports a lot of the other features that bridge has,
including bridge rules and protected domains, but i got tired of
working out of the tree and stopped implementing them. the main
outstanding features is better address table management, the
blocknonip flag on ports, transparent ipsec interception, and
spanning tree. i may not bother with spanning tree unless someone
tells me that they actually use it.

the core ethernet learning bridge functionality is provided by the
etherbridge code that was factored out of nvgre and bpe. veb is
already (a lot) faster than bridge, and is better prepared to operate
in parallel on multiple CPUs concurrently.

thanks to hrvoje popovski for testing some earlier versions of this.
discussed with many
ok patrick@ jmatthew@

This form allows you to request diff's between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.