Thursday, June 17, 2010

LDD3 notes: Networking

My notes while reading Linux Device Drivers 3rd edition, network drivers.

General and setup

Network devices have no /dev entry point
  • different namespace
  • file operations don't make sense on network interface. Why? I think they could.

alloc_netdev variants alloc_[eth|fc|fddi|tr]dev
  • separate [ltalk|fc|fddi|hippi|tr]_setup
private data not a pointer to driver allocated data, but allocated along with the net_device.
  • supply size to allocate_netdev()
  • use netdev_priv(dev) to access the data
interface flags IFF_*
  • flags & IFF_DEBUG, enable debuggin via ioctl: netifdebug
  • IFF_UP change -> open()/stop()
  • any flag change -> set_multicast_list()
Features, interesting ones

NETIF_F_NO_CSUM - interface needs no checksums
NETIF_F_HW_CSUM - interface hardware does checksums
  • by default all socket buffers are in low memory
Networking device structure
  • jiffy timestamps for last tx/rx, tx watchdog timeout.


functions to control transmission from the networking system
  • netif_start_queue - at open()
  • netif_stop_queue - at stop() or should hard_start_xmit see insufficient buffers left
  • netif_wake_queue() - at tx_completion: same as start, but kicks the networking system back to work
  • netif_tx_disable() - similar to stop, for outside hard_start_xmit

Book suggests that hard_start_xmit() should free the skb at end. with real (dma capable) hardware, probably better free at tx completion to avoid copying.
Book demonstrates extending short packets on the stack - looks like a bad idea for real hardware.


Example deals with skb's allocated at atomic context for received packets. For real hardware it's probably easier and more efficient to preallocate the skb's and run dma right into them.

  • determine protocol, eth_type_trans()
  • mark ip_summed HW/NONE
  • update stats
  • netif_rx()

Temporary polling mode for high throughput. Bypass interrupt overheads.

  • hardware packet buffering
  • capability to disable only rx interrupt
at rx interrupt
  • disable further rx interrupts
  • netif_rx_schedule()
at poll()
  • loop receiving packets
  • don't exceed CPU packet budget, device packet quota
  • netif_receive_skb() instead of netif_rx()
  • netif_rx_complete() and return 0 if no more packets left
  • return 1 if there were packets left
Bypassess input_pkt_queue?

Link state
  • netif_carrier_[on|off|ok]()

  • networking layer by default reserves headroom it needs, at least 32 octets
  • drivers should reserve headroom to have IP header on aligned address (NET_IP_ALIGN)

head data tail end

| head | | tail |
| room | | room |

-----> ---->
pull put


-----> reserve ---->

Hardware address resolution

  • details of the physical layer header are to be encapsulated in the driver
  • ethernet-specific header has a common implementation via ether_setup()
  • neighbour -mechanism used to implement arp, not described in the book?

  • ioctl on a socket invokes protocol specific ioctl()
  • protocol delegates unknown ioctls to device, based on ifreq.ifrn_name
  • do_ioctl() with ifreq in kernel-space, cmd

MII support
book describes an obsolete interface. write an overview of the current code?

  • not to be confused with the NAPI polling interface
  • bootloader-like interruptless operation adapted through polling into software simulated interrupts

Net namespaces
not describe in the book, run across while reading vlan code
per namespace 'global' variables
[un]register_pernet_gen_device() to have a 'global variable' pointer identified by 'id'
net_assign_generic() to set pointer
net_generic() to get pointer

No comments:

Post a Comment