Thursday, April 29, 2010

LDD3 notes: Passage of time

My notes while reading Linux Device Drivers 3rd edition


.Jiffies

  • get_jiffies_64()
  • comparison macros: time_after/before
  • jiffies <-> timeval / timespec
  • get_cycles()

short busy waits

  • n/u/mdelay()

1ms/1s resolution sleeps

  • msleep(), ssleep()

Working with 1/HZ resolution sleeps

  • wait_event_timeout() - returns timeout left, never negative
  • set_current_state(interruptible), schedule_timeout()
  • in_atomic(), in_interrupt()


Sleeping with waitqueues

  • macro: one init_waitqueue_head, multiple wait_event(_interruptible), one wake_up to wake up them all.
  • manual wait_event alternative: prepare_to_wait(), schedule(), finish_wait(), signal_pending()
  • 'exclusive' sleepers are woken up as specified batches or individually, they won't behave like a herd
  • going to schedule soon, use wake_up_onterruptible_sync, it won't reschedule right away
  • never use sleep_on, it's broken w/ race condition

Tasklets

  • atomic context in timer interrupt handler or softirq
  • multiple schedules result in single execution
  • cannot assume process context or access user space memory

Workqueue

  • queue has a kernel process, can sleep, but cannot access user space
  • each work_struct in queue only once at a time
  • can be delayed for jiffies


Monday, April 26, 2010

RCU

Reading a few LWN articles on RCUs (1 2 3) really shed some light on their properties and use.

On read-intensive scenarios, it's more efficient to replace read/write locking with an RCU. This is possible because all Linux platforms have atomic pointer read/write operations.

Read critical section
  • dereferences pointers through a mechanism with platform specific memory ordering guarantees: rcu_dereference()
  • may not sleep
  • may not keep or pass dereferenced pointers outside the critical section
Write critical section
  • Protects agains concurrent writes using regular spinlock mutex
  • Makes a copy of the original structure, making updates in the copy
  • Swaps in the new version, while still keeping the old version (atomic)
  • Invokes synchronize_rcu() to wait for all readers to exit their current rcu read critical sections.
  • ( A non-pre-emptible kernel can simpy wait for all CPUs to switch contexts. )
  • Free the old version
Properly used RCUs are immune to deadlocks, but synchronous writes may be very slow due to need to wait for the context switches.

Thursday, April 22, 2010

LDD3 notes: Concurrency

My notes while reading Linux Device Drivers 3rd edition.

Reasons to pay attention to concurrency

  • Early kernels had no SMP, no pre-emption -> enough to protect from interrupts
  • SMP and pre-emption both pose similar concurrency requirements, even though you'd be willing to ignore the other one.
  • shared resources -> avoid


semaphores: sema_init, up, down (declare_mutex)

  • read/write semaphore pairs: init_rwsem, [up|down]_[read|write], downgrade_write, trylock
  • semaphores are dangerous as automatic variables


Completions

  • init_completion, wait_for_completion (uninterruptible), complete, complete_all, complete_and_exit (thread)

Spinlocks

  • higher performance than semaphores
  • disables pre-emption on current cpu
  • may not sleep while holding one
  • mutual exclusion with interrupts ok with spin_lock_irqsave, within one function

R/W spinlocks


Lockless data structures: kfifo generic circular buffer


atomic_t, an 24bit integer


atomic bit operations set_bit, clear_bit, change_bit, test_bit, test_and_set


seqlock: data structure versioning and retry on collision


RCU's


SGI lockmeter to measure time spent waiting on locks


Thursday, April 15, 2010

LDD3 notes: Debugging

My notes while reading Linux Device Drivers 3rd edition.


printk_ratelimit() tells if we're not flooding the log


Kernel configuration

  • CONFIG_DEBUG_SLAB -> canary killed
  • CONFIG_DEBUG_SPINLOCK_SLEEP -> potential sleeps with spinlocks detected
  • CONFIG_DEBUG_INFO, CONFIG_FRAME_POINTER for gdb debugging

seq_file - kernel in-memory file buffer, similar to open_memstream or ostringstream.

  • a cleaner interface for implementing a /proc file
  • iterator/visitor: start, show, next, stop
  • api for the visitor: seq_printf into a seq_file
  • fops implemented for reading

debugging a live kernel through a 'dynamic' core dump

  • gdb vmlinux /proc/kcore, core-file /proc/kcore
  • cannot modify data, breakpoint, watchpoint or single-step
  • add-symbol-file for modules, using /sys/module/*/sections/.* like with a jtag emulator
  • print *(address)

kdb -debugging from SGI

  • ia32 only
  • built-in to kernel, pause/break key as 'ctrl-c' takes you into debugger
  • has breakpoints and can modify data
  • sees module symbols automatically

kgdb -mm -variant

  • serial port, ethernet kgdboe
  • x86, ppc.
  • no ARM

kgdb -sf.net variant

  • serial port only
  • x86, ppc.
  • no ARM

Linux trace toolkit

DProbes

  • vs kprobes?


Thursday, April 8, 2010

LDD3 notes: Device registration and operations

My notes while reading Linux Device Drivers 3rd edition.


register_chrdrv replaced with cdev_add(), cdev_del() and struct cdev


register_chrdev_region, alloc_chrdev_region, unregister_chrdev_region to

  • statically, dynamically pick a contiguous block of dev_t's i.e. major and minor numbers.
  • assign them a device name ( /dev/devices and sysfs )
  • no need to know major/minor numbers at open
  • look at inode->cdev,
  • deduce filp->private_data using using container_of

Notes on proper behaviour with open/close

  • if the device cannot seek: nonseekable_open() , no_llseek
  • struct file represents an open file descriptor in kernel, can be shared by multiple processes
  • one open, fork&dup, single struct file, multiple close, one release

Notes on proper behaviour with read/write, Select(BSD)/Poll(SystemV)/Epoll(Linux)

  • O_NONBLOCK ( == O_NDELAY) and no progress possible -> immediate -EAGAIN
  • _interruptible fails -> -ERESTARTSYS, VFS will retry or return -EINTR
  • poll reports device writable -> next write must not block
  • encountering error in the middle of successfull transfer: return partial result, next attempt will return failure
  • security implications of blindly referencing an user pointer

IOCTL

  • asm/ioctl.h + Documentation/ioctl-number.txt
  • "clueless" legacy: 8bit magic + 8bit device specific
  • capable() - permissions
  • access_ok() - plausible user memory address

Asyncronous notifications from user space perspective, on sockets and tty's

  • whom to notify: F_SETOWN
  • please notify: F_SETFL(FASYNC)
  • receive SIGIO -> select()



Thursday, April 1, 2010

LDD3 notes: Compiling and loading modules

My notes while reading Linux Device Drivers 3rd edition.

out of tree module makefile boilerplate with dual purpose

  • standalone: invokes kernel tree modules target
  • referenced by M: acts like in-tree kbuild makefile

disposable sections with __init, __initdata, __exit


module loading races

  • register facilities only when really ready to take calls
  • at failure, previously registered facilities can be in use already

module_param()

  • automatically exposed in /sys/module
  • with given access permissions, can be read/written to
  • module won't be notified of writes