Debian 7 on the Samsung Series 9 Ultrabook

I recently purchased an upgrade to my aging laptop; a SAMSUNG Series 9 NP900X3C-A01US 13″ Ultrabook. I wont go too much into aesthetics except to say that this laptop is everything the reviews say it is. It’s light, sturdy, stylish, fast, and sips power. It is, almost down to the PCB, Samsungs answer to the 13″ Macbook Air. I am happier with it so far than I have been with any laptop I’ve owned… and I’ve owned quite a few.

At any rate, throwing Debian 7.0 (wheezy) on this laptop was trivial and almost everything “just works”. There are a few things I had to tweak as far as power saving, function keys, etc. and I wanted to outline those things here. Implement the items below to get the most out of yours if you own one.

Use the latest kernel
I am running 3.7.4 from kernel.org on this ultrabook. Always use the latest available stable kernel on laptops. This is doubly true on very new ones like the series 9 if you want all the hardware to be well supported. Some hardware wont work under the default wheezy kernel on this model. There are also continual improvements in power management happening in the kernel. One example of something that didnt work properly under the default wheezy kernel was detecting when the lid was closed.

Use tmpfs
Debian doesn’t yet default to putting some things on tmpfs that should be. In /etc/default/tmpfs set RAMTMP=yes to mount /tmp on tmpfs. I also like to add an entry to /etc/fstab to mount /home/someuser/.cache/google-chrome on tmpfs as well. Both of these things speed up access to temporary/cache data and help to save power.

tmpfs /home/someuser/.cache/google-chrome tmpfs mode=1777,noatime 0 0

Enable discard support
This laptop comes with a 128GB SanDisk SSD U100. If your SSD supports TRIM (and this one does) and you are using ext4 (and you should be!) you can enable TRIM support in the file system by adding ‘discard’ to all the mount points in /etc/fstab.

/dev/mapper/lvm-root / ext4 discard,errors=remount-ro 0 1

If, as in the example above, you are also using LVM then you should configure it to issue discards to the underlying physical volume. To do so, set “issue_discards=1” in /etc/lvm/lvm.conf.

Use NOOP scheduler
Schedulers are getting smarter these days so this might not be necessary any more. I am still in the habit of setting noop as the scheduler for non-rotational storage devices though. I like to add a udev rule that will set noop if the device advertises itself as non-rotational. You could just set the default elevator to noop but this would effect, say, a USB SATA disk that you may plug in some day.

cat > /etc/udev/rules.d/60-schedulers.rules << EOF # set noop scheduler for non-rotating disks ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="noop" EOF

i915 power saving
The i915 kernel module for the Intel HD 4000 graphics chip set supports some extra power saving options that you can take advantage of. To enable them, add the following to /etc/default/grub, in the same spot where the "quiet" option for grub currently exists.

GRUB_CMDLINE_LINUX_DEFAULT="quiet i915.i915_enable_rc6=1 i915.i915_enable_fbc=1 i915.lvds_downclock=1"

To see more information about what those options do, execute "modinfo i915".

Disable onboard LAN
This is a truly portable notebook. You shouldnt generally be using the onboard LAN a lot. You can save some power by disabling it in the BIOS.

Extend battery life
This laptop has such good battery life that you should be able to live with it quite comfortably in "battery extender" mode. This mode only lets the battery charge up to 80% and greatly extends the useful life of the battery. Enable it in the BIOS.

If you are in a situation where you know you're going to need maximum battery life (say, while waiting to board a very long flight) you can disable battery extender mode via a file in /sys. Letting the battery charge to 100% should give you about another hour of run time.

echo 0 > /sys/devices/platform/samsung/battery_life_extender

Enable touchpad tapping
Xorg uses the wrong driver for the touchpad by default. If you want to enable tap / doubletap / etc. then you'll need to touch a config file for Xorg.

mkdir -p /etc/X11/xorg.conf.d
cat > /etc/X11/xorg.conf.d/50-snaptics.conf << EOF Section "InputClass" Identifier "touchpad" Driver "synaptics" MatchIsTouchpad "on" Option "TapButton1" "1" Option "TapButton2" "2" Option "TapButton3" "3" #Option "VertEdgeScroll" "on" #Option "VertTwoFingerScroll" "on" #Option "HorizEdgeScroll" "on" #Option "HorizTwoFingerScroll" "on" #Option "CircularScrolling" "on" #Option "CircScrollTrigger" "2" #Option "EmulateTwoFingerMinZ" "40" #Option "EmulateTwoFingerMinW" "8" #Option "CoastingSpeed" "0" EndSection EOF

Coming Soon...

Enable silent mode binding
omething here about binding Fn-F11 to enable/disable silent mode.

Enable keyboard backlight bindings
Something here about enabling backlight keys Fn-F9 and Fn-F10

Enable wifi binding
Something here about Fn-F12

Turn off bluetooth radio by default
Related to the above, but only turn off bluetooth radio during boot up

Use Powertop
Something here about enabling powertops tunables on boot up

Deciphering Linux page allocation failures

I find myself diving into Linux kernel memory management more and more these days. I thought I’d write up some helpful tips on decoding something you might see every once in a while; page allocation failures. In this particular case, we’ll look at the following example:

Dec  6 04:30:13 host kernel: echo: page allocation failure. order:9, mode:0xd0

What you see here is the following:

  • Dec 6 04:30:13 time stamp
  • host host name
  • kernel: the process that generated the message. In this case, it was the kernel itself
  • echo: the command that cause the message to be generated
  • page allocation failure. the message itself
  • order:9, the number of pages that were requested, as a power of 2
  • mode:0xd0 flags passed to the kernel memory allocator.

Regarding “order:9”, the kernel allocates pages in powers of 2. order:9 simply means it requested 2^9 pages (512), of whatever size they are. To see the size of your memory pages you can issue:

getconf PAGESIZE

In the case of this host, memory pages are 4096 bytes so the kernel was attempting to allocate 2097152 bytes (2MB). “mode:0xd0” is the flag passed to the kernel memory allocator. You can find all possible modes in include/linux/gfp.h.

“echo” caused the page allocation failure, which lead this call trace:

Dec  6 04:30:13 host kernel: Call Trace:
Dec  6 04:30:13 host kernel:  [<ffffffff8020f895>] __alloc_pages+0x2b5/0x2ce
Dec  6 04:30:13 host kernel:  [<ffffffff80212dac>] may_open+0x65/0x22f
Dec  6 04:30:13 host kernel:  [<ffffffff8023def3>] __get_free_pages+0x30/0x69
Dec  6 04:30:13 host kernel:  [<ffffffff884a9aa0>] :ip_conntrack:alloc_hashtable+0x33/0x7a
Dec  6 04:30:13 host kernel:  [<ffffffff884a9b56>] :ip_conntrack:set_hashsize+0x49/0x12a
Dec  6 04:30:13 host kernel:  [<ffffffff8029a32b>] param_attr_store+0x1a/0x29
Dec  6 04:30:13 host kernel:  [<ffffffff8029a37f>] module_attr_store+0x21/0x25
Dec  6 04:30:13 host kernel:  [<ffffffff802fdc83>] sysfs_write_file+0xb9/0xe8
Dec  6 04:30:13 host kernel:  [<ffffffff802171a7>] vfs_write+0xce/0x174
Dec  6 04:30:13 host kernel:  [<ffffffff802179df>] sys_write+0x45/0x6e
Dec  6 04:30:13 host kernel:  [<ffffffff80260106>] system_call+0x86/0x8b
Dec  6 04:30:13 host kernel:  [<ffffffff80260080>] system_call+0x0/0x8b

The first line is where the kernel failed, in alloc_pages, which is no surprise. As we go a bit deeper in the stack trace you can see that the calling function was :ip_conntrack:alloc_hashtable, so we died during an attempt to allocate 2MB to the ip_conntrack hash table.

After the above, the kernel dumps a fair amount of information (Mem-info) about the memory state of the host. If you’re interested in the kernel code involved, see show_mem() in lib/show_mem.c, and show_free_areas() in mm/page_alloc.c.

Dec  6 04:30:13 host kernel: Mem-info:
Dec  6 04:30:13 host kernel: DMA per-cpu:
Dec  6 04:30:13 host kernel: cpu 0 hot: high 186, batch 31 used:32
Dec  6 04:30:13 host kernel: cpu 0 cold: high 62, batch 15 used:57
Dec  6 04:30:13 host kernel: cpu 1 hot: high 186, batch 31 used:96
Dec  6 04:30:13 host kernel: cpu 1 cold: high 62, batch 15 used:11
Dec  6 04:30:13 host kernel: cpu 2 hot: high 186, batch 31 used:90
Dec  6 04:30:13 host kernel: cpu 2 cold: high 62, batch 15 used:53
Dec  6 04:30:13 host kernel: cpu 3 hot: high 186, batch 31 used:102
Dec  6 04:30:13 host kernel: cpu 3 cold: high 62, batch 15 used:7
Dec  6 04:30:13 host kernel: cpu 4 hot: high 186, batch 31 used:136
Dec  6 04:30:13 host kernel: cpu 4 cold: high 62, batch 15 used:14
Dec  6 04:30:13 host kernel: cpu 5 hot: high 186, batch 31 used:39
Dec  6 04:30:13 host kernel: cpu 5 cold: high 62, batch 15 used:3
Dec  6 04:30:13 host kernel: cpu 6 hot: high 186, batch 31 used:163
Dec  6 04:30:13 host kernel: cpu 6 cold: high 62, batch 15 used:12
Dec  6 04:30:13 host kernel: cpu 7 hot: high 186, batch 31 used:74
Dec  6 04:30:13 host kernel: cpu 7 cold: high 62, batch 15 used:0
Dec  6 04:30:13 host kernel: DMA32 per-cpu: empty
Dec  6 04:30:13 host kernel: Normal per-cpu: empty
Dec  6 04:30:13 host kernel: HighMem per-cpu: empty
Dec  6 04:30:13 host kernel: Free pages:       19348kB (0kB HighMem)
Dec  6 04:30:13 host kernel: Active:67877 inactive:18034 dirty:110 writeback:0 unstable:0 free:5017 slab:11928 mapped-file:3125 mapped-anon:28202 pagetables:854
Dec  6 04:30:13 host kernel: DMA free:21268kB min:2916kB low:3644kB high:4372kB active:271488kB inactive:70236kB present:532480kB pages_scanned:35 all_unreclaimable? no
Dec  6 04:30:13 host kernel: lowmem_reserve[]: 0 0 0 0
Dec  6 04:30:13 host kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Dec  6 04:30:13 host kernel: lowmem_reserve[]: 0 0 0 0
Dec  6 04:30:13 host kernel: Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Dec  6 04:30:13 host kernel: lowmem_reserve[]: 0 0 0 0
Dec  6 04:30:13 host kernel: HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Dec  6 04:30:13 host kernel: lowmem_reserve[]: 0 0 0 0
Dec  6 04:30:13 host kernel: DMA: 2090*4kB 1339*8kB 99*16kB 17*32kB 5*64kB 5*128kB 0*256kB 2*512kB 1*1024kB 0*2048kB 0*4096kB = 24208kB
Dec  6 04:30:13 host kernel: DMA32: empty
Dec  6 04:30:13 host kernel: Normal: empty
Dec  6 04:30:13 host kernel: HighMem: empty
Dec  6 04:30:13 host kernel: 61160 pagecache pages
Dec  6 04:30:13 host kernel: Swap cache: add 428356, delete 421485, find 49760550/49818299, race 0+197
Dec  6 04:30:13 host kernel: Free swap  = 1986220kB
Dec  6 04:30:13 host kernel: Total swap = 2096472kB
Dec  6 04:30:13 host kernel: Free swap:       1986220kB
Dec  6 04:30:13 host kernel: 133120 pages of RAM
Dec  6 04:30:13 host kernel: 22508 reserved pages
Dec  6 04:30:13 host kernel: 42739 pages shared
Dec  6 04:30:13 host kernel: 6835 pages swap cached

Finally we see a message about falling back to vmalloc.

Dec  6 04:30:13 host kernel: ip_conntrack: falling back to vmalloc.

Since the kernel attempts to kmalloc() a contiguous block of memory, and fails, it falls back to vmalloc() which can allocate non-contiguous blocks of memory.

Linux KVM: Openvswitch on Debian Wheezy

Among a great many other things, openvswitch is an alternative to managing your virtual networking stacks for KVM with bridge-utils. It supports VLANs, LACP, QoS, sFlow, and so forth.  Listed below are the steps required to get openvswitch running on Debian 7.0 (wheezy).

This article is written with the presumption that you are running a source-installed kernel (3.6.6 with the openvswitch module in this case), and want to use the latest openvswitch from git.

Install prerequisites

Apply any available updates, get all the build dependencies for openvswitch, and install module-assistant.

apt-get update && apt-get dist-upgrade
apt-get install build-essential
apt-get build-dep openvswitch
apt-get install module-assistant

Prep your environment

bridge-utils has a kernel modules that conflicts with the brcompat module in openvswitch. Lets remove that and at the same time stop libvirt and KVM for a bit.

apt-get remove --purge bridge-utils
/etc/init.d/libvirt-bin stop
/etc/init.d/qemu-kvm stop

Build openvswitch

Clone the openvswitch git repo and build debian packages from it.

git clone git://openvswitch.org/openvswitch
cd openvswitch
dpkg-buildpackage -b

Install the packages you just built.

cd ../
dpkg -i openvswitch-switch_1.9.90-1_amd64.deb openvswitch-common_1.9.90-1_amd64.deb \
openvswitch-brcompat_1.9.90-1_amd64.deb openvswitch-datapath-source_1.9.90-1_all.deb \
openvswitch-controller_1.9.90-1_amd64.deb openvswitch-pki_1.9.90-1_all.deb

Build openvswitch-datapath for your running kernel.

module-assistant auto-install openvswitch-datapath

Configure brcompat to load on startup.

sed -i 's/# BRCOMPAT=no/BRCOMPAT=yes/' /etc/default/openvswitch-switch

Verify your configuration

At this point you should reboot and verify that the proper modules are loaded, the service starts normally, and the status output is correct.

[email protected]:~$ lsmod | grep brcompat
brcompat               12982  0 
openvswitch            73431  1 brcompat

[email protected]:~$ /etc/init.d/openvswitch-switch restart
[ ok ] Killing ovs-brcompatd (5439).
[ ok ] Killing ovs-vswitchd (5414).
[ ok ] Killing ovsdb-server (5363).
[ ok ] Starting ovsdb-server.
[ ok ] Configuring Open vSwitch system IDs.
[ ok ] Starting ovs-vswitchd.
[ ok ] Starting ovs-brcompatd.

[email protected]:~$ /etc/init.d/openvswitch-switch status
ovsdb-server is running with pid 6281
ovs-vswitchd is running with pid 6332
ovs-brcompatd is running with pid 6357

And that’s it! You now have a working openvswitch installation upon which you can do all the usual things you did with bridge-utils, and so much more.

leap seconds and Linux

On June 30, 2012 a leap second was inserted into UTC which caused a fair amount of difficulty for companies across the Internet. Some explanation of leap seconds, the problems with it that exist in the Linux kernel, and solutions to it follows.

What are leap seconds?

A leap second is a one second adjustment that is applied to UTC in order to prevent it from deviating more than 0.9 seconds from UT1 (mean solar time). It can be positive or negative and is implemented by adding 23:59:60 or skipping 23:59:59 on the last day of a given month (usually June 30 or December 31). Since the UTC standard was established in 1972, however, 25 leap seconds have been scheduled and all of them have been positive.

Since they are dependent on climatic and geologic events that affect the Earths moment of inertia (mostly tidal friction), leap seconds are irregularly spaced and unpredictable. The International Earth Rotation and Reference Systems Service (IERS) is responsible for deciding when leap seconds will occur, and announces them about six months in advance. The most recent leap second was inserted on June 30, 2012 at 23:59:60 UTC. It has been announced that there will not be a leap second on December 31, 2012.

What problems do leap seconds cause?

Leap seconds are problematic in computing for a number of reasons. As an example, to compute the elapsed seconds between two UTC dates in the past requires a table of leap seconds which must be updated whenever one is announced. It is also impossible to calculate accurate time intervals for UTC dates farther in the future than the interval of leap second announcements. There are more practical problems dealing with distributed systems that depend on accurate time stamping of series data.

In particular, there have been problems with the implementation of leap second handling in the Linux kernel itself. When the last leap second occurred on June 30, 2012 this caused outages at reddit (Apache Cassandra), Mozilla (Hadoop), Qantas Airlines, and other sites. Generally speaking, leap second problems on Linux hosts are characterized by high CPU usage of certain processes immediately after application of a leap second to the local clock.

In one particular case, tgtd (scsi-target-utils) on CentOS 6 hosts began generating an average 14,000 log messages per second:

Jun 30 23:59:59 host kernel: Clock: inserting leap second 23:59:60 UTC
Jun 30 23:59:59 host tgtd: work_timer_evt_handler(89) failed to read from timerfd, Resource temporarily unavailable
Jun 30 23:59:59 host tgtd: work_timer_evt_handler(89) failed to read from timerfd, Resource temporarily unavailable
Jun 30 23:59:59 host tgtd: work_timer_evt_handler(89) failed to read from timerfd, Resource temporarily unavailable

This caused the root file system of approximately 600 hosts to become full before the issue was mitigated.

Why do these problems occur?

The last leap second exposed a kernel bug that can affect any threaded application. It is most apparent with applications that use sub-second CLOCK_REALTIME timeouts in a loop, usually connected with futexes.

On July 3, 2007 commit 746976a301ac9c9aa10d7d42454f8d6cdad8ff2b (2.6.22) removed clock_was_set() in seconds_overflow() to prevent a deadlock. Due to this patch the following occurs when a leap second is added to UTC:

  • The leap second occurs and CLOCK_REALTIME is set back by one second
  • clock_was_set() is not called by seconds_overflow() so the hrtimer base.offset value for CLOCK_REALTIME is not updated
  • CLOCK_REALTIME’s sense of wall time is now one second ahead of the timekeeping core’s
  • At interrupt time, hrtimer code expires all CLOCK_REALTIME timers that are set for ($interrupt_time + 1 second) and before

At this point all TIMER_ABSTIME CLOCK_REALTIME timers now expire one second early. Even worse, all sub-second TIMER_ABSTIME CLOCK_REALTIME timers will return immediately. Any applications that use such timer calls in a loop will experience load spikes. This situation persists until clock_was_set() is called, for example, via settimeofday().

On July 13, 2012, Linus merged several commits in d55e5bd0201a2af0182687882a92c5f95dbccc12 (3.5-rc7) which, beyond simply providing clock_was_set_delayed() in hrtimer to resolve the problem, included other rework of hrtimer and timekeeping.

Affected Kernels

This problem has existed since kernel 2.6.22. All kernels from 2.6.22 to 3.5-rc7 are presumably affected. All RHEL 5.x kernels already include a patch to avoid this bug. Unfortunately, Red Hat either neglected to patch, or mispatched, RHEL 6 for the same issue. All RHEL 6 kernels are vulnerable to this problem with patches available in the following updates;

  • RHEL 6.3: kernel-2.6.32-279.5.2
  • RHEL 6.2 Extended Updates: kernel-2.6.32-220.25.1.el6
  • RHEL 6.1 Extended Updates: kernel-2.6.32-131.30.2

In Debian and it’s derivatives this issue is patched in the following kernel updates;

  • Debian 6.x (squeeze): linux-image-2.6.32-46
  • Debian 7.x (wheezy): linux-image-3.2.29-1

Resolution

Quite obviously, the most prudent fix is to apply a patched kernel package to the affected host, or upgrade to upstream > 3.5-rc7. If a given host cannot be patched, it is possible to manually call settimeofday() after a leap second is applied by issuing either of the following;

date -s "`LC_ALL=C date`"
date `date +'%m%d%H%M%C%y.%S'`

Doing so will resolve any present issues on the host in question.

Another interesting approach to solving this problem was devised by Google, which they call “Leap Smear”. Since Google run their own stratum 2 NTP servers they patched NTP to not issue LI (leap indicator) and instead “smear” a leap second by modulating ‘lie’ over time window w before midnight;

lie(t) = (1.0 – cos(pi * t / w)) / 2.0

You can read more about the leap smear technique at their blog.

diskless booting with PXE and NFS

For a long time now I’ve wanted to set up all my mythfrontends to be diskless nodes that boot via PXE using an NFS share as their root filesystem. I finally got around to doing this. I was even able to just migrate my existing installations directly into the PXE boot environment. Here is how I accomplished it…

Continue reading diskless booting with PXE and NFS