Custom-built Linux router, (no) thanks to Realtek

In my last post, I spelled out my requirements for a home router: dependable and not requiring babysitting or monthly rebooting, but flexible enough to let me run and control dnsmasq, tcpdump, and VLANs.

When I realized I was seeing so much weirdness at once from my OpenWrt router as to be circumstantial evidence against OpenWrt itself, I mentioned this to my officemate and he said “why don’t you stop screwing around and install full-blown Linux?” Sure, I thought, but that brings up two problems: it sounded like a huge time suck, and where am I going to find appropriate hardware to use as a router?

With help from friends I eventually solved the second one, not without paying a heavy tax in terms of the first (and not in the way I expected): this is that story.

On the time suck question, it seemed like I would have to learn a lot of new things to set up and, possibly, maintain a lot of tasks that I was accustomed to having OpenWrt do for me. I already knew how to install and administer Linux for standard desktop or server use, but I’d never myself configured any advanced networking topologies and my few interactions with iptables had been painful, so configuring NAT and firewalling and routing and dealing with multiple network interfaces was daunting (and this box is by definition going to be exposed to the Internet, so I’d better get the firewall right). I poked around and found shorewall, which exists basically to configure the parts that I didn’t already know how to do, and the more I read about it, the more it seemed a good match for what I was trying to do.

On the hardware question, I wanted something small and quiet and low-power, which would fit in my server rack and stay on all the time without running up my power bill or generating so much heat that it either fails or needs a leaf blower of a fan. (That basically describes most consumer routers, for which, generally, the closest thing you can find to a standard Linux distribution supporting them is OpenWrt. Ahem.)

I also wanted it to have multiple network interfaces, as a router should. (This may or may not be relevant to the hardware decision, though; read on.) A router needs a minimum of two interfaces by definition: one for each network it routes between, so at the minimum, one for the LAN and one for the WAN. The scenario I had in mind was more complicated, with two separate LANs (one for my family and one for guests who just want to get their tablet on the internet), and leaving room for the possibility of multiple internet providers, so I’d need at least 3 interfaces, with the option to expand to 4 or more in the future. Now, these don’t all have to be physical interfaces built into the router. If the router has USB ports, you can add more that way; also if you have other physical infrastructure supporting VLANs, you can multiplex several networks over one physical port. (Again as a comparison to OpenWrt: standard consumer routers that OpenWrt runs on tend to have 5 ports and 2 network interfaces; one network interface is connected directly to one port labeled WAN, and the other interface is connected through a switch chip to the other 4 ports, which by default are bridged onto a single VLAN but which can be configured for 4 separate VLANs if that floats your boat.)

After getting some advice from friends and discussing it ad nauseum, I ended up buying a fit-pc2i, notable because it’s a standard x86 PC (so I can choose really any standard Linux distribution or even Windows to run on it), in a tiny passively cooled case, drawing 6W, and with 2 physical network interfaces. (I didn’t like the idea of depending on a bunch of USB network adapters, and I wasn’t sure I could rely only on VLAN support to get extra ports, so I wanted a 2nd port for insurance. Now that I’ve used it, I think a single reliable physical network interface + VLAN support would work out fine.) Those 2 network ports are not enough for my scenario, so I also bought a Cisco SG200-08 switch, which I use solely to add ports, turning 1 into 8.

Having made these decisions, I bought the fit-pc2i and SG200, installed Linux (Ubuntu 11.10 Server) and dnsmasq and shorewall, configured VLANs between the router and switch so that various switch ports acted like they were connected to additional eth1.x interfaces in the router, and started testing things. It worked fine until I tried a speed test (from a client connected through the new router which was connected behind my old router); the speed test promptly hung from the client’s point of view, and I couldn’t access the new router over the network at all. I power cycled the new router, tried again, same result. I poked around log files, tried to enable the Linux NMI watchdog, and generally looked for clues without finding anything until I visited the fit-pc forums and read “solution for freezes when scp/ftp/nfs with most Linux dist”. This pointed the blame squarely at the Realtek network interfaces, and suggested an alternate driver as a solution. Once I started investigating fixes for this, I got really pessimistic at first: a Google query for “r8169 freeze” shows a dismaying number of hits, many in distribution-specific bug reports going back years and years. I’d been under the assumption that networking is Linux’s lifeblood and that wired networking has long been a solved problem — wireless network hardware flaky under Linux, sure, any network hardware flaky under Windows, sure, but wired network hardware flaky under Linux? That was a rude surprise.

Long story shorter, the in-tree driver (open source and provided with Linux kernels) for this class of Realtek hardware is named r8169. It actually supports a family of Realtek chips named RTL8111/RTL8168, of which there are apparently many variants with important programming differences even inside the same PCI ID, so using lspci won’t necessarily tell you enough about which one you have. Realtek also has their own driver, also ostensibly open source but not included in the standard Linux kernel, called r8168. For years now, you can find blog posts saying “I had such and such a problem with r8169 and I switched to r8168 and it worked better.” So naturally, I tried r8168, and found it didn’t work at all. Upon further investigation, it has completely broken VLAN support (at least on my hardware, in the 8.027 driver that was current at the time, in the phase of the moon that obtained at the time): on a non-virtual interface it worked fine (and without freezing the kernel); frames that should have an 802.1Q tag added or removed had it done incorrectly, and would either (outgoing) get ignored by the switch, or (incoming) get ignored by the kernel. After spending hours running 3 instances of tcpdump (on the fitpc on the raw interface, on the fitpc on the virtual interface, and on a separate machine plugged into a switch port on the SG200 mapped to the same VLAN), I could characterize the problem: outgoing frames were transmitted with no tag, and get dropped by the switch. Incoming frames with a tag actually had it stripped and were dispatched properly. I found out about “ethtool -K” to control hardware acceleration of VLAN tagging (does this really benefit from hardware acceleration? More than it loses from the possibility of someone screwing it up?), disabled VLAN tag hardware acceleration in both directions, and found the opposite problem. Just by luck, at this point I re-enabled hardware acceleration for VLAN tagging only on the RX path, and things started working. But only on certain ports.

As a recap of what I found to be broken with r8169 and 802.1Q: as the driver loads by default, it improperly tags packets on the TX path. If I use “ethtool -K txvlan off”, TX works but RX packets are ignored. If I use “ethtool -K txvlan off rxvlan off” followed by “ethtool -K txvlan off rxvlan on”, TX and RX both work, but flakily — some ports and protocols work, some don’t, and I don’t know why but I’ve spent too much time staring at packet traces and I don’t care any more. The driver is broken out of the box, can be made to almost work by enabling and disabling VLAN tag acceleration in the right order through an order-dependent set of transitions reminiscent of port knocking, but still doesn’t entirely work, and I’m not going to trust it.

Then, back in r8169-land, I found an Ubuntu bug report, Network problem with the r8169 driver and RTL8111/8168B, in response to which people said the 3.1 kernel driver seems to work better than the 3.0 kernel driver, and Leann Ogasawara produced a 3.0 kernel with the 3.1 r8169 driver grafted in for people to try. So I tried it, and: lo and behold, while my repro scenario would still provoke a nasty warning and stack trace in system.log, there was no freeze.

At this point, I reported my findings to both the r8168 maintainers (Realtek) and the r8169 drivers (Linux netdev mailing list and Francois Romieu). Realtek didn’t respond at all. Francois did reply, saying he’d been fixing a bunch of problems in this area recently, and the 3.2 driver should work even better (this was last December, in the final throes of the 3.2 kernel release). I grabbed a 3.2RC7 kernel, installed it under the Ubuntu 11.10 install I was using, and it worked fine. No warnings, no backtraces, no freezes.

I haven’t touched the configuration since; after another week or so of testing I installed the fitpc + Ubuntu 11.10 + the 3.2RC7 kernel as our main router, and we haven’t had any problems with it. Hopefully, the Ubuntu 12.04 release (which already uses a 3.2 kernel) will install and run fine, and I won’t have to worry about this for another another 5 years since 12.04 is an LTS release.

Lessons learned here:

  • VLANs are cool, and I don’t really need more than one physical interface on the router the way I’m using it. I recommend the VLAN + separate switch as port splitter technique. But you do want a gigabit network interface if you’re going to do that.
  • Realtek was the bane of my existence for a few weeks in December. It looks like I just had bad timing, and if I’d done the same setup in April using Ubuntu 12.04 with a Linux 3.2 kernel I wouldn’t have had to learn any of this r8168/r8169 business. But given the history, I wouldn’t recommend their products. I went so far as to reconsider the whole fitpc choice. But in this form factor, it seems all the alternatives (including other fitpc products) use Realtec NICs.
  • shorewall makes configuration of NAT routing, firewalling, and traffic shaping much easier, in my opinion, than raw iptables and tc.
  • Aside from dealing with the Realtek issue, this was less of a time suck than I was expecting.
  • Including dealing with the Realtek issue, this was more of a time suck than I was expecting.
  • I’m happy with the result, though. Treating the fitpc-2i and SG200 as one unit, I have something that’s about the size and power consumption of the OpenWrt router, except now it’s got a 1 GHz x86 CPU, 1GB of RAM, 32GB of flash storage, 9 individually addressable network ports, and is still entirely solid state. Those hardware specs only matter inasmuchas they give me plenty of breathing room for future expansion (I don’t think my actual usage was taxing the much-lower-speced OpenWrt router), but the real bonus is it’s stable: OpenVPN, dnsmasq and miniupnpd are all behaving as they ought to.

OpenWrt is losing its luster

Ever since I discovered it in 2006, I’ve used, and been a huge fan of, OpenWrt for my home network router. I started running the venerable Whiterussian release on a genuine WRT54G, at a time when I wanted to tweak some configuration options that the stock firmware didn’t give me control over; I dabbled in alternate firmware like Tomato and HyperWRT but quickly found that their filesystem layout made them too hard to customize; then I discovered OpenWRT and found the design far saner.

While my original purpose in looking past the factory firmware (power boosting to increase range) didn’t work out well (it boosted the signal and the noise, with little effect on range — go figure — replacing the antenna worked far better), I soon found myself addicted to several features of a fully configurable router. Probably at least half of what I loved about OpenWRT can be attributed directly to dnsmasq, a wonderfully clever piece of software that acts as both DHCP and DNS servers, and coordinates across the two protocols, so that in general, all DHCP clients get entered in the DNS with reasonable hostnames. (This solves the same problem that Microsoft tried to solve with NetBIOS and WINS, and that Apple tried to solve with Rendezvous/Bonjour, but the dnsmasq approach doesn’t require any new protocols or software on any of the clients — DNS just works.) Beyond dnsmasq, here is a short list of things I ended up configuring my OpenWrt router to do, which I hadn’t realized I needed, but once available proved to be very useful:

  • OpenVPN server: set up an offsite file server at a friend’s house for offsite backup
  • avahi: to forward mDNS across the OpenVPN link as appropriate
  • VLAN support: in addition to my private LAN, I ran a separate semi-public network with wireless access points using a separate SSID bridged onto a separate VLAN which had access to the internet but not my LAN
  • tcpdump: whenever having network problems, the ability to sniff traffic on the router is invaluable in troubleshooting them

This worked great for years, but as time passed eventually the WRT54G got flaky and started rebooting every few days. When it came time to replace it, I wanted newer and faster hardware — compared to 2002-era hardware, I wanted more RAM, more flash storage, gigabit wired networking, and N-speed wireless networking. This was 2010, and such hardware was readily available. I picked a suitable-looking model with good hardware (Netgear WNDR3700), installed the current OpenWrt release (10.03, Backfire), and at first things were great1.

Then, what’s the problem, you may ask? Well, over time I started experiencing a raft of weird problems, to be detailed below, and slowly but surely I started associating them with the current version of OpenWRT. I tried newer versions, filing bug reports and asking for help in the forums, and even switching to different hardware, but eventually these problems piled up to the point I decided to switch away. I still have a lot of respect for the project and its volunteer developers, but it just wasn’t working out for me. My unprovable hypothesis is this: Openwrt originally started as a fork of the GPL-mandated open source drop of Linksys’ own WRT54G firmware, and thanks to Linksys’ own engineering and binary drivers for the specific hardware they used, the result wasn’t particularly clean but worked great. Over time, as OpenWrt started running on more and more hardware, and started tracking closer to stock Linux instead of vendor-provided customizations, and using open source drivers instead of opaque vendor binary drivers, it has to deal with more configuration sprawl and gets less benefit from the vendor QA and the result, while much cleaner, is also flakier. I don’t know that this is true, but in terms of pure stability, I never had a problem with Whiterussian on 2002-era hardware (Linksys WRT54G and Asus WL500G-P), whereas I saw all sorts of weirdness with Backfire on 2010-era hardware (Netgear WNDR3700 and Buffalo WZR-HP-300N).

On to the problem list:

  1. On the WNDR3700, internet connectivity would occasionally just stop. My LAN would be fine, except no contact with the outside world. I first blamed this on my cable modem and provider (hence this series of articles), and at the time I’d had such a good experience with OpenWrt that it was a long time before I thought to blame the router, but eventually I looked in that direction, and found eth1 would get into a stuck state that I could reset either by replugging the cable or using mii-tool to reset the media interface. I posted about this on the OpenWrt forum, without hearing much.

  2. Because of that problem, and because I couldn’t tell whether it was hardware or software at fault, I did the extremely scientific thing of changing two variables at once, and bought a different router (Buffalo WZR-HP-300N) and installed a newer Backfire (10.03.1) on it. I didn’t have the eth1 problem any more. But I did have a different and equally annoying problem at approximately the same frequency: dnsmasq would stop serving requests. Since it’s responsible for both DHCP and DNS, this was fairly crippling. I’m loathe to blame dnsmasq itself since I’ve never seen this behavior from it in any other install, and also because this was accompanied by weird system level behavior: when it got into this state, I was unable to kill -9 the dnsmasq process, nslookup processes on the router would also become unkillable, and the router would fail to soft-reboot — I had to hard power cycle it. I posted about this on the OpenWrt forum as well, to deafening silence.

  3. I’d see a low but present rate of DNS lookup request failures — a valid request would return NXDOMAIN, and repeated immediately would succeed. This caused a small amount of application-layer collateral damage and general flakiness. Of course, I don’t know whether to blame Backfire or dnsmasq or something else. But I saw this problem during the 2 years I used Backfire as my main router, and not before or since.

  4. The OpenVPN link would eventually stop carrying traffic. Traffic that should be routed over the VPN link would just disappear. At this point, I’d try troubleshooting first with tcpdump, then by adding verbose logging like echo-a-character-for-each-packet in OpenVPN (which requires restarting the server), then restarting the firewall a few times, and none of these changes would make a difference immediately, but after restarting both OpenVPN and the firewall a few times each (without making any configuration changes other than enabling logging), suddenly I’d start seeing the echo characters and packets would flow; then I’d disable the logging and restart OpenVPN and it would work fine for a few more weeks. Again, I have no proof that Backfire was to blame here. But as in #3, I saw this problem during the 2 years I used Backfire as my main router, and not before or since.

  5. My Xbox 360 suddenly found itself unable to sign into Xbox Live. I noticed this in January 2011 and while it’s hard to know exactly when it started or what changed, I hadn’t changed anything in the router configuration recently, so I suspect the Xbox dashboard update shortly before that made the critical change. But I blame OpenWrt, not the Xbox software, because I sniffed the traffic it was sending, and found weirdness. Specifically, the Xbox likes to use UPnP to open network ports, and I had miniupnpd enabled on the router to allow this, and in the broken state, the Xbox would make a UPnP request to forward UDP port 3074 to itself, then it would send outgoing UDP packets to something.xboxlive.com:3074, and the router would see that as eligible for the forwarding rule it just set up, and immediately reflect the Xbox’s own packet back to it. (It would also get NATted and go out on the WAN, and a few hundred milliseconds later a response would arrive from the real Xbox Live service, but by then the Xbox was already confused.) Meanwhile, the port forwarding wasn’t actually necessary because of the OpenWrt NAT implementation. So if I stopped miniupnpd, the Xbox would be able to sign into Live, but other things I run that want UPnP or NAT-PMP would break. I was able to have it both ways by blacklisting the Xbox from talking to miniupnpd, but this didn’t inspire confidence.

Once I mentally assembled this list into one place, I realized there was enough general weirdness here that I was no longer happy with OpenWRT, and I wanted something that would be completely stable and dependable. That made it tempting to buy a router with the right capabilities as advertised features out of the box, and stick with the stock firmware, but I didn’t want to give up on the flexibility I’d become accustomed to (especially, dnsmasq, ability to run tcpdump, and ability to install extra services). What I actually did about this is a story for another time. Here, I’ll just say that after replacing the OpenWrt router 3 months ago, I haven’t seen any of the above problems recur.

Note 1: Actually things took a little hacking before they were great, speaking of Backfire on WNDR3700 — I initially started setting this up before the final Backfire release, so I had to compile from source, which was fine because then and even now, 5GHz radio support doesn’t work in the precompiled releases. Too bad, since dual radio support was my reason for choosing the WNDR3700 in the first place.

Siri, keeping the “artificial” in “artificial intelligence”

A lot has already been written on Siri, most of it pretty polarized — either of the “it’s a gimmick nobody uses”, or the “wait, but I use it!” variety. I don’t mean to throw fuel on the fire, but I keep trying to use Siri and haven’t found it to work well enough to be worth the effort; here I’ll present some examples which are both edifying and amusing.

(There’s one glaring exception to “I haven’t found it to work well enough”: for straight dictation, it’s uncannily accurate. If it starts and stops listening at the right time and doesn’t encounter a network error — that is, when it works at all — it almost always transcribes exactly what I said. It’s strange to me that Apple brands the dictation feature as part of Siri, because to my way of thinking they’re entirely separate layers — dictation or transcription for turning spoken sounds into text, and then Siri for assigning a meaning to that text and acting on the meaning — but as far as I can tell, Apple considers dictation to be a Siri function, so I have to give her credit: she’s great at understanding the words I say.)

In the following real-usage examples Siri, while great at understanding what I say (in all these cases the transcribed query is exactly what I meant to say) is not so great at understanding what I mean.

Flight status:

IMG_0326

“I can’t help you with flights.” Actually she did understand what I meant, but doesn’t know how to help. That’s too bad; that seems like a really natural function for a personal assistant, and one that lends itself well to machine structured representation and constrained search. I expected Siri would knock this one out of the park.

Appointment reminder:

IMG_0327

“3 weeks from today” gets interpreted as “1 week from today”. (This screenshot was taken on 2/9, as you can see from the blue marking, and 2/16 where Siri put the appointment is decidedly not 3 weeks later.) Again, Siri is often credited (and advertised) as being good at setting up appointments; I expected her to knock this one out of the park.

Another appointment reminder:

IMG_0364

“5 weeks from today” gets interpreted as “tomorrow”. Otherwise, so close.

Knowledge question:

IMG_0438

This one is more of a stretch, and I wouldn’t have been surprised had Siri fallen back on generic web search, but instead she tries to interpret it and get the answer from Wolfram Alpha, so points for that. But — and the reason this rates inclusion in this post — the interpretation she (and/or Wolfram) chooses is straight up bizarre. (I really had to scratch my head to deconstruct this. It decided that “king” refers to a magazine which apparently has a circulation of 223,000; it decided that “bed” is an abbreviation for “banana equivalent dose”; I have no idea why or how it thinks those concepts are related. If you actually go to wolfram alpha and search for “king size bed”, it actually understands the topic is “bed sizes”, though it has no information. If you search for “king bed”, it does take “bed” to mean the radiation unit, does not take “king” to mean the magazine, and clarifies “assuming king is a unit” while offering to “use king bed as a bed size topic instead”. So I think Siri deserves the credit for taking this from crazy to awesome.)

And finally, accidental bonus humor:

IMG_0436

OK, this wasn’t meant for Siri’s ears at all (it’s not even a question); in fact I thought I was dictating a text message in the Messages app. (Having input focus in the wrong spot is a strange and niggling complaint for a touch platform like iOS, but this is what happens.) Siri’s interpretation of why I’d be telling her about a rat eating my compost is, however, both noteworthy and disturbing, especially for the establishments it recommends (neither of which deserve this treatment, I can vouch for).

On a final note, I’ll reiterate that these are not contrived examples; excluding the compost-eating rat which was a mistake, this list represents the last 4 times I tried to use Siri with real intent, and at least 3 of these are clearly in her domain. With results like these, I don’t know that I’ll keep trying.

Headphones that let sound in but not out

When our son Dominic was born, I knew that my wife and I (and he) would start having diverging sleep schedules for a while, and I suddenly discovered a need for something I’d never considered before: headphones that let ambient sound in but not out (and also umpteen pieces of more traditional baby paraphernalia, natch). The reason being: if they’re resting or sleeping and I’m not tired but want to stay in the same room, can I work or entertain myself without bothering them, and also without completely isolating myself from them? If I’m listening to music and Vanessa talks or Dominic squawks, I want to be able to hear them.

This is the opposite of what people usually want when using headphones: usually, you want the best fidelity possible for the sound playing in the headphones, without outside noise or distractions. Allowing ambient noise to intrude on the music is not normally the goal. So, it’s not a problem with many ready solutions.

I’m familiar with the basic headphone designs: in-ear, and open- and closed-back on-ear. Open-back headphones tend to be comfortable and provide a good soundstage, but don’t isolate your ears from the environment (in either direction), so they’re not suitable for noisy environments (you’ll hear the noise mixed with your music) and they’re not suitable for quiet environments with other people nearby (they’ll hear a tinny, audible, annoying version of what you’re listening to). Closed-back and in-ear headphones tend to have a physical seal between your ears and the outside world, with the headphone drivers on the inside of that seal, so they isolate you and your music from the environment — good if you don’t want to annoy nearby people, or if you don’t want to hear ambient noise, but not so good if you need to react to what’s going on around you.

In addition to these different physical and acoustic designs for allowing or blocking sound transfer, there are also active noise-canceling headphones, which are of the closed or in-ear variety (to physically block most sound transfer), but further have microphones and some active electronics to pick up ambient noise, negate it, and play its negative mixed into your audio stream, to really minimize ambient noise. These are popular for use on airplanes or other loud environments.

What I’m looking for here can’t be an open design because those let too much sound out, and it can’t be a passive closed design because those let too little sound in. Instead, what I want is basically a variant of the active noise-canceling design with a reverse switch, so that it uses the same circuitry but instead of negating ambient noise and mixing it with the audio stream, actually amplifies ambient noise (to a degree proportional to the volume of the audio stream) and mixes it with the audio stream. Trouble is, I don’t know of, and couldn’t find, any noise-canceling headphones with a reverse or “uncancel” feature. (When I attempted looking for this, I did find a few people looking for the same thing, mostly for use while running/biking near traffic, and most of the time they were told to use open headphones, since outgoing noise transfer isn’t a concern in that case.)

While I couldn’t find any noise-uncanceling headphones specifically designed or marketed as such, I did find two families of products that do essentially the same thing. The first is video gaming headsets with voice chat support; in addition to headphones they have a microphone for chat, and while that’s there for the purpose of sending what you say to other people, they also mix your own microphone input back into your ears so you can hear yourself talking. The second I wouldn’t have come up with on my own, but was suggested by a gun-nut friend when I posted a question about this on Facebook: electronic ear muffs designed for use around intermittent loud noise sources like heavy machinery and, er, guns, which attempt to block dangerously loud noises but otherwise keep you immersed in your environment. By design, they’re more concerned with incoming sound than outgoing, but the best way of protecting your ears is to seal them away from external sound and then electronically add back only what they think you want to hear, and that sealed design minimizes outgoing sound leakage too.

Having learned this, I tried both of these approaches. I bought Astro’s Mixamp 5.8, which has quite the bag of tricks: it connects to any headphones and makes them wireless; it also uses signal processing tricks to try to make a Dolby Digital signal sound like it’s coming from a multi-speaker surround sound system even though the headphones only have 2 drivers; it also mixes the microphone input back into the headphone output, and even has a mixer dial to fade between “chat” and “game” audio. But you don’t have to use this with a game system; you can also connect the transmitter to a music player, in which case “chat” will just be your own voice, and “game” will be the music.

I also borrowed an Impact Sport Electronic Earmuff from another gun-nut friend. It’s battery powered and by default self contained; it feels like wearing construction earmuffs until you turn it on, at which point you hear a faint hiss as it channels ambient noise from external microphones into speakers on the inside of the sound seal. But it also has an “aux in” jack which lets you feed in music or other audio from any sound source with an analog output.

How’d they work? The Mixamp, not so well for this purpose — it has many other nice properties and does a good job of what it was actually designed for, but I couldn’t get the mic on any headset I tried to pick up enough ambient noise to un-isolate me from the room. Now, the chat headsets I tried have directional mics designed to to pick up the wearer and nothing else, so they’re completely not designed for the purpose I was using them for, and I can’t really blame them for doing a poor job. I also tried a couple of separate omnidirectional mics, but couldn’t get them to pick up any sound at all when used with the Mixamp. I’m no microphone expert, and don’t know what kind of microphone you’re supposed to use with this, but I’m guessing the output level of the ones I tried was too low. Anyway, I think the theory is sound, and if I had the right mic (omnidirectional, sensitive enough, and with the right output level), the Mixamp would do what I want, but I couldn’t get it to work in practice.

On the other hand, the electronic earmuffs? Uncanny. When you first put them on, you can hear almost nothing (like passive ear protection, they’re designed to present a strong acoustic seal). Then you turn them on — and auditorily, they disappear, and you can hear what’s going on around you. Then you plug in an external audio source, and suddenly you’re marching to the beat of a drummer only you can hear, while still hearing what’s going on around you. Exactly the effect I was looking for, and, I repeat, the effect is uncanny. I had no idea things like this existed. Now, this specific set of earmuffs is not exactly perfect for my needs — they’re stiff and not comfortable to wear for long periods, and they’re certainly not audiophile sound quality — graded as headphones they’re not great, but as something that lets me hear ambient noise while mixing in a private audio track no one else can hear, they totally win. And these are just one example of a product category; there are many others available from other companies, some of which are probably more comfortable and/or better sounding. (Some of these are designed for all-day wear for police and soldiers, so they’d better be comfortable.) All at a price, of course. But they do exist.

Moral of the story: these electronic earmuffs are pretty cool. And it’s good to have gun-nut friends.

Diesel

When Vanessa and I visited France in the fall of 2010, we spent a week driving around in a rental car. I didn’t think too much about the car when renting it — one broken thing about the rental car industry is it’s really hard to know what kind of car you’re going to get until you pick it up — so I just reserved something cheap, and when we picked it up, it turned out to be a Citroën C4 Picasso with a diesel engine. Nondescript enough, at first, but after a week on the road with it, I noticed impressively high fuel mileage — somewhere above 50 mpg (approximate since I was doing the calculations in my head and converting from liters and kilometers).

This 50+ mpg result was while driving around on the good highways where the speed limit is 130 kph (80 mph). That got me thinking. The only mainstream cars in the US that gets gas mileage like that are gas-electric hybrids (e.g. Prius), and even those not at 80 mph; meanwhile, compared to the Prius the C4 has more usable space, and is more fun to drive thanks to its manual transmission. A car that beats the Prius on performance, packaging and mileage — why hasn’t this caught on in the US?

Fast forward most of a year to when Vanessa and I found out she was pregnant, and I decided that’s enough reason to sell my 2-seater convertible and get something more practical, i.e. with back seats and a hard top. And thanks to that Citroën C4, I decided it should be a manually-shifted diesel.

Once I started shopping, I realized I had a pretty short list to choose from. The US auto manufacturers think diesel is for trucks; there are no US-made diesel passenger cars sold here. The Japanese manufacturers make diesel passenger cars for sale elsewhere but don’t import them to the US. That leaves European companies: Audi, BMW, Mercedes and Volkswagen all sell diesel cars here. But only Volkswagen offers even a single stick-shift model (shame on you, Audi and BMW — I get pretty annoyed surfing audi.de or bmw.de and looking at the drivetrain combinations available in Europe and not here, but that’s a story for another time).

What I would have chosen if it were available here, or if we lived in Europe: either an Audi A3 or BMW 3-series with all-wheel-drive, stick shift and turbodiesel engine. What’s available here in the US combining a stick shift with a diesel engine: Volkswagen Golf, Jetta and Passat.

So I’m now driving a 2011 VW Golf TDI. But before I made that decision, I realized I needed to understand whether and why diesel vehicles are really more efficient. Actually, I had 3 questions: - What is diesel fuel, actually, and does refining it from crude oil come at the expense of usable gasoline? - Why do diesel engines get better fuel economy than gasoline engines? - Why does diesel generally cost more at the pump, compared to unleaded gasoline?

which combined into a suspicion that perhaps diesel is effectively condensed gasoline, and that it costs more and drives you farther because it’s just using up more of the original crude oil energy.

Off to Wikipedia: I needed to read the articles on Diesel fuel, Diesel engines and petroleum and crude oil. It turns out my half-baked hypothesis was well-intentioned but completely wrong. You can read the article yourself, but to summarize and answer my questions: crude oil is a mix of hydrocarbons of various chain lengths, ranging from heavy and less volatile (diesel) to light and more volatile (gasoline, kerosene, etc). Refining oil is the process of separating out these different hydrocarbons; when you refine a given amount of crude oil you get some fixed quantity of diesel, gasoline, and various other products. It’s not the case that there’s a tradeoff between producing diesel vs. the other fuel products; in fact when you produce one (in a modern refinery) you produce them all, and you want to use and sell them all for best efficiency. (Contrast this with the situation in the 1800s, when the only petroleum product we knew how to use was kerosene, so they’d separate that out and throw the rest away.)

This also means there’s no fixed price relationship between diesel and gasoline; the prices are set by market forces and supply and demand; that explains why diesel (in California in 2012) generally costs more than even premium unleaded, but in other places and times it’s been cheaper than regular unleaded.

Finally, the efficiency advantage: it turns out diesel is denser than gasoline, both in terms of mass per volume and energy per volume (this falls out from the fact that they’re roughly equivalent in energy per mass). Diesel’s energy content is about 15% higher than gasoline, by volume. This translates directly to a 15% miles-per-gallon for diesel engines. Additionally, the Diesel cycle is inherently about 20% more efficient than the gasoline powered Otto cycle, apparently due to higher compression ratio.

What this means is that a diesel engine will typically get 30-40% better fuel mileage than a comparable gasoline engine, about half of this advantage due to the energy density of the fuel and half due to the higher compression ratio.

So if your goal is to use less of a scarce natural resource and generate less polluting emissions, diesels do have a real advantage over gasoline engines. A pure diesel powertrain is cheaper and lighter than a hybrid gas-electric powertrain like that of the Prius, and compares favorably for highway mileage, though not so favorably for stop-and-go driving. A hybrid diesel-electric should be substantially better than gas-electric for both city and highway driving, I would imagine (diesel-electric hybrids exist in transit buses, but at this point, there are no such passenger vehicles).

As for my Golf TDI: I get about 42 mpg door to door on my standard commute (37 miles, most of which is 75 mph highway driving). I can also typically go 500 miles on a tank of fuel, which is a real convenience advantage over most cars. I wouldn’t call it a sports car, but it is zippy and fun to drive. Compared to other high-mpg options available in the US, it strikes a pretty good balance between performance and fuel economy — I’d like to see more diesel options available here including diesel-electric hybrids, though this may be a passing phase as pure electric cars become more widely available. (Note that the Citröen C4 I started this post with was significantly more efficient, probably due to a smaller engine. I already bemoaned the lack of powertrain options that the Europeans deign to import here, and chose Volkswagen because they alone import manually shifted diesels, but they’re not immune to the curation effect either; in Europe they sell a smaller diesel engine that gets 50+ mpg, but the US gets only the 2 liter version for which mpg figures in the low 40s are typical.)

Off-the-cuff reaction to OS X Mountain Lion

After reading about Apple’s new Mountain Lion, including coverage from Daring Fireball and Ars Technica, my immediate reaction is:

  • Address Book is being renamed Contacts? Great. I can’t tell you how many times I’ve typed Command-Space, C-O-N, trying to launch Address Book via Spotlight.
  • So long Growl? I already said “so long” to Growl months ago, since I don’t like the way old notifications pile up. Notification Center, at least in its iOS incarnation, is much less intrusive.
  • Messages (the app, with support for, er, messages via iMessage) from the desktop makes perfect sense.

But top to bottom, from today’s announcements, this feels much more like a collection of iOS apps appearing in OS X, not a revamp of the OS itself. Where are the system-level changes? OK, Notification Center is system-wide, and there’s more iCloud support including document save/load directly to iCloud, and there’s Gatekeeper (the ability to ban installation of unsigned apps). AirPlay mirroring will be useful, but it’s too bad the current AppleTV limits the output to 720p. Still, most of the changes (as documented today) are at the surface.

On the other hand, recent OS X releases (especially Snow Leopard, but also to a large degree Lion) were about laying new system infrastructure, so maybe this is perfectly appropriate. Also, compared to Windows, OS X distinguishes itself more and more by its useful collection of built-in apps, so improving and extending that collection is fair game. And finally, the name signifies it’s a new lion, not a whole new cat.

From which I conclude the meta-news here is that with a plan to update the OS annually and the ability to roll out these upgrades via the Mac App Store for $30, we should get used to regular incremental updates, not revolutions every blue moon. Which is a pretty sane way to develop quality software, really.

Nest “C” wire update

My Nest thermostat review complained about up-front lack of obviousness about “C” wire compatibility issues, and Nest support replied with a tweet saying

We’ve updated the compatibility tool so there’s a note about C wire, but you’re right - it could be clearer.

I took a trip back through the compatibility wizard, and (unless there are more changes coming) it’s there but it’s still not obvious.

First, I tell it what I’ve got:

Then, it tells me what to expect:

I’m compatible, no reservations, it says. Unless I click through the “Learn more” link, which from the context sounds like it’s going to help me decide whether I can install it without help, not whether it’s compatible with my wiring. That link goes to a “Does Nest need a common wire?” article, which has some useful information, including recognition of the possible problem, a suggestion to do what I did (turn another wire into a C wire if possible), and a link to even more details.

That’s the right information, but I wouldn’t expect to find it hidden behind that “Learn more” link, and I still think anyone who clicks through the compatibility wizard without a C wire should see it. Maybe these problems are rarer than my small sample size indicates, but on the other hand, this “power stealing” problem doesn’t seem like an easy one.

C’mon, Nest — I like the product, and now that I’ve got mine working, I expect to remain a happy customer. But for an issue this subtle and important, I think you’d do well to err on the side of caution.

Nest learning thermostat review

I bought two Nest Learning Thermostats for my house, which has two furnaces. Here’s how they’re working out for me.

Ordering

I almost bought them on the day they were announced back in November, but dithered, and the initial batch sold out; I got on the waiting list and waited. In early January they sent me an email allowing me to place an order (for up to 5 units); I ordered two; they arrived Friday and I’ve spent the weekend playing (er, working) with them.

One thing that happened between when I didn’t order them the first day and when my turn to order came around was Marco Arment’s post on Nset incompatibility with his existing wiring. Having read that, I semi-carefully checked my own wiring against Nest’s online compatibility wizard before ordering (which I might not have bothered doing if I hadn’t read Marco’s post, given that my furnaces and wiring are only 3 years old, so I assumed they’re modern). When I say “semi-carefully”, I mean I looked at my wires and verified them against Nest’s tool (which says I’m fine), and I considered Marco’s problem (2 wires, red and white, are enough for a basic thermostat to control heating, but not to power or recharge the thermostat without activating said heating) and made sure I had a 3rd wire and figured I wouldn’t have that problem.

In fact, when my thermostats arrived and I installed them, I also uttered a triumphant but premature tweet about having modern wiring. If only. If you’re reading carefully — more carefully than I had read Marco’s post — you’ll note I said I had 3 wires, but not which 3 wires. And, as it turns out, I had it wrong. No, I didn’t have a C (common) wire; my third wire was a G (fan) wire.

Unboxing and setup

The packaging is nice; the actual thermostat unit is nice; the base which you attach to the wall has a built-in level, and pressure-fit connectors for the wires instead of screw terminals. It’s obvious the device was carefully designed and built without taking shortcuts — which had better be true, given the price.

It was fairly easy to take my old thermostat off the wall, note how it was using the wires, reconnect the same wires (Rh, W and G, again) in the same capacity to the Nest base, attach the Nest base to the wall, and then press the Nest device onto the base. It didn’t take long to do this twice, in fact — once upstairs and once downstairs.

The next step is to connect the thermostat my Wi-Fi network — entering Wi-Fi passwords on small devices without keyboards is usually no fun, but entering text on the Nest’s rotating dial is surprisingly satisfying. Kind of like using a combination lock from high school again, but less confusing. As soon as it connected to the network, the Nest found itself a software update, invoking the download-and-reboot cycle that’s the bane of many modern smart devices. In fact, it seemed to reboot twice. The downstairs device survived this fine and next wanted me to join it to a cloud management account; the upstairs device asked me what time it was, which is a bad sign for something that’s supposed to be connected to the internet. It turns out it had forgotten how to connect to the network, so I had to enter my Wi-Fi password again.

After this setup phase — which took a little longer than I’d expected due to all that rebooting — the thermostats were active, ready to use manually, and, ostensibly, learning my habits so I don’t need to tell them what to do.

Use

These thermostats are easy on the eyes and easy to use; you just turn the dial to change the temperature, or push in on the dial to bring up the onscreen UI with easy-to-understand options for setting home/away, turning the whole thing off, or invoking manual scheduling settings I haven’t bothered playing with.

If you create a Nest account and install the smartphone app or use their website, you can monitor and change the temperature from anywhere, which is pretty darn space age if you ask me.

One minor glitch is that the just-approach-to-wake-screen feature seems to only sometimes work; the Nest is supposed to recognize when you approach and greet you by turning on the screen, but sometimes it doesn’t turn on until I actually touch it, sometimes it only turns in when I stick my hand within inches, and sometimes it turns on when I just walk by. This would be a minor annoyance, but it also makes me wonder how well the auto-away feature will work, since that’s also based on a proximity sensor (maybe the same one, maybe a different one, I don’t know).

Two simple things my old, not-worth-blogging-about thermostat did that the Nest doesn’t: show you the time on the main screen (my wife and I are used to using the thermostat as a clock on our way out the door, as it’s conveniently located for that) and remind you to change the furnace filters. Oh well, not a dealbreaker, and maybe these features will appear in a future software update.

One design deficiency that wouldn’t be worth complaining about with any other thermostat, but is noteworthy with the Nest now that they’re trying to apply good design to elevate thermostats to the level of art object: off-angle viewing of the display is terrible. It looks great from straight on, but thermostats are usually mounted at chest level, not at eye level, and used from fairly close by, so in normal use I’m looking down at the display at a fairly steep angle. Part of the problem is the LCD display itself, and part of the problem is the convex lens sitting over it; it’s really not putting its best face forward in actual use.

Problems

The second day after I installed the two Nests, I heard a clicking sound coming from the heating ducts. Upon inspection, the downstairs furnace was working fine, but the upstairs furnace was making the clicking noises, which were resonating through the ducts. I fiddled with the thermostat; it wanted the furnace on but it wouldn’t come on. I turned the thermostat off; the clicking stopped. I turned the thermostat back on, and it turned on the furnace for about a minute, then the furnace turned off and went back to clicking. I fiddled with the thermostat for a while longer, and was able to provoke various permutations of this behavior, but not a working upstairs furnace.

I called Nest support, and got a really helpful and knowledgeable guy who suggested I swap the upstairs and downstairs units, to help rule out a problem with the thermostat itself. After the swap, I still had problems upstairs and not downstairs — apparently the problem is with the furnace, and not the specific Nest unit. His hypothesis was that my furnace was so sensitive to voltage that it would work fine if the thermostat just bridges the Rh and W wires, but if the thermostat pulls any power at all from those wires, the furnace relays get confused and start toggling instead of staying on. Just for fun, I measured the wiring at both thermostats with a multimeter and couldn’t detect much difference between upstairs and downstairs (in both cases, the multimeter saw 28 V across Rh and W with the thermostat detached and the furnace not running, and 0.485 A across Rh and W with the furnace running), but the Nest’s built-in technical details monitoring report claimed voltage dropped from 29 V upstairs when not running the furnace to 7.77 V when running the furnace (and compared to 9.19 V downstairs). Is that 1.4 V difference enough to explain why one would work and the other wouldn’t? I don’t know.

When I called back to Nest support with these results, the tech support guy said this sounds like a power stealing problem and the probable solution is to run a C wire. I asked why two nearly identical (same model, different size) furnaces that are only 3 years old would be different in this regard, and he didn’t really know.

Note this isn’t the same problem Marco wrote about, though I’d likely have that problem over time too; his problem happens if the furnace doesn’t run very often and the Nest needs to charge itself without running the furnace; in my case the fully charged Nest couldn’t even run the furnace when it should be on.

Gripes

Beyond the fact that the thermostat can’t run one of my furnaces, it’s annoying that the Nest compatibility tool doesn’t even warn of this problem.

A few other things that came up with I was on the support call: first, when I was first describing the problem and giving information on how I’d connected the Nest et cetera, I asked if they were already collecting diagnostic information in the cloud since the thermostat is smart and internet-connected, and was told no. I find that surprising and disappointing — a missed opportunity. (There’s maybe a privacy concern, and I don’t want them selling data on how often I run my furnace or how often I’m home vs away, but I certainly wouldn’t mind them collecting diagnostics to improve their product.)

Also while chatting with the support tech, I asked if there was a way to get the Nest to display a clock on the main screen, and to remind me when to change the furnace filter, and was told no on both accounts; then I asked if I could file a feature request for those and was told that not only do they not take feature requests from customers, but that the support techs are prohibited from talking to the product designers, both for legal/intellectual property reasons. I joked that this must be because Nest was founded by people from Apple — famous for not being persnickety about taking ideas from people who might then claim ownership of said ideas; what can you say about a company with an official “unsolicited idea submission policy”? — and he said yeah, if you read our privacy policy it has a lot in common with Apple’s. But come on. It’s not like there’s any intellectual property in the idea of a thermostat displaying the time of day. And more than that: a policy that says a company can’t listen to its customers is a customer-hostile policy.

Solution

Basically, to get the Nest working with my upstairs furnace, I need to give it a C wire. I did consider running new thermostat control wiring, but when I looked at where the wires run, it’s not worth the trouble — there’s no way to rerun the wires without opening walls.

But. What’s this G wire for, anyway? Following the Transtronics wiki Marco linked to, and confirmed by observation, the thermostat doesn’t need to tell the fan when to run while heating — I have gas heating and the furnace itself controls the fan. Now, the thermostat can also tell the fan to run without the heat, and it turns out that’s all the G wire is good for. After checking with my wife, I confirmed we almost never use that feature (and it was relatively easy to access before, with a switch on our old thermostats, whereas with the Nest it requires digging through the settings menu a bit, so we’re even less likely to do it).

Thus, the solution: repurpose the G wire as a C wire. We don’t really need direct control over the fan, and if I do, I can run new wiring for that, to locations that aren’t as hard to get to as the thermostat locations. (Note that this works for me because we have gas heat; electric heaters don’t control the fan themselves, so removing the G wire would not be a good idea.) So, I had to turn off the furnace and reconnect the green wire on that end too, moving it from the G terminal to the C terminal. This quick-and-dirty solution satisfies my hacker instincts, and after a couple days of use, the Nest is much happier: the Nest’s voltage monitoring reports 36 V at 100 mA, instead of 8 V at 20 mA without the C wire, and the furnace runs when it should.

Summary

I like it; I’ll keep mine.

But based on my experience, I can’t recommend the Nest thermostat unless you can give it a C wire. And I’m disappointed that all the information Nest will give you before buying doesn’t point out even a whiff of such problems.

Update: Nest support tweeted me back to say they’ve updated the compatibility tool with a note about the C wire, but in my opinion it’s still buried where it’s too easy to miss.

Observations on mobile platform speed

Over the past 5 years, I’ve used smartphones from Apple and Palm (powered by iOS and webOS), and played around with Android devices over the same timeframe though I’ve never owned one for daily use.

Still, I find the different design choices at the system level to be very different, and the results pretty much what you’d expect for each one:

  • webOS as Palm originally shipped it: apps are written in Javascript and run in an interpreter; the platform doesn’t swap out virtual memory. I never saw apps crash; it was pretty common to get out-of-memory errors (in the guise of a message that says you’re trying to do too many things at once and need to close some apps). Apps were pretty slow.
  • webOS as reinterpreted by the hacker/modder community: apps are still Javascript running in an interpreter, but now there’s a swapfile pretending there’s more memory. (I think Palm liked this patch enough they eventually folded it back into their official kernels, actually. And if swapping caused a hit to speed, it was compensated for by the fact that the hacker kernels also overclocked the CPU.) The result with this was a platform on which apps never crashed, and also never run out of memory, but they could be glacially slow. The poster child here was the Google Maps app, which often took so long just to launch — over a minute, no kidding — that it would apparently become confused about why it had launched in the first place, and just draw a white screen. (The bit about confusion is just my narrative, but it often would draw nothing but a white screen, and that seemed to correlate highly with the times it took forever to launch.) You can certainly make the argument that an app that takes a minute to launch to a broken state is no more helpful than an app that crashed; there’s no panacea here.
  • iOS: apps are written in Objective C and compiled to native code; there’s no swapfile I’m aware of. There are two results here: apps run fast, and they crash often. They crash because they segv from bad code, they crash because malloc returns null and they segv instead of checking for that, and they crash because the platform kills them for using too much memory.
  • Android: apps are written in Java and run in a JIT-compiled VM (which has gotten faster over time, notably in the 2.3 release); I don’t know if there’s a swapfile, but I suspect stock Android releases don’t have one and mods like CyanogenMod allow one. As I said above, I don’t have deep experience with Android devices, but my general perception is that on contemporary roughly equivalent hardware, it feels significantly faster than webOS and significantly slower than iOS. (I’m talking about user perception of speed here, which has to do both with how responsive the UI is and how fast real work gets done. iOS and webOS both pay more attention than Android to keeping the UI responsive, to the best of my knowledge.)

So. This evidence is relatively unsurprising and easy to explain — interpreted code that can have as much memory as it wants never crashes but takes forever to do anything useful; native code under tight memory management is much faster and much more crash prone — what’s interesting to me is that all 3 of these approaches were considered viable in the marketplace. And I don’t even know that the dog slowness of webOS is the reason it failed in the marketplace (and if HP had poured the right resources into it and actually shipped what they said they would in 2011, a better optimized webOS 3 running on 2011-class hardware might not have even felt slow). But even discounting webOS, the other two approaches are both provably market-viable.

Space limiting your Time Machine network backups

As part of the aforementioned office-quieting project, I wanted spinning disks out of the office, so I garbage collected 2 1TB drives from external enclosures that had served for Time Machine, and moved them into a NAS enclosure in the basement.

That solved the noise problem and gave me a bunch more network-attached storage, but turned off Time Machine; the next step was to re-enable TIme Machine but back up to the network.

A few years ago setting up Time Machine to back up over the network, to anything but the Time Capsule mini-NAS that Apple designed for it, took some minor rocket science (and in my experience caused no shortage of kernel panics on the client machines); now it’s more stable and easy to set up, especially if the network file services are provided by Apple’s own AFP server. I have some free space on another RAID array attached to a Mac Mini also in the basement, perfect for this sort of thing, and so all I had to do was mount that drive from the client machine, then go to the Time Machine prefpane and select it for backup. Time Machine creates itself a disk image and goes to town.

The one problem with this is that it creates a disk image with the same size as the underlying physical volume. It’s a sparse image, so it doesn’t immediately fill the whole volume, but it will grow to do so over time. That’s not good, since I want multiple Time Machine backups to be able to share that volume, and they’re not the only thing that lives there.

Googling for solutions to this, I found an article on how to pre-create the sparseimage with whatever size you want. I tried that, but when I enabled Time Machine, it ignored the hostname_macaddress.sparseimage directory and just created a new hostname.sparseimage directory next to it. (Which, IMHO, is a good thing, since keying the backup name from the MAC address is not going to work well with machines with multiple network interfaces, for example a laptop which is sometimes using Ethernet and sometimes using Wi-Fi.) Maybe that’s a holdover from a previous OS version; who knows. So then I tried precreating the image file as just hostname.sparseimage, but then when I enabled Time Machine to the same volume, it noticed the existing one, decided not to use it, and created “hostname 1.sparseimage” instead.

Then I stumbled on a simpler recipe:

  1. Enable Time Machine the normal way (mount a network volume, open System Preferences, go to Time Machine preferences, click Select Disk, and choose the network volume).
  2. Let Time Machine do its thing. It will create a sparse image there with the same size as the underlying volume, and do the initial backup, and (eventually) unmount the sparse image.
  3. Later, when Time Machine is not running, use hdiutil to resize the sparseimage to something smaller. I used “hdiutil resize -size 750g hostname.sparseimage” (down from its original 3TB size).

Boom. Seems to work fine. After having backed up a little over 400GB, Time Machine now displays the backup status with “Available: 385 GB of 3 TB”, so after backing up another 385GB, it’ll start pruning the backup set, instead of filling the volume and getting confused.