The following document is intended as the general trip report for me at the 18th Systems Administration Conference (LISA 2004) in Atlanta, GA from November 14-19, 2004. It is going to a variety of audiences, so feel free to skip the parts that don't concern you.
While the conference started today, I was not there. In order to save money (personally important after being unemplopyed for 2 of 2.5 years or so), I wasn't going to arrive until Monday night, saving 2 hotel nights and 2 days' food costs. In retrospect, work decided to pay — and told me so in the week before the conference itself.
Today was my travel day. Headed off to Logan, getting there just in time to not catch an earlier flight to Atlanta. The flight was uneventful (which, given some of the horror stories I'd heard about Delta, pleasantly surprised me); I got my bag and headed off via MARTA to the hotel.
Got to the hotel and found a small problem with the room reservations. Seems that when Philip got there on Saturday, instead of my reservation (which had his name on it) they checked him in under his cancelled one (at the wrong, higher rate). So when I got there on Monday and they claimed he was checked out, I checked into a new room — but had to remind them that they needed it at the corporate (not conference) rate. Once that was done, and his name confirmed on my portfolio, I headed off to dinner with a large group of people. (22 or 24 of us, Indian, done as a pseudo-buffet thing: lots of chicken, lamb, shrimp, and veggies.) After dinner, back to the hotel, and I called Philip's room. And got him. (How can I get ahold of a guest who, according to the computer, is already checked out?) We headed down to the front desk where we had the idiot desk clerk call his manager, who was good enough to (a) get Philip and I checked into the same room at my lower rate, and (b) comp Philip for Saturday and just charge him the higher rate for Sunday. (They had no record of charges for him for a Saturday night stay, despite him obviously checking in on Saturday. Oi. So far I'm not impressed with the staff of this hotel.)
Once that was straightened out, I hit up the hottub and then hung out with folks in the lobbies — there were 4 lobby levels: the garden level, with a 2-story atrium and an upstairs bar overlooking it, and access to the skybridge to the attached mall; the lobby level, where hotel registration and the street exit was; the convention level, where most of our programming was, including the Marquis and Imperial ballrooms; and the international level, accessible by a different set of elevators and escalators, where the reception would be Thursday night — for a while before bed. Nice conversations and the majority of my hallway tracking.
Tuesday began with the Advanced Topics Workshop, once again ably hosted by Adam Moskowitz. We started, unlike usual, with a quick overview of the new moderation software. We followed that with introductions around the room — in representation, businesses (including consultants) outnumbered universities by about 2 to 1, and the room included 3 past LISA program chairs and 6 past members of the USENIX Board or SAGE Executive Committee.
We had our usual interesting discussions on a variety of topics. Our first topic was introducing the concepts of disciplined infrastructure to people (e.g., it's more than just "cfengine" or "isconf" or something), or infrastructure advocacy, or getting rid of the ad hoc aspects. Some environments have solved this problem at varying levels of scale; others have the fear of change paralyzing the systems administration staff. One idea is to offload the "easy" tasks either to automation (but avoid the "one-off" problem and be careful with your naming standards) or to more junior staff so the more senior staff can spend their time on more interesting things than the grunt-work. Management buy-in is essential; exposing all concerned to LISA papers and books in the field has helped in some environments. This is, like many of our problems, a sociological one and not just a technical one. Remember that what works on systems (e.g., Unix and Windows boxes) may not work for networks (e.g., routers and switches), which may be a challenge for some of us. We also noted that understanding infrastructures and scalability is very important, regardless of whether you're in systems, network, or development. Similarly important is remembering two things: First, ego is not relevant; code isn't perfect and a developer's ego does not belong in the code. Second, the good is the enemy of the perfect; sometimes you have to acknowledge there are bugs and release it anyway.
After the morning break, we discussed self-service, where traditionally sysadmin tasks are handed off (ideally in a secure manner) to users. Ignoring for the moment special considerations (like HIPAA and SOX), what can we do about self-service? A lot of folks are using some value of web forms or automated emails, including the business- process (e.g., approvals) not just the request itself. One concern is to make sure the process is well-defined (all edge cases and contingencies planned for). We've also got people doing user education ("we've done the work, but if you want to do it yourself the command is..."). Constraining possibilities to do only the right thing, not the wrong thing, is a big win here.
Next we discussed metrics. Some managers believe you have to measure something before you can control it. What does this mean? Well, there're metrics for services (availability and reliability are the big two), with desired levels to meet, in-person meetings for when levels aren't met, and so on. Do the metrics help the SAs at all, or just management? It can help the SAs identify a flaw in procedures or infrastructure, or show an area for improvement (such as new hardware purchases or upgrades). We want to stress that you can't measure what you can't describe. Do any metrics other than "customer satisfaction" really matter? Measure what people want to know about or are complaining about; don't just measure everything and try to figure out from the (reams of) data what's wrong. Also, measuring how quickly a ticket got closed is meaningless: was the problem resolved, or was the ticket closed? Was the ticket reopened? Was it reopened because of a failure in work we did, or because the user had a similar problem and didn't open a new ticket? What's the purpose of the metrics? Are we adding people or laying them off? Quantifying behavior of systems is easy; quantifying behavior of people (which is the real problem here) is very hard. But tailor the result in the language of the audience, not just numbers. Most metrics that are managed and monitored centrally have no meaningful value; metrics cannot stand alone, but need context to be meaningful. Not all problems have technical solutions. Metrics is not one of them. What about trending? How often and how long do you have to measure something before it becomes relevant? Not all metrics are immediate.
After a little bit of network troubleshooting (someone's Windows XP box was probing port 445 on every IP address in the network from the ATW), we next discussed virtualized commodities such as user-mode Linux. Virtual machines have their uses — for research, for subdividing machines, for providing easily-wiped generic systems for firewalls or DMZ'd servers where you worry about them being hacked, and so on. There are still risks, though, with reliance on a single-point of failure (the hardware machine) theoretically impacting multiple services on multiple (virtual) machines.
Next we discussed how to get the most out of Wikis as internal tools. What's out there better than TWiki? We want authentication out of LDAP/AD/Kerberos, among other things. The conference used PurpleWiki which seems to be more usable. There's a lot of push-back until there's familiarity. They're designed for some specific things, but not everything. You need to be able to pause and refactor discussions if you use it as (e.g.) an email-to-Wiki gateway. (There is an email-to-Wiki gateway that Sechrest wrote.) If email is the tool most people use, merging email into a Wiki may be a big win. Leading by example — take notes in the Wiki in real time, format after the fact, organize it after you're done — may help sell it to your coworkers.
Next we listed our favorite tool of the past year, as well as shorter discussions about Solaris 10, IPv6, laptop vendors, backups, and what's likely to affect us on the technology front next year. We finished off with making our annual predictions and reviewing last year's predictions; we generally did pretty well.
After dinner at Steak and Ale with 11 others (Bob, Ted, Tom S, Michael, Peter, Amy, Greg, Nevin, Joe, Mark, and Kendall), we headed over to the GLBT BOF, known to others as the GBLTUVWXYZ BOF, the queer BOF, the motss BOF, and the Alphabet Soup BOF. We had a pretty good turnout this year and had several lively discussions about hobbies, activities for (non-geek) spouses at the conferences, and non-US work environments, though when the conversation was turning to politics I bailed.
After the BOF I hottubbed for a while then visited the Garden Lobby "Overlook" bar (mainly for conversation but also for rehydration) before heading off to bed.
Today the technical sessions began. We started with the usual announcements. We received 70 abstracts, of which 22 were accepted. (It would have been 23, but they found their data was corrupt so they did not publish a paper. They did present anyhow.) We had on the order of 1100 attendees by the opening session and about 1200 by the end of the week. We had a conference wiki for the first time this year. Brent Chapman won the outstanding achievement award; our very own Doug Hughes won the inaugural Chuck Yerkes award (probably to be known as the cluebat award). Jeremy Blosser and David Josephsen won the $1,000 cash award and (more importantly, at least according to SAGE Interim Transition Board Thingy President Pro Tem Or Whatever Geoff Halprin) two $18 SAGE polo shirts for the best paper, "Scalable Centralized Baysian Spam Mitigation with Bogofilter."
Howard Ginsburg of CNN Technology was our keynote speaker. He gave what I found to be an
interesting talk about how CNN is getting rid of (video) tape in favor of digital media for its on-air telecasts. It's more efficient, is faster to produce, is easier to edit and transfer between remote bureaus and the studio sites (New York and Atlanta), and provides better access to archived media. They're also working on importing or ingesting the 22 years' worth of archived video tape into the system. Some of the challenges include changes to workflow, maintaining quality with no encoding losses, providing ubiquitous access, sustaining 24x7 access forever, and both forwards- and backwards-compatibility (OS, applications, hardware, system architecture, media formats, and so on).Technically, they've got 2x28TB cores for high-resolution video and 2x2TB for low-resolution, or 1800 hours of broadcast-quality video, in CNN Atlanta, and are upgrading New York to the same spec by 2005 (it's at half that now). One of the goals is no system downtime even for upgrades (they use the 3rd core, the development one, as backup for the production core not being upgraded while the other's being upgraded). They've designed it for peak demand, so it's often partly idle. Storage requirements — especially when you include the archived media ingested from 22 years' worth of video tapes — is well over a petabyte.
I next went to the legal Invited Talk by John Nicholson, "What Information Security Laws Mean For You." I'd been previously familiar with some of the legal issues discussed (especially HIPAA), but I enjoyed the information behind the Sarbanes-Oxley Act (SOX), which affects my job, and some of the questions on jurisdiction. His slides are available in both html and PowerPoint.
Went out to lunch with Aaron, Amy, Bob, Ellen, Greg, Kendall, and Nevin on Broad Street; we grabbed various food types and ate outside in the nice sunny day. (We could tell the natives, huddling in their jackets because it was so cold, from the tourists, eating in shirtsleeves. The temperature at the time was near 70F.)
After lunch I went to the LiveJournal talk, "LiveJournal's Backend and memcached: Past, Present, and Future." Founder Brad Fitzpatrick and Lisa Phillips spoke in an information-rich and -dense talk that was really enjoyable about the historical growth of LiveJournal from one server to well over 100 servers. The step-by-step story of how they'd fix one bottleneck and the next, in hardware and in software, was entertaining. Did you know they use custom Perl load-balancers and a special homegrown (overlay) file system?
There's also a meme of sorts that got started. Steven was going to write up the talk in his LJ, so Peter posted to his LJ about Steven writing, so I posted to mine about Peter posting to his about Steven writing, and then Marybeth posted to hers about mine... Chris and Sabrina jumped on the bandwagon as well. Peter posted pictures, which helped us identify Kraig as well. Much silliness abounded.
Finally in today's technical track I attended the documentation talk. Mike Ciavarella stepped in to take over from the ailing Mark Langston who couldn't be there, and used Lewis Carroll's works (mainly Through the Looking Glass) as a metaphor for systems administration.
He gave several reasons to document:
- We tend to forget or generalize
- Dissemination
- Expected by managers
- Your own knowledge and perspective is unique
- Planes, buses, and cars (and alien abductions etc.)
- Professional aspects:
- Best practice and completion
- Increases awareness of the SA
- Cover your ass
- Complex systems:
- "What" is easy
- "Why" is hard — document WHY things were done "this way", what constraints, what choices were discarded, what growth paths were planned, what assumptions were made, etc.
- Why not?
Writing should be just another tool in the system administrator's toolbox. It saves time in the long run, it improves perception of the sysadmin by others in the company (management and users both), it emphasizes your professionalism, and it reduces your stress.
I started the day with Ken MacInnes' talk, "Grid Computing: Just What Is It, and Why Should I Care?" He explained what grid computing is; you can think of it as utility computing, or cycles for sale, distributed computing, such as the RC5 project or SETI@home; sharing high-performance resources, such as clusters, storage, visualization, and networking; or by its definition, "Common protocols allowing large problems to be solved in a distributed multiresource multi-user environment."
Grid computing evolved out of a need to share resources, to be flexible for virtual organizations (such as sharing data between astronomy departments at multiple universities). Traditionally, you'd have to log into each site's systems individually. In a grid architecture, there's a simpler interface (such as a web portaal) using (for example) SOAP and XML and GSI which combined does the rest of the hard work. Challenges include both the technical (trust in the security sense, protocols, user interfaces, version control across multiple sites) and the political (cross-organizational authority, standardization, and so on).
Why should I care about it? Well, in research especially, collaboration is the name of the game. Grid computing provides more efficient use of limited resources, and resource sharing is the key to funding. Finally, grid computing is a state of mind more than anything else.
I had lunch with AEleen, Bob A., Kendall, and someone who I'm blanking on (Aaron was with us to start but returned to the hotel) at the mall food court. I had a yummy bowl of onion soup and half a pastrami sandwich.
After lunch I went to Dan Klein's talk, "Flying Linux." He explained the difference between fly-by-rod (mechanical hardware which is difficult to make redundant, lightweight, and reliable) and fly-by-wire (using digital electronics over wires instead, which can be made all three). Fly-by-wire has advantages in that it's easier to make it redundant, it's very lightweight, it can translate the pilot's intent, and can therefore use one simulator to act like a heavy plane (747), or a small personal jet, or even the space shuttle ("brick with wings"). And lest you think this only applies to pilots, the by-wire technology is becoming common in cars, as well. Consider your ABS and traction control as examples. We're also beginning to see walk-by-wire (such as the Segway).
The thrust of this talk was that you shouldn't fly if a general-purpose operating system — be it Windows, Linux, or FreeBSD — were running the digital controls. You can't test everything. Combinatorial testing (odd interrelations), stress testing (circuit breakers), and correlative debugging (temporal separation of cause and effect, e.g. food allergies) all combine to make things impossible for completeness testing. And while a crash is bad enough with the computer on your desktop, it's not a word you want to use when lives are on the line or when you're flying.
I also attended the plenary session, "A System Administrator's Introduction to Bioinformatics," by Bill Van Etten. He'd previously given this talk to raves at BBLISA, but I'd missed it then. I didn't take notes, but it was an introduction to biogenetics from a technologist's point of view, showing what parts of the process various researchers were interested in and how that translated into data storage and manipulation.
I had to skip out of the Bioinformatics talk early to do Game Show preparation. We did some debugging (found a user interface bug) from 5 to 6, then headed off to the Galactic Gaming Reception from 6 to 8. (It was better than the board and card games I was expecting; we had decent food — burgers, corn dogs, nachos, quesadillas, wings, onion rings, a variety of salads including Caesar salad with real anchovies, open bar, and chocolate flowerpots for dessert — and then blackjack, poker (Texas hold-'em), roulette, and craps tables, followed by a raffle.) I spent the next hour attending BOFs (mainly the Google BOF for the free chocolate) and hallway-tracking, before spending an hour in the hottub before the Scotch BOF. I skipped out of there for a bit more than an hour to drive the laptop running the Game Show at a full-form test (where I found a new driver-interface bug and tripped over a known Tcl UI bug) before returning to the Scotch BOF for the rest of the evening.
Friday started with the sex talk, "System Administration and Sex Therapy: The Gentle Art of Debugging," given by David N. Blank-Edelman. There's a tendency to increased complexity and increased distributed interdependency. We don't control (administratively or otherwise) all of the pieces involced. This makes it harder to debug things when there's a problem. Overall, an entertaining talk, though much like Dan Klein's 1998 talk "Succumbing to the Dark Side of the Force: The Internet As Seen from an Adult Web Site," it did indeed focus on the technical aspects — in this case, the art of debugging — and not anything particularly titilating.
Debugging is cognitive not constructive. Testing is the process of determining whether a given seet of inputs causes an unacceptable behaviour in a program. Debugging is the process of determining WHY a given seet of inputs causes an unacceptable behaviour in a program, and WHAT must be changed to cause the behaviour to become acceptable.
We as sysadmins spend a lot of time looking at first-order changes but need to spend more time on second-order changes (definitions, processes, assumptions, practices, traditions, and so on).
Next I attended Peter Salus' talk about the history of system administration. In the 1950s, there were no SAs. The operators in white coats maintained the hardware, and punch readers, and so on. Things have sure changed since then. As usual, Peter peppered his slideless talk with often amusing anecdotes which have a point. What company or university would have their servers in a glassed-in area administered by priests in white lab coats where users can't come near them?
Peter went through the history of LISA topics from LISA I (1987) to now (2004). Some topics stick with us (e.g., Storage), and others come and go (e.g., telephony in 1996). Some interesting notes in that scripting didn't show up until 1998 (Perl, Tcl, and something else), mailing lists had special sections in 1997 and 1998, passwords didn't get their own session until 2000, and recovery got its own special section in 2000. SLAs and network topology in 2002. So what does this mean about the sorts of things we do? It indicates the requirement of agility and flexibility in systems administration, since what we do covers such a broad area. And our job is to make it possible for users to do their work, not for us to play with the cool toys.
What's important these days? Certainly security, as well as networking and everything that goes with it (email and "teh intarweb"). This includes physical and virtual (computing) both, including the mundane necessities (cables, budgeting, logistics, procurement, and so on). But for the most part we've moved from the physicality of the machine into ideas and concepts and scripts (languages, like Perl/Python/Tcl), and hopefully more user training and education. We're much more script than screwdriver oriented nowadays. We worry about things which we may not have sufficient control over (e.g., security). We're becoming much more of a profession than (e.g.) handymen.
I had lunch today with Pat, Greg, Peter S., Ellen, and Doug. We ate in the hotel, and service was unfortunately slow (and my burger was overcooked to medium-well instead of medium-rare, but that's unsurprising).
After lunch (which unfortunately ran long), I attended the talk by our own Tom Limoncelli, "Lessons Learned from Howard Dean's Digital Campaign." Thanks to being late, I missed the first half of the talk, and a lot of what Tom spoke to is material he's covered in his own LJ and on mailing lists in the past year or two, so I'll refrain from summarizing it in depth. But at a geek level, the campaign is a big database application. With the right information (name, location, interests, preferences, support amount, donation amount, etc.), everything becomes a database query.
I had to skip out of the Q&A from Tom's talk to help carry the Game Show materials from Rob's room down to the convention floor. We then did the setup (for the electronics, the judging table, and the stage) and ran through our three regular rounds (complete with "ringer" Hal Pomeranz who seems to have lost his touch) and finals, giving out lots of books and framed SAGE Code of Ethics and retractable Ethernet cables for travellers and so on. No glitches in the Game Show software or hardware this time, thanks to aggressive bug fixing earlier; Dan and Rob rehearsed the hand-offs and patter more this time, and the tightness of the show really showed. And, of course, we'd listened to the feedback from 2003 and made the categories more technical (the obviously-Photoshopped picture of Kerberos, or Cerberus, notwithstanding). The audience seemed to enjoy it.
After the Game Show and tear-down, we moved the booze for the Dead Dog out of Terry's car into the suite, and Terry Lance and I went to dinner at the Hyatt across the street because the Marriott, in its infinite wisdom, had closed their in-house steakhouse for the week we were there. I had a delicious insalata caprese followed by the prime rib with roasted garlic mashed potatoes and correctly-cooked carrots, asparagus, and broccoli rabe. (Didn't eat the rabe or most of the carrots. Ick.) Following dinner we adjourned to the suite to set up for the party. Luckily, Bob A. brought down his supply of tonic water, since we'd neglected to add it to the shopping list (a fact which none of Adam Moskowitz, David Parter, or I caught — and Parter drinks the stuff!), oops. Nice mostly-quiet Dead Dog, made quieter by the almost complete lack of attendance by the USENIX Staff and the USENIX Board (who'd all skipped town that evening or earlier), and several others of the Usual Suspects who'd either not been at LISA or who'd had to leave early. I skipped out around 2am, dropping Adam's bartender kit supplies on his room door handle before heading to bed.
Today was checkout and travel day. Luckily I woke up around 11am or so, otherwise I'd've slept through the noon checkout time. Showered, packed (having blown off packing completely on Friday, which is very unlike me), and didn't see many folks from our group in the Lobby. (Tom S. and Michael were checking out, and Brent was coming down the escalator when we were leaving.) Philip and I headed out to the airport — he had a car and was driving to Birmingham AL for the evening, so he was kind enough to offer me a lift. We ate at a fast food place (gasp!) and I got to the airport uneventfully. Checked in, cleared Security, found the gate (at the far end of Concourse A, of course), and nobody else from LISA was apparently on my flight. (Bad Peter, switching to an earlier flight. Hrmf.)
The flight itself was fine. A little screaming baby syndrome on landing, but she quieted down pretty quickly. Got off the plane and over to baggage claim with no real delays, met Tia there, picked up the luggage, and headed out to meet Jet at the curb. (Jet did the logical thing, since you can't park-and-wait curbside at Terminal C, she headed over to a hidden part of Terminal B to do so. She came back, we headed off to dinner, and then to my car so I could drive it home (as it was parked at Dkap's since it'd otherwise have been towed from my place, as the conference week was Street Cleaning week in my area of Cambridge).