The following document is intended as the general trip report for me at the 26th Systems Administration Conference (LISA 2012) in San Diego, CA from December 9-14, 2012. It is going to a variety of audiences, so feel free to skip the parts that don't concern or interest you.
I woke up before the alarm, showered, and packed up the last of the stuff I needed to take with me (CPAP, laptop, and toiletries, as well as the last 4 tangerines from the fridge). Headed out through the rain to Metro Airport; much of the traffic was moving slower than posted speeds, even though the roads were clear (if wet). My usual parking level was marked as full, so I wound up parking up a level and exposed to the elements, so I hope it doesn't snow over the next week (or if it does, that it's melted by the time I get back to my car next weekend). Checked my bag, cleared Security (though I'd accidentaly left my wallet on my person so it had to go back through the X-ray conveyor; oops), and got to the gate. Turns out my ticket class doesn't let me pay for change and upgrade fees so I couldn't buy one of the 7 still-unsold first class seats. Coach was about half full, though, and I had my entire 3-set half-row to myself.
The video system locked up in the middle of playing the safety video, so while we heard how to fasten our seatbelts and turn off electronic devices, we didn't know about the exits, oxygen masks, or life vests. The cabin crew had to fall back to the old "safety dance in the aisle" method. The system also stopped tracking the flight about 210 miles out of Detroit; it changed the flight duration to 00:00 and didn't advance any further for the remainder of the flight. Other than that and a bit of a bumpy landing, the flight was pretty uneventful.
At baggage claim I ran into a first-time attendee who was on my flight, and we shared a shuttle back to the hotel. The first hotel shuttle driver zoomed past us without stopping, so I called and kvetched and the second shuttle stopped. I got through check in fairly efficiently; the process was delayed by Accounting not having run the faxed-on-Wednesday form through the system, so it's a good thing I brought a copy with me. Room and tax are going directly to the University so I don't have to front 'em the cash for it. I got to my room, unpacked, and headed downstairs to find people to lunch with. I ran into Toni, Andrew, and Julie, and we went to the lobby sports bar; Tom joined us shortly therefter. I had a decent enough if overpriced burger.
I schmoozed in the lobby with a bunch of folks (including but not necessarily limited to Carolyn, Dan, Geoff, Mark L, Mike C, Narayan, Nicole, Patrick, Skaar, and Tom), and said Hi to the staff setting up Registration (Andrew, Anne, Casey, Jessica, and Julie), chatted briefly with Lee, then hung out in the hotel lobby space (since there's no seating in, near, or around Conference Registration) until it was closer to time for Conference Registration to open. We did find out that the $11.95/day hotel Internet is not being comped at checkout (though it was in Boston).
Was first in line at Registration, first (other than staff) to get my t-shirt, and hung out with folks, chatting and catching up, before dropping the laptop off in my room and heading to the Welcome Reception. Nibbled some cheese and crudites, drank a Diet Coke, helped fill out some scavenger hunt cards, and eventually headed off on foot to The Boat House (about a mile away) with David for a late dinner. We shared the baked brie appetizer (with roasted garlic, fresh fruit, and blueberry-lavender jam), and I had the crab-stuffed salmon (which came with roasted garlic mashed potatoes and a jicama salad). I also used the concierge's coupon for a free mud pie (which was humongous), so I was pretty much stuffed.
Got back to the hotel, chatted briefly with Mike and Steve, and headed up to crash.
Weekend day! I slept in until 4:45am PST, so hopefully I can get a nap in later. I noticed the overnight web content sync to my disaster recovery site blew up because of a problem on the target, so I bounced the service and reran the sync, then manually published one piece that just didn't want to go. Bopped around online until it was late enough to shave and shower without overly annoying the neighbors.
I headed down to the conference area to schmooze and grab some breakfast (fresh fruit and apple turnovers). The downstairs area, outside the classrooms, had tables and chairs (and power), but the network wasn't up. Once the network came up I hung out there doing email and writing up this trip report.
During the next session, since none of us were in tutorials or workshops, Adam, Matt, Nicole and I went to Jasmine for dim sum. We had barbecue pork buns (steamed), egg rolls, roast pork, shrimp dumplings, shu mai (pork and shrimp), vegetable dumplings, and Matt got a bubble tea. Extremely tasty and with tax and tip still only $15 per head.
Hallway tracked in the afternoon on either side of the 2-3pm power nap. Finally managed to get out for a quick burger at In-N-Out with Adam and Chris, then a dip in the hot tub, then swung by Games Night to kibbitz on a game of Cards Against Humanity. Started zoning out and headed off to bed around 10pm.
I woke up early and made the mistake of checking my mail. It seems that despite someone saying there were no changes made to the LDAP cluster, either a new server was actually added or one had its IP address changed. I wound up having to add a new SSL certificate to all three tiers' trusted key store (and a second SSL certificate to my Development environment), and update a config file or two, and blow away old LDAP caches, and restart all three environments. Prod and QA were relatively painless, but Dev blew up spectacularly (one service won't come back up) and I had to hand it off to a colleague.
From a conference standpoint today was another unstructured Hallway Track day. I wound up doing lunch offsite with Adam, Bill, and Peter at Rubio's; I had the lobster burrito which was very tasty. For dinner I'd coordinated this year's 0xDEADBEEF dinner, so I went with Adam, Bill, Brent, Janet, and Mario to The Wellington. I wound up having the house salad (mixed grrens, Roquefort cheese, candied pecans, and pickled onions, in a balsamic vinaigrette; a 10-oz New York strip medium rare), with dijon-roasted potato wedges and a small side of mac-n-cheese; and the dessert trio of chocolate peanut butter ganache cake with peanut brittle, Pink Lady apple cobbler (with vanilla ice cream), and a mini-ice cream sundae (with chocolate ice cream).
After we got back from dinner, I spent very little time chatting with some folks in the lobby before heading upstairs to bed.
Tuesday's sessions began with the Advanced Topics Workshop; once again, Adam Moskowitz was our host, moderator, and referee.
[... The rest of the ATW writeup has been redacted; please check my web site for details if you care ...]
After the workshop, a bunch of us (Carson, Chris, David, Doug, Mario, Mark, Patrick, and I) went to dinner at C Level. Good food, better company.
We got back to the hotel more or less in time for the 7pm BOF sessions. At 8pm I went over to Rob's room to meet him, Adam, and John, and we packed up the Game Show equipment, moved it to meeting room 411, and set it up for rehearsal. Ran through the five rounds — three qualifying, finals, and tie-breakers — and wound up reordering several categories so that harder questions are worth more points. That took us more or less up to 11pm so when we were done and had dragged it all back to Rob's room, I just went to my own room and crashed.
I managed to sleep in 'til 6am today. Showered, made the mistake of checking email, and fixed a couple of production issues before heading down to the conference proper.
Carolyn began with the usual announcements. This is the 26th LISA, the program committee accepted 22 papers from 49 submissions, and as of shortly before the keynote we had "over a thousand" attendees. They thanked the usual suspects (program committee, invited talks coordinators, other coordinators, steering committee, USENIX staff and board, sponsors, authors, reviewers, and our employers for allowing us to show up), gave the usual housekeeping information (speakers to meet with their session chairs 15 minutes before they go on-stage, BOFs in the evenings, and reception on a boat).
LISA 2013 (the 27th) will be November 3-8 in Washington DC and Narayan Desai and Kent Skaar will be the program chairs.
Next the regular awared were presented:
- Best Papers:
- Practice and Experience — "Lessons Learned When Building a Greenfield High Performance Computing Ecosystem," Andrew R. Keen, Dr. William F. Punch, and Greg Mason, Michigan State University
- Student — "Theia: Visual Signatures for Problem Diagnosis in Large Hadoop Clusters," Elmer Garduno, Soila P. Kavulya, Jiaqi Tan, Rajeev Gandhi, and Priya Narasimhan, Carnegie Mellon University
- Best Overall — "Preventing the Revealing of Online Passwords to Inappropriate Websites with LoginInspector," Chuan Yue, University of Colorado at Colorado Springs
- USENIX Annual:
- Lifetime Achievement — John Mashey for evaluation especially the SPEC benchmark
- Software Tools User Group — Arthur David Olson for OlsonTZ (the timezone database)
- LISA (formerly known as SAGE) Outstanding Achievement — Jefffrey Snover, Bruce Payette, and James Truher for PowerShell
- LOPSA Chuck Yerkes Award — David Lange
Carolyn introduced our keynote speaker, Turing Award recipient and the first system administator Vint Cerf, who spoke about "The Internet of Things and Sensors and Actuators."
The Internet was designed over 40 years ago (March-September 1973) and went into operation in January 1983 and there are now over 908 million visible public(ish) machines and over 2.4 billion users. Recent changes include the launch of IPv6 in parallel with IPv4, internationalization of domain names, new global top-level domains, and wider adoption of both DNSSEC and RPKI. There are also more sensor networks and mobile devices.
That's all great, but how do you configure and manage this stuff? Some devices (such as sensors) may not have a display, there may not necessarily be protocols for authentication and authorization, network discovery, and so on. There are also policy challenges (cf. the recent ITU discussions).
He alluded to his bit rot rant (see here) and noted that you can't do backwards compatibility forever. How do you read stuff that's too old a format? What about bit rot on disks?
He conluded with a mention of the "InterPlaNetary Internet." TCP wasn't designed for 40-minute round-trip times, so it's not the best choice. The Mars rovers and orbiters are using store-and-forward technology to communicate. He noted that "We turn science fiction into reality."
After the morning break, I went to Alberto Dainotti's invited talk, "Analysis of an Internet-wide Stealth Scan from a Botnet." Basically while they were looking for something else when Egypt was off the air (due to the Arab Spring), they found a botnet scanning the entire IPv4 address space for SIP servers on UDP port 5060. They have a darknet with the UCSD telescope, an unassigned network block advertised via BGP, so any traffic to it is unsolicited. More details are available in their paper.
Lunch was a limited selection of overpriced hotel food at the Vendor Exhibition at the Pavillion (a permanent tent between the two Sheratons). I had a beef burrito which was good enough but cost $9. Did a quick run through the vendor floor; got my Google pin collection (Chrome, Plus, Maps, and LISA 2012) plus a water bottle. Other than that and some Tim Tams, there wasn't much schwag to grab.
I hung out and Hallway Tracked with folks until the afternoon sessions started. I went upstairs to get rid of my stuff before going to Rob's to help lug the Game Show equipment down to the ballroom. (I did have a bit of a scare; my laptop refused to wake up from sleep and even force-rebooting it never brought up the monitor. I wound up zapping the PRAM to get it back. Luckily, the browser has "Reopen all tabs from last session," since I had 50+ tabs open.)
Anyhow, at 3:15pm I met the rest of the Game Show production staff in Rob's room to carry the equipment down to the ballroom and set it up. Managed to forget about gravity and clobbered one of the audience-facing monitors; luckily we had a spare. The game went smoothly enough, considering:
- The code was being tweaked up until 3:15pm.
- The contestant-facing monitors didn't display the questions.
- The buzzers (wireless mice) didn't always work.
- The buzzers sometimes didn't lock out others; often 2 and even 3 rang in at once (on the audience- and contestant-facing displays).
In the final round, John Miller won a free pass to the LISA 2013 technical sessions.
After tearing down and packing up the equipment, I met Chris, David, Lee Ann, and Michael in the hotel lobby and headed out to Todai for all-you-can-eat sushi. It was only fair; the quality's gone down over the last several years. They also had a problem with their credit card processing; it took them over 20 minutes to run the 3 credit cards.
After we got back to the hotel, I ran into Mike C. so I got the contraband (Tim Tams) for Moose. I then took a 45-minute dip in the hot tub before heading back to the room, catching up on work email, and crashing.
This morning started with Selena Deceklman's plenary session, "Education vs. Training." The presentation itself was short, but led to some excellent Q&A with the audience.
The education world (theoretical, lecture-based) and the training world (practical, one-on-one mentoring, self-pased) don't overlap and we need to bridge the gap. There's a difference between "I covered it so they should learn" and "If they didn't learn it I didn't teach it;" that's a major shift in thinking, and most training falls into the former category. For sysadmin training and education we need to move more towards the latter, even though it's a lot of work. Some advised to teach the method (how to do things) instead of just the raw facts (things), to move away from rote memorization.
Sysadmins are often bootstrapped by learning on their own and tinkering, so there's a lot of distrust about certification and courses and so on. We can make this acceptable to them by developing a certification system that "doesn't suck." Despite the perceived hostility sysadmins are practical; RedHat's certification isn't bad (for example).
There's a lot of fear, uncertainty, and doubt in the education versus training discussions. If sysadmins can be classified or certified, there's a question as to what happens to all of the qualified people who've been doing this ("grandfathering"): If everyone needs certification, how do you force the grandfathers to do so ("I've been doing this for 30 years")?
It was noted that you have to train whoever you hire; you want to educate them on sysadmin skills but train them on the local weird proprietary stuff and legacy systems. The foundation of skills is important. However, nobody has time to work on training because other things take priority. After a sufficiently-painful disaster one place now has management backing for Cross-Training Fridays so they can defer users' requests to the next week while building each others' skillsets.
Fundamentals can be taught, but some (if not most) sysadmin skills can't be taught without on-the-job or other hands-on experience. Learning from mentors (a la apprenticeship and guild) is important if not necessary. We need to encourage more mentoring and allowing deep-dives into mysteries. We need to be more effective with leveraging learning from peers. Mentoring doesn't scale well, unfortunately.
After the break I went to Jeff Darcy's invited talk, "Dude, Where's My Data?" The problem is that compute cycles are everywhere but data isn't, so you have to move the data to go with the compute cycles. There are the three Vs affecting replication behvavior:
- Volume, or size (e.g. in TB), affecting both initial setup and bandwidth
- Velocity, or rate of change (e.g. in TB/hr or files/hr), affecting ongoing bandwith but also (and more so) a latency issue
- Variety, or format, semantics, and so on
That all affects how different solutions work on that data. With bigger data we get the three Ds:
- Distance, how far does the data have to move in replication
- Domains, how many places does the data need to be
- Divergence, how asynchronous is it, buffered/cache may not be the same across sites, and so on; perhaps the most serious as it affects data consistency and data integrity.
As synchronicity increases performance decreases (latency increases). He used rsync as an example. The initial sync is easy, but staying in sync hard. Conflict resolution, especially with multiple nodes, is very hard.
After the session was supposed to be lunch. The vendor exhibition floor had another hotel-catered lunch, selling pizza (which had been sitting under heat lamps for long periods of time) by the slice for $7. No thanks. I tried to eat at the in-hotel sports bar, but they couldn't manage to bus the table or provide any service, not even a "We'll be right with you," for over ten minutes. Complained to the restaurant staff on my way out, but that didn't get me any food. Complained to the concierge so she could advise the restaurant manager, since I didn't think I could do so without ripping him or her a new one. Wound up hallway tracking until the break, where I could grab 3 soft pretzels and 2 cans of Coke Zero in lieu of lunch... and then the fire alarms went off. Whoops.
In the fourth session block I attended Matt Blaze's plenary session, "DYI NSA on the Cheap." Last year, Matt and his team discovered a number of protocol weaknesses in P25, an allegedly-secure two-way radio system used by, among others, the federal government to manage surveillance and other sensitive law enforcement and intelligence operations. Problems his team found included:
- No authentication — You can't be sure if the sender is who you think. There's no protection against replay or splciing, the displayed unitID isn't authenticated, and incoming clear traffic is always accepted.
- Identification in the clear — Every transmission includes a 24-bit unitID (identifying both the unit and the agency if not also the office or squad), groupID, and NAC, which are always sent in the clear.
- Ping response — Radios typically automatically respond to pings, so an active adversary can easily discover idle radios... transparently.
- Denial of service (in theory) — P25 has a lot of error correction, but jamming just 64 bits invalidates the entire 1728-bit voice frame. A jammer needs 14dB less energy than the transmitter. A $15 IM toy can act as a jammer!
- Useability — No user feedback about inbound encryption, "circle-slash" ambiguous as to whether it means encryption is/n't enabled. Rekeying is difficult and unreliable.
This is potentially serious but the attacks require some expertise. More seriously, a lot of cleartext goes over the air that off-the-shelf units can pick up (with users unaware that it's in the clear).
What's wrong? It can include:
- Single-user error (one user in the clear to an encrypted team)
- Group error (everyone in the clear thinking they were encrypted)
- Keying failure (one member missing key so everyone in the clear)
In practice, even split between single/group error and the keying failure.
The session broke a little early so poeple could quickly change, drop stuff off in their rooms, and get back down to catch the departing-at-5:30pm buses to head to the boat dock a mile away. The buses loaded and left pretty quickly, but then we got to stand dockside in the cold wind for 15+ minutes until we could board the ship. Once aboard they gave us champagne and we grabbed seats at the table and, shortly after pushing off, they opened the buffet lines. There was an Italian-inspired line (Caesar salad, penne with roasted garlic, bruschetta, herb-pepper chicken over spinach), a Mexican-inspired line (cheese enchiladas, beans, Spanish rice, chips, and a rock shrimp ceviche with English cucumbers and cilantro), and a roast beast carving station (with roasted red potatoes, gorgonzola, a horseradish sauce, and a chimichurri). There was also a selection of dessets (cookies, brownies, cream puffs, chocolate-dipped strawberries) and an open bar (I had a very strong Long Island Iced Tea). Never made it downstairs for the gambling, but casinos were never really my thing. Did have nice conversations over and after dinner.
When we got back to the hotel, I took a dip in the hot tub (with Alan, Darrell, Lee Ann, and Peter). They were supposed to close at 10pm but didn't actually come to kick us out until 10:30pm. Came upstairs, showered off the chlorine, rinsed out the suit, and went to bed.
I managed to sleep through the 6.3 quake and the 4.1 aftershock this morning. After I woke up I wrote up and sent out the ATW raw notes and links report to the attendees (only 2 of whom sent automated out-of-office replies). I also did the pre-packing, wrapping the fragile stuff (the electric martini glasses from the reception and the Google sunglasses that aren't Google Glasses, and the Australian contraband for Moose), and making sure everything would fit in the luggage (tight squeeze but doable).
In the first session I attended Tom Limoncelli's replacement talk on "Ganeti: Virtual Cluster Manager" (because the data integrity speaker couldn't get to the conference). Virtual desktops and servers are fun, virtual clusters more so. Benefits to sell to your box: Cost savings, faster service deployment, new services like virtual apps and temporary machines, and software ualifications are easier. Ganeti is used internally at Google for things like their internal DNS servers or temporary virtual machines; it's not user-facing production. It also abstracts the low-level details so you don't have to care about them.
Ganeti keeps data mirrored across nodes so if node1 dies it'll recover on node2. Downtime becomes a live migration, long outages or off-hour tasks become reboots. It has both a human-usable command line and an API; your environment can be small (1 node), medium (2-40 nodes), or huge (racks, each as a node group, one lock per group). Ganeti 2.6 works up to a few hundred nodes even with group-level locking. 2.7 will be shipping soon.
In the Q&A, Tom noted that there are some workloads where DRDB, which underlies Ganeti, is overloaded, primarily huge web servers with many queries or anything with heavy I/O requirements (such as big databases). This is a drawback of DRDB; DRDB is slow because a write() isn't complete until it's finished on both nodes. In 2013 they want to offer plain disks without DRDB ("unsafe configuration").
In the second session, Doug Hughes spoke about "Near-Disasters: There and Back Again: A Sysadmin Tale." He opened with pictures of the Verizon data center post-Hurricane Sandy, 3.5 stories of stairwell in standing water. He then showed some pictures of various disasters, natural or otherwise, before going through the four disasters.
Issue 1 was a degraded WAN. smokeping graph, failover from the primary to the secondary data center. They see problems where the latency jumps to 52ms on every OC12 failover. It's the difference between his site in/near NY and Denver. That's what their carrier chose for a backup path. (They've changed carriers since.) Some of the things they learned were to look for mismatched transceivers and unidirectional problems.
Issue 2 was an archive failure. They had a MegaRAID controller and 8 adjacent disks in the array "went away," because the controller "forgot" about them, and they saw random numbers instead, and almost lost everything in recovery. They tried /etc/system magic fsck option, reseating the disks (they went from "disappeared" to "visible to the card with numeric identifiers"), reseating the cables, removing the disks from the pool, reseating the PCI card, and using dd to overwrite and recreate the labels — which led to a namespace collision and clobbering the labels of okay disks. They wound up having to use format -e to restore the factory label, reboot, zpool -s to find they'd only lost 2 files in snapshots. One of the spares wasn't in use so they could remove it and Towers of Hanoi to substitute through all the disks. Oh, these were 2TB disks so it took 16 hours per disk to copy/recover. Lessons learned:
- "was" in ZFS is useful only until the mappings change.
- ZFS can repair from dd-inflicted damage.
- Use RAID cards that don't require creating a single disk LUN (pass-thru).
- Locate lights are helpful.
Issue 3 was NFS. A subset of NFS clients lose access to the server. Reboot gets to same state. Not the RAID cards, maybe it's a stuck SSD disk? Yep. A stuck SSD disk locked up the SAS bus. Lessons learned: Keep spares on hand, letting the machines allow for swapping RAID cards, CPUs, RAM, and other components without requiring unracking is helpful (HA would've been better), and SSDs are relatively new and have peculiar failure modes.
Issue 4 was a storage SNAFU with their 2PB raw (600TB used) chemistry data that's essential for their work. 4 Linux boxes running GPFS in front of the storage, in 10-disk fixed RAID-6 stripes. Controller has 5 PCI SAS cards with 2 ports, 1 per shelf. (SPOF: Only one port-n-card per shelf.) So what happened? Controller 1 cards throw channel failures and users report slowness; RAID-6 is down in half of storage. Controller 1 failed. They replaced it (which means no journals) and then 2 hours later controller 2 fails. They replaced it, and the vendor says to power-cycle all controllers and shelves. 2 technicians from vendor shut down 3 of the 2 shelves. When the units moved between buildings power strips got swapped for shelves 8 and 9, and they pulled the cable labeled 9 (for shelf 8) not the cable for shelf 9 (labeled 8). They shut everything down because the console was throwing errors. Brought up the shelves, then started the Linux nodes, which could see all the disks and did, they can get quorum and GPFS finds most things are back. The vendor has a firmware feature that says "freeze" if there's a third controller failure, so the 3rd shelf didn't fial completely. They replace a disk shelf I/O card.
Status at this point: RAID-6 degraded everywhere. 1 more (of 800) disk failing is data loss. Half disks have journals available; half don't. There're 90 RAID-6 groups. The emergency shutdown might require a fsck. They don't know if they can rebuild from journals.
Repair: Restore journals for half the users. Takes a few hours. Can bring the file system online for (most) users. They can look for a specific I/O error in the file system to identify metadata. They apply a QOS policy to throttle reads. Both controllers rebuild, but can't do mroe than 6 at a time and there're 45 to do. Do they wait for 2-3 weeks while they rebuild (hoping for no disk failures)? They got an idea: Because of the policy engine, a lot of the data is on MAID storage and the other rack. MAID keeps things that aren't touched for 90 days. The vendor provides a super secret command that can copy from MAID back to the primary RAID... and it doesn't work. The problem is that the command thinks there're outstanding writes to the disk. The GPFS shutdown didn't finish the replication to some disks (e.g., superblock data). The vendor asked for debug output ("a few thousand lines" — 128K lines from each controller). Send the stuff in to the vendor; their guru identifies something in a bitmap that can flip them to fix the bitmap and rerun the supersecret command. Once they correct that they get all the disks online, so now only 30 LUNs are degraded. 5 days later, one disk out of every column is rebuilt back up, so they're no longer double-disk degraded and they can survive another disk failure. Then they can fsck it, which after 14 hours fails with "I can't continue" in phase 2 (of what?). Vendor says they have to fsck the file system offline; they declined. IBM says it's a fsck bug and to upgrade. They can do that out of band with the file system online and then run another online fsck. After 24 horus they get to phase 3, and phase 4, and finally it's back up. They remove the throttles and get back to normal.
Lessons learned:
- Explore all failure modes; some may be surprising.
- Design the hardware architecture.
- Insist on on-site spares.
- Correct labels are imperative. Trust but verify. Maintain them.
- Apply throttles on the routers and switches.
Some other things to think about include:
- Only loss was "of time;" everything else was recoverable from tape or other disks (MAID). Someone had to restore 90 days' stuff from tape to additional old storage and hack the tree of symlinks-n-data to work around it.
- How much to communicate to management?
- Do you take the storage offline or leave it online?
- Test your restores. Are you optimized to BACK UP or RESTORE quickly?
- What could have caused it?
- Dirty power? (No.)
- Lightning strikes? (No.)
- Solar flares? (Probably not; later determined: No.)
- Gremlins? (Maybe.)
- Freak accident and/or bad luck.
- Subspace disturbance?
Issue 5, "So long and thanks for all the fish." They accidentally put ext4 data on top of metadata. Wound up creating a huge GPFS file, put the fs in debug mode, get the blocks and offsets, make it so it's only on one disk, then remove the file but leave the blocks there. The vendor guru is dd'ing each block's data to determine which is good and which is bad. He can recognize the bit pattern from GPFS. But you can't replace the bad with the good, instead you put the copy in one of the blocks from the deleted file and then patch the inode to use the new block copy. They do that then fsck the file system. That fsck takes all of Sunday and doesn't finish by Monday morning. It finally finished late Monday and was interrupted. Got some errors, sent them to IBM, more patching inodes etc. the next weekend. Run another fsck which cleans up a lot of stuff and finishes and all seems well.
But.
Now they need to repeat this a month later when they manually reimage another Linux node — before they actually corrected the FC zoning. mdraid sees the ext4 labels and it takes ownership... and overwrites the patched system and they got to do it all again.
For lunch, I went with Derrick to the deli past the pavillion, and picked up Matt on the way.
The next and last technical session was the closing plenary, "15 Years of DevOps." David Blank-Edelman of the USENIX Board reminded us that the papers are all available online now, and thanked the USENIX Staff and MSI (a/v) staff, and thanked Carolyn Rowland as program chair. Mike C introduced the speaker, Geoff Halperin.
Geoff's thesis is that we'll software development has changed forever, and now it's our turn. He went through a history of the software development from the monolithic mainframe days through the smaller interconnected client/server systems to the current 3-tier architecture. The software distribution channels have changed, leading to smaller but more frequent releases. "First to market" was the overriding concern in the 1990s, leading to "good enough" lower-quality code, but developers don't like "lower-quality." This leads to Agile methodologies (described elsewhere).
However, there were three wrong assumptions in trying to apply this to operations:
- Production looks just like Development.
- Production looks like a vendor install.
- Production is static.
Early software development and release models don't mention system operations at all; all of the standards documentation assume that developers would do operations. The assumption is that Operationss is represented in the Requirements phase. DevOps by its very nature puts an Ops person in the room with Dev (and the business).
We need serviceability criteria:
- Each Production environment is unique.
- The environment is never static across its life.
- The goals are exposed by the application exposing the necessary controls to the SA to provide for maintenance.
- This is a defineable set of criteria called "serviceability criteria."
Operations' requirements fall into three categories:
- Serviceability criteria (global scope)
- Standard operating environemtn (local scope)
- Application requirements (application scope)
Historically as an industry we left all of this for the individual sysadmin. In practice this is... less than ideal.
The system administration industry started as firefighters, or a reactive support group. Hardware was expensive and people cheap. SAGE was founded in 1982 (SAGE-AU in 1983) and LISA started in 1987. We've seen the rise of the web, thin clients, and the web/app/database structure. Then we got commodity hardware and the rise of virtualization. Next came the cloud: Linux x86 and VM wins (infrastructure on demand), the API rises (infrastructure as code), and automation is driven by scale.
Given all of that, what's DevOps? DevOps is to systems administration like Agile was to software development:
- Culture and attitude — Colocate Dev and Ops, move away from us vs them methodology, Kanban walls, shred food and discussions, and crossover of teams.
- Practices and processes — Developers do infrastructure as code, continuous integration, and production support; sysadmins are embedded with Dev teams and work on Kanban walls too.
- Technology and tools — He mentioned puppet, chef, cfengine, github, Jekins, rails, cucumber, AWS, VMWare, OpenStack, CloudForms, and so on. Standards are still evolving.
In this model, developers also do after-hours support, as well as build better code since they'll have an end-to-end system view. Continuous Integration is a Dev practice; test-driven development and requires well-behaved developers and a clean code base. Continuous Release is a DevOps practice, requires a well-defined (automated) release procedure and a well-defined target infrastructure.
DevOps isnt:
- A silver bullet.
- Complete holistic or fully-cooked.
- Highly scalable into traditional enterprise environments.
- New.
It also doesn't solve a bunch of problems.
As sysadmins, our job is to make ourself redundant:
- Consistency leads to reduced effort.
- Autonomous (self-repairing) systems means less to do.
- Work on the right problems.
Because we had no ballroom space for the fourth session block, we had an ice cream social in the foyer. Other than some logistical issues getting everyone in lines to get ice cream (your choice of chocolate, strawberry, or vanilla) you could top them yourself (sprinkles, chocolate sprinkles, cherries, strawberry sauce, bananas, hot fudge, hot caramel, whipped cream, nuts, and possibly some other stuff I'm forgetting). I went with a chocolate ice cream with chocolate and caramel sauces, some bananas, and a cherry, and went light because after the ice cream and after the conference ended, a small group of us (Chris, John, Mark, and I) went to Rei Do Gado for meat on swords.
In addition to the salad bar (from which I had some asparagus, a little caprese salad, some prosciutto-wrapped honeydew, and a lot of seafood — crab legs, shrimp, smoked salmon — they provided (and we ate) baby back pork ribs, bacon-wrapped filet, bacon-wrapped turkey, beef ribs, chicken stuffed with cheese, garlic sirloin, hangar steak, leg of lamb, pineapple, pork sausage, skirt steak, and top sirloin. Also drank two caipirinhas with it, so decided to pass on the hot tub upon our return to the hotel.
After a dance with the hotel to break others' $20 bills to reimburse me for the outbound trip and Chris for the return trip — the front desk was out of change, the bell stand could break one $20 but not the second, so they went to the sports bar for that — once I'd been reimbursed I could actually break a $20 myself, so I swapped a $10, $5, and 5 $1s for a $20 at the bell stand so they'd have change again.
Nobody had made it back to the State Suite by 9pm so I went back to my room to pack up what can be packed up. I headed up to the suite around 9:30pm; I helped move the supplies down to the corner of the lobby when we'd gotten a noise complaint — and the hotel was willing to work with us not shut us down entirely, even to the point of allowing us to drink our own booze in their lobby. Said my goodbyes beginning around 11:30pm and was back upstairs by 11:45pm. Finished packing except for the necesseties (chargers, CPAP, laptop, and toiletries) and crashed.
Despite going to bed after midnight, I was still up by 6:30am. Decided not to spend another $12 of the University's money for what would amount to only 3 hours of Internet access and just used the phone to process email (one of my production boxes reported a power failure, but seven others in the same data center, including four in the same blade enclosure, did not so I chose to ignore it, not that there's much I could do from nearly 2,000 miles away). Did take advantage of the free (if very slow) 'net in the lobby, though. Preloaded a bunch of static web pages so I'd have stuff to read on the flight.
Today was my return travel day. By the time I was booking my flight there was no direct flight from San Diego back to Detroit so I wound up flying home via Minneapolis. Got to SAN without trouble, and though they said they had free wifi services I was unable to get anywhere off their network. Had an early-for-PST lunch of a personal pepperoni pan pizza, which while not the healthiest choice did let me try to get my body back to the Eastern time zone for meals.
For the second flight in a row, the in-seat flight tracker didn't work. This time it seemed to provide valid data, but once the map came up none of the on-screen controls worked, so I couldn't zoom in or out or pull the actual stats (e.g., ground speed, tail wind, etc.). Managed to get into Minneapolis on time, death-march from G17 to F10, and grab a quick bite to eat on the way. The gate agent gave me a hassle for boarding with a zone-3 pass at the very end of zone-2 with other zone-3 people ahead of me, and then a flight attendant tried to shame me for putting my jacket in the overhead bin since she said not to (before I was onboard to hear it).
It was bumpy out of MSP and into DTW, but got out of the plane reasonably quickly. By the time I got to baggage claim my bag was already on the carousel. Caught the next shuttle to the other terminal and found my car by 11pm and was home by 11:40pm. Unpacked, put stuff away, sorted laundry, prepped souveniers for work, and popped the evening meds before going to bed.