Radio Response's activities following Hurricane Katrina

Jeff R. Allen
[email protected]
February 24, 2006

Copyright 2006, Jeff R. Allen <[email protected]>. Licensed under a Creative Commons License.

This is version 1 of this report. It is made up entirely of my opinions, and has only been reviewed by one other person who worked on the project. Future versions of this document will encompass feedback from my colleagues as it arrives.


Table of contents:

Executive Summary

Hurricane Katrina devastated the Gulf Coast on 29 August 2005. In the weeks that followed, Radio Response pulled together a team of over 50 volunteers using approximately $30,000 of donations of products and services from dozens of vendors to provide Internet connectivity to residents and relief workers in the Hancock County, MS area.

We proved that it is possible to deploy a wireless long-haul IP network in a disaster area. We also proved that it was possible to create a distribution network, which could distribute either local satellite IP bandwidth, or terrestrial IP bandwidth delivered via wireless backhaul. Furthermore, we proved that we could provide the services of a small scale wireless ISP (site surveys, installations, checkup visits, reliable IP, phone support) in a disaster environment.

We proved that our service was useful and desired, both by the residents of Hancock County, and as a tool to empower fellow aid organizations.

Our experience deploying VoIP was inconclusive. There are tantalizing parts of our experience that support the argument that wireless IP and VoIP can provide much more nimble service than legacy utilities. However, due to failures of technology and management in our project, we deployed VoIP much later than simple HTTP. As a result, VoIP telephone calls were a very small percentage of our bandwidth, and an extraordinarily small amount of the "good" we did in the area.

I found that we were very inefficient in all aspects of the project. From the fact that we were doing this for the first time, through the organization of staff, to the lack of a budget, there were plenty of reasons why we were inefficient. All of them are solvable, and my overall impression is that a dedicated group could do this job much more effectively than we did by planning ahead for the next disaster.

In my opinion, to be effective the organization would need to:

All of these things cost money. Thus, to be effective, the organization would need a budget. However, it needn't cost too much. The equipment cache can be made up of donated equipment, some of which we already have. The staff member on call would not be a paid position, just someone who has made a commitment to act as a leader during a deployment. The money would mostly be spent on administrative and logistical support of the volunteers themselves. Without a detailed budget it is hard to know, but I suspect it would be possible to operate with $10,000 per year.

History of the Project

Genesis

I arrived after this chapter was complete, so the following is my understanding from talking to people involved. Any errors are due to my failure to learn the story right.

The project started when evacuees from New Orleans began arriving in northern Louisiana. Mac Dearman saw them at a church in his town and realized that by simply giving the church a free connection to his existing wireless network and putting a couple phones on a table, he could help evacuees make contact with loved ones. Within a couple hours, Mac had done a normal customer install at the church (like any other customer on his network), and installed a few pre-activated VoIP phones he had in stock (Uniden UIP-2000 phones with service from Nuvio). Mac received requests to hook up other shelters, both inside his network's coverage area, and in neighboring towns. He put out the word to the wireless Internet Service Provider (WISP) industry that with donations of equipment and time, he could repeat his success at the first shelter. People and equipment began arriving, and the team went to work.

Press reports archived on the Radio Response website document this work.

Meanwhile, in California, CityTeam Ministries reached out to Inveneo hoping that Inveneo's expertise creating rural telecommunications systems in Southeast Asia and in Africa could be used to bring phones to a project CityTeam was undertaking at the Powerhouse of Deliverance church in Bay St. Louis. Inveneo people (under the banner of a new organization, AidPhone) assembled an Asterisk-based system which made it possible to transparently use donations of long-distance telephone service from multiple providers using only the AidPhone telephones. AidPhone also secured a donation of 1000 analog telephone adapters, and as many analog telephones.

At Mac's farm in Rayville (Northern Louisiana, about 5 hours from New Orleans) the work to connect shelters to Mac's existing network was drawing to a close. The abilities of small northern Louisiana towns to support evacuees were exhausted, and their communication needs had been met by Mac and his volunteers. There was more work to be done, but the team had to find a way to be helpful nearer the heart of Katrina's damage to do it.

At this point (approximately a week after the storm), AidPhone and Mac's team met. AidPhone had a charter to work in Hancock County, as a result of their partnership with CityTeam. AidPhone had donated long-distance telephone service and IP telephones, but all the Internet service in Hancock county had been wiped out, so the IP-powered phones were useless. Mac's volunteers had the know-how and equipment to make a long-haul Internet link and a community distribution network to deliver it. Mac's team also had a desire to work in Hancock County, so the match was perfect. Mac and Mark Summer of AidPhone came to an agreement to work as independent groups in partnership to achieve the goal of deploying Internet service and AidPhone's phone service to Hancock County.

Hancock County project (and my role)

I joined the project on September 14, 2005, approximately 2 weeks after Katrina made landfall. The story from here on is based on my memory, my journal entries, postings on the Radio Response website, and email archives.

When I arrived, the team had just moved from Rayville to a staging area in Ponchatoula. There, we had a large air conditioned office in front of K&D Truck and Trailer Repair. The rent was paid for several months, courtesy of a donor. The office formed a useful base because Ponchatoula suffered only light wind damage from the storm, and so power and Internet were working reliably. Phones donated by Front Range Internet of Colorado were crucial at this point for coordinating donations and volunteers. We used an Internet connection shared wirelessly with a small business next door.

The first night I was there, I took on the role of back-office sysadmin. That night we renamed the organization that was previously best described as "the guys at Mac's farm" to "Radio Response". We made a website and moved content that one of our volunteers (Paul Smith) had previously made to it. We made this effort because we understood early on that many eyes were on us, and we needed to be able to explain concisely who we were and what we were doing. Of course, while brand-new back-office volunteers were doing this, other folks were out in the field, making contacts with government and private organizations, etc. It took some time to spread the word that we had a new name, and website, etc. We handled it the best we could, but it must have been pretty confusing to our contacts to have a group of volunteers change names overnight.

About half of the team was already in Hancock County. At this stage, they were camping at the Powerhouse of Deliverance church. They were living self-contained off of the food and water they brought with them into the disaster area. None of the team had access to an RV (with air conditioning), so the nights were hot and humid. The days were frustrating too, because at this point they were still trying to understand the situation on the ground and make contacts with the authorities. CityTeam's charter and backing were helpful, but nonetheless it simply takes time to meet the right people and gain their trust. Finding our first government liaison made all the difference.

The difference in comfort at Ponchatoula and Hancock County made some of us commute, coming back to Ponchatoula in the evenings. That was a workable system, but not having the entire team together in the evenings made it difficult for the support folks in Ponchatoula to know what work was needed. At this time, Hancock County was a blackhole of communication, at least from the outside in. Cell phones were working passably when calling out of the region, but folks there would get too busy and preoccupied to remember to call back to Ponchatoula to give updates.

As the project in Hancock County was getting underway, a request for help in New Orleans came in from Joel Johnson who was working on technology at the site of the Common Ground Collective in Algiers. Joel had managed to use donated hardware to make a computer lab. For Internet access, he used an EVDO card in his Mac iBook, and used the Mac as a NAT box and router. He'd created a second Internet access point using the same technology at the health clinic. The idea was to provide Internet access to the residents of Algiers, so that they could fill out FEMA forms online and get help quicker. With better bandwidth than the EVDO card, Joel also hoped to provide free phone service.

On September 17 (2.5 weeks after the storm) I visited Joel in Algiers to assess the situation there and see how Radio Response could help him. What I found was that DSL was working in the neighborhood, so a long haul link from downtown New Orleans was not required. Further, it seemed unlikely to me that the population in the neighborhood would be well served by a set of Internet labs as we envisioned in Hancock County. This was due to the small remaining population, and to their technological illiteracy (and in some cases, English illiteracy). Making a successful project there would have required a staff of attentive "buddies" to walk people through using the computers, and the right timing with the need to fill out FEMA forms and the return of the evacuees. The Common Ground Relief folks were preoccupied with running the medical clinic, a distribution point, and doing journalism to document the perceived security threats and human rights abuses. It was unrealistic to expect them to have the interest to run the education campaign needed to make a big Internet project successful. It simply didn't all align, and I went back to Ponchatoula having decided not to attempt a second Radio Response project in Algiers. I did commit Radio Response to helping Joel by giving him donated equipment when possible. Later, the NoMesh project came into being and worked on using the existing DSL in the community to make a community mesh network. Radio Response passed some of our donated equipment to NoMesh to help them out.

(Joel also wrote up a document with his lessons learned.)

Most of the people involved in NoMesh had worked for Radio Response on earlier trips. When they were ready to come back for a second work trip, they chose to work in Louisiana instead of Hancock County Mississippi. Having the flexibility to let people redeploy themselves like this made it possible for people to make themselves useful in the way that fit their temperament best. I think it was a strength of the self-organization evident in all of the Internet-related groups working in the area.

During this time, the activity at Ponchatoula was support work. Jim Patient and Kevin Cupit were constantly on the phone arranging donations, volunteers, and other logistics. Aleks Clark was working on a back-office system to help us keep inventory and volunteers organized. Then there was the refurbishment lab, operated by Ben Earnhart.

There was a huge shipment of PC's that were in various states of disrepair. Ben Earnhart of the University of Iowa did a fantastic job of creating a PC refurbishment lab and managing several volunteers to work through the pile of hardware to turn unknown (and pretty broken) hardware into working machines with a copy of Windows 98 installed. We ended up using about 10 person-days (the long 16-hour days typical of disaster work) to get about 40 machines prepped for delivery to clients. This huge labor cost forced me to later make a tough policy decision: I started declining donations of PC's unless they had been refurbished and had an OS installed before being shipped. This pushed the refurbishment work to the edge, outside the disaster area. The idea was to make the best use of volunteer labor both inside and outside the disaster area. The policy makes sense when described in those terms, but at least one donor went away from me very unhappy when I declined his donation. Two other donors were able to accept my conditions and sent 50 more computers ready to go. We passed those computers on to two clients: Morrell Foundation and St. Clare's Catholic Church and School.

For the record, we never committed the time to work the phones and get legal Windows licenses for the machines, though we were pretty sure we could have gotten Microsoft to agree to such a donation. Instead, we used a freely available tool we found on the net that can generate license keys (illegally). We would have preferred to skip the whole question of software licensing by using Free Software, but at the time FEMA's website required IE 6, and thus we felt we needed to deliver Windows machines, regardless of the licensing or management issues. The average machine had less than 64 megs of RAM, so we felt we had to standardize on Windows 98. We did make a few XP machines, and tried to place them where administrative work would be happening, as they run Open Office better, and have dramatically better USB support (for use with digital cameras, flash drives, etc).

While installing operating systems, we solicited a donation of Deep Freeze from Faronics. We hoped to use it to prevent the machines from being destroyed by spyware, adware, viruses, "helpful" users adding utilities, etc. The initial attempt at downloading and installing it failed, and I never committed the time it took to solve the problem. Faronics offered a direct contact with a tech support guy, but we could not find the time to make use of him. I regret that it turned out this way, because we did notice later that many machines got pretty broken pretty fast. It wasn't nearly as bad as I expected, but it was bad enough that we would have been providing better service to our customers if we'd gotten the Deep Freeze installation done right the first time.

Around September 19, I left Ponchatoula and started working full-time in Hancock County. Some support folks remained, as there was more work to do there (specifically computer refurbishment), but the situation in Hancock County seemed to be progressing, and it was time to bring in more people to augment the backbone team and start doing site installs. That turned out to be an optimistic version of the schedule, but there was worthwhile work to be done in Hancock County anyway, so it made sense to have people down there, even as the backbone folks struggled to get things working. At that point the Hancock County team had secured us access to the EOC, which gave us both a comfortable base to work from, and also more resources to work with. We started using our hardware and skills to repair, expand, and otherwise tweak the emerging campus IP network at the EOC; when there's IT people around, inevitably people ask you to fix things! Using a port on the EOC satellite uplink, we managed to build a small network connecting several of the Search and Rescue teams on-site to the Internet. We also helped the community radio station, WQRZ-LP get their studio phone line working again (using the Tracstar satellite in the Public Affairs Office).

IT in general, and IP networking in the EOC was a pretty decentralized thing. In general, groups were expected to come self contained, and there was little attention to how to integrate the parts into a sum greater than the parts. I believe this was a minor failing, but it is hard to see how it might have been different; groups come and go quickly, taking their equipment with them. Most groups are too small to bring a dedicated IT person, and those that do (like the military) might not be in a position to easily share resources due to reasonable policies that end up tying people's hands. Because we were completely independent of all the agencies, and had carte blanche from our donors to "do good" with the hardware on hand, we were able to act as an unofficial IT organization on the edges of the EOC. The EOC itself did not effectively take advantage of our skills, preferring instead to use an outside contractor (NVision Solutions). We kept clear of them in order to not start any kind of political problems.

While all this was going on in my world, the folks building the long-haul link to Gulfport were struggling to meet their self-imposed deadline of "just a day or two". As I was not present for most of that work, I can't tell the story of how those links got built. My understanding from listening at meetings is that there was a delay while MCI attempted to deliver the donated link directly to Waveland via the newest and least understood wireless technology, WiMax. Eventually, Radio Response folks got a chance to do the shot with our own equipment and people familiar with that equipment. We added another tower into the route to make sure the RF energy did not get lost in the ground; with a long wireless shot, the curvature of the earth plays an important role. One of the keys to success was using equipment that the people on-site were familiar with. The hardware and software that make up these systems is not incredibly complex, but because of the tight profit margins in the industry, the equipment is not of terribly high quality (software or hardware). It is important to know the quirks of the equipment in order to plan and install networks that work. Simply doing it according to the manual does not work.

After several days of work the two-hop link from MCI in Gulfport was up and running (the day was September 21, 3 weeks after the storm, a week and a half after the Radio Response team arrived in Mississippi). Meanwhile, the team had also been preparing a distribution network centered on the Waveland water tower. However, due to dependencies built in to the design of the network, we were unable to make progress on customer installations until the long-haul link to Gulfport was done. The problem is that you need functioning customer sites to shake out problems in the distribution network, and our customer sites were originally designed to depend on the DHCP server in Gulfport, on the far side of the long-haul link which was also not yet completed. Making far flung parts of the network depend on each other in this way was obviously a mistake, one of many lessons I learned on the project.

By the time the long haul link to Gulfport neared completion, there was incredible pressure to quickly show results. The pressure came partly from inside the group (motivated people simply wanting to see success) but also from our partners, who felt they'd trusted us to keep our word and come through for them, but we were failing to do that. As the new kid on the block, technologically, we had a responsibility to manage expectations, then beat them. With many different people from different backgrounds talking to government and private organizations, it was inevitable that some of them made promises we couldn't keep. That endangered our credibility and added pressure to show results. In an environment where you need the cooperation of many other organizations to get the job done, you cannot risk losing credibility, or you risk losing cooperation.

The result of that pressure as the long-haul link was completed was a very harried day of attempted customer installs. Despite fielding four teams to at least four sites that day, to the best of my knowledge none of the installations were completed. One reason for this frustrating and fruitless day was certainly technical problems. A much bigger contribution was lack of pre-planning, equipment preparation, documentation and team training. Ideally the delay getting the long-haul link ready would have given the part of the team not involved in the long-haul link time to do this preparation work. I did not step forward and lead this work because as a newcomer to this technology, I could not understand what needed to be done. In retrospect, it is easy to see what was missing. The prime thing missing was someone to take leadership who had a background in creating a distribution network and doing customer installs on that network. The people who could have served as those leaders were focused on the long-haul work, and were unavailable.

In the next few days, we got the customer sites working that we'd started on the first day. And with that, the network was up and limping. As with all operations projects, the nature of the work changed subtly from "building" to "maintaining" the network. We put time and energy into network monitoring tools (though I regret that it took me several weeks more to start using MRTG to make these graphs). Furthermore, we lost at least one install crew (and sometimes more) to maintenance tasks. The network was very flaky at that time, as we struggled with power problems at backbone and customer sites, mysterious IP address conflicts and ARP timeouts, failure to acquire addresses over DHCP, and a flaky long-haul link. Some folks wanted to move ahead on debugging VoIP problems at that point, but the network was simply not stable enough to justify using our time like that.

Hurricane Rita threatened the Gulf Coast about this time, and the team scattered. Some members chose to weather the storm at the EOC. Others, who were ready for a weekend anyway, traveled to Ponchatoula, LA and to Pensacola, FL. The team that stayed behind at the EOC got an incredible amount of work done, adding evidence to the theory that a small group of motivated people can accomplish much more than a large group of equally-motivated people!

When we returned after Hurricane Rita made landfall on the Texas/Louisiana state line (September 25, 3.5 weeks after the storm), the network was a little bit bigger, but that addition gave us all kinds of new options. With help from Rescue International, the team from Southern California Wireless, Bob van Zant, and Don Castella made a link from Waveland north to the EOC. In addition to making the network reachable from our lab and sleeping quarters, this also gave the network access to alternative uplink capacity. As our long-haul link continued to flake out, we now had the opportunity to use satellite uplinks at the EOC to serve as a backup link to the net.

The experience of managing a network with one reliable but low performance link (satellite) and one high performance (but unreliable) link was quite frustrating. It was made more difficult by a decision we undertook to only share our limited satellite bandwidth with customers at the EOC, and not those in the city of Waveland. Later, we got exclusive access to satellite links from Cisco and from the EOC, which we were allowed to share outside the EOC. Since that time, I've considered what kind of technical solutions I wanted available to me to make it easier to manage a network with multiple paths out the Internet. I will propose one such system later in this report in the section titled "Bandwidth sharing box".

During that time, those of us who made contact with customer sites were put in a very difficult position. With a network that was incredibly unreliable, it was hard to know, when visiting a site, if it would be working at all. And when it was, it was hard to promise anything about performance the next day. The customers were remarkably tolerant of this bad network behavior, but our credibility was certainly hurt by it, and it hurt my morale, and the morale of other volunteers to be faced time and again with saying the same thing: "yes it's down, no we don't know why, yes we're trying to fix it, no we don't know when it will be fixed". Failover and auto-rerouting capabilities seem like luxuries out of reach in a disaster-response network, but they are all the more important because every component in a disaster-response network is all the more stressed than in a normal network. Note: "Component" above includes the operations staff!

Around the time of Hurricane Rita, we experienced a major change in staffing. Leadership and key personnel who had been in place since the arrival of the group in Hancock County were ready to go home. This turnover started pushing me towards a leadership role, as I had been present as a worker during the building of the network, and would be present for many more weeks while we operated and grew the network. I chose to let the transformation of the team happen on its own, as those folks present were not in need of strong leadership, just an informal coordinating meeting from time to time. Practically speaking, however, starting around September 24, I was the on-site project manager, and I take full responsibility for where the project went (or didn't go) after that point.

During my tenure as the project manager there were two trends: a dwindling crew, and a solidifying network. After Rita, the crew was dominated by IT generalists, with the exception of Don Castella, a very experienced WISP owner from Chicago and Bob van Zant, a wireless ISP installer. Others included Brent Chapman, Raymond MacKay, Corlus Nance, Matt Justice and Sean Head. Sean worked on installations with Bob and Don. Brent and I trained the interns (Corlus and Matt). Finally, Ray, Brent and I worked on stabilizing the network, via documentation, long debugging sessions to understand the current behavior, and by ultimately implementing a network redesign incorporating what we'd learned from trying unsuccessfully to operate the network as originally designed by the first wave of volunteers. The redesign also took into consideration new donated hardware that became available later.

Don provided valuable training to the team on how to run a solid network. His van-of-plenty continued to turn up parts we needed to do professional installations long after any reasonable person would have expected it to be exhausted. Don's generosity with his materials, and his time, made a huge difference at a time when the project was really struggling to deliver a stable network.

Bob van Zant was the remaining backbone guy on the project at that point. Bob's most important task was to fix the flaky link to Gulfport. He tried several things, but the thing that finally made the link stable was switching to 900 MHz Trango gear, which limited the link to 3 megabits. Bob also worked hard on extending the network to Port Bienville, but due to confusion on aiming the equipment, he never got that link up. After Bob had to go home, I climbed the Waveland water tower and the Port Bienville water tower in order to place equipment selected and assembled by Don Castella and me.

Brent led the charge to gather data to justify the redesign, then made it happen. His first week on the project was unwittingly spent on the data gathering stage, as he struggled to make himself useful by doing customer installs and repairing failing things.

One of the other significant events after Hurricane Rita was leaving the EOC at the vocational school's wood shop and moving in with International Aid. This was a significant disruption to the team, but I tried to manage the move with as little disruption as possible. Having an office to work from at International Aid was very valuable to the team. Having our warehouse space reduced from half of the wood shop down to our 40' container was difficult, but by carefully packing we made the best of it, using the trailer itself as our warehouse.

With the network redesign out of the way and the network behaving in a predictable and stable way, we were able to start expanding it again. Don expanded the network north to the FEMA camp at the Equestrian Center outside the Kiln. This promised to be a great location, because our team, like many volunteer groups was housed there. Unfortunately, FEMA elected to close the camp and move the volunteers to NASA Stennis, making our bet on the FEMA camp not pay off in the long term. We also turned Second Street elementary into a repeater site, to give us better coverage in Old Town Bay St. Louis. We used Second Street elementary to extend the network to the Calvary Kitchen, and to CityTeam's community center at the McDonald ballpark ("Field of Dreams"). We hoped to also provide Internet access to the school when it opened, but I didn't follow through on that after I left the project.

During this time we had a failure in the middle of the network, separating it into two pieces. The northern chunk included the FEMA camp, International Aid, and the EOC. The southern chunk was the rest of the network all the way to Gulfport. Because we still had exclusive use of a hot-spare satellite at the EOC, we were able to arrange for the northern part of the network to use it to get out to the Internet. Several days later, Don got some firemen to climb the Waveland water tower for us to repair the problem. Later, on one of my climbs, I inspected the cable that had failed and found damage on it between the ladder and a VHF repeater put in place by the fire department. I met the fire department the day they took the repeater down, and they told me they had been changing batteries every 10 days. That means they'd had several chances to accidentally cut the cable to our equipment over the course of the six weeks it was there. The outage was probably almost inevitable, as we were using indoor-rated cable which was not adequately protected, and the fire guys are not trained how to work around networking equipment without breaking things. I relate this story not to cast blame on the fire fighters, but to point out some lessons to be learned: towers are shared, not everyone is as careful as they should be, cables will be cut and you have to design for it.

Another challenge that presented itself during this time was of a political sort. Students from the Naval Postgraduate School in Monterrey, CA had installed a wireless network like ours in the early days. Their network was primarily in support of government and public safety, but it also reached some private feeding centers and distribution points. When their deployment was finished, they left the equipment and went home. Apparently the Postgraduate School wanted their equipment back. Some government agencies (we don't know which ones) in Jackson MS let a contract to replace the Naval Postgraduate School's network with another wireless network. We were told that the contract gave the contractor exclusive access to the City of Waveland water tower. A sub-contractor of the prime contractor set their sights on us as amateur intruders in the way, and either via ignorance or maliciousness, convinced Waveland's Chief of Police that our equipment was hurting his telephone service (as provided by the Naval Postgraduate School's network). They carted off some of our equipment, and we never got it back.

There is a remote possibility that our equipment was, in fact, conflicting with theirs. Due to some poorly executed plans to interconnect the networks, our network might have been connected to theirs without the knowledge of the entire team. This is why it is critical to have a clear plan for how to handle multiple uplinks; connecting to someone else's public-safety network, then breaking their network, is a very good way to get in big trouble with the authorities.

The Waveland water tower was the center of our network, and having someone else claim exclusive access to it was a huge problem for us. We resisted making any changes to the network for a while, relying on inertia to keep our equipment on the tower; after all, who's going to climb a tower just to take down someone else's equipment? We were emboldened somewhat by the fact that we were at the time providing service to influential aid groups like International Aid and the Calvary Kitchen. The Mayor of Bay St. Louis ate most of his meals at Calvary Kitchen, so we hoped we had some pull with him. The standoff went on for a while until we had a meeting with the contractor to decide how to proceed. Our major concession was we would commit to staying out of the way of any location where the contractor was being paid to provide Internet. Mac Dearman made a deal with the contractor that Mac felt was in our interest, but shortly after that they stopped returning our calls, making it impossible to proceed with the deal. Mac and I surmise that the contract was getting renegotiated, or it fell through or something. I eventually broke the deal (of my own accord) and installed a connection in McDonald ballpark, one of the places we were supposed to stay away from. I did this on my last day on the project because I was unwilling to leave the community center at McDonald ballpark off the net due to political problems. When I told Mac what I'd done he agreed it was a good idea.

The remainder of my time in Hancock County was spent on routine visits to the customer sites to check on them, make small repairs, extend customer networks, and so on. During this time I was preparing to hand the network off to the interns. Corlus Nance was not able to put in enough hours on the project to get fully trained due to schedule conflicts with another job of his. So it was up to Matt Justice to learn all he could to be able to maintain and grow the network in our absence. Don left a few days before I did. Brad Jackson made a return visit and worked several days during the final hand off to Matt. I left October 28. Matt's first day running the network was October 29. In the mean time, Brad repaired the Second Street Elementary site, which had been disassembled by the renovation workers.

At the time of the hand off, Mac Dearman visited and took custody of the equipment in the trailer. I had planned to deliver him an inventory, but Mac, his wife Sharon, and Brad packed the trailer before I had a chance to do the inventory. I'd finished a renumbering of the network, making it much easier to understand. I documented the state of the network in a series of tables, and got help from Brent to update the network diagrams.

Hancock County after the Hand Off

Matt Justice documented his work exceptionally well using the Radio Response website. His first entry was the 10/29 update. Updates after that tell the story of Matt's maintenance work and work on new connections. It also mentions all the help he got from Bruce Barton of Rescue International, who had helped us all along. Bruce became especially helpful after I left as another mentor for Matt.

The future

The future of the network is uncertain. Matt's commitment to the project expired in mid-December, when his semester ended. He's enjoyed the work, and will probably come down to Hancock County a few more times, but he will not have a regular schedule in January.

Some customer sites are disappearing. The FEMA camp has been closed since late October already (but our network still reaches there). International Aid's last day was reported by Matt to be December 8. The New Waveland Cafe left the day after Thanksgiving. However, other sites will be there for the foreseeable future: the Davis store, Second Street Elementary, the CityTeam community center at McDonald ball field, and the Morrell Foundation's iCare Village.

One idea people discussed was forming a locally-operated non-profit organization to take over the network. The social support networks of the county are in tatters right now. While it might have been possible before the storm to find an interested board of directors, funding, and so on, it is currently not possible, in my opinion. However, we did not make careful inquiries to see what folks in the community thought about this option. The high school next to the vocational school apparently has a computer teacher that might be a good resource. The community college down in Bay St. Louis that was destroyed might be another direction to explore to find interested people to maintain the network.

Another idea is to select a turn-off date, and schedule a work-weekend of local team members (Brad, Mac, Sharon, Matt) to collect all the gear. Once the gear is collected, it could be used as the beginning of an equipment cache. A cache such as this is described later in this report as part of my vision for a more successful deployment. There would be a moderate cost associated with the cache, for a storage unit, or for the one-time purchase of a shed to be placed on Mac's property.

As of now (January, 2005) I do not know what will become of the network. Fundamentally, determining its future falls on the shoulders of Mac Dearman, the founder of Radio Response.

Achievements

In this section, I discuss the customers we reached, the applications they used, and the network that made it possible. The first two are measures of our achievement against our goal to "be helpful". The third is a measure of our technical achievement given the situation we were faced with.

Customers

Because of the fluid nature of the network, there may be customers who I forgot (or who I never even knew we had). This list is current as of December, 2005.

How the Network was Used

Timing affects usage patterns

One thing we learned is that making a long haul network takes time, lots of time. The incredibly quick successes Mac's team had in northern Louisiana were due in part to Mac's network already existing. The hard work was done long before the need arose to bring the churches online. In Hancock County, Radio Response was called on to build both a long-haul and a distribution network from scratch. That took time, and that time delay affected the way our network ended up being used. For instance, one much touted application of Internet technology to Katrina, finding loved ones using services like KatrinaList.net was already a solved problem by the time our Internet access was available; people already had found their loved ones some other way. We rarely witnessed people filling out FEMA forms online using our computers. They had already done their FEMA paperwork at the FEMA field station just 100 feet away from our Internet lab. The single biggest use of the net was for ordinary "I'm hanging in there" type email. One lady sat down and said, "Thank god! Now I can pay my bills!". Another young woman sat down and started looking for a new job, as the day care center she'd worked at before the storm was now closed. Finally, out at the Davis Store, I heard that they found an email address for the state unemployment insurance office that allowed them to get a question answered even though the phone lines were jammed.

Relief worker usage

It would not be a stretch to say that an equal amount of usage came from disaster relief workers themselves. First, organizations used our Internet connection to communicate with their home base, making them more effective than they would have been with cell phones only. Second, individuals used the Internet connection to explain what they were experiencing to friends back home. They sent out email to worried parents and posted to blogs. Sharing their experiences like this helped attract more volunteers and resources to get the job done. In fact, the Radio Response blog contributed to getting extra waves of volunteers that we might not have gotten otherwise. I've also seen spikes on the traffic graphs that lead me to believe large files, probably videos, are being uploaded from the network. For a sample of the kind of video being published from the Katrina damage area, search Google Video for Waveland. I don't know for sure, but it's possible some of those were published over our network.

Storm-specific uses

Out in the Internet, a huge amount of attention went to the problem of moving health and welfare messages. Frustrated engineers and other Internet users trapped by circumstance in their hometowns channeled their desire to help into systems like Katrina List. Google and other companies aggregated the data into public search engines like Google's hurricane-specific people-search page.

However, our network was up and running too late to be of any use for posting information like this. By the time our system was in the hands of residents, they had already found some other way to report their status. It is likely that users of our network were searching public databases for the status of other people, but they were not actively filing health and welfare reports over our network.

Another storm-related Internet use that had a significant amount of attention was applying for government assistance over the Internet. By observation, it seems our network was not used much for this. In all my site visits, I never saw anyone doing a FEMA application online. I don't know if SBA loan applications were possible online, but it's a moot point because the best place in the county to do SBA paperwork was at the Small Business Recovery Center created by SBA and the Chamber of Commerce at the Coast Electric conference center. There, they had public fax service and satellite Internet. I never got a chance to visit, but my understanding from radio interviews with SBA folks was that there were enough counselors on hand there to personally help each applicant. Likewise, FEMA set up a processing center in the K-Mart parking lot. With that kind of support, it's no surprise that people were not using the Internet to apply. They should have anyway; I heard a report in Algiers that well educated residents with easy access to the Internet out of the area got FEMA financial assistance within days of filing an Internet claim, while their poorer, less educated neighbors waited in Algiers for FEMA representatives to come door to door. It stands to reason that getting your application submitted as soon as possible via the Internet would be the best strategy.

I suspect there's also a self-selection effect at work. My impression of what websites people were using was only from observation of our computers. Laptop users could be expected to be more comfortable with doing financial tasks online. So they might have been using the FEMA website and I just didn't ever see anyone doing so.

As Rita came ashore, time and again I'd find people using our computers to track its progress. Access to television was severely restricted by the living situation of most residents. Radio was widely available, to those that had cars, and to those that remembered to pick up a hand-held radio at one of the distribution points. Because our computers were at feeding centers, it was easy to drop by after a meal to check on the status of the approaching storm.

Internet Lab Use

The public computers were used the same way public computers anywhere (libraries, Internet cafes, hotels) are used. The most common use was web-based e-mail (Hotmail, Yahoo, Gmail, and so on). Though we did not supply the computers with IM clients installed (a simple oversight, not a policy decision), most computers sprouted IM clients immediately since we allowed users to install their own software.

One problem sometimes encountered with public Internet terminals is for people to view objectionable material. We had no reports of this problem, though some of our customers were worried about it when we brought the computers in. One reason why is that we were always careful to place the computers such that the screens were visible to the public. With no privacy to indulge in bad behavior, people don't. Arranging the computers like this is a trick I learned from working in an Internet cafe in Guatemala.

Here are some other things people told me they were using the computer for:

Telephone use

A loudly-touted application of the network was to be restoring telephone service rapidly, using Voice over IP (VoIP) technology. In reality, VoIP arrived on our network later than e-mail and HTTP access. The reason why is that the e-mail and HTTP protocols were developed at a time when connectivity in the Internet was much slower and lower quality than it is today. The protocols have built into them (either explicitly, or implicitly) an assumption that the underlying network will be slow and unreliable, and as a result they degrade gracefully in such a network. VoIP, in contrast, only came about in the last 5 years or so, and has mostly been developed in an environment of cheap, high speed, high quality (low packet loss and low jitter) networks. There are legitimate reasons for why VoIP is engineered as it is, but the bottom line is that we were not able to deploy VoIP until we were able to deliver a very high quality network. This meant that VoIP came much much later than other applications, so late that alternative modes of making voice contact had already been in wide use. My observation is that in our deployment in Hancock County, VoIP was an order of magnitude less useful to the citizens than simple HTTP access.

Cell phone and public telephone service was widely available and reliable by the time we were able to provide VoIP-based telephone service. This meant that our telephone service was considered by most customers as a nice touch, but of secondary importance to the public Internet services.

The one exception was at the Davis store. This location was several miles outside of town, where the BellSouth public telephone banks were located. As a relatively poor neighborhood, cell phone penetration was very low. People were unable to get new cars to replace cars destroyed by the flood. As a result, the VoIP telephone at the Davis Store was the only telephone within walking distance for about 300 people in the neighborhood.

Nonetheless, a user at the Davis store told me that he was getting busy signals from the Mississippi State Department of Employment. He sent an email and got a reply within a day. Further proof that the store-and-forward technology of the traditional Internet beats the new VoIP technology, unless the communication task must involve voice and must happen in real time; virtually the only task that requires VoIP seems to be letting grandparents hear grandchildren's voices!

Open Network, Unknown Users and Uses

As we operated the network in a maximally open manner, I'm certain we had customers and offered services that we never even knew about. It would have been possible for unseen laptop users to join the network; it would have even been possible for someone to set up a wireless bridge from one of our customer sites to their own network.

As an aside, there were many more laptops present than I expected for a rural community recovering from a category 5 hurricane. On reflection I believe it is because a laptop is easy to move. I surmise that many people packed the laptop when they evacuated, and brought it back with them when they returned. A number of people commented to me, "I used to have a desktop computer like this, but it got destroyed in the storm."

The ability to fax was a common request. We were unable to offer analog fax machines with our VoIP configuration, but it seems likely that enterprising users with laptops were able to use digital cameras or scanners and Internet fax software to make their own fax service. Huge outbound bandwidth spikes from time to time imply that people were publishing video from our network out to public hosting services, for instance Google Video. I never witnessed applications like this on our network, but the beauty of the Internet is that Radio Response did not need to plan for all the possible applications. By providing simple IP (even IP NAT'ed behind two routers) people could use the network for what they needed, when they needed it.

Network Design

The network went through two distinct phases, with two very different designs. Each was an achievement, as was the transformation from one to the other.

Initial Design

The initial network design was driven largely by the requirements of the initial VoIP hardware we had available for the project. The design also had a certain KISS (Keep It Simple, Stupid) aspect to it. In retrospect, some simplifying features of the network were untenable. Another limitation on network designs was the paucity of customer-edge equipment available to us.

Designing a network for the needs of one particular application is widely agreed upon to be a bad practice, but as the team members arrived on scene with a vision of the project that focused on telephone service (to the exclusion of more traditional protocols like HTTP), it was somewhat inevitable this might happen. Add to the focus on VoIP, the difficult requirements presented by the VoIP hardware (DHCP virtually required and only one layer of NAT allowed) and the design was basically set in stone before any alternatives could be considered.

The design called for a flat network using Ethernet bridging from the Cisco in Gulfport all the way out to the farthest customer device (PC or VoIP telephone adapter). The Cisco was to be configured to be the only DHCP server on the network, and also to provide NAT services for the network. The network was 10.10.0.0/19. The DHCP range was originally slated to be the entire range, less the router IP address, 10.10.0.1. Radios were to be given addresses from 192.168.0.0/24, with no provision for packets to be routed between 10.10.0.0/19 and 192.168.0.0/24 as a "security measure". This set us up for a situation where routine diagnosis was impossible from behind NAT boxes, and network management tools would need to be dual-homed to monitor the entire network.

An IP allocation plan in the 192.168.0.0/24 net block fairly quickly proved not to scale as the network grew, so that eventually the addresses in 192.168.0.0/24 were hopelessly scrambled, requiring that workers refer to an up-to-date network diagram in order to have any hope of understanding the network.

The first problem we found with this design was that there were no static addresses available for customer equipment that needed to be statically assigned, like print servers. We requested a range of static addresses from the router administrator to solve this problem.

The next problem we quickly saw was that DHCP was flaky or outright broken at customer sites. The problem seemed to be that broadcast traffic was being blocked in various parts of the network. Because none of our customer premises equipment supported DHCP relay, we were counting on broadcast working right from end to end in the network. It didn't, but we never figured out why, exactly. The Trango firmware supports features related to clamping DHCP, as do the Nortel switches that were donated to us, and which we used at every customer site. We tried to disable all broadcast blocking, but it's clear we were not successful.

Brent and I saw, but did not successfully diagnose, situations where two computers on one physical LAN behaved differently. For instance, one could ping the router in Gulfport, and the other could not. When dumping the ARP table on the router, the MAC address of the broken machine would appear to have been proxy ARP'd by some other part of the network. Fundamentally, our network design depended upon wide-area ARP working correctly, and in our network broadcast packets were not reliably being passed, so ARP was not reliable.

IP address conflicts on both the 10.10.0.0/19 and 192.168.0.0/24 network happened a few times because we did not have reliable record-keeping mechanisms.

As a result of all these problems, mixed with flakiness on the long-haul network link to Gulfport, our network was exceptionally unreliable. Worse than simply being broken, it was behaving erratically; sometimes things worked right, encouraging customers to keep trying, then discouraging them when things failed again. It was a very frustrating network to work on, in part because I watched it get built and though I had no better idea to offer the team, I had half expected the design to have these types of problems.

The second design

Several things worked together to make a second design possible.

First, it was necessary. The network was so unmanageable, we simply had to do something. We were unable to grow the network while chasing after bugs, and our customers were losing patience with us, going so far as asking us how they could order satellite connections to replace our failing connection.

Second, the natural turnover of staff brought people into the project (specifically Brent Chapman) with new energy, experience in situations like ours, and with no history on the project. The turnover also sent people with a vested interest in the first design home. Brent had nothing to lose by proposing and implementing a new design. Furthermore, I used my tenure on the project to give me the authority to make decisions on the behalf of the project. I encouraged Brent to fix the network, and promised him I'd run whatever interference I needed to so that he'd not have any political pressure on him for doing so. Some would argue that with such a small project, for such a good cause, there should be no pride nor politics to overcome. To those people, I'd respectfully request they remove and discard their rosy spectacles. People are people, and people under pressure behave even less reasonably than you'd normally expect.

Another thing making a redesign possible was our realization that VoIP was not the killer application of our network; HTTP was. This we could see by the willingness of the customers to brave our flaky network in order to get their email by simply hammering on the reload button when things didn't work right. All this time, we were unable to devote time to debugging the broken VoIP phones we first deployed. Nevertheless, customers weren't complaining about the broken phones; they wanted reliable HTTP not VoIP.

The last thing that figured into the redesign was that we received 10 out of an eventual 50 Linksys ATA's (consumer-grade broadband (ethernet-to-ethernet) routers with built-in analog telephone adapters). These gave us a cache of equipment which could be configured identically. Together with the Trango SU's we were already using, we were able to create a standardized demarcation between the core network and the customer networks. The Linksys provided a local DHCP server. To get that benefit, we had to add a second layer of NAT, but that made it easier to understand customer sites, because they could always be configured with the same subnet, making training of technicians easier. The second layer of NAT did not affect the VoIP implementation, as it had access to the external address on the routers (though we proved later that the Linksys ATA can also work behind multiple layers of NAT fairly reliably using NAT keep-alive messages).

The final design is documented by Brent on the team wiki. The key to the new network design was to get rid of DHCP on the backbone, and carefully guard access to the backbone. Together with a new numbering scheme I implemented after Brent left, the network took on a stable form that others have been able to maintain after the team that deployed it left, an attribute that the first incarnation didn't have.

I believe this design could be used for a pre-staged network in order to reduce the amount of configuration (and therefore, time and expertise) needed in the field. I will propose such a network later in this report.

Lessons learned

In the following sections, I put into writing lessons I learned while working on the project. They are in an order that makes sense to me, but practically speaking they all basically stand alone.

Applicability of our approach

In Hancock County, access to the Internet was needed and appreciated. Telephone service was available for affected citizens, but it was not convenient to those without cars. As a result, our VoIP service had value, but was far from the only way people could communicate via voice. Cell phone service was reliable within a few weeks, and cell phone vendors set up tents to sell new service to those who were living without landline service for the first time in their lives.

Aid agencies, both private and government, take cues from corporations on how to conduct themselves. Both FEMA and Red Cross depended to a huge extent on telephone service working. Their behavior in this regard was strange, as it seemed to disregard the reality that close to 100% of the victims from Hancock County were without reliable personal telephone service. Signs popped up on shared telephones urging the lucky few who got through to the severely overloaded FEMA or Red Cross call centers to keep the agent on the line and hand the phone off to other citizens. Since aid agencies evidently prefer to receive requests via telephone, groups like ours that seek to provide telephone service will always be welcome. It might make more sense, however, to cut the dependency on groups like ours, and simply offer the services in the field, without hiding behind a call center.

Our approach to deploying IP needs refinement. Our approach was to build a long-haul network from Gulfport to Waveland, then build a distribution network inside of Hancock County. Because the same experts who were building the long-haul network were needed to make progress on the distribution network, the two ended up being deployed in sequence. It would have been better to concentrate on distributing locally available satellite bandwidth first, then finish the terrestrial long-haul network and switch over to the higher quality terrestrial network later. The easy availability of satellite uplinks in the disaster area surprised us, making our approach of "deliver IP into the region, then distribute it" the wrong approach.

A free wireless ISP (and telco) in the middle of a disaster is useful for private organizations that are telco-challenged. Rich, well prepared organizations bring a van with a DirectWay satellite unit on top. Organizations with connections at the EOC can get the incumbent carrier to expedite landline phone service restoration for them. But the majority of small teams, even from rich organizations, benefited from having experts take care of networking, so that they could concentrate on what they are good at. For example, we provided telephones and Internet to a team from Pfizer, who was distributing drugs to local clinics. We also provided Internet service to a Navy Seabee base (probably for morale-related use, not operational use).

It is unclear how many of these observations are applicable, as they come out of Hurricane Katrina, whose scale was so huge. It could be that lessons taken from Katrina won't be useful for the next 20 years' worth of hurricanes. I was too busy with Katrina-related work to watch carefully after Rita and Wilma to see what the needs were. There were calls from people outside the disaster areas for us to go to those hurricanes too, but one thing I learned by working inside a disaster area is to ignore the people outside, and only believe the reality on the ground. Without having visited east Texas or Florida, I don't know what the needs really were.

Applicability of this technology

The technology we used (Trango Broadband long-haul and distribution equipment, outdoor 802.11b equipment, and consumer-grade home networking equipment) was appropriate for the job, but it did present some problems.

The quality of the engineering of the software (and to a lesser extent, hardware) is very low in these types of devices. Software bugs are very common, and unless you are using a particular "blessed" version of the firmware, behavior is far from predictable. Because most people on the project were not familiar with the devices (thus knowing the features to avoid, and the blessed version numbers), it was very hard to tell the difference between mistakes we were making, software bugs, and hardware failures. This was not a theoretical problem; we saw all three types of problems, sometimes two at a time (making them exponentially more difficult to debug).

Software quality problems and all, our technical approach was still the right one to take. Because these are simple, cheap devices meant to be integrated by relatively inexperienced network engineers (or in some cases, completely untrained home users), they are easy to use in an environment with lots of people of differing backgrounds. And because they are loosely coupled, they still work right when other things aren't working. A closed system that depends on a proprietary configuration server would be dead in the water when the configuration server lost power, for instance (a common occurrence in a disaster area).

Planning

Though it is difficult to remember in the heat of battle, "A good plan today is better than a great plan tomorrow". This mantra lets you make progress today, but builds in to the network problems that are going to stack up and bit you later. So you have to learn that when you are operating day by day on what could charitably be termed a "good plan", you must schedule time later for rework, to incorporate the unknowns the "good plan" glossed over. This is true in all network design, I think, but it is a bigger deal when the cycle time is so short; a network built last week might be ready for significant rework this week.

This is a common problem in the emergency management context. Normal management skills and techniques are not useful during the period when it is impossible to plan more than a day ahead. Leaders who are successful in this environment are grown, not trained. Thus it is important to have continuity in an organization. Holding at least one drill before hurricane season, organized and lead by the person who is committed to leading an actual response would be the ideal way to grow such a leader inside of our group.

Information Management

Much has been said about the "fog of war", and the "chaos of disaster areas". It's true, all of it. And yet, it is manageable. Experienced agencies know how to make a dent in the problem, but with weak technology backgrounds, they might not even be getting as far as a combined emergency management mindset plus technology could get. People attracted to the Radio Response project are familiar with tools to manage information, but didn't know how to work in a disaster context. There is definitely room for improvement.

Below I identify some of the lessons we learned about getting and using information. I tie it together at the end with a proposal for how I'd do it differently next time.

Getting Intel

It is unrealistic to wait around for someone to tell you what to do. The authorities don't know any better than you what needs to be done. If you expect to get direction, or even accurate intelligence, from the authorities, you'll be disappointed. It's not a matter that no one knows the answers to the questions, just that the people you have access to don't know and don't know how to find the data before it becomes stale. There was a daily coordination meeting in the morning, but we were not invited to listen in. Our government liaison, Bill McCusker, shared what he could from these meetings, but it wasn't very helpful.

The authorities bring Internet access with them to support themselves. Their priorities were understandably on other aspects of the relief effort, so they are not too interested in a project like ours to bring Internet to citizens. That doesn't mean the our project is unappreciated, it's just that the limited capacity of the county emergency managers did not give them the luxury of giving us detailed briefings, etc. To those who argue this is temporary, and that eventually emergency managers will see Internet access a necessity, I disagree. The priorities are transport (without which you can't move resources to solve any of the other problems), then communications, then survival commodities like water and (later) food. Communications is a very high priority, but the needs are met with a small set of linked VHF repeaters and standalone satellite connections, not with an Internet distribution network.

So if the authorities don't tell you where to go and what to do, how do you find out? A huge amount of it comes from chance encounters, and these are facilitated by driving around and talking to people. It seems incredible, but it works. Like minded people sort each other out. Radio Response might have been using this "network" in the early days in Hancock County, but I don't know; I wasn't there, and information dissemination inside of the organization was too spotty for me to know what information others were gathering.

Another reason you need to gather information outside of official channels is that the authorities don't know everything that is happening, and can't. American society is broken into many classes and divisions, and while soft-focus human interest stories try to tell you that disasters bring us together, the opposite is in some ways true. Traditionally disenfranchised communities get forgotten by the authorities (not out of malice, but because the authorities are overwhelmed and the disenfranchised community doesn't have the contacts to be heard). Fear of racism might keep blacks from seeking help from a city government they have long perceived as white dominated. Some groups help themselves, and do not ask for outside help. Some even end up rejecting outside help due to conflicts with authorities. People with legal judgments or arrest warrants outstanding will refuse to come to official aid stations. As a non-partisan aid group, we had a responsibility to reach out to all kinds of groups, not just the ones the Hancock County EOC knew about.

One way we found a new customer was just like in a commercial WISP: word of mouth. A volunteer with International Aid liked our service and recommended us to the Morrell Foundation. Likewise, a church pastor visited International Aid to place an order for his distribution point and we set up an appointment with him for a site survey on the spot. He came looking for bottled water and left with a promise of Internet access.

Though it came too little and too late, the EOC started trying to provide networking events to let volunteer agencies tell each other what they are doing. That probably would have been very helpful to us, had we not been shifting into a maintenance-only mode by the time the meetings started happening.

What Data?

Before we start talking about how to manage the flow of data better, let's list the kind of data we were dealing with:

The amount of data is substantial, and it comes in forms other than just text. However, even only having an up to date copy of the text info printed out would have been useful at times. I took to carrying a printed phone list and network diagram with me tucked into my notebook. Other team members came up with other techniques that worked for them, more or less. But on average, I think it was safe to say that people didn't have the data they needed, when they needed it.

Sharing Intel

Once you get that intel, how do you get it into the hands of people who need it? The kind of people working for Radio Response are used to using e-mail, wikis, and databases to manage information, so it was inevitable that various people would propose to do so, especially people "at home", separated from Hancock County by thousands of miles, but wanting to be helpful. This was pretty much a failure, and the reason why was simple: connectivity.

It should have been obvious, but when you go to build the Internet someplace, attempts to keep yourself organized using the Internet are not going to work very well. If you get access long enough to enter data, then you'll likely not have access later when you need the data.

Sharing the information over the Internet is clearly extremely valuable, but the problem is it can't be the only way information is shared. So whatever systems the team uses need to work locally until Internet access is stable, and then need to work remotely as well. There are two choices where to put the server the team is using: on-site in the disaster area, or out on the Internet. The latter appears attractive for a number of reasons, but since the majority of the updates come from the people on the ground, and they will need the information even when they can't talk to the public net, it is best to put the server on-site with the workers, then use some kind of script to do one- or two-way replication with the public copy of the wiki.

A proposal

My proposal for how to handle this is to use a local wiki, with custom software to sync the local wiki to a remote one when possible. Such wiki-syncing software might already exist. The local wiki would need to live on a laptop dedicated to the job, not on someone's personal laptop that will leave with them.

This is much more than a technical problem, of course. It is fundamentally a management problem. It takes leadership to convince the team that investing the time in gathering and exchanging data at the end of the day will make the team more effective the next day. I would assign someone the job of interviewer and reporter. They would gather data from people in the evening into the locally hosted wiki. They would then print a packet of the fresh data (specifically a phone list, a customer list, GPS waypoints, and a map) for each team member to be handed out at the morning meeting.

This would have to be a priority from the highest levels of management (i.e. the most respected team member). When I was filling the role of the reporter, it was considered a luxury at the end of the day to chat about what happened, not a necessary debrief. As it was in an informal bull session "around the campfire", people did not have their notes with them, so gathering contact info and GPS coordinates was impossible.

Uplinks and Timing

One thing that I learned by watching the backbone guys at work is that planning and installing a microwave link is hard. It calls upon a varied set of skills from esoteric things like RF engineering, through political maneuvering and salesmanship (to get access to towers), to hard, sweaty, dangerous work (hanging radios at 200' above ground level in 100 degree temperatures). It is a job that requires a critical mass of highly trained people (ideally four people: one team of two on each end of a link). It is a job that does not go faster with more people, and sometimes is limited by things outside your control (weather, political climate, RF interference). It is not something that can be scheduled, nor rushed to completion. Installations take longer than you'd expect, and require an array of special tools and supplies (U-bolts, antenna pigtails, waterproofing compound, cable ties). Installations have to be done carefully and to the highest standards of workmanship, because climbing the tower to fix something is arduous and makes for long outages. If any of the required items (radio, supplies, trained people) are missing or turn out to be unusable, any hope at keeping to a schedule goes out the window.

In contrast, satellite connections are fast and easy to set up. Some systems aim themselves. Higher bandwidth systems need to be professionally installed, but it can be done with one or two people in an afternoon. Most satellite connections are set up to allow visiting users with a laptop to connect into it.

As a result, satellite bandwidth is fairly easy to come by during the time it takes to engineer a long-haul link. A network that can take advantage of those differing uplink technologies would be up and running faster than one that is designed with the assumption that it will be using a long-haul terrestrial link for its only connection to the Internet.

Cooperation with Other Organizations

County Government

In this incident, the Hancock County government was the prime controlling agency for the recovery effort. Such local-level control might always be the case, I haven't seen enough disasters to know for sure. In the United States, our preference for local government is encoded deep in both our culture and our laws. We have a distrust of the "feds", and laws on the books that limit federal power. There are certain things that state and federal government cannot do, even in a disaster, until the local government invites them in. On the plus side, this means that decisions about the future of a community are made by people from the community. It also means that working with the emergency management people is going to be more a matter of personal relationships than official policy. If a county employee trusts you to git'r'done, then you'll be free to do your work, without someone from Washington DC asking you why you are doing it. On the negative side, county emergency managers are likely to be less trained, and less versed in technology like community wireless networks. It wouldn't do any good to make our case to the FCC and expect the FCC to be present in every disaster; disaster response is controlled by local people. So, immediately building personal relationships inside the EOC is critical to success. Finding a government liaison who was excited about our mission made all the difference.

Incident Command System

The response to Hurricane Katrina in Hancock County was facilitated by use of the Incident Command System (ICS). It is a pre-planned organization system that is designed to scale from a single house fire up to a Katrina-sized event. It is commonly used throughout the United States. Its origins are in wildland fire fighting in California in the late 70's. ICS training is widely available on the web, and formal education is available from FEMA, and through individual states. The ham radio community sponsors training in ICS via the ARES/RACES system of volunteer disaster communication teams.

It would have been useful for several Radio Response team members to have been trained in ICS, so that the operation of the EOC would have made more sense to us. Our first government liaison, Bill McCusker, did a good job of making sense of the situation for us. After Bill went home to Florida, it was up to us to integrate with ICS ourselves.

One aspect of ICS that is critical to understand is the Emergency Support Facilities (ESF). As communications providers, we are part of ESF-2, Communications. However, as volunteers, we also need to be in touch with ESF-15, volunteer coordination. And finally, to get access to EOC facilities (for instance, to get warehouse or lab space, or access to a water tower) we needed to talk to ESF-5, Planning.

People in the EOC are of two minds when it comes to volunteer groups like ours. There's a tribal, us vs. them mentality that happens everywhere in human society. They wonder, "what are these amateurs doing getting in the way of the professionals?" This is a problem that ham radio operators have suffered for decades, and it's unclear it will ever get better. Luckily, in Hancock County, distrust of amateurs was at a minimum, and cooperation ruled the day. Regardless, we still felt a fair amount of pressure to prove ourselves quickly. This lead to some poor quality work during the first few days which we paid for later. Perhaps this is how it always has to be in a disaster context, I don't know.

One sticking point is RF emissions. It seems that most people who handle radios for emergency operations do not understand electronics, physics, or RF propagation. I don't know what their backgrounds are, but in my experience, they consider any non-government use of RF equipment a threat to their turf. It's important to remember, too, that a job like frequency coordination attracts controlling personalities. After all, if you get your kicks by telling people what to do, what job could possibly be more rewarding than telling a bunch of visiting police departments that they are not allowed to use their toys? Of course, personalities like this are rarely swayed by facts, or by regulations. It doesn't do any good to say, "Our devices operate in the unlicensed 900 megahertz band." All they hear is 900 MHz, and they say, "900 is already in use by the radio station, use some other band, or I'll have you arrested." Our response to this declaration was to go talk to the radio folks (who were already customers of ours, and thus loved us) and confirm that they were seeing no interference. Sara Allen of WQRZ-LP was more concerned that their studio to transmitter link was causing interference to us!

Private volunteer organizations

You can and should ask for favors from your customers. They are willing to "pay" for their Internet service by bartering. We ate many of our meals at kitchens we'd provided with Internet service. We got office space, a place to park our trailer, and even delivery services from International Aid. Towards the end of my time there, I was even living with them, sleeping on the bed in my car and using their bathrooms and showers.

Volunteer SAR teams, visiting Fire Fighters

There is a long tradition in the United States of integrating the services of volunteer groups into the operations of the professional emergency response teams. For one thing, it wasn't too long ago that literally all emergency response in the United States was done by volunteer fire departments. In rural areas, volunteer fire departments are still the rule. So there are a large number of volunteer search and rescue organizations that are deployed to help find bodies after a hurricane.

These organizations benefit from having a free wireless ISP. They were some of our most appreciative customers. The nice thing about hooking up SAR guys is that they have skills and equipment that's useful to us, so once they've sent email home to their wife (or use iSight and iChat to see their kids!), they owe you one, and you can get them to climb water towers for you, or loan you UPS'es.

They are typically under-used, because they are typically over-deployed. You'll sometimes see three SAR teams camped out when there is only work enough for one team. As a result, they are interested in helping out wherever they can. If you can give them a job, they'll get it done for you. This was also true of professional fire fighters assigned to Hancock County on mutual aid contracts.

One thing to understand about SAR folks and fire fighters is that while they may be eager to work, they do not know how to do neat and tidy installs. It's the facts of life, and you live with the results, but you should at least be aware of the problem going in to it. One of our biggest outages (losing AP 1, linking Waveland water tower to Stennis water tower) was probably due to fire fighters replacing batteries in a repeater. They smashed the cable our team had left unprotected, cutting several of the pairs of copper. Whose fault was that outage? No one's really. It's just reality in a disaster environment. Our equipment was not even labeled, so they had no way to contact us, if they'd wanted to.

The team that gave us the most help was Rescue International. Bruce Barton of Rescue International loaned us lots of equipment, and arranged to have one of his guys climb Stennis International Airport's water tower. Later, once I left the project, Bruce worked with Matt Justice to help bring new sites on line. He was so helpful he became an honorary Radio Response guy.

Electric Companies

We didn't have any interactions with the electricity companies, but to me, they seem to be the ideal partner for us in the future. Here's why. First, by definition, where they are, the power has been restored. They have lift trucks, making it easy to do tree-based installs at customer sites. They seemed to me to be way better organized for rapid response than every other organization (certainly than the EOC). They had a comfortable place for their guys to sleep, and to eat. Their camp was eventually taken over at the FEMA camp at the Kiln, but only after the electric companies had already finished their work and gone home.

Finally, their network gets repaired at approximately the rate ours grows. For instance, getting 30 miles of transmission line back into service in a week's time would not be unreasonable, just as we were able to get a 30 mile Internet link done in about a week. Each mile of power lines is much simpler than the same mile of telephone equipment, just like us.

The one trick with talking to electric companies is that they already use a lot of RF and Internet technology, and you'd need to be sure they understood where we are coming from as a voluntary provider of community networks. They use RF to send SCADA (Supervisory Control and Data Acquisition) data from remote parts of their network. Some power companies are also getting involved in Broadband over Power Lines (BPL), and might consider us a competitive threat.

An open question: what can we do for them to get them to let us ride their coat tails? How do they do command and control while rebuilding the network? Could we get them to hire us to restore Internet to their facilities, then we use those facilities as distribution points?

Power structure inside the team

Because our response to the Katrina disaster was ad-hoc, we we were building the organization while trying to get the job done. Even harder, we were building a coalition of organizations. A certain amount of time and effort was wasted on butting heads. I suspect that folks from our partner, Inveneo/AidPhone, felt that they were not respected as much as they would have liked, and that Radio Response people "took over" the project. I'm sympathetic to such a complaint, but when you are working in partnership with volunteer groups whose membership ebbs and flows, power shifts happen.

The situation would have been easier with a single project manager that was committed to working on the project from start to finish. Less effort would have been spent on transitions and personal conflicts. The results would have been strongly dependent on the effectiveness of the manager. They would need to be able to make good priority decisions, be able to attract and motivate hard workers (while weeding out tourists), and able to enforce decisions.

The right answer is probably the organized chaos we ended up with, but it is frustrating. Using seasoned people, who have been through real events and or drills, and have a pre-existing personal relationship would be helpful. That would tend to weed out the tourists, and let people become accustomed to each other's working styles before they get on scene.

Things we should have had

We needed a working printer available to team members. Printers were donated to the effort, but we did not make an effort to have one permanently connected in the office, with easy-to-access drivers, and so on. This was an oversight, and it would be easy to fix next time. Having a printer easily available would help with distributing info ("Let me just print out a copy of the current info packet before you go out on that job") and also getting equipment labeled.

We did not have outdoor 802.11b gear that we could trust. We had a big donation of Deliberant 1300A AP's, but the problem was they seemed to be returns. Some of them were already configured, and several had hardware problems. The other problem with Deliberant 1300A's is that they have no field-accessible factory reset switch. We also had a lot of different Linksys devices available to us. They occasionally exhibited bad DHCP behavior, which was difficult to diagnose, and could not be repaired remotely.

We needed to use heavy-duty UPSes at backbone sites, and light-duty ones at customer sites. Donations of both arrived when the need became clear, but it was something we should have had on hand from the beginning.

We should have made name tags/badges and business cards for ourselves. They give you credibility. Also, people tend to fall back to physical systems (scribbling on little pieces of paper) when everything else is broken. You can't give someone your e-mail address when they are sleeping in a tent and are waiting for you to make their Internet connection out of spare parts.

Naming difficulties

The names for things were fluid and non-standardized. Even our group changed names when I first got there.

It is important for people to understand the benefit of using standardized names for locations in the network. Perhaps with a better documentation and team update system, naming would have been less of a problem. Disagreements on whether it was Baypoint, or the Davis Store would quickly fade away when the morning information packet showed one name only.

Telco Resiliency (or lack thereof)

The telephone system behaved as well as could be expected. Several things came to light that I did not know before:

First, cellphone service, at two weeks after the storm, was acceptable. It was sometimes difficult to make a call, to be sure, but the signal strength was always quite high. Getting power to the cell sites quickly was obviously a priority. They also added a lot of cell sites, using Cells On Wheels (COWs) or temporary installations. SMS was much more reliable than voice, however. Getting more team members comfortable using SMS would have been time well spent.

Second, VoIP works around telephone network congestion. It was very difficult to call inbound to Ponchatoula two weeks after the storm. The recovery effort was in full swing, and the demand on the network was much higher than the small amount of rerouted capacity could deal with. The VoIP phones in Ponchatoula never exhibited the same problems the landlines did. Why? Because the last telco hop for them was in Colorado. From Colorado to Louisiana, the call traveled on the Internet. Colorado wasn't having a disaster, so it's lines were ready to take as many calls as we were receiving. This is an important feature of VoIP that's little understood and appreciated. It's unclear how a Vonage phone with a Nola prefix would have behaved; it's likely that it would be partly impacted, as the call would probably be delivered at least to the LATA before Vonage would get a chance to move the call to VoIP. A good strategy for using VoIP for disaster relief might be to have terminations in many different LATAs ready before the disaster, and then choose the terminations to use by which LATA is least affected.

Third, expedited service restoration in areas affected only by wind damage (not saltwater flooding) was quite fast. However, once you have service reconnected, don't count on it lasting. As the linemen start repairing things properly, the hacks they put in to make the EOC's phones work come back out. Basically, a phone line restored on an expedited manner is an outage waiting to happen as the restoration effort proceeds. No one wants it to work that way, but practically speaking, it happens. Service restoration in areas affected by flooding basically required the entire network, to be rebuilt, because switch boxes corroded and overhead lines were ripped to shreds. Mississippi, unlike Florida, does not get enough hurricanes to force them to move all the telephone lines below ground.

Electrical Power

You need to have lots of UPSes. You need both big ones for backbone sites (1400 VA) and little ones (500 VA) for customer sites. The purpose of the UPS is primarily to give the equipment very clean power. Customers tolerate and understand small outages due to power cuts. It is not necessary to build the network to operate with no power at all, you just need to invest in UPSes to protect the equipment to reduce support problems.

At customer sites, do not attempt to operate on generator power unless it is a huge (75 kva) generator. The huge generators at the New Waveland Cafe and Christian Life never gave us any problems, but the small generator we shared with power tools at Morrell ate our UPS. UPSes with sophisticated power monitoring and ground fault detection do not work well on generators. It would be better to have a simple, stupid UPS than one that is trying too hard to protect you from bad power. By definition, a generator delivers bad power!

At backbone sites, you need reliable power. There is no one available to put gas in the generator. We had this problem twice; initially at the Waveland water tower and later at the Cisco satellite uplink. The lesson was to use your resources to solve the power problem at the shared-infrastructure site before making the network depend on it. The EOC can help expedite temporary power poles to backbone sites (as Diamond Jim did for Port Bienville). Also, by choosing your backbone sites wisely (like next door to a police station) you can get reliable power faster without having to call in any favors: someone else will already be expediting the power for their own reasons and you can simply ride on their coattails.

Another possibility for reliable backbone power might be solar. Engineering a solar power system would be feasible (it has been done many times before in the wireless community). A solar power system would need to be built and tested before it was deployed. It would be important for such a system to be flexible, and not simply "we're powering an AP with a solar panel!" At backbone sites, there are almost always legitimate reasons for other power consumers than just the wireless gear. For instance, you sometimes need a switch to connect multiple segments of the network. And for long debugging sessions, you need a place to plug in the laptop. Sometimes an idling car and an inverter can provide laptop power, but physical limitations do not always permit using a car for AC power.

Frequent site surveys while you are choosing backbone sites might show that the power situation is improving on its own. We came back from surveying Port Bienville and asked Diamond Jim to get temporary power to the water tower. Several weeks later, we heard we had power and should go install the link. When we got there, we found a temporary cell site running off a generator, next to the temporary power pole. While we were there, the generator maintenance man came, and offered to let us plug in to the generator. We probably didn't need to wait for the temporary AC power pole; we probably could have plugged into the cell site generator the day it was installed, had we known about it.

Tools and Ladders

In general, people's personal tool sets were sufficient to get the job done. Wireless installs do not require much specialized equipment. The one critical specialized tool is a cable crimper and a high quality cable tester. It is easy to make bad CAT5 crimps, and to waste huge amounts of time debugging them. It is better to insure that everyone who is making cables has a tester and uses it religiously. This should be a requirement to join the team. When running power over Ethernet, it is especially important that you have 100% continuity and no crossed wires, to prevent burning up equipment or creating mysterious faults.

Ladders are required for almost every install. This is a problem, because people who fly in and rent a car cannot bring a ladder as carry-on luggage! You can borrow ladders fairly often, but it becomes a problem if you don't have your own sometimes. Having a couple ladders in the cache would be appropriate.

Having access to a lift truck is useful for certain installs. I don't think it would be useful enough to justify the cost and trouble of keeping one around. The Part 15 folks drove one from San Antonio all the way to Gulfport. It couldn't go over 55 MPH, so it took forever to make the trip. We found that it worked OK to get access to one when we needed it by asking around for a favor. At the Morrell Foundation install, we explained that we thought we needed a lift truck to do the install right. They used their resources (favors, cash... we don't know) to get a lift truck on site for our use.

Physical Installation Issues

We did not have enough hardware for customer premises installs. It takes a baffling assortment of poles, U-bolts, and brackets (along with all the various nuts and bolts) to do a high-quality installation. We were dead in the water until Don came along with his incredibly well stocked van. Local hardware stores only reopened in the final weeks that Don was on-site. Before that, Don's van was the hardware store.

About half of our installations were in situations where it was appropriate to use quick and dirty mounting techniques. For instance, when mounting a subscriber unit on a tent pole, the mounting hardware it comes with is enough. In other situations, we connected our device to a scavenged pipe from the debris, and then used duct tape to attach the pole to a lamp post. The other half of the installations called for more careful installs, for instance putting an SU on the back of a church, or on the chimney above Second Street Elementary.

When you are installing a subscriber unit in an area where the built-in panel antenna is sufficient, it is much easier to mount it than when it has an external antenna. It is much more common to use an external antenna in the flat topography of the Gulf states, so having the antenna poles and mounting hardware you need to use them is critical. A pole, plus the antenna, plus the SU is a fairly heavy and bulky thing. You can't simply prop it up somewhere and leave it, or it will blow down. This is where having the equipment to do a professional installation becomes really important.

As we were using donated CAT5 cable, we were limited in our choice: the blue stuff, or the gray stuff (both of which was relatively low quality interior-grade cable). We traced two failures back to using interior-grade cable in a situation that called for exterior-grade cable. One was on the Waveland water tower, where the wire was crushed by another team working on a VHF repeater. The other failure was due to burying the indoor-rated cable at the Morrell Foundation.

One great mounting system we came up with at the EOC was to take a normal antenna tripod meant to be fastened to a roof and fasten it to a T-shaped set of boards instead. Then we could weigh down the T-shaped wood with sand bags, making a very steady base that is easy to tip up and down to work on the antenna. When you put the whole assembly up on top of a flat roof, you get a significant amount of height. Sandbags are easy to come by anywhere the National Guard has been. They seem to leave them behind like cow droppings.

Rescue International had a crank-up tower to use with their VHF repeater. We put some equipment on it, but found that it was not ideal. You can only access the top of the tower when it is tipped down. That means that you can't align the antenna when it is in the normal operating position. It is also very slow to raise and lower, because it operates off of a single little electric winch. Finally, for all the problems, it doesn't get you too much height. It's clear having portable towers can be useful, but there's more research and experimenting to be done as to what the most appropriate tower is for our application. One feature that would be very nice for wireless work is the ability to turn the top of the tower without climbing it or taking it down. This would make it possible to align an antenna easily.

Public Relations

We did not put enough effort into public relations and community outreach. The project's impact was reduced as a result.

We encouraged some of our customers to advertise our services, but that didn't really happen, except at the Davis Store. I put some effort into keeping a list of public Internet sites in the flier that the EOC's press office put out, but not enough to make sure the list was always complete and up to date. I regret that I didn't make time for this type of work, but when you are a technical person pulled into management, it is easy to focus on the technical work your team is doing and help them, instead of identify the non-technical work that needs to be done that they are not doing.

Another PR activity that we utterly failed on was getting the local press to cover the story. That might have turned up local talent we could have used to improve our transition plan when we had to go home.

I felt like I put the right amount of effort into writing the blog. It served its purpose by both keeping donors up to date on our progress, informing donors and volunteers of anticipated needs, and acting as a journal for the team. Having someone assigned (or self-assigned, in my case) to this job was a worthwhile investment.

Operating the Network

As countless operations folks have commented over the years, "this network would work fine if it weren't for the users". So I discuss user support first, then network operations issues unique to the disaster context.

Customer Service

First, a note on terminology. There was some confusion in the group for a while about how to refer to the people using our network. Terminology matters, because it sets the stage for the commitments you'll be making. I made an effort to refer to our contacts at sites who were hosting us as "customers". Part of the confusion came from the fact that they were, in fact, beneficiaries. Our deal was to give them the computers and the Internet service for free in exchange for space, power, and security. I felt there was a slight undercurrent of disrespect developing for the contribution our hosts were making to the network by providing space, power, and most importantly users. After all, when you are doing your best to provide a free service and you are receiving complaints that it is too flaky, it is easy to blame the beneficiary instead of treating it as a legitimate customer service problem.

Once you perceive the problem as a customer service problem, you've got more options available to you to address it. The prime one, with a free service, is setting expectations.

In the disaster-response context, customers can accept outages, but they need to know what quality of service to expect so that they can make plans for alternatives if necessary. Obviously this is much more important when you are acting as an ISP to another aid agency than when you are simply providing public access. At most sites, however, we were acting as an ISP to the host organization, which was in turn acting as a public access site. The lesson I learned was that it was important to make commitments to the customers, but the promises might be astonishingly vague and still be of use to the customer for planning purposes. For instance, a commitment like, "we expect 24 hours more outage, but then we think we can keep it going 8 hours out of 24 because we are on a new uplink that's limited to evening hours" would be firm enough to be useful to customers.

Network Operations

Having a phone number that can be easily redirected to use as a NOC is important. We tried to do that when we first published a NOC phone number, but through some confusion ended up with a number we could not easily move. The right way to handle it is to have a number dedicated and pointed at a home base out of the area at the start of the response. The cached equipment will be labeled with the NOC number, and the team will be able to make new labels on the fly with the number on it (even if the labels are just a Sharpie on duct tape). Once the on-site crew has finished the initial bring-up and is transitioning to maintenance mode, the NOC number should be redirected to a VoIP or cell phone on the ground, in the area of operations. It is critical to give customers the shortest path to the ops team on the ground, without introducing another layer of human message passing, as we ended up doing. Making the NOC phone number be a toll-free number would be a nice touch, and it so happens that toll-free numbers are easier to redirect to arbitrary points.

Network management will be predominately via customer trouble reports. Tracking them with a structured system wasn't necessary at the scale we were working on. Instead, I simply acted as the point of contact for them and managed the todo-list in my notebook, then assigned jobs to my coworkers. This is one of many cases where we found record keeping with pen and paper was the best approach.

The issue of where the uplink to the Internet is is a big deal. It is relatively easy to run the distribution network. In our experience, running a stable long haul link to Gulfport was much more difficult than finding local satellite uplink bandwidth available for sharing. As a result, the single biggest lesson I learned was to plan to swap around uplinks. As the situation changes, the distribution network will likely stay stable, but the uplink can change. For instance you might start by using a fraction of the bandwidth from a public information office, then later get the terrestrial link working. When the terrestrial link fails, and a vendor like Cisco offers satellite for a few days, you can switch back onto that. When DSL starts coming back, you can find a church with DSL and make them the backup for your terrestrial link.

Technology to change the egress point of a network exists. The simplest way to do it is to have each egress point use a precisely the same config (for instance, "listen on 10.10.0.1, do NAT for 10.10/16"). Then administrators manually arrange for only one egress point to be active at once. When the egress point changes, all the existing NAT bindings evaporate, but customers can initiate new ones by reloading their browser page, or rebooting their VoIP phone. We used this manual system in the Radio Response network.

During an outage in the middle of the distribution network which created two separate contiguous zones, north and south, we arranged for each segment to have it's own egress point (one over satellite, one over the terrestrial link to Gulfport). Multiple egress points gave us a way to get the network up even in its partitioned state.

Technology to make switching the egress point of the network a simpler thing, and more automatic would be welcome. The obvious technology, dynamic routing (BGP, OSPF, etc) are not welcome. They would significantly elevate the barrier to entry to doing backbone maintenance. The current design only requires the same level of knowledge used in a home-networking context. By observation of the skills present in the Radio Response team (which I think it representative of team members you might expect to volunteer in the future) the lowest common denominator was the ability to work in a home network context. Out of over 150 person-days of work, we only had about 20 person-days of work from staff who would be able to work on a system with dynamic routing in the core.

What's important in the "home networking" context? First, the devices have to act like simple appliances. Configuration should be via web user interfaces. If we are to have a dynamic routing system, it must be able to work in the home networking context. As there are none commercially available that I am aware of, we'd need to implement something in preparation for our next deployment. The best platform for a dynamic routing system would probably be a re-flashed WRT54G, followed by a Linux LiveCD. The nice thing about using Linux via a LiveCD is that team members can bring the ISO file with them, or fetch it via an EOC satellite connection, then burn the CD locally. Finally they could load it into a donated PC and have a router ready to use. Fetching, burning and running a LiveCD is practical in a disaster context. Debugging BGP is not.

Auto-configuration systems would be welcome. We wasted a significant amount of time with configuration errors. It is easy to make them in the context we were working in, and it was exceptionally difficult to find them and fix them. Many of our volunteers were unfamiliar with the normal behavior of the devices, so we had problems telling the difference between "normal" bad behavior (i.e. non-critical bugs), bad hardware (which a lot of people donated to us, accidentally or otherwise), and our own configuration errors. The VoIP devices were a particularly good example of this. The Uniden phones from Nuvio were auto-configuring. Once they got an address assigned via DHCP, they would fetch a file via TFTP and auto-configure themselves. The Linksys ATAs we were using are probably capable of the same thing, but we were configuring them by hand, and we made critical configuration mistakes and spent time debugging them quite often. (However, because the Nuvio phones prohibited setting the IP address statically, they did not work once the Linksys routers exhibited a bug that stopped them from doing DHCP.)

Back Office Needs

Immersed as I was in the details on the ground, I don't know too much about what kind of "back office" support we were getting from Sharon, Mac's wife. In a perfect world, we would have a formal offer and donation tracking system; Part 15 apparently had something like this, but it lacked transparency, so it seemed like a lot of offers were lost in the cracks. For instance, they never replied to my offer to work for them, so I ended up working for Radio Response. Another back office job would be to issue receipts and thank you notes.

Scheduling volunteers takes a lot of effort. It is not a wholly back office thing, of course, since the needs are known only by those on the ground. As the on-site manager, it was very difficult for me to dedicate time to recruiting or even coordinating the arrival of people who volunteered. This is something we should have done better, but we didn't have a dedicated volunteer to assign to it.

We found out later that we should have been tracking volunteer hours. Selfishly, tracking them would have been good for our own press, so that we could show the amount of effort expended to assemble and operate the network. But more important, Hancock County can use our volunteer hours to help offset their part of the bill FEMA will be sending them for the federal aid FEMA offered. As our labor was highly skilled, each hour our volunteers logged could have offset a larger amount of money than the churches bringing teenagers down to muck out houses.

Donated Equipment Woes

Beggars can't be choosers, and we are grateful for every single donation we received. We did our best to make good use of the equipment donated to us, keeping in mind our responsibility to the donor to respect their trust in us. We received substantial contributions of both computers and network hardware. Both ended up presenting certain problems that we had to solve.

Used Computers

The used computers that were donated to us were not in working order. They were, in fact, often far from working order. The same was true for the monitors. It is unclear if the machines were broken during shipping, or if they were donated as "machines in need of some refurbishment". It's also easy to imagine that a unmarked stack of known-bad machines were accidentally donated to us. It is common at large commercial and academic sites to have computers around that were not worth fixing because they failed near the end of their planned service life. Such a pile would make a tempting donation to someone who either didn't know they were broken, or thought that donating broken hardware was better than donating nothing.

As a result of the hardware quality problems, we ended up setting up a refurbishment operation in Ponchatoula, LA. We also found we needed to set up a final testing lab in Bay St. Louis to weed out failures due to rough handling between Ponchatoula and Bay St. Louis. The amount of volunteer effort the refurbishment effort ate up was simply unbelievable. The operation in Ponchatoula consumed at least 10 person days. The testing lab in Bay St. Louis consumed an additional 3 to 5 person days. Working on PC refurbishment in the middle of a disaster area is simply not a good use of resources. It is finicky work that takes experience to do right. It is best done on large batches of similar or identical machines, not the mish-mash that we had. It takes a huge amount of space and benefits from special tools (hard drive copiers, motherboard diagnostic systems, etc).

There are commercial and not-for-profit organizations dedicated to recycling PC's, both by refurbishing them, and by recycling dead ones. Often, they get paid by large organizations to take on the liability of a large inventory of old machines, then they refurbish them, resell some, and donate the rest to projects like ours. It would have been preferable to work with a partner like that to handle the refurbishment task.

As our labor pool dwindled, I pushed the refurbishment work to the "edge" by declining donations of hardware that was not ready to use. It was a very difficult decision to make, but the results were satisfying: it kept the team in Hancock County focused on operating the network, and two significant donations of donated PC's still arrived. We passed one donation on to the iCare Village, and the other on to St. Clare's Catholic School, both locations where we had taken an active role in delivering Internet service.

Networking hardware

We received a mish-mash of used and new home networking hardware from private donors. Because networking appliances are less complicated than a PC, it was relatively easy to make use of these. However we did have a significant problem with misplacing power bricks, as staff dug through the inventory looking for pieces to solve whatever problem was at hand. This was a frustration, but it's unclear that it's a solvable one; enthusiastic volunteers probably are more valuable when they are allowed to dig through the inventory than when they are held back by careful inventory management.

One of Brent's many contributions to the project was several boxes of 1-gallon heavy-duty zip-lock bags. These allow you to save space by getting rid of all the paper and cardboard packaging, and you can see what's in them without opening them. Finally, you can handle the whole "unit" (router, Ethernet cable, and power brick, for example) with one grab.

We also received a significant amount of inventory that was seemingly new in original boxes. As we worked with it, however, it became clear that in two cases, manufacturers elected to send us refurbished stock. One of the manufacturers sent discontinued access points which were very difficult to find manuals for. I need to emphasize that we were grateful for the donations, but the fact we were not dealing with current hardware made us less efficient.

We found that the infant mortality rate of the refurbished hardware was noticeably higher than with the new hardware our team members were accustomed to using while professionally building networks. That meant we had to be very careful to test CPE's in the lab before setting off for a customer install, and to always carry a spare in case the pre-configured device failed during the install. We also had to visit sites to reset or replace devices that failed in the days after they were installed. When you are mounting a device with a lift truck that is only available one day, it must work right the first time; there's no second chance to replace it. For this very reason, we ended up "wasting" an antenna up in a tree, because it was connected to a dead access point and we had no way to get the AP and antenna back down to repair them.

The radios arrived with a mix of firmware on them. This is common, even with new hardware, but it formed one more hoop we had to jump through. Since upgrading some of the firmware on these devices is a tedious and error-prone activity, it often got skipped by team members who were in a hurry, or didn't know how to do it. Running buggy firmware on some of the radios had an unknown effect on the network, but it likely wasn't good.

Finally, one type of hardware that was donated to us lacked a factory reset feature. First, several of the devices arrived pre-configured, presumably because in the refurbishment lab the "re-flash NVRAM" step had been missed. Those were useless to us. Second, in an environment with high turnover of people with different levels of experience, a factory reset feature is required. It is all too easy for someone to set it to an incorrect IP address that the next guy can't guess, or a password no one else knows, or for the label to wash off in the rain, leaving us locked out of the device. We lost several devices to mistakes like this.

A Vision for Success

In this section, I lay out a program that would address a number of the problems I saw that made us inefficient.

We need to invest in a certain amount of preparedness. We need to prepare our network design, the equipment, and our team.

The network design we ended up with in Hancock County would likely work in other contexts, in particular the structure of the distribution network, and the numbering system we used towards the end. One particularly nice feature of it is that you could pre-configure many of the components and label them, then assemble them into a working network with minimal configuration work.

Gathering equipment for the cache will be a two-step job. First, we need to decide on the future of the network in Hancock County. If some of it will be recovered, it (and the leftovers from the original install) can form the core of the cache. Next, we need to decide on the inventory for the final cache (how much point to point hardware, how much distribution hardware, how many customer sites). Whatever is missing between what we have now and what we want to have, we'll have to get from donors. When acquiring equipment, we should do our best to avoid getting refurbished equipment again. If a manufacturer wants to offer a discount on new merchandise, that would be really helpful. But the cache needs to be made up of the exact same equipment that's available at retail, not refurbished merchandise, and not factory returns.

The equipment in the cache should be opened, tested, and pre-configured. It should be clearly labeled, including contact information that will be correct during a deployment. That means whatever phone number is on the labels must be redirectable. Labels should indicate how the equipment, in its preconfigured state, fits into the stock network design.

The cache should also have CD's for Windows, and for Linux. For Linux, it would be really nice to have a customized installation that creates a ready to use, "on site administration server". It could include MRTG and Nagios, a wiki (and a wiki-syncing system that publishes the locally maintained information out to the public network), a Samba server for file sharing, an issue tracking system, and an volunteer tracking system.

After assembling the cache, we should test our equipment and our approach by running at least one drill. The leader of the drill should also be prepared to commit to responding when the group is activated and will act as the project manager.

A drill could take place on a weekend. The team members would travel to the equipment cache location (Mac's farm in Rayville would be an ideal spot). The team would be given a scenario Saturday morning, and engage in a surveying, mapping, and planning exercise Saturday morning. By lunch, the team should have a plan in place that addresses the problem posed by the scenario. The team should practice some data management techniques at this point, perhaps using a local wiki to document the plan, using offline mapping software, etc. Also, during the planning stage, they should not use the Internet, to simulate the disaster scenario, until they find a satellite uplink to use. Next the team will choose a subset of the plan to implement on Saturday afternoon and Sunday morning. Sunday afternoon would be dedicated to cleanup.

The subset of the plan to be implemented should include, at the minimum, the following things:

Motivating team members to invest their time in such a drill would be difficult, especially for those who would incur significant expenses while traveling to the drill. It seems likely that only residents of the Gulf States would be able to make it, but that's probably as it should be. It makes sense to build this disaster response capacity in the region where it will be most useful.

The bottom line is this: our approach worked, and it worked in a very difficult situation. With some preparedness, we could be much more effective, requiring fewer volunteers to make the same impact, and doing so in a more timely way.

Future Work

As I've written up this report, I have identified projects that we should undertake as we move forward. They are listed here in no particular order.

Caches

Learn how the equipment caches for command radio systems work. The federal government maintains a cache in Idaho of ready to use radio systems. One was in use on the Waveland water tower, and it came from the National Interagency Fire Center which maintains a radio cache.

One difficulty with operating a cache with computers in it is keeping them up to date and operating correctly. With network hardware, it would probably be enough to upgrade all the devices to a standard version, then store them. Computers need to be updated to the latest patch level as soon as possible after putting them into use. They should be stored configured to enable no public services upon boot, so that the patching step can be done before the system is compromised.

Build/find an equipment and volunteer tracking system

Aleks Clark made one, but it was not successfully implemented, by which I mean it fell into disuse as soon as Aleks stopped running it.

An effective system would need disconnected operation. When you are building the Internet, an Internet based system doesn't help you any. It needs to be able to recover from loss due to flaky donation hardware, and move easily because a volunteer is ready to go home has it on his laptop.

Watchdog for home-networking routers

Home-networking class routers are cheap and easy to build into complex systems, but they have incredibly bad software. They leak resources, they mysteriously stop doing things they were doing just fine yesterday, etc. They are essentially useless in a network unless they can be automatically reset on a regular schedule. Someone needs to make a smart power brick that power cycles the router every 12 to 24 hours.

OpenWRT based image

Build a custom image for Linksys devices based on the OpenWRT toolset. This would make customer routers more manageable. Could enable auto configuration systems. Also could allow ping on the WAN interface, which would improve network manageability.

Bandwidth sharing box (and auto uplink selection)

We found that time and again we were offered the use of bandwidth (satellite, DSL, and the T3 (later T1) in Gulfport). We were not in a position to make quick use of these offers. By engineering the system ahead of time to expect those offers, we could be ready to accept them. One challenge is that usually the offers of IP uplink come with string attached; it's not really an offer of IP transit, but an offer to plug in to their network with your laptop. That means to take advantage of the network connection, you'll need to DHCP to get yourself an IP, then you'll need to somehow live with the fact that you are NAT'ed, and maybe even firewalled such that only HTTP works (and maybe only via an HTTP proxy).

Instead of envisioning our network as a distribution network downstream of a single NAT box, instead envision the network as a set of one or more distribution networks connecting in to a remote "stable NAT environment" via VPNs. Every time someone offers us some bandwidth, we'll bring a PC over, load a Linux LiveCD with our software on it, and boot it. The PC has two interfaces, an inside and an outside. The outside interface DHCP's for an address and then opens a TCP channel to the VPN server in the stable NAT environment, which is hosted in a managed datacenter far outside the disaster area. It measures the quality of the link back to the NAT server (if it can make the link at all). The inside interface runs VRRP and asks around on the network segment it is plugged into if there is anyone out there that can provide a better link to the Internet than this server can. If so, then it waits as a standby. If not, then it takes control of the router address, and all the customer routers out there that had been failing to reach their default router can now talk to the Internet via this new connection.

Later, when we get the long haul link up to a T1, the lower latency on that connection gives it a higher quality link measurement and it advertises a better link. The box that last had the link goes back to standby, and the high quality link is now the master link outbound.

Because NAT for the entire network is happening in the stable NAT environment, beyond the VPN connections, the customers never even see their connections get broken. The other nice thing about implementing NAT in the stable part of the network is that it will be ready and pre-tested before the deployment. People on the ground will only need to install pre-configured devices from the cache, and run uplink sharing boxes at the edges of the net.

No one has attempted to build a system like this yet. We don't know if it would work, and if so, how much standard software would be involved, and how much custom (and thus buggy, difficult to maintain) software would be required.

As icing on the cake, it would be neat to build bandwidth throttling into this proposed box. So, when we are offered bandwidth for our personal administrative use, we can say, "how about if we share it with the whole network". they will undoubtedly say, "no, just your laptop", then we can say, "how about if we promise it will never use more than 20% of your bandwidth?" That would be a very valuable tool to have during negotiations. And as decorations on the icing on the cake, you could have the bandwidth throttling controlled by cron, so that after work hours, we get 90% of their bandwidth.


Versions of this document


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.