1、Lessons Learned Entry: 1463Lesson Info:a71 Lesson Number: 1463a71 Lesson Date: 2003-08-31a71 Submitting Organization: JSCa71 Submitted by: David LengyelSubject: Accident Investigations/Information Technology (IT) Support Requirements for Major Mishap Investigation Abstract: An in-place, operational
2、web-based Mishap Investigation Support System on hot standby, and a designated management team ready to support operational requirements and customization are needed to support accident investigation/information technology requirements for a major mishap.Description of Driving Event: There was no si
3、ngle NASA system on February 1 that could meet the requirements set for the Columbia Accident Investigation Board (CAIB) and Columbia Task Force (CTF). The Process Based Mission Assurance - Enhanced Security (PBMA-ES) system, using a proven COTS “engine“ was rapidly deployed within a dedicated stand
4、-alone hardware/software environment to provide a high level of security, including 2-factor strong user authentication. The system provided immediate secure, web-based operability of key functionality to support a mobile, in-the-field, investigation activity including, action tracking, calendar, do
5、cument management, and database capabilities. Neither the CAIB, the CTF, nor the PBMA management team could reasonably anticipate or identify all of the requirements at the beginning of the investigation (both functional and interface) needed to support the investigation. The requirements “critical
6、needs“ continued to evolve from the initial request for “action tracking,” i.e., customization of JSC form 564, to web links, to full text search, to advanced library support capabilities. In addition, particularly early-on, there existed a continuous, often tension-filled negotiation process betwee
7、n the needs of the multi-headed customer(s) (CAIB) and the IT-security community. The absence of clear IT security policies exacerbated the requirements change process. Finally, the degree and complexity of library management requirements, as driven by Department of Justice (DOJ), did not emerge unt
8、il a month after implementation of the PBMA-ES. Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-This real-time customization with “on-the-fly” changes constantly in-work, resulted in a major hw/sw configuration management challenge, organizational po
9、licy disconnects, and ultimately frustrated customers who were looking for immediate implementation of their desired changes. Lesson(s) Learned: Real-time, crisis-mode systems engineering can never capture all potential future requirements. You do the best that you can and make the best decisions yo
10、u can with the information you have at the time. The overarching lesson learned is that NASA needs to have a web-based, Mishap Investigation Support System (MISS) in-place and operational, “locked and loaded“ on hot standby to support major mishaps. The system must fulfill the requirement set identi
11、fied in the next section. It is equally important to have a designated (hot standby) management team ready to support the myriad of operational requirements and customization needs that one can expect to emerge. In the final analysis, the experienced PBMA management team provided the necessary suppo
12、rt to implement the dynamically evolving CAIB requirements. Recommendation(s): Establish a NASA baseline information technology Mishap Investigation Requirements Set (MIRS) to include the following elements: a71 MISS Management Team: Experience dictates that a management team be available to quickly
13、 respond to a major mishap and be able to quickly implement mishap-specific plans and procedures such as: a72 Configuration managementa72 IT Security Plan modificationsa72 On-site user orientation/traininga72 24x7 technical support during the early phase of an investigationa72 Data backup and archiv
14、al activitiesa72 Work group administrationa71 Financial/resource managementa71 Sustaining engineering and operationsa71 Collaborative Environment: Use a web-based, fully integrated, COTS knowledge management system application to provide immediate field support for the investigation team (i.e., memb
15、ers, contacts, calendars, task management, web-links, etc.). A COTS product will ensure best-available, market-driven capabilities are available to support mishap investigation Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-teams.a71 Independence/Po
16、rtability: Expect to confront the issue of accident board independence and ensure that the MISS is established as a stand-alone hardware/software infrastructure behind a NASA firewall but portable so that it can be easily moved to any NASA Center.a71 IT Security: A well-defined and documented NASA I
17、T security policy applicable to all systems is essential. Recommend baselining NASA system host-center firewalls, secure socket layer encryption, and a secure but usable single factor user-authentication policy. A generic but tailorable IT security plan should be in place for the hot standby system.
18、a71 Hardware/Software Infrastructure: Use a commercial-like, multiple-web server front-end to ensure a high level of reliability and operability.a71 Interfaces: Identify mandatory interfaces for compatibility with other support tool sets. Establish interface control agreements as part of MISS develo
19、pment. Interface control testing should verify interoperability.a71 Library Services / Massive File Manipulation and Data Archiving: Co-locate the web servers with standalone archive/library servers if it is deemed necessary to perform frequent massive file upload, download, and manipulation. It is
20、inefficient and time consuming to transfer massive files via the internet.a71 Expect to Customize/Change Management: Experience suggests that no system can meet all potential requirements that may evolve during an investigation. Anticipating the need to customize, the MISS management team must have
21、in-place: a72 a baseline configuration management plan and processa72 a change process with clear roles and responsibilities for requirements management and change approvala72 a dedicated beta-test server to conduct verification testing for hw/sw changesEvidence of Recurrence Control Effectiveness:
22、TBD NASA ResponseDocuments Related to Lesson: Agency Contingency Action Plan for Space Flight OperationsMission Directorate(s): Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-a71 Space Operationsa71 Exploration SystemsAdditional Key Phrase(s): a71 A
23、ccident Investigationa71 Administration/Organizationa71 Computersa71 Configuration Managementa71 Information Technology/Systemsa71 NASA Standardsa71 Policy & Planninga71 Safety & Mission Assurancea71 SecurityAdditional Info: Approval Info: a71 Approval Date: 2004-06-16a71 Approval Name: Ronald Montaguea71 Approval Organization: JSCa71 Approval Phone Number: 281-483-8576Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-