1、Lessons Learned Entry: 1483Lesson Info:a71 Lesson Number: 1483a71 Lesson Date: 2004-08-23a71 Submitting Organization: JPLa71 Submitted by: Mark Boyles/ David OberhettingerSubject: MER Spirit Flash Memory Anomaly (2004) Abstract: Shortly after the commencement of science activities on Mars, an MER ro
2、ver lost the ability to execute any task that requested memory from the flight computer. The cause was incorrect configuration parameters in two operating system software modules that control the storage of files in system memory and flash memory. Seven recommendations cover enforcing design guideli
3、nes for COTS software, verifying assumptions about software behavior, maintaining a list of lower priority action items, testing flight software internal functions, creating a comprehensive suite of tests and automated analysis tools, providing downlinked data on system resources, and avoiding the p
4、roblematic file system and complex directory structure.Description of Driving Event: Shortly after the commencement of science activities on Mars, the “Spirit” rover lost the ability to execute any task that requested memory from the flight computer. The rover operated in a degraded mode until 15 da
5、ys later, when normal operations were restored and science activities resumed. The root cause of the failure was traced to incorrect configuration parameters in two operating system software modules that control the storage of files in system memory (heap) and flash memory. A parameter in the dosFsL
6、ib module permitted the unlimited consumption of system memory as the flash memory space was exhausted. A parameter in the memPartLib module was incorrectly set to suspend the execution of any task employing memory when no additional memory was available. Task suspension forces a reset of the flight
7、 computer, and it is never supposed to occur. The initial reset event was triggered by the creation of a large number of files associated with MER instrument calibration that overburdened flash memory, and then system memory. The reset did not clear flash memory because flash memory is non-volatile
8、by design. Although the reset did delete the Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-files in system memory, the total size of the file system structure is determined not by the number of current files but rather by the maximum number of file
9、s that has ever existed. Since neither memory was cleared by the initial reset, a cycle of repetitive computer resets and flight software re-initializations ensued. The effects of overburdened flash and system memory were not recognized nor tested during system level ground testing. Mission Operatio
10、ns recovered the mission by manually reallocating system memory, deleting unnecessary directories and files, and commanding the rover to create a new file system. Because revision of flight software was considered too risky, operational changes were implemented for both MER vehicles to improve overs
11、ight of rover file management. References 1. JPL Incident Surprise Anomaly Report (ISA) No. Z83174, January 29, 2004.2. Glenn Reeves, Tracy Neilson & Todd Litwin, “Mars Exploration Rover Spirit Vehicle Anomaly Report,” Jet Propulsion Laboratory Document No. D-22919, May 12, 2004.3. Mars Exploration
12、Rover Project Library, Collections 13788 and 13664.Additional Key Words: flight software architecture, flight software design, flight software requirements, file system corruption, shutdown failure, autonomous shutdown, system memory space, repetitive resets, avionics, system reboot Lesson(s) Learne
13、d: A severely compressed flight software development schedule may prevent the achievement of a full understanding of software functions. During the MER software development process there was a continuous reprioritization of activities and focus. One impact of this dynamic process was that only the h
14、ighest priority flight software issues and problems could be addressed, and memory management problems were viewed as a low risk.Recommendation(s): 1. Enforce the project-specific design guidelines for COTS software, as well as for NASA-developed software. Assure that the flight software development
15、 team reviews the basic logic and functions of commercial off-the-shelf (COTS) software, with briefings and participation by the vendor.2. Verify assumptions regarding the expected behavior of software modules. Do not use a module without detailed peer review, and assure that all design and test iss
16、ues are addressed.3. Where the software development schedule forestalls completion of lower priority action items, maintain a list of incomplete items that require resolution before final configuration of the flight software.Provided by IHSNot for ResaleNo reproduction or networking permitted withou
17、t license from IHS-,-,-4. Place high priority on completing tests to verify the execution of flight software internal functions.5. Early in the software development process, create a comprehensive suite of tests and automated analysis tools. Ensure that reporting flight computer related resource usa
18、ge is included.6. Ensure that the flight software downlinks data on system resources (such as the free system memory) so that the actual and expected behavior of the system can be compared.7. For future missions, implement a more robust version of the dosFsLib module, and/or use a different type of
19、file system and a less complex directory structure.Evidence of Recurrence Control Effectiveness: Preventive Action Notice No. Z87148 was opened by JPL on August 2, 2005 to initiate and document appropriate Laboratory-wide corrective action on the above recommendations.Documents Related to Lesson: “D
20、esign, Verification/Validation and Operations Principles for Flight Systems,“ Rev. 2, Jet Propulsion Laboratory Document No. D-17868, Section 4.11: Flight Software System Design, March 3, 2003.Mission Directorate(s): a71 Exploration Systemsa71 Sciencea71 Aeronautics ResearchAdditional Key Phrase(s):
21、 a71 Computersa71 Flight Equipmenta71 Flight Operationsa71 Ground Operationsa71 Hardwarea71 Independent Verification and Validationa71 Information Technology/Systemsa71 Payloadsa71 Risk Management/Assessmenta71 Safety & Mission Assurancea71 Softwarea71 Spacecrafta71 Test & VerificationProvided by IH
22、SNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-Additional Info: Approval Info: a71 Approval Date: 2004-09-22a71 Approval Name: Carol Dumaina71 Approval Organization: JPLa71 Approval Phone Number: 818-354-8242Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-