When software designers get it fundamentally wrong.

Post Reply
Message
Author
User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

When software designers get it fundamentally wrong.

#1 Post by TheGreenGoblin » Fri Apr 10, 2020 2:28 pm

Yesterday I was perusing some notes I had made some years ago as part of a conference about software used to automatically test software based systems and the fact that such testing problems can be NP incomplete and thus not practically verified in this way, even by software (as opposed NP complete problems which might be usefully verified this way). I noted that I had used the example of the failure of the Ariane 5 rocket as part of the launch of the Cluster constellation of satellites, which I had somewhat crudely entitled 'clusterf@ck', due to a number of issues in the spacecraft's software design that ultimately led to an integer overflow that caused an inertial navigation system exception and reset that caused the spacecraft to start gimballing wildly and break up, resulting in the manual destruction of the craft by the range safety officer (or automatically by an on board system which itself would have needed to be software tested/verified). Anyway I was wondering about similar issues that have been encountered in the aviation sphere and was apt to look at the record and wondered if any system designers, ATC personnel or pilots here, or anybody else who is interested in this kind of thing, had any stories to tell that are relevant to these kinds of issues.

In my ongoing search, I found this interesting case dating back to 2015.
A software vulnerability in Boeing's new 787 Dreamliner jet has the potential to cause pilots to lose control of the aircraft, possibly in mid-flight, Federal Aviation Administration officials warned airlines recently.

The bug—which is either a classic integer overflow or one very much resembling it—resides in one of the electrical systems responsible for generating power, according to memo the FAA issued last week. The vulnerability, which Boeing reported to the FAA, is triggered when a generator has been running continuously for a little more than eight months. As a result, FAA officials have adopted a new airworthiness directive (AD) that airlines will be required to follow, at least until the underlying flaw is fixed.

"This AD was prompted by the determination that a Model 787 airplane that has been powered continuously for 248 days can lose all alternating current (AC) electrical power due to the generator control units (GCUs) simultaneously going into failsafe mode," the memo stated. "This condition is caused by a software counter internal to the GCUs that will overflow after 248 days of continuous power. We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane."
787 Dreamliner integer overflow underflow problem
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

Boac
Chief Pilot
Chief Pilot
Posts: 17255
Joined: Fri Aug 28, 2015 5:12 pm
Location: Here

Re: When software designers get it fundamentally wrong.

#2 Post by Boac » Fri Apr 10, 2020 2:51 pm

In the great scheme of things, having a software problem with a generator that has been "running continuously for a little more than eight months" is an issue that can be ignored. Who on earth would do that? Crew would be well out of hours................ =))

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#3 Post by TheGreenGoblin » Fri Apr 10, 2020 3:03 pm

Boac wrote:
Fri Apr 10, 2020 2:51 pm
In the great scheme of things, having a software problem with a generator that has been "running continuously for a little more than eight months" is an issue that can be ignored. Who on earth would do that? Crew would be well out of hours................ =))
:)

You'd think so wouldn't you but stranger things have happened! I quote the ATC failure in South California on the 14th September 2004. This occurred because the system hadn't been reset for over a month, whereas an APU running for 8 months is highly unlikely I grant you.
A bug in a Microsoft system compounded by human error was ultimately responsible for a three-hour radio breakdown that left hundreds of aircraft aloft without guidance Tuesday last week, according to a report in the LA Times.

Nearly all of Southern California's airports were shut down, and five incidents where aircraft broke separation guidelines were reported. In one case, a pilot had to take evasive action.

The newspaper said that a Microsoft-based replacement for an older Unix system needed to be reset every thirty days 'to prevent data overload', as a result of problems found when the system was first rolled out. However, a technician failed to perform the reset at the right time, and an internal clock within the system subsequently shut it down. A back-up system also failed.

Cause of the SNAFU
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

Boac
Chief Pilot
Chief Pilot
Posts: 17255
Joined: Fri Aug 28, 2015 5:12 pm
Location: Here

Re: When software designers get it fundamentally wrong.

#4 Post by Boac » Fri Apr 10, 2020 3:18 pm

....and there is also something else SERIOUSLY wrong with the design of the aeroplane if an APU generator trip can shut down all the electrics?

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#5 Post by TheGreenGoblin » Fri Apr 10, 2020 3:19 pm

Boac wrote:
Fri Apr 10, 2020 3:18 pm
....and there is also something else SERIOUSLY wrong with the design of the aeroplane if an APU generator trip can shut down all the electrics?
+1
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

Boac
Chief Pilot
Chief Pilot
Posts: 17255
Joined: Fri Aug 28, 2015 5:12 pm
Location: Here

Re: When software designers get it fundamentally wrong.

#6 Post by Boac » Fri Apr 10, 2020 4:22 pm

It saddens me to call BS, TGG, on a venerable long-term O-N 'er, but I must: Some questions:
When/why would an APU or APU generator be run for 8 months non-stop?
What is the service interval for a 787 APU?
How often is a battery check or change done on 787?
How does OPS ensure the aircraft never shuts down at a noise-sensitive airfield with APU restrictions?
Did you mistime this post by 29 days? :))

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#7 Post by TheGreenGoblin » Fri Apr 10, 2020 4:28 pm

Boac wrote:
Fri Apr 10, 2020 4:22 pm
It saddens me to call BS, TGG, on a venerable long-term O-N 'er, but I must: Some questions:
When/why would an APU or APU generator be run for 8 months non-stop?
What is the service interval for a 787 APU?
How often is a battery check or change done on 787?
How does OPS ensure the aircraft never shuts down at a noise-sensitive airfield with APU restrictions?
Did you mistime this post by 29 days? :))
I did say this Boac... =))
whereas an APU running for 8 months is highly unlikely I grant you.
This fault would have been found by testing the code itself , most likely using a software based code testing tool! The fact that this kind of potential overflow error was found (even in this most unlikely set of circumstances), raises questions about the whether similar errors have been overlooked in the code elsewhere. The FAA called this one out and they were right to do so.

Code quality and safety uber alles! :-B

Ve will NOT tolerate errors Boac, even though zo ve have an error prone human being in ze cockpit... ;)))
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

G-CPTN
Chief Pilot
Chief Pilot
Posts: 7644
Joined: Sun Aug 05, 2018 11:22 pm
Location: Tynedale
Gender:
Age: 79

Re: When software designers get it fundamentally wrong.

#8 Post by G-CPTN » Fri Apr 10, 2020 4:46 pm

AIUI, under certain conditions (such as consecutive flights with cleaning in between and maintenance) can result in 24-7 powering.
It is usual for commercial airliners to spend weeks or more continuously powered on as crews change at airports, or ground power is plugged in overnight while cleaners and maintainers do their thing.

Boac
Chief Pilot
Chief Pilot
Posts: 17255
Joined: Fri Aug 28, 2015 5:12 pm
Location: Here

Re: When software designers get it fundamentally wrong.

#9 Post by Boac » Fri Apr 10, 2020 5:01 pm

'or ground power is plugged in overnight' - ferzackerly

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#10 Post by TheGreenGoblin » Fri Apr 10, 2020 5:02 pm

Lest Boac thinks I was yanking his proverbial control column...
AGENCY:
Federal Aviation Administration (FAA), DOT.

ACTION:
Final rule; request for comments.

SUMMARY:
We are adopting a new airworthiness directive (AD) for all The Boeing Company Model 787 airplanes. This AD requires a repetitive maintenance task for electrical power deactivation on Model 787 airplanes. This AD was prompted by the determination that a Model 787 airplane that has been powered continuously for 248 days can lose all alternating current (AC) electrical power due to the generator control units (GCUs) simultaneously going into failsafe mode. This condition is caused by a software counter internal to the GCUs that will overflow after 248 days of continuous power. We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane.


DATES:
This AD is effective May 1, 2015.

The Director of the Federal Register approved the incorporation by reference of certain publications listed in this AD as of May 1, 2015.

We must receive comments on this AD by June 15, 2015.
The context to this AD is all these glitches came out of the woodwork after the battery fires associated with the Dreamliner when it was in early service and they did a forensic investigation of the hardware and software and were worried about the quality issues that were coming out...

I suspect the GCU counter may have continued counting if the system was permanently powered up, either by the APU or batteries or the external power source.

Now perhaps I can persuade Boac that salmon live in trees and eat pencils... =))
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

Boac
Chief Pilot
Chief Pilot
Posts: 17255
Joined: Fri Aug 28, 2015 5:12 pm
Location: Here

Re: When software designers get it fundamentally wrong.

#11 Post by Boac » Fri Apr 10, 2020 5:09 pm

"Now perhaps I can persuade Boac that salmon live in trees and eat pencils" - I knew that, 'cos I know everything about everything, remember? (remind you of anyone?). OK - I relent and apologise. BS Charge dismissed. No leg-irons or Iron Maiden for you. (Why do I now expect another music video................... :)) )

Why the FAA went to the trouble to produce that AD I cannot fathom, especially when we look at how they treated the Max!

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#12 Post by TheGreenGoblin » Fri Apr 10, 2020 5:14 pm

Boac wrote:
Fri Apr 10, 2020 5:09 pm
"Now perhaps I can persuade Boac that salmon live in trees and eat pencils" - I knew that, 'cos I know everything about everything, remember? (remind you of anyone?). OK - I relent and apologise. BS Charge dismissed. No leg-irons or Iron Maiden for you. (Why do I now expect another music video................... :)) )

Why the FAA went to the trouble to produce that AD I cannot fathom, especially when we look at how they treated the Max!
;)))

You do tempt with the Iron Maiden mind...
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#13 Post by TheGreenGoblin » Fri Apr 10, 2020 5:33 pm

Another AD with a partially similar potential root cause of failure (and more likely) than the previous issue dissected above.
DEPARTMENT OF TRANSPORTATION
Federal Aviation Administration
14 CFR Part 39
[Docket No. FAA-2020-0205; Product Identifier 2020-NM-024-AD; Amendment 39-19883; AD
2020-06-14]
RIN 2120-AA64
Airworthiness Directives; The Boeing Company Airplanes
AGENCY: Federal Aviation Administration (FAA), DOT.
ACTION: Final rule; request for comments.
––––––––––––––––––––––––––––––––––
SUMMARY: The FAA is adopting a new airworthiness directive (AD) for all The Boeing Company
Model 787-8, 787-9, and 787-10 airplanes. This AD requires repetitive cycling of the airplane
electrical power. This AD was prompted by a report that the stale-data monitoring function of the
common core system (CCS) may be lost when continuously powered on for 51 days. This could lead
to undetected or unannunciated loss of common data network (CDN) message age validation,
combined with a CDN switch failure. The FAA is issuing this AD to address the unsafe condition on
these products.
The US Federal Aviation Administration has ordered Boeing 787 operators to switch their aircraft off and on every 51 days to prevent what it called "several potentially catastrophic failure scenarios" – including the crashing of onboard network switches.

The airworthiness directive, due to be enforced from later this month, orders airlines to power-cycle their B787s before the aircraft reaches the specified days of continuous power-on operation.

The power cycling is needed to prevent stale data from populating the aircraft's systems, a problem that has occurred on different 787 systems in the past.

According to the directive itself, if the aircraft is powered on for more than 51 days this can lead to "display of misleading data" to the pilots, with that data including airspeed, attitude, altitude and engine operating indications. On top of all that, the stall warning horn and overspeed horn also stop working.

This alarming-sounding situation comes about because, for reasons the directive did not go into, the 787's common core system (CCS) stops filtering out stale data from key flight control displays. That stale data-monitoring function going down in turn "could lead to undetected or unannunciated loss of common data network (CDN) message age validation, combined with a CDN switch failure".
https://www.theregister.co.uk/2020/04/0 ... tale_data/

The sang froid of these Boeing Captains is marvelous to behold...
The CDN is a Boeing avionics term for the 787's internal Ethernet-based network. It is built to a slightly more stringent aviation-specific standard than common-or-garden Ethernet, that standard being called ARINC 664. More about ARINC 664 can be read here.

https://www.aim-online.com/products-ove ... -tutorial/

Airline pilots were sanguine about the implications of the failures when El Reg asked a handful about the directive. One told us: "Loss of airspeed data combined with engine instrument malfunctions isn't unheard of," adding that there wasn't really enough information in the doc to decide whether or not the described failure would be truly catastrophic. Besides, he said, the backup speed and attitude instruments are – for obvious reasons – completely separate from the main displays.
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

User avatar
boing
Chief Pilot
Chief Pilot
Posts: 2714
Joined: Thu Aug 27, 2015 6:32 am
Location: Beautful Oregon USA
Gender:
Age: 77

Re: When software designers get it fundamentally wrong.

#14 Post by boing » Sat Apr 11, 2020 2:49 am

Slight divergence because I realize that this would actually be a crew error but I could never really understand the logic of this design. In the early days of INS there was the "Jumbo Graveyard".

The early Omegas could only hold a limited number of waypoints, 8 if I remember correctly. The problem occurred if you had forgotten to enter the required new waypoints when the box reached waypoint 8. The INS was, at least initially, programmed so that if the new waypoint had not been entered the aircraft would turn to zero N/S and zero E/W and merrily proceed on its way without further warnings. Since this was, of course, off the coast of Africa and the most likely aircraft to face the error was the 747 Jumbo the zero/zero destination became known as the "Jumbo Graveyard".

I do not know if any crew actually headed to Africa but eventually the system was re-programmed so that the aircraft simply maintained present heading if no new waypoint had been entered.

.
the dreamers of the day are dangerous men, for they may act on their dreams with open eyes, to make them possible.

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#15 Post by TheGreenGoblin » Sat Apr 11, 2020 11:41 am

boing wrote:
Sat Apr 11, 2020 2:49 am
Slight divergence because I realize that this would actually be a crew error but I could never really understand the logic of this design. In the early days of INS there was the "Jumbo Graveyard".

The early Omegas could only hold a limited number of waypoints, 8 if I remember correctly. The problem occurred if you had forgotten to enter the required new waypoints when the box reached waypoint 8. The INS was, at least initially, programmed so that if the new waypoint had not been entered the aircraft would turn to zero N/S and zero E/W and merrily proceed on its way without further warnings. Since this was, of course, off the coast of Africa and the most likely aircraft to face the error was the 747 Jumbo the zero/zero destination became known as the "Jumbo Graveyard".

I do not know if any crew actually headed to Africa but eventually the system was re-programmed so that the aircraft simply maintained present heading if no new waypoint had been entered.
Did the Omega give any other indication that it had run out of way points save for heading off on its zero sum way? A trumpeting elephant alarm perhaps...


I suppose the first space/aviation software overflow issue that really made it into the headlines was the Apollo 11's Infamous Landing Error Code 1202



Don Eyles

Jack Garman
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#16 Post by TheGreenGoblin » Sat Apr 11, 2020 11:56 am

A trumpeting elephant alarm perhaps...
Or something like this. No space for feeble chimes in aerospace...



Never fly through the centre line again. Only the fully established will hear the silence. Better than a biting dog to a pilot and therefore far more useful than any FO. Avoid that hypoxic torpor. Be situationally and altititudinally woke! Fly right and enjoy the silence.

Don't die sucker... stay awake...

[Thread drift off/]
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#17 Post by TheGreenGoblin » Sun Apr 12, 2020 10:10 pm

The F-35, all 3 variants, and their ancillary maintenance system/spares tracking system are still riddled with software bugs/faults/issues.
Anthony Capaccio at Bloomberg got an early look at the report from Robert Behler, Director, Operational Test & Evaluation. Behler’s report cites 873 software flaws related to the F-35 plus 13 “must-fix” issues, including problems with the 25-millimeter gun on the U.S. Air Force F-35A model.

With regard to the gun, Behler’s office “considers the accuracy, as installed, unacceptable” due to “misalignments” in the gun’s mount, Capaccio quoted the report as saying.

None of the problems in the DOT&E report actually are new, according to Capaccio, but they do underscore the difficulty Lockheed Martin and the F-35’s sponsor governments have had in developing the plane.

Trade publication Defense News in early June 2019 revealed lingering flaws in the F-35’s design. At high angles of attack, the F-35B and the carrier-compatible F-35C have a tendency to depart from controlled flight, Defense News reported.

“Specifically, the Marine short-takeoff-and-vertical-landing variant and the Navy’s carrier-launched version become difficult to control when the aircraft is operating above a 20-degree angle of attack, which is the angle created by the oncoming air and the leading edge of the wing,” Defense News explained.

The Pentagon wants Lockheed to fix the 13 most-critical problems before the company starts work on the JSF’s latest Block 4 software.
https://nationalinterest.org/blog/buzz/ ... ues-118651

Continuous Capability Development and Delivery process
Some of the aircraft’s lingering problems appear to be connected to the F-35 Joint Program Office and Lockheed Martin’s recently adopted Continuous Capability Development and Delivery process, a method of delivering software fixes and additional functions every six months. The process is modeled on a Silicon Valley method of delivering bite-sized chunks of code changes to customers called agile software development.

Lockheed Martin was openly optimistic in 2019 about the agile method’s ability to turn around the F-35’s troublesome software, which totals more than 8 million lines of code. However, DOT&E says the concept has been problematic.

“Software changes, intended to introduce new capabilities or fix deficiencies, often introduced stability problems and adversely affected other functionality,” says the weapons evaluator’s report. “Due to these inefficiencies, along with a large amount of planned new capabilities, DOT&E considers the program’s current Revision 13 master schedule to be high risk.”
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#18 Post by TheGreenGoblin » Sun Apr 12, 2020 10:30 pm

The F-35 aircraft remains woefully unprepared against malware infections and other cyber-attacks, according to POGO – the respected non-profit watchdog Project on Government Oversight.
Dubbed the most expensive weapon system in history, the beleaguered fighter jet is plagued with problems, including a lack of protection against software nasties that would cripple its critical systems, it is claimed. Cybersecurity protections are particularly important because the aircraft relies so heavily on a network of automated systems to operate properly, we're told.

"The fully integrated nature of all F-35 systems makes cybersecurity more essential than for any other aircraft," POGO's Dan Grazier noted this month, having obtained documentation that the jet has low "fully mission capable" rates. That's military jargon meaning it's rarely fully ready for combat.

"Legacy aircraft already in service are equipped with software-enabled subsystems, and while a hacker could penetrate the GPS system in a legacy system, because the subsystems are not fully integrated, a hacker could not also access the communications system, for example," Grazier continued. "The F-35 is inherently far more vulnerable."
To give you an idea of how the interconnected nature of the F-35's computer systems is a massive vulnerability in of itself: separate subsystems, such as the Active Electronically Scanned Array radar, Distributed Aperture System, and the Communications, Navigation, and Identification Avionics System, all share data. Thus, the GAO's auditors warned, just compromising one of these components could bring down the others.

“A successful attack on one of the systems the weapon depends on can potentially limit the weapon’s effectiveness, prevent it from achieving its mission, or even cause physical damage and loss of life,” said the GAO team.
Read the full text here...
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#19 Post by TheGreenGoblin » Sun Apr 12, 2020 10:54 pm

The F-22 has had it share of software travails as well.
In December 2005 the first F-22 Raptor fighter aircraft came into service. To quote the United States Air Force (USAF), ‘The F-22 is a first-of-a-kind multi-mission fighter aircraft that combines stealth, supercruise, advanced maneuverability and integrated avionics to make it the world’s most capable combat aircraft.’ But, to be fair, this was taken from the budget statement in which the air force was trying to justify the expense. The USAF ran the numbers and estimated that, by 2009, the cost of getting each F-22 in the air was $150,389,000. The F-22 certainly did have some really integrated avionics. In older aircraft, the pilot would be physically flying the plane with controls that used cables to raise and lower flaps, and so on. Not the F-22. Everything is done by computer. How else can you get advanced manoeuvrability and capable combat? Computers are the way forward. But, like planes, computers are all well and good – until they crash. In February 2007 six F-22s were flying from Hawaii to Japan when all their systems crashed at once. All navigation systems went offline, the fuel systems went and even some of the communication systems were out. This was not triggered by an enemy attack or clever sabotage. The aircraft had merely flown over the International Date Line. Everyone wants midday to be roughly when the sun is directly overhead: the moment when that part of the Earth is pointing straight at the sun. The Earth spins towards the east, so, when it is midday for you, everywhere to the east has already had midday (and has now overshot the sun), while everywhere to the west is waiting for their turn in the noon sun. This is why, as you move east, each time zone increases by an hour (or so). But this has to stop eventually; you can’t go forward in time constantly while travelling east. If you were to magically lap the planet at a super-fast rate, you wouldn’t get back to where you started and find it was a complete day in the future. At some point, the end of one day has to meet, well, the day before it. By stepping over the International Date Line, you go back (or forward) a complete day in the calendar. If you’re finding it hard to get your head around this, you’re not alone. The International Date Line causes all sorts of confusion and whoever was programming the F-22 must have struggled to work it out. The US Air Force has not confirmed what went wrong (only that it was fixed within forty-eight hours), but it seems that time suddenly jumped by a day and the plane freaked out and decided that shutting everything down was the best course of action. Mid-flight attempts to restart the system proved unsuccessful so, while the planes could still fly, the pilots couldn’t navigate. The planes had to limp home by following their nearby refuelling aircraft. Modern fighter jet or ancient Roman rulers: sooner or later, time catches up with everyone.
A Comedy of Maths Errors by Matt Parker
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

User avatar
TheGreenGoblin
Chief Pilot
Chief Pilot
Posts: 17596
Joined: Thu Aug 08, 2019 11:02 pm
Location: With the Water People near Trappist-1

Re: When software designers get it fundamentally wrong.

#20 Post by TheGreenGoblin » Mon Apr 13, 2020 6:26 am

TheGreenGoblin wrote:
Sat Apr 11, 2020 11:41 am


I suppose the first space/aviation software overflow issue that really made it into the headlines was the Apollo 11's Infamous Landing Error Code 1202
More from Don Eyles

Then we heard the words "program alarm". In Cambridge we looked at each other. Onboard, Aldrin saw the PROG light go on and the display switch back to Verb 06 Noun 63. He quickly keyed in Verb 5 Noun 9. Alarm code 1202 appeared on the DSKY. This was an alarm issued when the computer was overloaded — when it had more work to do than it had time for. In Cambridge the word went around, "Executive alarm, no core sets". Then Armstrong said, with an edge, "Give us a reading on the 1202 program alarm"[10].

From here events moved very quickly, too fast for us to have any input from Cambridge. It was up to Mission Control in Houston. The story of what happened there has often been told — how it fell to a 26-year-old mission control guidance officer named Steve Bales to say "go" or "abort". Bales had participated in a recent review of LGC alarms that had deemed 1202 a "go" unless it occurred too often or the trajectory deviated. He was supported by Jack Garman of NASA and Russ Larson of MIT in the back room. Garman said, "go". Larson gave a thumbs-up. (He later said he was too scared to form words.) So Bales answered, "go", Flight Director Gene Krantz said "go", and capsule communicator Charlie Duke passed it up to the crew. At MIT, where we realized that something mysterious was draining time from the computer, we were barely breathing.
https://www.doneyles.com/LM/Tales.html


Steve Bales - https://en.m.wikipedia.org/wiki/Steve_Bales

Russ Larson - https://wehackthemoon.com/bios/russell-larson
Though you remain
Convinced
"To be alive
You must have somewhere
To go
Your destination remains
Elusive."

Post Reply