Optus has shed more light on last week’s mass outage that crippled its networks and affected some 10 million customers, as its executives including CEO Kelly Bayer Rosmarin ready for a Senate probe kicking off this Friday. The telco revealed a routine software upgrade by a third-party infrastructure provider was behind last Wednesday’s outage that also impacted businesses, hospitals and rail networks.
Optus CEO Kelly Bayer Rosmarin faced criticism over her response to the outage. Natalie BoogAn Optus spokesperson said that the telco had spent the past six days investigating the incident, which left some customers unable to access triple zero emergency services.
“We now know what the cause was and have taken steps to ensure it will not happen again,” the spokesperson said in a statement.“ At around 4.05am Wednesday morning, the Optus network received changes to routing information from an international peering network following a routine software upgrade.
“These routing information changes propagated through multiple layers in our network and exceeded preset safety levels on key routers which could not handle these. This resulted in those routers disconnecting from the Optus IP Core network to protect themselves.“
The restoration required a large-scale effort of the team and in some cases required Optus to reconnect or reboot routers physically, requiring the dispatch of people across a number of sites in Australia.“This is why restoration was progressive over the afternoon. Given the widespread impact of the outage, investigations into the issue took longer than we would have liked as we examined several different paths to restoration.
”The spokesperson added that Optus had changed the peering network to avoid the problem happening again, and would continue to work with international vendors and partners to increase the resilience of its network.
According to publicly listed information, the part of Optus’ network affected by last Wednesday’s outage peers with parent company Singtel’s network in Singapore; China Telecom; the US-headquartered global content delivery network Akamai; and Global Cloud Xchange, owned by Jersey-based 3i Infrastructure and formerly known as Flag Telecom.
This masthead revealed on Saturday that a senior Optus executive phoned an Akamai counterpart about 9am on the day of the outage believing Akamai may have been one of the peers that contributed to the outage. However, Akamai said on Saturday that there was “no present indication that this incident is related to an issue with Akamai”.
On Monday night it was more definitive: “Akamai did not trigger the outage,” an Akamai spokesperson said. “We stand ready to support Optus and our partners at all times.”Optus pledged to fully co-operate with the reviews into the outage being undertaken by the government and the Senate.It had previously been coy about the root cause of the outage, with CEO Kelly Bayer Rosmarin telling this masthead last week that the failure was “a network event” that “triggered a cascading failure which resulted in the shutdown of services to our customers”.
The under-fire telco is offering free data to disgruntled customers – but some commentators say it needs to do more. Wednesday’s outage not only paralysed the nation’s telecommunication networks, but prompted long queues at Telstra and Vodafone retail stores as customers looked to shift providers.
It also affected other providers using the Optus network, including Amaysim, Vaya, Aussie Broadband, Moose Mobile, Coles Mobile, Spintel, Southern Phone, Gomo and Dodo Mobile.
Narelle Clark, who formerly worked at Optus and is now chief executive of the Internet Association of Australia, said Optus should have had in place router rules that dismissed the third-party’s update that exceeded its router’s preset safety levels. She observed that she had, over the span of her career, seen many incidents where routing updates sent between external parties had crashed individual routers.
A simple typo in a “route map” when redistributed between internal networks can similarly overload routers. It was “so easy” to accidentally share a significant update that causes problems, as the default configuration in routing updates is “send all, even today”, she said.
“This is exactly why it is important to ensure filtering is in place on the receiving end, so that the offending session is dropped rather than the update being passed on at all.“At anytime you have to assume that everybody who’s sending you routing information is prone to error. That’s why you always set those sorts of protective filters in place,” Clark said.
Matt Tett, the managing director of Enex TestLab, which assesses everything from toasters and the internet to traffic systems, said: “At the end of the day Optus may need to shoulder some responsibility, rather than pointing a finger at an unnamed peering partner.
“What processes failed internally to allow this to occur?” he asked, “and if it was never registered as a potential risk or point of failure then what mitigation strategy have Optus now implemented to ensure it will not reoccur? What are the lessons learned and steps taken?”He said if responsibility sat solely with a third-party supplier, Optus would’ve named who it was, like when the ABS named IBM during its 2016 Census collection failure.
As previously reported by this masthead, Optus is offering aggrieved customers a free data top-up, but the industry watchdog says it is prepared to force the telecommunications company to offer large compensation payments (up to $100,000 for a business that could prove a loss and up to $1500 for individuals with a claim) if it refuses to settle customers’ claims.“If you can see a customer has clearly been impacted, we’d be encouraging them to really own the complaint and deal with it,” telecommunications industry ombudsman Cynthia Gebert said.
“But if we need to take a strong line with Optus to get the right outcome for their customers, that’s what we will do.”Optus’ offer was immediately slammed by Greens communications spokesman Sarah Hanson-Young, who said the “PR play” was not enough, and tech analyst Foad Fadaghi, who said “knee-jerk offers” could prompt more customers to ditch the business.
Embattled chief executive Bayer Rosmarin is due to front a Senate inquiry into the 16-hour outage, while also answering to a separate government inquiry announced by Communications Minister Michelle Rowland. The Senate inquiry kicks off this Friday. The outage came a year after Optus suffered a massive data breach, in which more than 9 million current and former customers had their records accessed.
Source: Brisbane Times – Latest News