Deep Engineering #16: Designing Systems for Longevity with Alexander Kushnir

Practical takeaways on HAL/RTOS, SoMs, CI-gated releases, and crypto agility—grounded in current FDA and NIST guidance.

Divya Anne Selvaraj

and

Alexander Kushnir

Sep 04, 2025

Welcome to the sixteenth issue of Deep Engineering.

This edition was created in collaboration with Alexander Kushnir, principal software engineer at Johnson & Johnson MedTech. Kushnir specialises in electrophysiology systems and brings roughly two decades across medical devices, industrial controllers, and networked embedded platforms. He has worked on motion-control firmware, network switches, VoIP, and medical software; his core expertise spans embedded Linux, modern C++, cross-platform development, and HW/SW integration. He has also built and led a two-day workshop on CMake.

In today’s feature, we use Kushnir’s field experience to examine what it takes to design software that must survive regulatory cycles, hardware obsolescence, and engineering turnover. You’ll find practical, transferable habits: clear boundaries between certified and updatable code, disciplined OTA with rollback, modularity via HAL/RTOS, SoM-based hardware strategies, CI-gated changes, and a quality culture that resists bit-rot.

Designing Systems for Longevity in Safety-Critical Embedded Domains with Alexander Kushnir

The FDA’s June 2025 guidance on medical device cybersecurity explicitly requires manufacturers to provide a “reasonable assurance” of ongoing security throughout a device’s lifecycle, including verifiable secure update mechanisms and a documented process for managing vulnerabilities over at least a decade. That means devices must support signed and authenticated firmware updates, with audit logs and the ability to rollback if needed, for 10+ years after release. Regulators now expect evidence in premarket submissions that companies have planned for ten-year supportability via patch management and cybersecurity monitoring. This shift, reinforced by new laws like the U.S. FD&C Act §524B, reinforces that designing for longevity is a fundamental safety requirement, not just a business choice.

As Alexander Kushnir observes:

“Any update that could affect safety or compliance still requires formal review and, if necessary, re-certification. In practice, we (can) minimize disruption by:
Separating safety-critical functions into a stable, validated firmware baseline that is rarely touched.
Isolating updatable modules (non-critical logic, UI features, analytics, etc.) so they can evolve without impacting certified components.
Using risk-based change management to decide when an update is worth the cost of triggering the regulatory process — for example, prioritizing security patches and critical bug fixes, while bundling minor enhancements into larger, less frequent releases.
In this way, the need to keep embedded software up to date becomes operationally similar to maintaining conventional PC or cloud-based software, but with the extra discipline required for regulated environments.”

Tooling and Platform Hygiene as Survival Strategies

Meeting a 10+ year lifespan requires disciplined platform and toolchain management. Here are three strategies to achieve this:

Build on long-term-supported software baselines: For example, the Yocto Project’s Scarthgap 5.0 release (April 2024) is a Long-Term Support (LTS) Linux baseline that will receive updates until 2028. Yocto 5.0.11 was released in mid-2025, showing that even 18 months in, the LTS branch is actively maintained with security patches and kernel updates. By starting with such an LTS OS (or an RTOS with vendor commitment to longevity), teams can ensure they can pull in fixes for years. It is much easier to support a product until 2030 when your base OS isn’t stuck back in 2021.
Keep your toolchains current in a controlled way. If your firmware is built with a 2017-era compiler, you may struggle to find expertise or patches by 2027. We see the risks in tooling evolution: Arm’s Development Studio 2025.0 (released July 2025) introduced a next-generation LLVM-based compiler toolchain and provided final updates to the previous Arm Compiler 6. Projects that linger on older toolchain versions risk missing out on optimizations for new processor cores and eventually losing support entirely. The new Arm toolchain, for example, adds full support for the latest ARMv9.6-A and other CPU IP, while Arm Compiler 6 is now end-of-life.
Make incremental updates as a habit. The CMake build system had version 4.1.0 released in August 2025, followed by a 4.1.1 patch just weeks later fixing regressions (test timeouts, Ninja build issues, etc.). Embracing minor updates regularly – rather than a massive jump every few years – keeps your development environment fresh and prevents “version rot” that can make future maintenance or security fixes painfully expensive.

Kushnir also recommends that teams:

“Use industry-standard and up-to-date tools: Even though it is not a hard requirement, tools keep evolving, and if you fall too far behind, then when you eventually need to investigate an issue in the field, you may find yourself forced to use newer tools you’ve never worked with—leaving you at a disadvantage.”

Long-lived projects should treat toolchain, OS, and library updates as part of the regular engineering cycle (with CI tests to catch issues early). It’s far easier to sustain a system when its underpinnings aren’t frozen in time.

Secure Updates as a First-Class Function

Robust OTA (over-the-air) update capability is now a first-class feature of any long-lived system. In fact, the FDA’s new rules require manufacturers to implement secure update mechanisms and to have a postmarket vulnerability management plan. This aligns with the best practices many teams have already learned through hard experience.

A recent OTA checklist from Memfault highlights the critical elements of a sustainable update framework: signed and validated firmware images, explicit hardware/firmware compatibility checks before deployment, support for atomic rollback if an update fails, staged rollout mechanisms, and fleet monitoring to catch issues early. This means:

Updates should be fail-safe: your device should never attempt an update if, say, it has low battery or insufficient flash space; it should verify the new firmware’s authenticity and integrity (e.g. cryptographic signature and hash) before swapping; and it must never “brick” itself – a watchdog or bootloader should detect a bad update and revert to a known-good version.
Deployments should be phased: Roll out to a small canary group (perhaps 5% of devices) and pause to observe telemetry for crashes or anomalies. Only when the update proves stable do you ramp to wider percentages. This staged rollout with live monitoring (looking at crash reports, reboot rates, memory usage, etc.) ensures that if an issue appears, it affects at most a sliver of the fleet and can be fixed in the next update.

Implementing such an update system though is not trivial. Fortunately, tooling is rising to meet the challenge. Cloud IoT platforms and services can manage device cohorts and track update metrics. Even development tools are integrating quality gates to enforce update readiness: for example, GitHub in August 2025 made repository rulesets and merge queues generally available, allowing teams to require that all code passes tests and security checks before it’s merged and queued for release.

As Kushnir emphasizes

“CI tests are a must. I would even say that every pull request should be gated, i.e. only if the pull request passes all the tests, should it be merged. Many developers don’t like writing tests, but as a matter of fact, the tests protect them, and provide developers the confidence to make major changes without breaking things.”

Long-lived systems will face newfound vulnerabilities (e.g. a cryptographic library weakness discovered in 2028) and changing requirements, so shipping without a safe update path is simply not acceptable in 2025.

Architecture Choices to Mitigate Obsolescence

A recurring theme in longevity engineering is abstraction with discipline. Systems that endure tend to be built with clear hardware abstraction layers and modular components, so that sub-parts can be replaced or updated with minimal ripple effect. Two tactics to mitigate obselescence:

Using a standardized Real-Time Operating System (RTOS) or Hardware Abstraction Layer (HAL)

Using standardized RTOS and HAL can allow you to swap out one microcontroller for another years later if the original becomes obsolete, without rewriting the entire codebase. Kushnir recommends:

Abstract all you can. Whether one is taking the OOP approach… or a procedural one, abstraction and modularity must be applied. Hardware Abstraction Layer (HAL) is an excellent example of abstraction, as the application logic is not aware of the hardware (for example Linux paradigm took abstraction to the edge - everything is a file, whether it is a network connection, hardware device, or a real file - the user reads from and writes to a file).

Recently the Zephyr RTOS (an open-source, cross-platform RTOS) has gained massive adoption and ecosystem support. Zephyr’s latest 4.2 release (July 2025) was its largest ever, setting a contribution record with 810 developers contributing and adding support for 96 new boards and 22 shields in one release.

This kind of broad hardware support and vendor backing (Silicon Labs, NXP, Intel, etc. are all involved) means that a product built on Zephyr can more easily move to new hardware in the future – the OS abstracts the MCU details and already supports a wide range of CPUs from ARM Cortex-M to RISC-V and even Renesas RX. The modularity and community maintenance of such an RTOS reduce the burden on your team to support every low-level detail for years.

In fact, companies traditionally known for proprietary environments are embracing this: IAR Systems (known for their commercial compilers/IDEs) announced in 2025 full production-grade support for Zephyr RTOS in their toolchain, including optimized compilers, debugging, and even functional safety certification for using Zephyr in ISO 26262 or IEC 62304 contexts. This convergence of open-source OS with professional tools indicates that using open ecosystems no longer means forsaking quality or support. On the contrary, it can lower integration costs and prolong maintainability, since you benefit from upstream improvements and a larger talent pool familiar with the technology.

Leveraging system-on-modules (SoMs) or long-life silicon platforms

Another architectural tactic to fight obsolescence is to leverage system-on-modules (SoMs) or long-life silicon platforms. According to Kushnir:

“When designing a hardware platform, the engineer must ensure that the components he chooses have a “long-term support”. Having said that, I prefer to use off-the-shelf System-on-Module (SOM) integrated on a custom board, rather than developing a board with the same CPU (or FPGA) and having to address most basic interfaces such as memory or a flash storage during the board bring-up. This reduces the complexity of board bring-up and makes it easier to handle hardware obsolescence, because the SOM vendor typically manages low-level design, interface validation, and long-term component sourcing.”

Many embedded suppliers also offer product longevity programs – for example, NXP’s Product Longevity program guarantees certain microcontrollers will be manufactured for 10 or 15 years. Choosing components under such programs, or designing your PCB to accommodate multiple pin-compatible variants, can save you from a costly redesign when one chip goes end-of-life.

In a training webinar on designing for longevity, NXP experts John Terpening and Jim Hoffmann highlighted the importance of SoC selection and supply chain planning: pick parts with known long-term availability, and avoid “cutting-edge” components that might be discontinued in a few years if they don’t find a broad market. They also suggest maximizing the bill-of-materials lifecycle while minimizing sustainment activity – i.e. use parts and modules that will stay around, even if it means slightly less cutting-edge tech, so that you aren’t forced into constant component churn.

Abstraction helps here too: if you design with a clean separation between hardware-specific code (device drivers, BSPs) and your business logic, swapping out a sensor or radio module for a newer one (or a second source) is far less painful.

Longevity in hardware means expecting that something will become unavailable or outdated – be it the processor, an OS kernel, a radio chipset, or an encryption algorithm – and structuring your system so you can adapt. Loose coupling, standard interfaces, and broad community support all increase your resilience to inevitable change.

Crypto Agility and Security for Decades

Designing for longevity also means anticipating the evolution of security threats and cryptography. What is secure today may be woefully insufficient in 10-15 years (consider that 15 years ago, SHA-1 and 1024-bit RSA were considered fine, which is no longer true). As Kushnir says:

“To keep a system secure for 10+ years, the update mechanism itself must be secure: signed and verified updates, encrypted transport, and a rollback option in case an update fails.”

An illustrative development in 2025 was NIST’s finalization of its Lightweight Cryptography standard, built on the Ascon algorithm family. This standard (NIST SP 800-232) specifically targets constrained devices – IoT sensors, medical implants, wireless tags – providing modern authenticated encryption and hashing that are efficient enough for low-power microcontrollers. Ascon was chosen for its robustness (withstood years of public cryptanalysis) and its ease of implementing countermeasures against side-channel attacks.

The emergence of standards like this indicates three best practices:

Devices expected to function into the 2030s should use crypto algorithms designed for longevity (both in terms of security margin and performance). Engineering teams should design their security architecture to be crypto-agile – for example, abstracting the cryptographic library so that if a flaw is found in an algorithm or a new standard emerges, you can update algorithms without rewriting the whole system.
Long-lived systems must account for secure boot and decommissioning. The FDA guidelines require plans for securely decommissioning devices (e.g. wiping sensitive data when a device is retired), and for transferring responsibility if a device outlives its official support period.
Design with security over the full life – from initial deployment to end-of-life. Incorporating hardware roots of trust, updatable cryptographic primitives, and fail-safe modes (where a device can be constrained or isolated if it’s past support and thus more vulnerable) are increasingly seen as best practices. As NIST’s Kerry McKay put it, lightweight cryptography standards are there to ensure even the smallest devices can “protect the information” they handle over their lifespan, without exhausting their limited CPU or battery resources.

Longevity requires a security roadmap. Engineering teams must choose the right algorithms, build in the means to rotate keys or ciphers, and plan for secure updates.

Fostering a Culture of Maintainability and Quality

Perhaps the most important ingredient for longevity is culture. As Kushnir says:

“Code review isn’t done just because “that’s the rule”; it’s done to catch defects and improve design. The same goes for documentation and tests—they’re tools, not rituals.”

Technical strategies will fall short if the engineering culture doesn’t value quality and continuous improvement. NASA’s approach to software is a case in point: to support missions that last decades (or human-rated vehicles with zero tolerance for failure), NASA instills a culture of exhaustive testing, peer review, and learning from mistakes. In a 2025 talk on how NASA tests their software for the Space Shuttle and Orion programs, NASA’s Darrel Raines explained that they use 4 to 7 levels of testing for each change, with independent verification teams and software quality assurance groups dedicated to finding potential defects. They deliberately bring in fresh eyes – separate teams that try to validate the software – to avoid blind spots.

This multi-layer test strategy, combined with strict coding standards and oversight, is how they achieved the Shuttle’s famously low defect rate (arguably one of the most reliable software systems ever built). Raines also emphasized using a diversity of tools and methods to catch errors: simulations, hardware-in-the-loop, static analysis, formal methods, etc., each can find different classes of bugs.

The philosophy is to never trust a single approach – if something is mission-critical, test it in many ways and assume nothing. For engineers in other domains, NASA’s example underscores that preventing and catching bugs early is far cheaper and safer than troubleshooting in the field. Thus, investing in robust test automation, code reviews, static analysis, and even techniques like fuzz testing or model checking for critical modules can pay immense dividends over a product’s life.

Modern programming practices and language features can also help here. The upcoming C++26 standard, for example, is adding design by contract capabilities (preconditions, postconditions, and assertions built into the language) as well as compile-time checks and safer standard library features. Embracing such features in embedded code can catch errors early or make the code’s assumptions explicit, easing maintenance. C++26 also brings improvements like bounds-checked iterators, nullptr validation, and pattern matching, which aim to reduce common sources of bugs and make code more self-documenting. Using these language improvements (once available in compilers) or similar features in other languages (Ada/SPARK contracts, Rust’s ownership model, etc.) can significantly improve software maintainability. They enable what one might call self-testing code – code that fails loudly if misused rather than silently corrupting data.

Finally, maintainability culture means continuous refactoring and knowledge sharing. A codebase is not a fixed asset; it’s a living one. Teams that succeed over the long run treat technical debt with the same seriousness as feature development. They allocate time in each release cycle to update stale documentation, improve code clarity, and refactor overly complex areas – especially if those areas hinder testing or pose a risk for future bugs. Long-serving products often see multiple generations of engineers; investing in clear code and design pays off when new eyes must understand the system in 5 or 8 years. In regulated industries, it also pays off during audits and recertifications if the design rationale is well-documented.

💡Key Takeaways for Engineering Teams

Plan for 10+ years from the start – choose components and OS/tool versions with known long-term support and map out an update strategy (both technical and operational) for the entire lifecycle.
Make secure, fail-safe updates a core feature – implement signed OTA updates with rollback, test them rigorously in real-world scenarios, and monitor your fleet continuously.
Stay current incrementally – continuously integrate minor updates to tools and dependencies so that you’re never more than one version behind, allowing you to benefit from fixes and avoid legacy lock-in.
Abstract and modularize – design hardware and software boundaries that allow component swaps; use standardized modules, HALs, and widely adopted RTOS or middleware to reduce the cost of change.
Invest in quality and testing – adopt a “test-first” culture with multiple layers of verification, and use the best tools available (from static analyzers to CI pipelines) to catch issues early.
Think ahead in security – be crypto-agile and design with future threats in mind (e.g. quantum-resistant crypto, secure element hardware, etc.), so your device isn’t frozen with today’s defenses decades later.

In the end, the true measure of success for safety-critical products is not just how well they work at launch, but how reliably and safely they continue to operate years and millions of hours later, when initial developers have moved on and the world around them has evolved. Achieving that kind of lasting dependability is difficult – but it has become the baseline expectation.

🧠Expert Insight

Read the complete Q&A with Alexander Kushnir for real-world insights:

Interviews

Designing for Decades: A Conversation with Alexander Kushnir on Longevity, Maintainability, and Embedded Systems at Scale

Divya Anne Selvaraj and Alexander Kushnir

Aug 12

Designing for Decades: A Conversation with Alexander Kushnir on Longevity, Maintainability, and Embedded Systems at Scale

In safety-critical domains, code longevity isn’t a nice-to-have—it’s a baseline constraint. Software must coexist with hardware for ten years or more, while withstanding evolving standards, team turnover, and limited upgrade paths. In this Deep Engineering Q&A, we ask industry veteran

Read full story

Share Packt Deep Engineering

🛠️Tool of the Week

Zephyr RTOS 4.2 – Open-source real-time OS for production-grade embedded devices

Zephyr is a small, modular RTOS used in commercial products across Arm, RISC-V, and other MCUs. The 4.2 release (July 19, 2025) expands hardware coverage and streamlines portability work for long-lived systems.

Highlights:

Portability by design: Standardized HAL/OS abstraction and Devicetree let you retarget boards without invasive code changes.
Broader hardware support: 4.2 adds 96 new boards and 22 shields, plus a migration guide for upgrades.
Production tooling: IAR’s latest Arm toolchain ships production-ready Zephyr support with RTOS-aware debugging and code analysis, signaling mature vendor backing.
Ecosystem momentum: Active release cadence and documented upgrade paths reduce sustainment risk over multi-year lifecycles.

Learn more about Zephyr

📎Tech Briefs

Cybersecurity in Medical Devices: Quality System Considerations and Content of Premarket Submissions by U.S. Food and Drug Administration (Final Guidance): Clarifies lifecycle cybersecurity obligations for medical devices, including signed/verified updates, documented vulnerability handling, and evidence of long-term supportability.
NIST Finalizes ‘Lightweight Cryptography’ Standard to Protect Small Devices by NIST (News): Finalizes Ascon-based authenticated encryption and hashing for constrained devices, enabling modern, efficient crypto for IoT and medical implants.
Zephyr RTOS 4.2 Now Available: Introduces Renesas RX Support, USB Video Class, and More by Benjamin Cabé (Zephyr Project): Announces a record community release—810 contributors, 96 new boards, 22 shields—underscoring RTOS standardisation and portability momentum.
Product update: Arm Development Studio 2025.0 now available by Stephen Theobald (Arm Community): Debuts the next-generation Arm Toolchain for Embedded Professional and latest core support, guiding teams toward controlled compiler upgrades for long-lived products.
Improved repository creation generally available, plus ruleset & insights improvements by GitHub (Changelog): Expands repository rulesets and merge queues so organizations can enforce CI/security checks at merge time for safer, auditable releases.

That’s all for today. Thank you for reading this issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next. Do take a moment to fill out this short survey we run monthly—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.

We’ll be back next week with more expert-led content.

Stay awesome,
Divya Anne Selvaraj
Editor-in-Chief, Deep Engineering

If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want to advertise with us.

Refer a friend

A guest post by