Building Emergency-Ready AI: A Conversation with Tony Dunsworth
How open models, synthetic data, and lightweight infrastructure are enabling secure, cost-effective AI for 911 systems and public safety operations.
For engineers building AI systems in production, scale isn't always the hard part—constraints are. In public safety, the challenge isn’t how to train the biggest model, but how to make a small one reliable. You’re dealing with scarce compute, strict privacy rules, and zero tolerance for downtime—while still trying to deliver tools that make a difference.
In this conversation, Deep Engineering speaks with Tony Dunsworth, a Database Administrator and Data Scientist with the City of Alexandria, where he helps shape the data backbone of 911 emergency response systems. With a Ph.D. in data science and over 15 years of operational experience, Tony has worked across the full data lifecycle—engineering, database administration, applied analytics, and forecasting—in some of the most mission-critical environments in government.
His recent work focuses on deploying AI with limited resources: leveraging open models, running on commodity hardware, and building reliable systems without exposing sensitive public safety data. In this interview, he shares practical strategies for architecting cost-effective AI tools, using synthetic data to train and validate models securely, and maintaining performance under stress in high-stakes domains. From real-world lessons to emerging trends in multilingual AI assistance, Tony brings a rare combination of technical depth and civic-minded design.
You can watch the full conversation below—or read on for the complete transcript.
Architecting AI Solutions on a Tight Budget
1: Public safety agencies often have minimal tech budgets and few specialist staff. For example, roughly 75% of U.S. 911 centers run with only two to five people on duty and limited resources. Given these constraints, how do you identify and implement AI solutions that are both useful and affordable?
Tony Dunsworth: I think centers have to find their own specific pain points. Each one has its own type of immediate thing that they feel they need, and there are a lot of companies that provide solutions and focus on that space.
One of the biggest spaces right now is intercepting non-emergency calls. In the United States and in Canada, when you call 911 or you call your local non-emergency line, you're going to the same pool of telecommunicators. So one of the things that we're trying to do is figure out how we can take some of those calls—especially for non-emergency services—and find ways to reroute them so they don't get through. If we have fewer calls coming into the center, having fewer telecommunicators isn't as big a deal.
My dissertation, for example, identified different libraries—LLM-based or neural network-based libraries—that could be deployed with minimal technical resources. If you just knew how to write five or six lines of Python code and had someone show you how to write the right import statements, you could get those forecasts that you needed.
There are other products—like Blue Sky Analytics, which is an R-based analytics platform—that are drag-and-drop or menu-driven. You don't have to know as much about how to program in R. As long as you can tell it, “Here's the variable I want, and I want the summary of it,” it can produce that for you in two clicks. So you don't have to have the same programming logic. A lot of it is: where do you identify what you need first, and what’s out there that you can use to leverage your budget in the best way possible?
2: Can you share an example of where a specific budget-friendly analytics or ML tool can make a difference in emergency operations?
Tony Dunsworth: Blue Sky—because I've recommended it to some of my colleagues before. You can load a dataset and then, just by clicking through menus, tell it what you're looking for. You tell it, “I want a bar graph of the number of calls that we’ve taken in this time period,” like, say, “I want to know how many calls we’ve taken in a month.” You can see how that goes.
Or I can say, “How many traffic stops have we executed in the last three weeks, by week?” And then look at that even further and say, “Tell me where they’re located.” And through a couple of clicks and a little bit of looking through the tech, you can easily get those pictures. You can see a better profile of your data, so you can see what your data is telling you.
3: Over the past year or so, open-source AI models have quickly narrowed the performance gap against proprietary giants, while also offering advantages in cost control and security. To what extent has this trend influenced your approach to building AI systems? And what benefits or trade-offs have you observed while doing so?
Tony Dunsworth: I've built two AI labs. I've built one personally that I use—that I'd say is a little more forward-thinking, a little more pushing the envelope. And then I've built one in our environment at work that I'm more conservative with. But leveraging both of those gives me better control, like you said, over cost. So I'm not accidentally running up a large bill with some model provider because I didn't realize I'd run out of tokens, or I didn't realize that I had pushed the model to limits it wasn't expected to go to.
And I try to be creative. For example, I may use models in my personal lab that I can't use at the office due to some underlying security concern. But it gives me a focus on what I can work with.
The biggest trade-off is understanding that the speed is going to be a lot slower. Even with my lab having 24 gigs of RAM, or my office lab having 32 gigs of RAM, they are still noticeably slower than if I'm using an off-site LLM to do similar tasks. So you have to model your trade-off, because I have to also look at what kind of data I'm using—so that I'm not putting protected health information or criminal justice information out into an area where it doesn't belong and where it could be used for other purposes. So the on-premises local models are more appealing for me because I can do more with them—I don't have the same concern about the data going out of the networks.
4: Using open models internally can help with data privacy and cost, and these models can also be optimized to run efficiently on local hardware, letting you keep sensitive data in-house and save on cloud costs. Have you already taken advantage of such approaches in your work? For example, have you been able to fine-tune smaller open models or use lightweight ML frameworks on-premise to deploy AI without requiring too much high-end hardware or high cloud bills?
Tony Dunsworth: Yeah. Right now, I've been doing more of the work with lightweight or smaller LLM models because they're easier to get ramped up with. They're easier to go ahead and get something that I can put into a test lab and expose to people and say, “OK, let me know what this is doing for you.” Is this—both size-wise or resource-wise or token-wise?
I'm interested in working on quantizing models a little bit better so that I can try to use larger models and have a bit more experience with some of the larger models, because some of them are very promising. I have to be cognizant of my environment, knowing that I have definitely some restrictions, but I’ve also got to take and build my own knowledge gap.
I use resources like publishing—I have a subscription—so I use a lot of access to tools and materials to help. And I use other tutorials around the way so that I can get better at handling that type of programming. I'm relatively new to it, so I'm learning as I go along how to better leverage my own engineering skills to improve the performance of those LLM models so that I can do more with them.
5: Can you briefly tell us what you mean by quantizing as a way of optimizing?
Tony Dunsworth: Quantizing is a way to optimize an LLM by making it work more efficiently with fewer resources—less memory consumption. It doesn’t leverage all of the parameters at once, so it’s able to package things a little bit better. It packages your requests and the tokens a little more efficiently so that the model can work a little faster and return your responses—or return your data—a little quicker to you, so that you can be more interactive with it.
6: Even as open-source AI becomes more capable, experts still talk about challenges around scalability and enterprise-grade reliability for these tools. What kind of obstacles or failures have you encountered so far when employing open-source or low-cost solutions in a mission-critical setting?
Tony Dunsworth: The biggest challenge is resource base. Unfortunately, none of us have infinite resources—I wish we did. So running into a situation where a model bogs down because I'm pushing the model hard for something, and at the same time I'm pushing the resources the model's running on very hard—it gets frustrating.
And that's one of the reasons I'm trying my hand at learning how to quantize models in my personal laboratory, because I recognize that I've got to make my work more responsive to my users—or it’s not worth it. That’s one of the biggest frustrations: when I can get it working reasonably well for me, and then I turn it loose to a test group and it seems like it doesn't work at all for them. You feel like you've put all this time into it, and it's not doing the things you promised it would do.
The other big challenge is upkeep. That is: do I reevaluate my models? How do I retrain? Do I bring in a new model that uses fewer resources and retrain it—and go through all that process again? It’s about continuously reevaluating my work and the work of the model behind it. How is that working? How do those fit together? Am I stressing my resources? Am I actually getting the coding right to deliver what I'm intending?
And I have to make sure I move my data—and use my data—in the best way possible to augment what I can do with the model. So there are always a few challenges. It’s not really enough to build it—I’ve got to take care of it too. And I have to plan for upgrades and enhancements, because once the users get ahold of it, they think of things that I haven't thought of. They say, “Well, can you get it to do this?” And it becomes a whole iterative process: how do I take those and fold them into the next round?
7: What are some strategies to ensure that an inexpensive solution is still reliable and maintainable? For example, that a prediction model will perform accurately under real emergency conditions—or that a homegrown tool will be supported in the long term?
Tony Dunsworth: The “supported in the long term” part is an educational experience. Right now, within the city, I'm probably at the vanguard of doing things with programming AI models. We've adopted some external software that allows us to have some AI interaction, as we state it right now. But as we push into what we can do, I'm starting at the vanguard of programming and writing more of the scripts and programming.
The challenge is finding the colleagues that are interested, and getting the time to work with them to show what I've done, listen to their experiences, see what they're doing or what they're interested in—and then putting that together so we can start building a game plan that advances it a little farther. And that’s always a challenge, because we all have our day jobs, and this isn't necessarily our immediate core function.
So it’s about finding that time in between—and how do we work it into our core functions so that we can spend more time building new things that we can then show to the city and say, “These are the things we can do. And if you give us a few more resources, or more time, or more opportunity, we can do even more.”
Synthetic Data for High-Stakes Domains
8: In high-stakes domains like emergency services, real datasets—like 911 logs—are often sensitive, proprietary, or too scarce to use freely for AI development. How can synthetic data, that is, artificially generated datasets, address this problem? What strategies do you currently use to generate synthetic emergency call data that preserves the important real-world patterns?
Tony Dunsworth: I started out a long time ago working with a commercial product, and one of the things they suggested is to take a real dataset and take it apart. Because I’m also the analyst who works with that data, I take it apart and find the things I need to see in it. For example, our center is a combined center, so it answers for police service, fire service, and medical services. I look at the ratio of the calls we receive so that when I make that dataset, it reflects those ratios properly.
Or I look at the reports I currently generate that I want to transition to maybe AI generation, or I want to be able to automate in some fashion. I look at that data. So I use real data as a seed—insofar as I find details about the data that I then want to recreate. And for me, it’s a lot of statistical recreation.
The generator I’m working on now is all Python. It’s all open-source scripts. But now that I have that down fairly well, I can feed that into an AI model and say, “OK, I need you to examine this.” This is where I make a lot of progress: I set it in my local model and say, “OK, this is the generator. What can we do with the data that comes out of it?” I feed it prompts and questions and say, “Let’s massage the model a little bit more so you can generate that dataset based off these parameters.”
And I’ve gotten some really interesting results—fairly positive. I build a lot of it, and then I pass it off to my local models to refine what I’m working on and make it a little better. Between the two, it's made it so that I can generate fairly realistic datasets that I can use in many different places.
I use it in testing my analytics models because I can do an analysis of that dataset really fast—I know what I’m looking for—and then I can have my model do the same thing. I make sure that they match. If it tells me something I didn’t see, I go back and recreate it: is it there, or is it something I need to work on because maybe my model isn’t as accurate as I thought?
One of the biggest challenges is, when I first started building datasets, a lot of my elapsed times between events—because we calculate how many seconds it takes between one event and another—were all normally distributed. In real data, it may be exponentially distributed, log-normal, Poisson, or a gamma distribution. So I check. I run tests against synthetic data to say, “OK, I just want the numeric variables. I don’t want the header. I don’t want anything else—just the numeric variable.”
I’ve written a script that analyzes the distribution and compares it. Then it tells me, “Here’s the distribution, here are its details.” And I feed that back into the model and say, “OK, make sure you’re using this distribution with these parameters when you generate these elapsed times between events.”
9: Government and industry groups have started emphasizing that synthetic data is a key privacy-preserving technique. For example, a 2023 federal strategy noted that synthetic data can unlock the beneficial power of data analysis while protecting privacy. Yet it also noted adoption has been slow due to limited awareness, lack of standards, and concerns about quality. In your experience so far, what are the biggest hurdles to using synthetic data in practice?
Tony Dunsworth: The biggest hurdle was pushback from peers at first. When you start presenting synthetic data—well, the reason I started using it for presentations and for instruction was that I didn’t want to lose my audience if they recognized the events. So when I explained to them, “This is all synthetic data,” they kind of looked at it like, “Well, it looks correct.”
We started taking it apart, and that’s when we found the challenges I identified earlier—like the normal distribution showing up in places where it shouldn’t. But as I’ve reiterated and refined the generators over time—the way I do it—the closer I bring it to a real dataset, the more they see, “Oh wait a minute, can I use the generator so that I can give a presentation without exposing real data?” And I explain how to use the generator, what the data output’s going to look like, how to do things with it.
Now people are more interested. Just a couple of weeks ago, I had a company say, “Hey, have you thought about using that generator for this purpose?” I hadn’t, but I said, “You guys can clone it—it’s a public repository. Clone it and start working with it.” I keep it under an open-source license. Just give me the improvements back—that’s the last rule.
I’m starting to see more people interested in using synthetic data so we can preserve privacy while still advancing what we’re doing. A lot of what I’m seeing now is in training, in teaching, and especially in testing—because if we can generate a dataset, or even just the pieces we need, we can plug that into the application to test the output.
10: Have you faced any challenges when it comes to validating models trained on synthetic data and how they will perform on real-world data?
Tony Dunsworth: Thankfully, not yet. Actually, I’ve been pretty successful. I think the closer I’ve made my datasets to being as realistic as possible, the better I’ve seen the model performances behave.
When I first started—yeah—I got some really weird output, especially when trying to build models that look at data analytics. One of the things we want to do is improve the use of analytics in centers. I want to make it easier for analysts, or for people, to analyze data. And I found that because my models weren’t as realistic as I thought they were, the output obviously wasn’t as accurate as I’d like it to be.
So I found programmer challenges more than application challenges, because it was the old saying: garbage in, garbage out. And when I was giving it certain types of data, it gave me exactly what I would expect—but it wasn’t what I needed. Seeing how the output wasn’t what I was looking for helped me refine my models and refine my overall analytics goals.
11: Like you’ve mentioned before, when training staff or demoing new tools, you never use real 911 call data—only synthetic data—to avoid exposing sensitive information. And you've talked about how you've dealt with the challenge of skepticism within your team or the people you work with. But when it comes specifically to trainees—those just starting off—or decision-makers, how have you found their reception to the idea of using synthetic scenarios for learning?
Tony Dunsworth: For the people I'm training, once you get over the initial explanation of why we’re using synthetic data, they’re usually on board with it. Decision-makers are a little more cautious. They’ll say, “Well, I want to make sure this is as realistic as possible,” because if we’re going to do this, we need that.
Over time, as you show them what you're working on and how it comes out, they’re the ones who—once they come on board—advocate as strongly or more strongly than I do. So that initial resistance melts away. But usually, for folks—especially when I’m training analytics—for them it’s like, “OK, you want to see some of the nastiest things we can put to you, to see what you can do and how well you perform?” All I have to do is change a couple of things in the model and let them go.
And they're like, “Hey, this is great,” because now they can focus on a solution. They can focus on a specific area that they want to work on or need help with. And I can make sure the data gives them that challenge, as opposed to waiting for events to line up correctly with real data. So it becomes a lot easier, especially in training—when you're training programmers or analysts—because you can really hone in on what you want them to focus on.
12: What safeguards or best practices would you say you follow to ensure that nothing about the synthetic data inadvertently results in a privacy violation—or duplicates an actual incident in an identifiable way?
Tony Dunsworth: I do two different things. One, I follow a lot of the ethical guidelines I was taught. I was very fortunate throughout my education—through my software engineering courses, my analytics and data science courses at university—that ethics was stressed as one of the most important things we needed to focus on alongside practice. So I have very solid ethical programming training.
I studied for the Certified Ethical Hacker exam years and years ago and took the practice exams and passed them. So I have a good background in knowing how to do things in an ethical way, and I keep a hold of that.
The other thing I do is I'm careful about where I pull seed data from. I'm careful about what kind of data I'm reproducing. For example, I don't reproduce narratives that come with calls—because they’d be too close. They’d be too close to maybe a triggering event or something that could come out looking like actual incident data. So when I'm working with tabular data, I'm very narrow and specific about what I do.
I also double-check when I get output from—say—a Faker library to generate names. I go through my own list of people I know to make sure that name doesn’t show up as someone I might know. Because, yeah, John Smith is fairly standard—but if I got a “Tony Dunsworth” accidentally generated from the Faker library, I’d say, “Whoa, wait a minute—we’re going to redo this.” I don’t want to see my name in there because now it may be too close.
The other thing is, I've started reviewing different frameworks—like the NIST framework we talked about in earlier conversations—so I can make sure I'm incorporating even more safeguards and rails into my practice. That way I can guide my development processes more efficiently to maintain ethical standards, maintain privacy standards, and make sure I'm doing everything as tightly as possible.
I’ve talked to my colleagues in the private sector, and I see I’m not the only one doing these things. My colleagues in the private sector—especially in the AI sphere—are doing the same. So I feel confident that the people I work with, whether it's public, private, or within my organization, are all operating from the same place: protecting as much information as we can, leveraging the metadata as best we can, but making sure we're not exposing anything that we can’t.
Risks, Lessons, and Future Directions in Public Safety AI
13: Deploying AI in emergency services carries very unique risks—from technical failures to ethical concerns—but also promises new capabilities. Regardless, introducing AI into emergency response is undeniably a high-stakes initiative. Lives can be affected by a bad prediction or a system failure. How do you approach risk management for AI tools in such a critical domain? You already talked about NIST’s AI Risk Management Framework, specifically when it came to privacy, but what about other kinds of risks?
Tony Dunsworth: One of the biggest risk mitigations is starting out from the beginning—knowing what you want to use AI for and how you define how it’s working well. How do you handle the data in transit? What data sources are you exposing to the AI, and what is it going to do with that data? Are you feeding that data back into model training at the same time? Is that data staying within your organization, or is it leaving?
For example, even if I build an AI in my organization, if I'm leveraging an off-site model, am I feeding that data back to that model? What data am I feeding, and how do I work with that vendor to make sure the data is being used the way I intend?
I prefer to keep my training in-house. That way, if I'm training my model off my own data to improve its accuracy, that doesn't leave my organization. But that’s a personal preference—I don’t want to expose that risk.
So a lot of it comes down to: How do you define success? What are you using it for? If you're using it just to say, “Well, we're using AI,” I'm going to be the first one to raise my hand and say, “Stop. We're not going to do it just for the sake of doing it.” But if we have a defined use case, and we have a measurable success rate—so that we know, for example, we're using AI in our quality assurance—it’s enabled our QA manager to process more calls, which is improving our ability to service our community. And we can see noticeable improvements.
Now we know that the AI assistance is paying off. It's doing what we want it to do. And we know that our risk-reward of data management and governance is providing a positive result. So we continue with it.
A lot of that is what we have to do—find that trade-off. What do we want it to do? How do we define if it’s doing it? And how do we handle the data in between?
We know that data is going to contain sensitive information. In this case, we know that data contains information, but we also know the company we purchased it from showed us how they protect that data. We found that their protection was FedRAMP grade, CJIS certified—it had all the protections built in that protected us not only legally but ethically. So we knew what they were doing with our data.
That's one thing I’m very strident about: asking our vendors, “How do you protect our data? How do you use our data?” And teaching the executives how to ask those questions—what questions are important and how to ask them. Whether they’re negotiating with me or with an outside vendor, they need to know how to get to the core: How do we protect our data while still getting the most use out of it?
14: If you had to create a list of recommended steps, what would you say are the steps to test and validate a model to ensure it wouldn't fail in the middle of a 911 operation?
Tony Dunsworth: I always recommend stress testing. Get synthetic data together to test it—and then just, in the middle of your testing lab, hit it all at once. Hit it with everything you've got, all at the same time. Stress it. Make it work really hard and see how it performs.
If you see it bogging down—if there’s a slowdown—is the slowdown still acceptable? Are you still getting enough back that you can continue in time and see that it’s giving you accurate responses? That's the first step of testing.
The second step: stress it again. Make it do everything all at once. And if it continues to perform well, then you have some confidence that when you’re deploying it in a situation where the wheels have fallen off and everything happens at once, it’s still going to be reliable enough for you to continue to operate.
We know that, with the nature of our business, anything can break at any second. So as long as we know it can handle some of that stress, and we can overstress it in test and it still works well enough, then we feel confident that—even if it breaks—it may break in a way we hadn’t anticipated, but we know it can still recover and come back to service quickly.
15: Can you share a lesson from a project that didn’t go as expected—perhaps an AI model that initially underperformed or a data initiative that faced resistance?
Tony Dunsworth: Yeah. I started building an AI-assisted analytics platform, and I thought I had gotten all of my stuff right. I started in testing—the first couple of tests worked well—and then I fed it a dataset that was a little more challenging than the ones I had used before, and it threw up. For lack of a better descriptor, it just said, “I got nothing.” It worked for 20 minutes on what I thought was a basic problem and then said, “I cannot create a solution.”
Unfortunately, I did that stress test in front of management.
Yeah. Lesson learned. I went back. I’m back at the drawing board. I threw everything out and started redoing it, because I found it was also overly complex. What seemed natural and normal to me—when I asked people to poke at it and look into it—they couldn’t get from Step 1 to Step 3, because Step 2 wasn’t obvious.
So I took all that feedback in and said, “We’re going to throw this codebase away. We’re going to start with a new codebase, but we’re going to start with engineering it in a different way.” And I’m in the middle of that process now. I’m really hopeful. I’m really optimistic that I have the workflow defined a lot better.
So what I’m looking at is not one global solution like I wanted, but micro-solutions that will do different things inside the same framework. And I think that will work a lot better.
16: If you had to pin down the most important lesson you learned from that experience, what would it be—and how has it impacted your practice moving forward?
Tony Dunsworth: Better user feedback.
Because I’m a trained analyst—I have multiple degrees in it—I assumed I could develop an analytics flow that would work for everybody. I learned really quickly: it worked well for me, but it didn’t work well for my target audience. And I failed to take my own target audience into account. I assumed they’d look at my workflow and say, “Oh, that’s the right one.” But that wasn’t what they wanted. What they wanted was a different workflow.
I learned that lesson the hard way—that the workflow didn’t work for them. So back to the drawing board.
Now, I reach out more often to power users—because really, they’re my user base—and ask: “What do you want to see? Do you want to see this? Will this work for you?” I pitch ideas, get ideas back, and that’s building a better workflow.
Ultimately, that’s the core of it. It’s the same in any software engineering: if your users aren’t going to be comfortable using it, it doesn’t matter how many bells and whistles it has. It doesn’t matter how great it works. If they’re not going to use it—it doesn’t work.
17: Looking ahead, what developments are you personally most excited about at the intersection of AI and public safety?
Tony Dunsworth: I’m excited about a couple different things. First and foremost, there’s a lot of focus in the vendor community on bringing AI to non-emergency calls so that we can intercept them and take care of them—without the telecommunicators manning our emergency lines being the ones who have to handle them.
Now we’re starting to integrate more of our—here in the United States—311 program, where it’s things like: there’s a pothole in the neighborhood, or the trash pickup didn’t happen. In the past, people would call those into the emergency lines. Now we’re fusing those together so that if someone calls in, it can be routed to the right place—whether it’s 311 staff or even a different department in the government structure that’s more suited to handle it.
So we’re providing more efficient service to the public and reducing the volume that our call-takers handle. It really is a community win-win, because now we can get those services out faster.
The other thing is: my first degree many years ago was in linguistics. So I’m excited about finding better solutions to handle multiple languages. In the city I work in, we publish our city documentation in four languages: English, Spanish, Amharic, and I believe it’s Sudanese Arabic. Because those are our major population groups.
In another community, maybe they need Vietnamese or Korean or Chinese—or French, in Canada. If we can improve the quality and speed of translation, we won’t have to hold to grab a language line and an interpreter. We can get that mobile response out to someone in an emergency several seconds faster.
That can save a life. That can get someone to medical treatment faster. That can calm a situation down faster. If we can do that, we’re benefiting our community and taking stress off of our telecommunicators at the same time.
Those are the most promising things to me—how we can do those things more efficiently, more quickly, and with greater benefit to our communities.