Driving innovation in radio astronomy

Podcast host Rebecca Parsons and Prem Chandrasekaran | Podcast guest Justin Jose and Dr. Neeraj Gupta

September 07, 2023 | 38 min 45 sec

Listen on these platforms

Brief summary

Radio astronomy — a subfield of astronomy that studies the sky using radio frequencies — is data-intensive. That poses a challenge for radio astronomers: building and then communicating scientific insights requires significant processing and analytical work. Thoughtworks has been working with Dr. Neeraj Gupta from the Inter-University Centre for Astronomy and Astrophysics (IUCAA) in India to develop solutions to these challenges, including a data processing pipeline, a collaborative platform for analysis and a digital catalog for publishing and communicating research.

In this episode of the Technology Podcast Dr. Gupta joins Justin Jose of Thoughtworks India's Engineering for Research (E4R) team as they speak to hosts Rebecca Parsons and Prem Chandrasekaran about their work together. Dr. Gupta explains the benefits of Thoughtworks work from an astronomer perspective, while Justin highlights the challenges of building software solutions in a highly specialized domain.

Episode transcript

[MUSIC PLAYING]

Rebecca Parsons: Hello, everybody. My name is Rebecca Parsons. I'm one of the co-hosts for the ThoughtWorks technology podcast. And I'm joined by my colleague Prem.

Prem Chandrasekaran: Hello, everyone. I'm very glad to be here. Looking forward to a great conversation today.

Rebecca: And we are joined by two guests. First, I have my other colleagues at ThoughtWorks. Justin, would you please introduce yourself?

Justin Jose: Hello, everyone. I'm Justin and I work with E4R, and I work as a lead developer in E4R.

Rebecca: And joining us from the project itself, Neeraj, would you like to introduce yourself, please?

Dr. Neeraj Gupta: Hi, everyone. I am Neeraj Gupta. I am an astronomer, and I work at the Inter-University Center for Astronomy and Astrophysics. It's a research institute based in India.

Rebecca: So let's start with introducing the problem. What is it that you're trying to solve and how have you approached this? And let's start with what's the science problem that we're trying to solve, because the principle of E4R, really is — to remind people — that we are trying to bring, we as ThoughtWorks are trying to, bring our software development capabilities and our technology capabilities to assist scientists like Neeraj in solving science problems. So, Neeraj, can you tell us a little bit about what the problem is we're trying to solve?

Neeraj: Yeah, sure. So the central problem that we are trying to address is to understand how galaxies form and evolve. Now what are galaxies? When we think about them, galaxies are an ensemble of stars, but they also contain a lot of gas. And these stars form from this gas. So one thing when we try to understand galaxies is to understand how galaxies can acquire this gas from outside and can then convert it into stars. But this is not so simple because at the center of these galaxies, there is also usually a massive black hole sitting and this black hole at times can emit huge amounts of energy. And this energy can be so large, so huge that it can actually outshine all the stars that are put together in the galaxy. And it can also emit a lot of materials into the galaxy. And all this feedback that is coming from black holes can actually disrupt the process through which galaxies can actually convert this gas that they have acquired painstakingly into stars. So understanding galaxy evolution, that is, their formation and how they evolve over a period of time, is essentially to understand this interplay between gas, stars, and the feedback that is coming from the central black hole. And now this is a very complex problem, as you can imagine. And there are several long-standing questions related to this. So what we are doing now is to use the most sensitive radio telescope in South Africa called the MeerKAT telescope, and we are obtaining a lot of data of millions of galaxies in the sky. Essentially, we are taking observations of 1,600 hours. This will lead to 1.6 petabytes of data. And all this is being gathered over a period of about three years, and that's something which we have to process and try to understand how these galaxies form and evolve.

Rebecca: Excellent. Justin, can you talk us through the problem from the technology perspective? What have we been doing to try to support this work?

Justin: OK. So from a technology perspective, the problem is twofold. One is the data, the amount of data that we would be looking at now, as Neeraj had mentioned earlier. So MeerKAT is a state-of-the-art telescope and it is going to generate a lot of data. And we, at the point when the collaboration started, we had no benchmark to understand how to process this data — this would be a large volume of data. So from a data perspective, this was a very big problem statement to handle. And that was the first unknown. So what happens when you get a large volume of data on the scale of petabytes, how do you process it? The second one was mostly from the domain aspect itself. So how do you build a robust pipeline to support science? Which from a technology standpoint — I mean, if I talk from my perspective as a technologist, astrophysics is a domain which I'm not so comfortable with. So how do we design a software where we understand the domain while also keeping in mind the unknowns which are brought in via both the data and the domain itself? The volume of the data and the domain. So that was the challenge from the technology perspective, mostly delivering a robust system which will enable the science to progress further given the unknown that we are handling such a scenario for the first time.

Rebecca: And how did we get involved? How did this collaboration start? What was the impetus for this collaboration?

Neeraj: So I can tell my perspective on this and then Justin can add to it. So when we started thinking about executing this project in 2015 and '16 — that's quite a while ago — just from the simple back-of-the-envelope calculations, it was clear to us that if we want to process this 1.6 petabytes of data using traditional methods of those times which is like an astronomer will take data from telescope, load it onto their high-end workstation, and then look at it in their own time and process it, it would take close to 20 to 30 years of time. So it was obviously clear that we cannot use just those traditional methods which have all the work for decades to work on a project of this scale. So it was obvious to us that we need to bring in the best software engineering practices to solve this problem. And then the second aspect was also clear to us that — and that's coming from the complexity of the data itself — that we cannot use off-the-shelf tools even in the software domain. So we need to work with the best in the field in every domain. And that's where we started discussing with Thoughtworks and we agreed upon that, OK, we will start working on this.

Prem: So can you elaborate a little bit on what kind of system you built?

Justin: So the entire phase of the project, I mean, the entire collaboration was done in phase-wise manner. So the first was practically prototyping it: coming up and understanding can this even be done. So prototype it, proof of concept. Get a proof of concept out and that then becomes your base benchmark to start with. That was the initial phase. And with that prototype being a success, knowing that yes, something like this can be built, the unknown just being the data volume — how the data volume is going to affect — but just from a science perspective the proof of concept is done. Then we entered into phase two, where we actually dwelled into building the concrete system which would help us progress through this journey where we can start identifying what to observe, when to observe and what to do once the data comes in, while keeping areas, checkpoints that if something fails, how do we fall back, what are the checkpoints to make sure that the processing goes in smoothly? That was the second phase. And then came the third phase where now we have the data, we have processed, we have some sort of images generated out of it, how do we make it publicly accessible, move ahead with the science. The science is done to be consumed by people at the end, so how do we make it available to people at large? So that’s the overall picture of how the overall collaboration was structured.

Rebecca: So can you tell us a little bit about what the flow of information is and a bit more specifically? I mean, so far we know there's lots of data and that's a very general problem. But can you tell us a little bit more? Okay, we've got the observations, what kinds of things are you looking for in the data and what are the properties that you're trying to maintain when we're processing the data other than the fact that yes, there's a lot of it and we need to make sure we process it efficiently?

Neeraj: So to understand this, we need to take a step back and just look at the complexity of the data. So I said, we have got this lot of data, 1.6 petabytes of data and what we are trying to do is to look at the sky. And it's a radio telescope that we are using, which means that we want to look at the sky at radio wavelengths. So the act of seeing, which we are so used to, with our eyes when we look up and try to look at the stars or moon, it all happens through this lens which is sitting in our eye and we take it for granted. Mathematically, if we look at it, it's actually doing a process which we call a Fourier transform. It's actually a Fourier transform which is sitting in our eye. But this lens which is so readily available at optical wavelengths to us through nature does not function the same way at radio wavelengths. So what we do actually at radio wavelengths is to build a telescope which actually consists of a large number of dishes or like antennas. So in the case of MeerKAT, it is 64 antennas which are spread over an area of eight kilometers. And what we do is that we take voltages from each of these antennas and combine them in pair. For example, if there are three antennas, we will combine the signals from one and two, two and three, and three and one, like this we will do for 64. And then a data stream is flowing to us that is like every few seconds. And then it is coming as a function of frequency that is like 32 sampled frequencies coming to us.

So that's the complex data that we have. And what we do with this data is that we take it and do a Fourier transform of it, which is equivalent to the process of actually making a lens at the radio wavelength, right? Once we make this lens through our computer and electronic data processing, then it actually is equivalent to producing an image, very similar to what we would see. Now, at the processing level the complexity comes from two levels: one is the large volume of this itself which needs to pass through our systems to be able to process reliably and efficiently. The data is organized in these three dimensions which are frequency, time, and antenna separations, such that the different data processing steps actually cannot be partitioned in the same way. So we cannot actually say that, Okay, I am going to take this data partitioning strategy and this is what I can apply to all these different steps of data processing, and then this can actually lead to the — it gives me the image out of it. So we need a system which can actually work through these different stages with different partitioning or solving strategies. That is one level of complexity that comes in that this pipeline has to tackle. And second is related to the nature of the project and that is that building this pipeline is going to take time because it's complex. And the requirements of this pipeline should come from the telescope performance because it's supposed to process the data from this telescope. But the telescope is not yet built. And we cannot wait till the telescope is built and we know its properties completely. So we have to start building the system well before the telescope has come into place and we understand it completely.

So Justin was talking about that — we spend a lot of time in prototyping. So we build a system which can actually cater to a large number of data processing or solving scenarios and then we test it against a variety of telescopes that were available at that time. And we left those options open so that as soon as the telescope comes online, we process the real data through it. We can quickly make those choices, optimize our pipeline, and start processing this data so that we are ready for it when it comes. Because we have to remember that 1.6 petabytes of data is actually coming over a period of three years and our system has to be efficient that we can process actually 1.6 petabytes in about three to four years of time. We cannot take five years or 10 years for that. So to be able to meet that, the system has to be prototyped and ready well before we actually start even our processing.

Prem: So again, trying to understand here. So even before the telescopes actually started transmitting real data, the prototype actually created some synthetic data to simulate what the actual telescopes will send you, and then now you saw the results of that so that when the telescopes did become ready, you're ready to go as far as being able to process it correctly. Is that a fair representation of how we solved the problem?

Neeraj: So we use simulated data. That's correct. But we also used real data from the best telescopes of that time because we also wanted to actually ensure that our pipeline is responsive to the real-world scenario. Because, with simulations, there can be limitations that you can actually get a result that you have put in the simulation. You cannot be 100% sure that you actually are testing against a real scenario.

Rebecca: And I understand that even with this initial prototype, we actually made a real scientific discovery. So can you briefly tell us that discovery, and then Justin I'll ask you a little bit about how that came about.

Neeraj: Yeah. So we were describing the prototyping stage and I mentioned that we also tested against real data from the best telescopes of that time. So we were looking at these galaxies, which are fairly distant from us, and we ended up detecting traces of hydroxyl molecules in the galaxy. And when I was describing this central problem, which is to understand how galaxies form and evolve, in that problem space the cold phase of gas actually, called atomic and molecular phase of gas, it is as cold as like 10 or 20 kelvin, actually acquires very special place. It's central to understanding how gas can be converted into stars. And this hydroxyl ion is actually a very nice tracer of it. And very few of these have been detected in the past. So through this prototyping phase itself we were able to detect one such case, which was very exciting — We got it published in a very prestigious journal, that's one aspect of it, but at the same time, it also gave us confidence that when we are going to do this large-scale survey, we will surely be able to detect many more of these in the sky.

Rebecca: So, Justin, that must have been pretty exciting to have the work that you did result in a scientific discovery. How did you approach this from the perspective of a technologist in terms of setting up this pipeline in a way to assist Neeraj in his research?

Justin: So there is a funny story behind that discovery itself. So the team back then while they were working on this pipeline, they were like, yes, let's test it out. It's a queuing phase of the pipeline, let's do it, and we have the data. And while testing they're like, we have this plot generated from the pipeline and it might seem like an anomaly. It might be because the pipeline is not configured properly. And I think that is the point when they got in touch with Neeraj inside — maybe you can just validate if the pipeline is running properly? And voila, we have a discovery during the queuing phase of the pipeline itself. So that is one incident how the discovery came into being. And I remember a comment back then. I mean, this is the phase when I was really joining into the project back then. So the developer back then told me that the statement was: if it were done via script back then, a basic script, this might have been skipped because the script would not have been robust enough to capture it the way the pipeline was designed and the pipeline was able to capture it. So from a point of what you say, a confidence, it told us that yes, we can — I mean, even though the pipeline is still in a phase where we exactly do not know that it might work properly or not, we are at a state where we can confidently say that yes, we can proceed; we have the confidence that it will perform properly. That was in the prototyping, the output of the prototype. Now, taking inspiration or keeping that as the baseline of development, I think we developed four parallel pipelines taking that as, what do you say, the baseline and building on top of that. So we have the base systems as the same. And we change the subsystem so that they do different operations in between. So in total, we built four data processing pipelines: one for the different stages in the data preparation itself. And it was all — and what the researcher at the end, what Dr. Neeraj could do was chain these pipelines together to achieve or do the science which he intended to do with it. So yeah, that was the way in which the overall — what you said — the development was followed through.

Prem: Very exciting. It makes me curious. What kind of technologies did you use to build this kind of system?

Justin: So to understand that, we need to understand the underlying astronomical tool which we were using. So the underlying tool which was used was CASA and the APIs provided by CASA was in Python. So what we used is we used Python to write modules which would then chain together the individual aspects of the data processing into a pipeline form. And the overall design was configurable in a way that you could switch off certain aspects of the pipeline if you do not want to run it. So the entire system was robust enough in such a way that I can choose which phase of the data processing I would want to run and which one, I mean, knowing, say for example I'm rerunning a certain data cleaning activity. I know certain cleaning activities have been done before, I would not want to rerun it. So my configuration would allow me to choose very specific parts of that process in itself. And this was one pipeline. And similarly, we had similar Python applications. I would call it applications itself.

Rebecca: And so, Neeraj, would you say these modules and this configuration is something that made sense to you from the perspective of the analysis that's familiar to you as a scientist?

Neeraj: Oh yes, because we worked very closely and collaboratively over all these years. So all the configurations that go into fine-tuning this and all the outputs that come out of it, we designed and tested them together, that's one crucial aspect of our project. And then, in addition to this, what Justin described, we also had a few additional requirements, one being that it has — because we are dealing with these large volumes of data, that one data set, that one hour of data that we get from the telescope to make certain type of images can actually take a week of processing on our cluster. So the pipeline has to couple very nicely with this set of processes that will run on our high-performance computing system that we set up at IUCAA. And then our research team is actually geographically distributed; so we also needed a system which our team can actually access seamlessly without any hindrance of where they are actually physically located. So that was another major requirement. And then since the data volume is so large, at all stages, actually, when the data comes from South Africa to India at our Institute, we loaded using tapes to the storage associated with our cluster from that stage. Then we have to process it and then we have to archive it. And then also when we get these different data products which are images of galaxies or their spectra, which is their brightness as a function of frequency.

So we are talking about millions of objects and millions of spectra that we have to process and make it available to our team. So since everything is so complex and large volume, large numbers, we have to have a system which can deal with all the stages seamlessly. It's not that just process the data and stop it and someone else will take care of the products; we have to have a system which can actually — the moment data is there it can understand that OK, this is the data that I need to process, and then it processes through it, it should be able to tell us that okay, whether it processed successfully or not, if not, then at what stages it may have failed so that someone can actually address the issue, looking at various diagnostics that have been produced by the pipeline. Because it has to be very efficient, otherwise, if a scientist has to look at the data while they are processing, and the total processing time for this data is three years while we are observing, this means that during those three years, we can actually either process the data or do the science. So we have to have a system where scientists engage with the processing, they prototype it, they configure it, the pipeline, and when they fire the process after that they can actually think about the science. So I think this is what we achieved through the design that Justin described: that a scientist would come in, spend time in configuring it, but once they have done it, they can actually trigger the processing and forget and start thinking about the science. And that's why we have been able to make these discoveries while the data is still coming.

Rebecca: And I understand that in addition to the discovery from the prototype stage that there have been additional discoveries as you've been processing the data. So can you tell me a little bit about that?

Neeraj: Yeah. So this actually again highlights the point that whenever we look at the sky with a new telescope which has capabilities which did not exist before, then every time we point it in the sky, we are going to be surprised, provided the data has been processed properly. So through MeerKAT, we are getting data which is covering such a large range in frequency that when we were observing this particular galaxy in the sky, in this particular situation what happens is that this galaxy actually contains a lot of cold atomic and molecular gas, which I was talking about. So naturally it actually has a lot of stars in it as well. But what happens when stars form they also emit a lot of radiation and this radiation can then ionize the gas or destroy the cold gas from which they actually formed. It looks counterintuitive but this is what happens in nature. And when this gas gets ionized it actually emits a different kind of a radiation which we call as recombination lines. We call it recombination lines because electrons which have been ejected from the atoms or molecules they are combining back. So that's why it is called recombining. And from physics, we expect that it should produce certain signatures at radio wavelengths, but these signatures are so weak that they have not been really detected reliably in the past. So in this particular case, what happened we got this nice, beautiful spectra of this object. We knew that it should contain the signature of this because that's what we expect from our understanding of basic physics and that can never be wrong because physics is robust. But those signatures are actually not happening at one frequency. In this spectra which has more than 64,000 pixels, they are happening at maybe 30 to 50 different locations.

So what we did was that we identified those locations based on our expectation from basic physics and then we averaged and combined them in frequency space. And when we did this, the signal actually really came out very significantly. And this discovery is very important from two aspects. One is that it validates the basic physics that we understand and we expect it to work even in these distant galaxies. And also this detection implies that we should be able to detect many more of such systems with MeerKAT, our survey, and also with the future telescope such as Square Kilometer Array which may be even more sensitive. So this has actually opened up a new field, which we know should exist, but it was not really becoming accessible. So it's one significant step in making this field accessible and open to the community.

Prem: This is really, really exciting, this discovery. Can you tell us about the MeerKAT absorption line survey, please?

Neeraj: So MeerKAT absorption line survey is the name of this large survey project that we are carrying out with MeerKAT telescope that I mentioned in the beginning. We are observing for 1,600 hours approximately this tons of galaxies in the sky and understand how galaxies in general form and evolve. And so our idea is that we are observing approximately 400 different patches in the sky that have been selected with a certain emphasis that these are the best locations to understand the formation of galaxies and detect especially the cold atomic and molecular gas phases because that's what we are actually trying to look for in these galaxies. And up to this point, we have acquired close to one petabyte of data, and we have through the pipeline processed more than 700 terabytes of data. And from this first batch of data processing, we have now identified half a million objects in the sky. And most of these are actually supermassive black holes. And many of them have been detected for the first time and that essentially forms the first data release of our project. By ‘data release’, we mean that we have actually organized this data in a form that not only our team can use this for various science objectives but also the astronomy community at large can use this in a very efficient manner. And this is a very significant milestone for several reasons. The foremost, which is relevant for our project, is that with this release, we have actually ticked all the boxes, all the things that our project should do starting from carrying out observations till actually the end of getting these images of the sky from which science can be done. And it includes processing archive and everything, all the stages of pipeline that Justin described. So that's one very important aspect. So this gives us a lot of confidence now that we can process the rest of the data. It's just a matter of time. And then on the basis of it, we will be able to tell whatever that is there in the sky because now we know that we can do it and we can do it properly. So that's one very important aspect.

And the second important aspect is that we have also been able to make this publicly available to the astronomy community at large. And that's important because this data is very rich and very complex and it contains a lot of information. Our survey team has got certain objectives; we will do science based on those objectives and we will put this out also in the public domain. But then there's a lot of other science that can be done with this data which we cannot do, we cannot do for various reasons, one being we know that this can be done, this must be done, but it's just too much — we are a small team, we have finite resources, so we cannot do everything. So the community will do it. And the second thing is that there are types of things that can be done with this data which our team cannot do. We don't have expertise. So putting this out in the public domain enables all those possibilities. And of course, the third one being that none of us know at this point what this data can do, actually. After two years or three years, maybe, someone else with a new perspective or better abilities will come along and look at this data and do those new things. So this is how the projects of this scale needs to be executed that we not only enable what we want to do but we also share everything that has been done publicly so that it can actually be improved upon and much more can be done with that.

Rebecca: So that tells us a little bit about where this is going from the science perspective. Justin, where else can this pipeline go from a technology perspective? Are there things that you're looking to do, or are there different applications that you might want to take into account? What's the future for the tech? We know the future for the science, which is already very exciting, but what's the future for the tech?

Justin: So to answer that I'll slightly sidestep a bit on the lines of cognitive burden. So, for a researcher the primary goal is the science. Data processing is a process through which they reach to the science. Now, if a researcher requires multiple tools, switching between these tools causes a cognitive overload for them. It's a very easy trap to get into, to lose context of where they were or lose track of what they were thinking. I mean, a very simple case would be I would be starting with a tool thinking about something, and by the time I'm done, what do you say, configuring the tool, the thought chain itself is broken. So that cognitive burden is something which we can reduce by following an idea or approach which we have done for the entire MALS survey or the ARTIP, which includes the ARTIP pipeline, along with the environment which we provide along with it. So the idea is very akin to a science platform itself. So we take that learnings from ARTIP so we know how powerful the idea in itself is, having a unified platform where all the tools which are required by the researcher is available: the data flow is transparent to them. So in the case of ARTIP, the researcher did not have to worry about what happens after one phase of data processing — Where does the data go from there? What does the second phase do with it? They just had to prepare a config. Yes, they had to put thought into what the config would look like. But once the config was done, they did not have to worry about the management of data in-depth. They knew the next coming pipeline will take care of it, or the next coming tool chain will take care of that data. It knows where the data resides and where it needs to go. So taking that idea one step further, can we take inspiration from this to build science platforms at large scales, very much on the emphasis being the large scale where the data volume is large and the processing timelines in itself are, let's say, weeks or months? I mean, such a system would enable the scientist or the researcher to very specifically focus on the outcomes of the research rather than getting into the nitty-gritty of how individual systems would interact or what happens if, let's say, an API call fails, just as an example. The system is built in such a way that it knows what to do or how to manage such a failure. What happens if a processing fails, what kind of message should a researcher get? Is the researcher getting too many verbose errors, or are they just getting what they need to know?

So building such massive science platforms is one major learning which we take from this. And I think that idea, I mean, there are other domains. So say for example, the pharma domain is one, where such pipelines can now help. So we start with a hypothesis, put that hypothesis to test in a robust pipeline which takes in data, goes through multiple processes, produces an output, and the researcher is just involved in the initial phase of designing how the pipeline looks like, and then it is repeatable. Secondly, such a science platform allows you for reproducibility of results. So, let's just take the first example of the initial discovery. We thought it was an anomaly but because we could rerun it on the pipeline more efficiently within a limited span of time we could reproduce that results over and over again. And we knew that it was not an anomaly. It is a concrete result which can be used. So reproducibility also comes along. And also it generalizes or — in science basically, reproducibility along with how to reproduce it. So if I have a well-established system like a science platform which says these are the tool chains which were chained together with these configurations and if you run it as it is, you get the result. So that emphasizes on reproducibility of the result how… I mean if someone is starting from taking this as a base point baseline, they can reach that baseline pretty easily because they have everything that is required to reach there in the first place.

And from there they can develop further both on science and the tech aspect itself. So if there are other tools which can be interlinked into this science platform idea, they can interlink it because the base system is already available. So that is where this idea can grow from this point. And ARTIP has been a great inspiration in thinking in that direction. It allows you to think about what happens when you get a large volume of data, as an example. What happens if the domain is unknown — how do you interact with the researcher to understand or incorporate the domain into the technology? How do you integrate a tool? And it also opens up an arena where the idea of collaboration between a researcher and a technologist goes in a symbiotic fashion. So we make progress in the technology perspective where, let's say, in other projects in E4R, we are developing hardware which can process a large volume of data. So we can now take inspiration — so we can grow in that technology aspect while the researcher along with us grows in the science aspect. So it's a symbiotic relationship from this point on.

Rebecca: Excellent. Well, I'd like to thank you all once again. Another fascinating set of discoveries coming out of our engineering for research group, E4R. So I would like to thank you, Justin. And thank you, Neeraj, for joining us and explaining to us the joys of star creation and gold gasses, and how you're making all of this information available so other scientists can build on the data set we've made available to them. So thank you very much.

Neeraj: Pleasure.

Prem: See you, folks.

[MUSIC PLAYING]

More episodes

Episode name

Published

Why the tech industry needs Expert Generalists

July 10, 2025

The three new fallacies of distributed computing

June 26, 2025

MCP and SRE: Why the future of IT operations is agent-driven

June 12, 2025

Unpacking Google I/O 2025

May 29, 2025

Accelerating mainframe modernization using generative AI

May 15, 2025

Exploring the fundamentals of software engineering

May 01, 2025

Themes in Technology Radar Vol.32

April 17, 2025

We need to talk about vibe coding

April 02, 2025

Infrastructure as code in 2025

March 20, 2025

How fitness functions can help us govern and measure AI

March 06, 2025

Architecture as code

February 19, 2025

Decoding DeepSeek

February 06, 2025

AI testing, benchmarks and evals

January 23, 2025

Exploring the intersections of software architecture

January 09, 2025

Who should make software architecture decisions?

December 26, 2024

Generative AI's uncanny valley: Problem or opportunity?

December 12, 2024

Using generative AI for legacy modernization

November 28, 2024

Data contracts: What are they and why do they matter?

November 14, 2024

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Jim Highsmith: a 54-year agile journey

August 26, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights

Driving innovation in radio astronomy

Brief summary

Episode transcript

Find out what's happening at the frontiers of tech