Platforms
Adopt
-
Headless content management systems have become a common component of digital platforms. Contentful is still our default choice in this space, but new entrants like Strapi have impressed us too. We particularly like Contentful's API-first approach and implementation of CMS as code. It supports powerful content modeling primitives as code and content model evolution scripts, which allow it to be treated like other data store schemas and enable evolutionary database design practices to be applied to CMS development. Recently, Contentful has released an app framework to write apps that make it easier to adapt Contentful to individual business processes and to integrate with other services. Apps can be built by and for a specific organization but a marketplace for apps is emerging, too.
-
GitHub Actions has become a default starting point for many teams that need to get CI or CD up and running quickly in a greenfield environment. Among other things, it can take on more complex workflows and call other actions in composite actions. Although the ecosystem in GitHub Marketplace continues to grow, we still urge caution in giving third-party GitHub Actions access to your build pipeline. We recommend following GitHub's advice on security hardening to avoid sharing secrets in insecure ways. However, the convenience of creating your build workflow directly in GitHub next to your source code combined with the ability to run GitHub Actions locally, using open-source tools such as act, is a compelling option that has streamlined the setup and onboarding of our teams.
-
K3s remains our default Kubernetes distribution for edge computing needs and resource-constrained environments. It's a lightweight, fully compliant Kubernetes but with reduced operational overhead. It uses sqlite3 as the default storage backend instead of etcd. It has a reduced memory footprint because it runs all relevant components in a single process. We've used K3s in environments like industrial control systems and point-of-sale machines, and we're quite happy with our decision. With the K3s runtime containerd now supporting wasm, K3s can run and manage WebAssembly workloads directly, further reducing the runtime overhead.
Trial
-
Apache Hudi is an open-source data lake platform that brings ACID transactional guarantees to the data lake. Our teams have had a great experience using Hudi in a high-volume, high-throughput scenario with real-time inserts and upserts. We particularly like the flexibility Hudi offers for customizing the compaction algorithm which helps in dealing with "small files" problems. Apache Hudi falls in the same category as Delta Lake and Apache Iceberg. They all support similar features, but each differs in the underlying implementations and detailed feature lists.
-
Arm compute instances in the cloud have become increasingly popular over the past few years due to their cost and energy efficiency compared to traditional x86-based instances. Many cloud providers now offer Arm-based instances, including AWS, Azure and GCP. The cost benefits of running Arm in the cloud can be particularly beneficial for businesses that run large workloads or need to scale. Based on our experiences we recommend Arm compute instances for all workloads unless there are architecture-specific dependencies. The tooling to support multiple architectures, like multi-arch docker images, also simplify build and deploy workflows.
-
Faced with the challenge of exploring large configuration spaces, where it may take a significant amount of time to evaluate a given configuration, teams can turn to adaptive experimentation, a machine-guided, iterative process to find optimal solutions in a resource-efficient manner. Ax is a platform for managing and automating adaptive experiments, including machine learning experiments, A/B tests and simulations. Currently, it supports two optimization strategies: Bayesian optimization using BoTorch, which is built on top of PyTorch, and contextual bandits. Facebook, when releasing Ax and BoTorch, described use cases like increasing the efficiency of back-end infrastructure, tuning ranking models and optimizing hyperparameter search for a machine learning platform. We've had good experiences using Ax for a variety of use cases, and while tools for hyperparameter tuning exist, we're unaware of a platform that provides functionality in a scope similar to Ax.
-
DuckDB is an embedded, columnar database for data science and analytical workloads. Data analysts usually load the data locally in tools like pandas or data.table to quickly analyze patterns and form hypotheses before scaling the solution in the server. However, we're now using DuckDB for such use cases, because it unlocks the potential to do larger than memory analysis. DuckDB supports range joins, vectorized execution and multiversion concurrency control (MVCC) for large transactions, and our teams are quite happy with it.
-
Any software system needs to properly represent the given domain in which it is employed and should always be informed by key aims and goals. Machine learning (ML) projects are no different. Feature engineering is a crucial aspect of engineering and designing ML software systems. Feature Store is a related architectural concept to facilitate the identification, discovery and monitoring of the features pertinent to the given domain or business problem. Implementing this concept involves a combination of architectural design, data engineering and infrastructure management to create a scalable, efficient and reliable ML system. From a tooling perspective, you can find open-source and fully managed platforms, but they're only one part of this concept. In the end-to-end design of ML systems, implementing a feature store enables the following capabilities: the ability to (1) define the right features; (2) enhance reusability and make features consistently available regardless of the type of model, which also includes setting up the feature engineering pipelines that curate data as described in the feature store; (3) enable feature discovery and (4) enable feature serving. Our teams leverage feature stores in production to reap these benefits for end-to-end ML systems.
-
RudderStack is a customer data platform (CDP) that makes it easy to store data in a data warehouse or data lake. This approach, increasingly known as Headless CDP, separates the CDP's features from its user interface and emphasizes configurability through APIs and the data warehouse/lake as primary storage. As expected from a product in this category, RudderStack has a rich repository of integrations with third-party products (both as source and sink) and the ability to ingest custom events. RudderStack has both a commercial offering and a self-hosted OSS version with functionality limitations.
-
Strapi is an open-source, NodeJS-based, headless content management system (CMS) similar to Contentful. It has been around for a while, and we've used it successfully in a few projects. Strapi provides both REST and GraphQL APIs, has comprehensive documentation, features an easy-to-use data model API and supports customizing both UI and logic.
-
TypeDB is a knowledge graph database, designed to work with intricate data relationships which makes it easier to query and analyze large data sets. TypeDB's TypeQL query language has a SQL-like syntax which eases the learning curve for schema definition, querying and exploration. TypeDB comes with a variety of tools that make it easier to work with the database, including a command-line interface and a graphical user interface, TypeDB Studio, which provides some features for working with TypeDB, such as managing schemas, querying data, visualizing relationships or even collaborating with others. There is a good deal of documentation available and an active community for support. Our teams used it to build knowledge graphs of taxonomic concepts across different databases and took advantage of its strong inference capabilities by adding new inference rules increasing efficiency and reducing workload. With its intuitive developer experience and supportive community, TypeDB is a good candidate to consider for any team looking to build data solutions that depend on complex data relationships, including natural language data, recommendation engines and knowledge graphs.
Assess
-
Autoware is an open-source autonomous driving software stack built on ROS (Robot Operating System) which can be used to develop and deploy advanced driver assist systems (ADAS) for a wide range of vehicles like cars and trucks. It provides a set of tools and algorithms for various aspects of autonomous driving, such as perception, decision-making and control. It also has a planning and control module that generates a trajectory for the vehicle based on its environment and objectives. It encourages open innovations in autonomous driving technology. We're building prototypes using Autoware to validate new product ideas and find it helpful.
-
Cozo is an embeddable relational database that uses Datalog for querying. We're intrigued with its support for time-travel queries and modeling graph data in relational schema. We quite like that it delegates data storage to existing popular engines — including SQLite, RocksDB, Sled and TiKV. Although Cozo is still in its early stages of development, we find it's worth assessing.
-
Dapr, short for Distributed Application Runtime, helps developers to build resilient, stateless and stateful microservices that run in the cloud. Some people may confuse it with a service mesh, because it uses a sidecar architecture that runs as a separate process alongside the application. Dapr is more application oriented and focuses on encapsulating the fault tolerance and connectivity required for building distributed applications. For example, Dapr provides multiple building blocks, from service invocation and message pub/sub to distributed lock, all of which are common patterns in distributed communication. One of our teams evaluated Dapr on a recent project; given their positive experience, they’re looking forward to bringing it to other projects in the future.
-
Immuta is a data security platform that allows you to secure access to your data, automatically discover sensitive data and audit how data is being used in an organization. In the past, we've talked about the importance of automation, engineering practices and treating security policy as code when we think about security concerns. Data security is no different. Our teams have been exploring Immuta to manage data policies as code to allow for fine-grained access control which is beyond what role-based access control (RBAC) can offer. Version-controlled policies can be tested and then provisioned as part of a CI/CD pipeline. In a decentralized data ecosystem, like one facilitated by data mesh, having domain-specific roles can lead to role or group proliferation in the identity system. Immuta’s attribute-based access control (ABAC) capability reduces the access grant to a mathematical equation of matching an "attribute" on the user to a "tag" on the data source. This platform is still new but certainly worth highlighting for data security needs.
-
Matter is an open standard for smart home technology, launched by Amazon, Apple, Google, Comcast and the Zigbee Alliance (now Connectivity Standards Alliance, or CSA). It enables devices to work with any Matter-certified ecosystem, thus reducing fragmentation and promoting interoperability among devices and IoT platforms from different providers. Its focus on standardization at the application level, support for Wi-Fi and Thread as communication mediums and backing from major tech companies set it apart from other protocols like Zigbee. Although the number of Matter-enabled devices is still relatively low, its growing importance in the IoT space makes it worth assessing for those looking to build smart home and IoT solutions.
-
Modal is a platform as a service (PaaS) that offers on-demand compute without the need for your own infrastructure. Modal lets you deploy machine learning models, massively parallel compute jobs, task queues and web apps. It provides a container abstraction that makes the switch from local to cloud deployment seamless, with hot reload both locally and in the cloud. It even removes deployments automatically, avoiding the need for manual clean-up, but can also make them persistent.
Modal is written by the same team that developed the first recommendation engine for Spotify. It takes care of the AI/ML stack end-to-end and can provide on-demand GPU resources, which is useful if you have particularly intensive computational needs. Whether you're working on your laptop or in the cloud, Modal just works, providing an easy and efficient way to run and deploy your projects.
-
Neon is an open-source alternative to AWS Aurora PostgreSQL. Cloud-native analytical databases have embraced the technique of separating storage from compute nodes to elastically scale on demand. However, it's difficult to do the same in a transactional database. Neon achieves this with its new multi-tenant storage engine for PostgreSQL. With minimal changes to the mainstream PostgreSQL code, Neon leverages AWS S3 for long-term data storage and elastically scales the processing up or down (including scale-to-zero) for compute. This architecture has several benefits — including cheap and fast clones, copy-on-write and branching. We're quite excited to see new innovations on top of PostgreSQL. Our teams are evaluating Neon, and we recommend you assess it as well.
-
OpenLineage is an open standard for lineage metadata collection for data pipelines, designed to instrument jobs as they're running. It defines a generic model of run, job and data set entities using consistent naming conventions. The core lineage model is extensible by defining specific facets to enrich those entities. OpenLineage solves the interoperability problem between producers and consumers of lineage data who otherwise would need to know how to speak to each other in various ways. Although there is a risk of it being another "standard in the middle," being a Linux Foundation AI & Data Foundation project increases its chances of gaining widespread adoption. OpenLineage currently supports data collection for multiple platforms, such as Spark, Airflow and dbt, although users need to configure its listeners. Support for OpenLineage data consumers is more limited at this time.
-
The "end of passwords" might be near, finally. Shepherded by the FIDO alliance and backed by Apple, Google and Microsoft, passkeys are nearing mainstream usability. When setting up a new login with passkeys, a key pair is generated: the website receives the public key and the user keeps the private key. Handling login uses asymmetric cryptography. The user proves that they're in possession of the private key, but, unlike passwords, it’s never sent to the website. On users' devices, access to passkeys is protected using biometrics or a PIN.
Passkeys can be stored and synced within the Big Tech ecosystems, using Apple's iCloud Keychain, Google Password Manager or Windows Hello. In most cases this works only with recent OS and browser versions. Notably, storing passkeys in Windows Hello is not supported on Windows 10. Fortunately, though, the Client to Authenticator Protocol (CTAP) makes it possible for passkeys to be kept on a different device other than the one that creates the key or needs it for login. For example, a user creates a passkey for a website on Windows 10 and stores it on an iPhone by scanning a QR code. Because the key is synced via iCloud the user can log in to the website from, say, their MacBook. Passkeys can be stored on hardware security keys, too, and support for native apps has arrived on iOS and Android.
Despite some usability issues — for example, Bluetooth needs to work because device proximity is checked when a QR code is scanned — passkeys are worth considering. We suggest you experiment with them on passkeys.io to get a feeling for their usability.
-
Spin is an open-source platform for building and running microservices in WebAssembly (WASM). In previous editions of the Radar, we talked about WebAssembly in the context of browsers, but we’re now witnessing the penetration on the server side due to its capabilities for fine-grained sandboxing, cross-language interoperability and hot reloading. With Spin CLI, you can quickly create and distribute WebAssembly microservices in Rust, TypeScript, Python and TinyGo. We're excited about Spin, and we recommend you carefully assess it as it moves out of early preview.
Hold
-
Denodo is a data virtualization tool that aims to make it easier to expose and secure transformed, consumer-friendly data (from multiple underlying data sources and through a variety of interfaces) from one platform. Data transformations within Denodo can be defined by creating virtual databases and views using a SQL-like language called VQL which are executed when a user queries the virtual database. Underneath, Denodo can delegate queries on the virtual databases against one or multiple underlying databases.
Although Denodo makes it easy to start exposing consumer-friendly data, performance degrades as layers of views and virtual databases are built on top of each other and queries with multiple joins start hitting multiple underlying databases. These problems are solvable, but they require fairly deep knowledge of the product's behavior and performance tuning options. Because of these drawbacks and given its limited support for unit testing, we recommend that you do not use Denodo as a primary data transformation tool and use tools like Spark or SQL (with dbt) for your data transformations instead.
- New
- Moved in/out
- No change
Unable to find something you expected to see?
Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.
Unable to find something you expected to see?
Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.
