Most “security tools” today are typically composed by code that consumes an API and applies predefined logic to identify issues. This is generally accomplished by:
Integrating third party tools into our monitoring platform isn’t always straightforward, as each tool:
Additionally, tools are usually designed to only fetch the data that is determined to be useful at a point in time. Often this means fetching the data that’s required in order to evaluate the currently implemented findings. This approach severely limits how useful they can be in other contexts, e.g. when conducting an incident response.
Motivated by the above limitations, we have taken a different approach in constructing our monitoring platform and in-house tools, which has come with several advantages. This blog post elaborates on our tooling development strategy, aiming to encourage the adoption of a similar approach in future tool developments across the security industry.
Rather than fetching a subset of known-useful endpoints, we generalize the consumption APIs in order to capture all the available data, independently of its perceived value at a point in time. For example, this could mean making requests to every single endpoint of the GitHub API. This data is then stored in its original format, including both the request and response. Once the data is stored locally (we typically call this a snapshot), we can run queries against it, and build findings on top of these queries.
This approach presents a number of benefits:
Let’s dig into some specifics.
We mentioned that we generalize the consumption of the API - what does this mean? Well, APIs follow a more or less strict format. If you can write some code that understands that format, then you can consume all of it with relatively little human intervention. Let’s look into the types of APIs we usually encounter.
These are usually REST APIs, that may have some level of definition without having a formal specification. We consider these to be “simple” as there’s limited relationships between endpoints, which allows for relatively straightforward collection.
A good example of this are the APIs exposed by Kubernetes clusters. Clusters
expose /api
& /apis
endpoints, which detail the resources available in the
cluster in a RESTy manner. The Kubernetes cluster API is quite simple
(endpoints are self-contained and multiple requests aren’t required to fetch
resources), which allows using the /api
and /apis
endpoints to fetch all
the resources.
The following figure shows the process a tool could follow to consume a Kubernetes cluster’s API:
In the above, the first request discovers the available endpoints/resources, and the second one fetches the details for the cluster’s namespaces. Following this process for all the resources returned by the initial request would allow consuming all the data exposed by the API.
These are APIs that publish definitions based on standards such as Open API or
GraphQL. Going back to the Kubernetes API example, in addition to the /api
and /apis
endpoints, clusters also expose a /openapi/v2
endpoint which
provides an Open API specification. Making an API request to this endpoint
returns information similar to what we saw previously:
{
"swagger": "2.0",
"info": {
"title": "Kubernetes",
"version": "v1.25.3"
},
"paths": {
"/api/v1/": {
"get": {
"description": "get available resources"
}
},
"/api/v1/namespaces": {
"get": {
"description": "list or watch objects of kind Namespace",
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/io.k8s.api.core.v1.NamespaceList"
}
},
"401": {
"description": "Unauthorized"
}
},
"x-kubernetes-action": "list",
"x-kubernetes-group-version-kind": {
"group": "",
"kind": "Namespace",
"version": "v1"
}
}
}
}
}
While the format is different, the complexity of parsing this information remains low. Things change significantly when there are relationships between the different endpoints. For example, in order to retrieve the details of all the repositories in a GitHub organization using the public REST API, you’d need to make two API calls:
https://api.github.com/orgs/<organization>/repos
to get a list of
repositories in the organizationhttps://api.github.com/repos/latacora/<repository>
to get the details for a
given repositoryTo accomplish this automatically, you’d need to consume the API’s definition and use it to build a relationship graph, which defines the endpoints that return data that needs to be used as input for other endpoints. This looks something like this:
These are APIs that don’t publish definitions or don’t have a consistent behavior. They also tend to have very loose conventions, with frequent exceptions. A good example of this is AWS, where each service has its own set of endpoints, and these endpoints follow differing structures. To further complicate things, AWS API endpoints have considerable, and sometimes complex, relationships.
For example, in order to identify EC2 instances that have instance profiles with IAM policies that grant administrative level permissions, you’d need to call the following API endpoints:
describe-instances
to list the EC2 instances and get the profiles
attached to each of themlist-instance-profiles
to get the IAM role attached to each instance
profilelist-role-policies
to get the IAM inline policies attached to each roleget-role-policy
to get the policy documents for the IAM inline policieslist-policies
to get the IAM managed policiesget-policy-version
to get the policy documents for the IAM managed
policieslist-entities-for-policy
to identify which of those policies are
attached to each IAM roleThe call graph would look something like this:
While the complexity of the call graph is similar to the GitHub API example above, AWS does not publish a definition of its APIs across all services in a convenient location. Supporting these APIs requires more human intervention compared to simple or formal APIs. But, for high-value APIs such as AWS, the effort involved is definitely worth it; where else can you get the details of everything deployed in an account!
As mentioned above, both our investigation and “findings” logic leverage query languages rather than custom logic. At Latacora, our primary programming language is Clojure, and the main technologies we use to query snapshots are Datalog with Datascript and Specter.
Thanks to our choice of query language, which is both expressed as structured data and intrinsically supports expanding “functions” (via Predicate Expressions), we can also express complex logic with simple helpers. What does this look like? Well, let’s say we want to write a query that returns all AWS EC2 instances that have the IMDSv1 enabled1. The Datalog query would look something like this:
(require '[datascript.core :as d])
(pqd/q '[:find ?instance-id ?vpc
:keys instance-id vpc
:where (aws-call "ec2" "describe-instances" ?api)
[?api :reservation ?reservation]
[?reservation :instance ?instance]
[?instance :instance-id ?instance-id]
[?instance :vpc-id ?vpc]
[?instance :metadata-option ?metadata]
[?metadata :http-tokens "optional"]
:in $ %]
db)
In the above:
db
is the Datascript database generated from the snapshotaws-call
is a helper function that pulls a specific API call from the db:where
clause implements the query logic, in this case
against the
describe-instances
data structure:keys
define the fields that should be returned by the queryAnother example would be identifying every API call (i.e. path in the snapshot) where a specific IP address appeared. Here Specter would be our tool of choice. A simple query like this one would suffice:
(require '[com.rpl.specter :as specter])
(def path-finder
"Finds the first entry matching `pred` in a deeply
nested structure of maps and vectors, and collects
the path on the way there."
(specter/recursive-path
[term-pred] p
(specter/cond-path
(specter/pred term-pred) specter/STAY
map? [INDEXED p]
coll? [INDEXED-SEQ p])))
#(specter/select (path-finder (fn [val] (.equals "1.3.3.7" val)))
snapshot)
In the above, snapshot
is the snapshot loaded into a map.
These are just a few examples of the data manipulation we perform routinely against snapshots. Since the tools are not tied to the underlying data structure being queried, they can be changed at any time, or more adequate tools can be used for specific tasks (e.g. we love using Meander for data transformation).
While we’re very proud of our in-house tooling, other excellent tools exist that implement useful functionality which we want to use, such as generating visual representation of cloud environments. While we could implement these same things (and we sometimes do), it would require a lot of work with limited added benefit. We’d rather allow these tools to run against the snapshots, instead of the actual APIs.
That way, we’d:
Since we’ve pulled all the information out of the provider, we have the ability to “play back” this information by faking an API endpoint that returns the information in the same format it was originally ingested in. This isn’t always trivial (and in some cases quite impractical), but it’s definitely a strategy we’ll want to implement more broadly in the future.
Good tooling is useless if you don’t have a means of proactively reacting to its findings. At Latacora, we’ve built a scheduling, reporting and alerting pipeline that integrates all of our tools. At a high level, it looks something like this:
The “reporting pipeline” is the central piece of our persistent monitoring capability. It ingests daily tooling output and not only does it generate point-in-time reports we share with our clients, it also creates JIRA tickets for things that need to be validated by an engineer, and potentially escalated to the impacted organization.
A specificity of this reporting pipeline is the approach we take to modeling our knowledge of our clients’ infrastructure and applying it to the tooling output. Once again, we take a generalist approach and describe the logic as Clojure code. This logic can be divided into two “models”:
As these models are code, they can be overlapped to our tooling output, which results in an accurate representation of the environments’ overall security posture.
We hope this blog post has provided valuable insights to security tool developers and has inspired a shift in the way they approach tooling development. If you have any questions or would like to get in touch, you can reach out at hello@latacora.com.