GraphQL: Why switch to the modern API standard?

Why GraphQL?

I am convinced of this, and I want to convince you too: Any reasonably extensive REST API will become a poor imitation of a GraphQL API.

Just try writing an API to display posts in a blog. You will probably design the following methods:

GET allPosts() GET post(postId) GET comments(postId)

It won't be long before allPosts() will evolve into something like this:

GET allPosts(skip, take, filterFields[], filterValues[], fieldsToInclude[])

So you won't need a separate round trip to load the comments, you might also modify the post() method a bit:

GET post(postId, commentsSkip, commentsTake)

This is how most APIs in the enterprise environment look. Rather individual, rarely uniform. Is there a way to integrate these recurring requirements for API flexibility into a standard from the outset? Ingenious developers at Facebook have thought about how to streamline and simplify API designs. The result is GraphQL - the next evolutionary step in terms of APIs.

What is GraphQL?

It should be noted that GraphQL is just a standard. While there are corresponding libraries like Apollo where parts of it are conveniently pre-programmed, it is of course technology-dependent.

GraphQL is a language for querying APIs with a "query language." From the client side, an API query looks more like a "database query," while a classic REST call looks like calling a method. This may seem marginal at first, but it represents a significant change in mindset regarding how we can interact with APIs.

A GraphQL query could look like this:

{posts (author:"alice", skip: 10, take: 10) {
    id,
    author,
    content,
    headline,
    tags,
    comments {
      author,
      comment,
    }
  }
}

So we are fetching posts 11 to 20 from "alice" and selecting the fields "id," "author," "content," "headline," and "tags." At the same time, we are loading comments for the posts, and from the comments, we are loading the fields "author" and "comment."

In this way, GraphQL can be seen as a generic "data access layer." Where the comments come from is completely irrelevant and is programmed in the backend. If we have a document-based database, the comments might be stored directly in the post document. If we have a relational DB, the comments might be in a separate table. Theoretically, the comments could also be stored in a text file or retrieved via another API from a third-party system - from the perspective of the querying code, it is completely the same - here we only see that there are comments and that the comments apparently have a relationship to the posts.

Advantages and disadvantages compared to REST APIs

Optimized data queries from the client perspective

Never again overfetching and underfetching.

In the GraphQL query, we explicitly specify which fields, sub-entities, and fields of the sub-entities we want to query. In any reasonably well-implemented backend, only the requested fields will actually be delivered. This makes it easy to use the same interface for multiple queries of varying granularity.

For example, a post overview page might only need a headline and the tags, while the detail page of a post needs to query the full content along with comments. From the GraphQL perspective, you can query "posts" in both cases - only the list of queried fields changes.

Less effort in the backend due to stronger decoupling

The same concept has significant, often overlooked implications in the backend: As an API developer, I can define a large number of fields as return values in my API. I don't have to worry about performance issues because only the fields that are actually needed by the frontend will be returned, and only those sub-entities will be queried. (Assuming proper programming in the backend...)

This has very simple but powerful implications: If, for example, the "tags" are not needed in the initial requirement for the overview page but are later requested by users, a purely frontend adjustment is sufficient. The backend does not need to be expanded and redeployed. This can potentially save a massive amount of effort and complexity in the long run.

Built-in parallelism

Let's assume we are writing a complex GraphQL query that queries multiple sub-entities, with individual sub-entities having different data sources. Posts and comments might be in a database. Other information might need to be retrieved from another server, again as an API, or might be stored in the form of files somewhere on a network drive connected to the server.

Good GraphQL frameworks provide convenient mechanisms for handling sub-entity queries. In Apollo under NestJS, loading a post along with comments and revisions might look like this:

  @Query(returns => Post)
  async post (@Args('id', { type: () => Int }) id: number): Promise<Post> {
    return // Load post from the DB
  }

  @ResolveField()
  async comments(@Parent() post: Post) {
    return // Load comments with comment.postId == post.Id
  }

  @ResolveField()
  async revisions(@Parent() post: Post) {
    return // Load revisions with revision.postId == post.Id
  }

Apollo now automatically checks whether comments and/or revisions are queried in the GraphQL query, and if so, these sub-entities are resolved with the respective async method. And this happens natively in parallel.

This is just one way to approach the matter. In a relational database, you might want to work without ResolveField and directly execute a JOIN. In other scenarios, especially when data from various data sources needs to be loaded, a data join via ResolveField is a very convenient/simple and performant solution because the data sources are queried in parallel out-of-the-box.

Conveniently, the results of the queries are then automatically inserted into the parent model (here: Post) by Apollo.

Deprecation feature

Especially with large, long-lived APIs, good handling of deprecation and further development is often underestimated. IT systems are constantly evolving, and I don't know of any sufficiently complex and long-lived API that hasn't had issues with deprecation. Usually, the solution is that multiple different versions of APIs must be operated in parallel (at least temporarily).

The GraphQL standard offers its own @deprecated directive, which allows you to easily indicate which fields are deprecated (with an optional reason).

Most GraphQL clients read this directive and issue a warning, for example, that a deprecated field is being used.

This ideally prevents the need to operate multiple (different) versions of APIs in parallel. It remains a single API that is continuously modified/expanded over time.

Subscription feature

The GraphQL standard also defines a subscription feature for push notifications. Thus, no additional technology is needed from the client's perspective for such functionality.

GraphQL as a central hub / data query layer

It was hinted at earlier, but here it is again at a glance: GraphQL changes how we interpret an API. We no longer see an API (only) as a collection of methods that return something. Rather, we see a GraphQL API as a "data access layer":

Syntax with a strong focus on data objects, their fields, and their relationships to each other
Unified view of data from various data sources/systems
Performant through query parallelism and simple control over which data is actually transmitted over the network
The frontend/client does not need to know from which data sources individual fields or sub-entities originate. The frontend simply queries what it needs - only the backend takes care of querying the appropriate sources.
In general, you save round trips to the server or can design queries more easily than with REST so that you need fewer calls overall while still achieving good API design.

Typing for pleasant & safe development

Types are "first-class citizens" in GraphQL. This not only makes development safer but also more enjoyable. In most GraphQL IDEs, the types are naturally taken into account and validated. Moreover, typing (especially of complex objects/models) enables very pleasant auto-completion features in the IDEs.

A few disadvantages & challenges

As with REST APIs, you also need to pay attention to appropriate authorization/authentication and parameter validation in GraphQL. Nothing prevents a client from creating a query like ...(skip:0, take: 1000000) This must be handled on the backend side.
"Too" complex queries possible: Although performance is a strength when "properly" using GraphQL queries, there is also a pitfall: Clients now have the ability to issue correspondingly complex queries that result in long loading times.
The learning curve for GraphQL is currently considered steeper than that of REST APIs.
The GraphQL community (and thus the number of examples, tutorials, etc.) is smaller than that of REST counterparts.
"Overkill for simple apps": In researching this article, I came across statements that the use of GraphQL is more worthwhile for larger APIs. I personally do not share this opinion. With the current tooling and basic training, implementing a GraphQL API is not more complex or time-consuming than developing a REST API. However, you enjoy some advantages that become apparent especially when the API is used, maintained, and expanded over a longer period.

Training & support for implementation

Do you need support in designing or implementing a GraphQL interface? Or would you like to upgrade your know-how on this topic as quickly as possible through a workshop, training, or by jointly implementing a prototype? Contact us now via the following contact form and let's get started :)