I can not deny that I am a little tired of hearing the vedettes of programming. Lately I tend to give more credit to bloggers, friends, reddit commentators and in general anonymous programmers who perform their work every day in a real environment.

Because let’s be serious, for these guys there’s only one correct way to do things: to do them well. And it’s not what I find every day in the real world (I hate to talk like that). Sometimes (more often that I would like) you must compromise.

My environment is the trench you know.

That said, I know the value of these vedettes, it’s just that many times their perfect speech issued from the ivory tower of programming does not fit very well in a real world of deadlines and business balances. I’m going to take some quick notes on what I find in this book. I do not pretend to be pedagogical, rather I want to have some notes that can help you when you build your next api. And I hope this time, you just can make the perfect api, the api you really love, without compromising. Like a real programming vedette.

The first thing you need to do apis that you do not hate is a program that generates test data. A seeder that fills out your database for testing. The book recommends Faker. I will check it out.

Then you have to plan and create the URLs (endpoints). To do this we will create a list of names, and for each name, a list of actions that will contain the 4 typical of any CRUD plus any other if we need it, including possible filters for the data (parameters). For example:

Places:

  • Create
  • Read
  • Update
  • Delete
  • List (lat, lon)
  • Images

In this case we need a list of places, which can optionally be filtered by latitude and longitude. We can also upload images. If our api only allows to upload one image, which replaces the previous one, the resource will be “/image”.


Restful basics

  • GET /resources
  • GET /resources/x
  • GET /resources/x,y,z
  • GET /users/x/places/y (embebbed data)

Remember that x, y or z here is better if they are UUID and not a simple autoincrement number. The autoincrement offer information to the users of the api that we normally prefer to hide.

  • DELETE /places/x
  • DELETE /places/x,y,z
  • DELETE /places
  • DELETE /places/x/image
  • DELETE /places/x/images

PUT / POST

We use PUT when we know the complete url in advance and no matter how many times it is executed, the result will be the same. In the other cases, use POST. For example, if you want to upload a user’s image:

PUT /users/x/image

I know the complete url and although I use the endpoint ten times the result will always be the same: it replaces the existing image.

POST /computers/x/settings

Here I can upload a different setting each time, so the result is not necessarily the same.

Some more tips

  • Always use plural resources, it is more consistent.
  • Do not put verbs in the url, only names (resources)
  • Each resource / name has its controller.
  • The previous rule have some flexibility, for example if a resource only concerns users, maybe we can treat it in the UsersController. —

Input & Output

Or what is the same, how to organize and structure the requests and responses of our api:

Requests:

POST /authors/4/book HTTP/1.1
Host: api.cool-authors.com
Authorization : Bearer cn389ncoiwuencr
Content-Type: application/json
{
  "user_id": 2
}

Responses:

HTTP/1.1 200 OK
Server: nginx
Content-Type: application/json
...
{
  "id":"44556",
  "book": {
    "id":"4562"
  }
}

As Content-Type we will always use application/json which is much more clear and readable and it results in fewer errors and more typing accuracy than for example application/x-www-form-urlencoded, widely used in PHP. There are very few cases in which xml is necessary. Personally I find it hateful XML and SOAP and only work on it as a mercenary or forced by particular circumstances that I hope will be repeated as little as possible in the future.

KISS

On the structure of the content there is no consensus.

  • JSON api (one or more resources, no difference, returns an array of objects).
  • Twitter (if there are several resources returns an array, if it does not, it returns the object).
  • Facebook (like twitter but includes resources within a namespace, which makes it easy to add metadata potentially).
  • It is recommended to always include a namespace that groups the single resource or collection:
{
  "data": {
    "id": "503294",
    "name": "defnx"
  }
}
{
  "data": [
    {
      "id": "503294",
      "name": "defnx"
    },
    {
      "id": "503296",
      "name": "yuale"
    }
  ]
}

My personal experience makes me prefer to always return a collection, although I can see advantages in both styles. Maybe it’s just a question of personal taste where finally the manager’s vision (usually arbitrary) will prevail. —

HTTP codes, errors and messages

Even when preparing the response because something goes wrong, we return data. Information about what happened on the server reaches the client thanks to the HTTP codes and error messages. Some typical HTTP codes are:

200 - Everything went well.
201 - Something has been created well.
202 - Request accepted, processed asynchronously.
400 - Error in the structure of the request.
401 - Unauthorized (no user).
403 - The user can not access this data.
404 - The route is not valid or the resource does not exist.
405 - The method is not allowed.
410 - The data has been deleted or disabled.
500 - An unexpected error on the server.
503 - Service not available at this time.

The http codes should be used as general categories of errors, but we are probably going to need more granularity in the response, creating our own types of errors and more explanatory messages.

{
  "error": {
    "type": "OAuthException",
    "code": "err-1234",
    "message": "sorry bro",
    "href": "http://api.example.com/docs/errors/#err-1234"
  }
}

If there are several errors, it can be grouped in an collection. The structure of an error object can contain some elements that the json-api specifies, such as id, href, status, code, title, detail, links or path. —

Testing

Since unit tests in an api with many endpoints often turn into a nightmare, the book proposes to use Behat in a PHP development environment. It is a BDD tool (Behavior Driven Development) that is more or less the same as Cucumber.

Feature: Users

Scenario: Finding a specific user
    When I request "GET /users/1"
    Then I get a "200" response
    Then I scope into de "data" property
        And the properties exist:
            """
            id
            name
           """
        And the "id" property is an integer

With a tool of this type we must define “Features” that correspond to our resources (the names). Next we define “Scenarios” that correspond in some way to the _test methods of our unit tests set. And for each scenario we define “Steps” that can remember the asserts of our unit tests. All expressed almost in natural language.

Now that there is no written code, it is the best time to write all the tests in this way. This let us continue the analysis on the api before starting to write any line of code.

To be able to execute these tests, it is necessary to give the library some details like the host of our api. More on how to use behat here.

Outputting data

In a previous chapter the author recommends using a namespace to group the results, using “data”. It also returns an array of objects if there are several resources in the response:

{
  "data": [
    {
      "id": 5,
      "name": "defn.es"
    },
    {
      "id": 6,
      "name": "abrahan.io"
    }
  ]
}

Or the object itself if there is only one resource:

{
  "data": {
    "id": 5,
    "name": "defn.es"
  }
}

Regarding the controller, you should not use an ORM or other data recovery mechanism of the database directly to return the data. That is, between the data recovery and the return there must be a selection of the specific elements that we want to return and casting appropriately to the types of data we expect. Here are some reasons:

  1. Performance.
  2. Casting PHP extensions as string (boolean becomes “1” or “0”).
  3. Security (API customers who potentially see compromising data).
  4. Stability.

To perform such transformations we can use the Fractal library.

If our data model changes we must be careful to leave those changes within the scope of our application and not go outside if it is not essential.

Each type of error that we use will have its method, and we will have the possibility of defining custom errors if necessary in a generic method that responds with an error. I tend to think that HTTP error codes already handle many cases, but that we may actually need more granularity. —

Related data

Our database model does not have to replicate to the structure of our api, although in most cases they will be very similar. When our resource has associated data we have to find the balance between sending all the related data, or sending urls so that the client retrieves the data. That is, between sending many Kb and having less http traffic, or sending a few Kb and having more http requests.

Some solutions:

Send the identifiers of the related objects and make a second request with all of them:

{
  "data": {
    "id": 5,
    "name": "defn.es",
    "_links": {
      "domains": ["3", "4"]
    }
  }
}

Sideloading: Prevents the duplication of embedded data:

{
  "data": [
    {
      "id": 5,
      "name": "defn.es",
      "_links": {
        "domains": ["3", "4"]
      }
    },
    {
      "id": 6,
      "name": "defn.com",
      "_links": {
        "domains": ["3", "4"]
      }
    }
  ],
  "_linked": {
    "domains": [
      {
        "id": "3",
        "name": "www.defn.com"
      },
      {
        "id": "4",
        "name": "api.defn.com"
      }
    ]  
  }
}

Embedded Documents (Nesting):

For me it is the best that the book exposes, because the client gets exactly what it needs and the number of requests is reduced, optimizing the data download.

/posts?include=comments,author.image

Debugging

The Chrome extension Postman makes an excellent job.


Authentication

Authenticate in an api is used to track users, give context to users, give or remove access to certain resources, activate or deactivate accounts, etc.

Sometimes our api does not need this functionality, because its resources are public and are read only, or maybe it is placed in a private network, protected in other ways. For the rest of the cases it is probably convenient to implement a login.

BASIC It involves sending a name and a password. It is easy to implement but it is not secure, especially in http. In each request the name and password are sent in the header, which is potentially a risk.

Digest Instead of sending the password, it sends the calculated MD5 hash of it. The problem is that it is still a very unsafe way of logging in which the password must be sent in each request. With SSL is quite safe.

OAuth 1.0a It is very secure and it does not send the password constantly in the header. But it is difficult to work with it. The token never changes, so security is compromised if it is used for a long time. It is a technology in disuse.

OAuth 2.0 Requires SSL. The user receives an access token that he can use in each request. These tokens expire after a certain time. To control that time, you can pick up an “Not Authorized” exception, and request another token in that wrapper. Implementing it by hand is very difficult, so better to use a library. The book recommends two (for PHP):


Pagination

What we are talking about here is dividing the potential result of a single imaginary request into several requests. This helps us to present the data in a more friendly way for the client as well as improving the performance of the call (http and database).

To achieve this we will send one more parameter in our query string:

/domains?number=10

The response would include information about the pagination as well as the URL that we should call to obtain the next page:

{
  "data": [
    
  ],
  "pagination": {
    "total": 100,
    "count": 10,
    "per_page": 10,
    "current_page": 1,
    "total_pages": 10,
    "previous_url": "",
    "next_url": "/domains?page=2&number=10"
  }
}

No doubt counting the totals can be heavy processing in large databases. In many cases you have to cache the result, pre-calculate it or filter it through other attributes:

/domains?iso=es&page=2&number=120

Another way to get a pagination is thanks to the cursors, an answer could look like this:

{
  "data": [

  ],
  "pagination": {
    "cursors": {
      "after": 10,
      "next_url": "/domains?cursor=10&number=10"
    }
  }
}

It is an efficient way to deal with a lot of data without counting them, the problem is that there will always be a last request that will not return anything.


Content negotiation

Send the type of document you want within the Accept Header and treat it in the API by retrieving the MimeType. If the type does not exist, it returns an error 415.

GET /domains HTTP/1.1
Host: api.test.loc
Accept: application/json

HATEOAS

An api can not be considered RestFull if it does not include hyperlinks in its responses. Thanks to them we can explore the rest of the api and work with resources comfortably.

{
  "data": [
      ...
  ],
  "link": [
    {
      "rel": "self",
      "uri": "/domains/1"
    },
    {
      "rel": "domain.subdomains",
      "uri": "/domains/1/subdomains"
    }
  ]
}

Versioning

Finally in this long post, just a comment regarding the versions (something that our api will be forced in one way or another). It seems that there is not an ideal solution, as they all break RestFull.

The approach I like the most is the one that it uses the header to indicate the version that is used, through the attribute “Content”, in the way it is done in the api of Github:

Accept: application / vnd.github.v3 + json

In this way the URLs of our resources will always be the same, it respects HATEOAS, it is simple of using and there are no problems with the cache.