Yet another simple url parser: github.

## Motivation

Parsing an URL is neither a challenging nor even an interesting problem, and there have already been lots of implementations.

I still started this side-project, since 1) the amount of work is moderate: it should be done in a few hundred lines of code; 2) it is somehow practical even it is a toy-project.

After all, as the old saying goes: “learning by doing”.

## Design

Since URL/URI is relatively straightforward, currently I follow the description on Wiki/URL.1

The syntax of an URI1:

URI = scheme:[//authority]path[?query][#fragment]
authority = [userinfo@]host[:port]

• scheme is mandatory
• authority is optional, and if authority is present:
• user info is optional
• host is mandatory
• port is optional
• path is mandatory
• query is optional
• fragment is optional

Please notice that currently the url parser can only recognize a valid url format, that is, follows the syntax above.

## Implementation

I refer to the zero-copy design of http-parser2, that is, instead of duplicating the url string, each field only points to the offset of the given url string, with a len limit.

I do not use regular expression (re) to parse urls. Instead, it simply scans the given url from beginning to end, and look for delimiters of each field. This gives a O(n) complexity.

The parsing returns a struct as the result:

typedef struct {
field_t *scheme;     // mandatory
field_t *usernm;     // optional
field_t *passwd;     // optional
field_t *host;       // optional
field_t *port;       // optional
field_t *path;       // mandatory
field_t *query;      // optional
field_t *frag;       // optional
} url_t;


with each field defined as:

typedef struct {
char *offset;
unsigned int len;
} field_t;


If a field is not NULL, then

• [field_t]->offset: points to the start character of the filed in the original url
• [field_t]->len: give the len of the field

## API

url_t *
url_parse(char *url); // parse the given url, returns the url_t as result

void
url_print(url_t *url_stru); // print parsing result

void
url_del(url_t *url_stru); // delete parsing result, free memory