It is incorrect to "normalize" // in HTTP URL paths

echoangle · 2026-04-18T09:39:51.000Z 1776505191

> Wait, are there any implementations that wrongly collapse double-slashes?

> nginx with merge_slashes

How can it be wrong if it is server-side? If the server wants to treat those paths equally, it can if it wants to.

It would only be wrong if a client does it and requests a different URL than the user entered, right?

leni536 · 2026-04-18T10:14:46.000Z 1776507286

It can't be. It's the same confusion as "email address normalization" being wrong (for example when gmail ignores dots when mapping an address to an inbox).

It matters where the normalization happens, and server-side behavior is out-of-scope of these identifier RFCs.

OoooooooO · 2026-04-18T10:15:00.000Z 1776507300

Yeah I would say that falls under the origin defining both paths as equivalent.

> Therefore, collapsing // to / in HTTP URL path segments is not correct normalization. It produces a different, non-equivalent identifier unless the origin explicitly defines those two paths as equivalent.

MattJ100 · 2026-04-18T07:59:37.000Z 1776499177

URL parsing/normalisation/escaping/unescaping is a minefield. There are many edge cases where every implementation does things differently. This is a perfect example.

It gets worse if you are mapping URLs to a filesystem (e.g. for serving files). Even though they look similar, URL paths have different capabilities and rules than filesystems, and different filesystems also vary. This is also an example of that (I don't think most filesystems support empty directory names).

dale_glass · 2026-04-18T09:09:49.000Z 1776503389

But maybe you should anyway.

Because maybe you use S3, which treats `foo/bar.txt` and `foo//bar.txt` as entirely separate things. Because to S3, directories don't exist and those are literally the exact names of the keys under which data is stored.

So you have script A concatenate "foo" + "/bar" and script B concatenate "foo/" + "/bar", and suddenly you have a weird problem.

I can't imagine a real use case where you'd think this is desirable.

secondcoming · 2026-04-18T09:26:38.000Z 1776504398

If a user of S3 knows that directories aren't real why would they expect directory-related normalisation to happen?

PunchyHamster · 2026-04-18T08:43:29.000Z 1776501809

We cut those and few others coz historically there were exploits relying on it

Nothing on web is "correct", deal with it

leni536 · 2026-04-18T10:04:52.000Z 1776506692

I don't think it's incorrect for distinct paths to point to the same resource.

Of course you shouldn't assume that in a client. If you are implementing against an API don't deviate regarding // and trailing / from the API documentation.

sfeng · 2026-04-18T08:42:09.000Z 1776501729

What I’ve learned in doing this type of normalization is whatever the specification says, you will always find some website that uses some insane url tweak to decide what content it should show.

renewiltord · 2026-04-18T09:27:13.000Z 1776504433

I’m going to keep doing it.

mjs01 · 2026-04-18T08:13:30.000Z 1776500010

// is useful if the server needs to serve both static files in the filesystem, and embedded files like a webpage. // can be used for embedded files' URL because they will never conflict with filesystem paths.

PunchyHamster · 2026-04-18T08:42:58.000Z 1776501778

....just serve it from other paths

janmarsal · 2026-04-18T09:23:28.000Z 1776504208

i'm gonna do it anyway

leni536 · 2026-04-18T09:10:32.000Z 1776503432

Wait until you try http:/example.com and http://////example.com in your browser.

stanac · 2026-04-18T09:44:12.000Z 1776505452

In both cases I get https://example.com/ in FF.

WesolyKubeczek · 2026-04-18T07:55:06.000Z 1776498906

It is probably “incorrect”, but given the established actual usage over the decades, it’s most likely what you need to do nevertheless.

Not doing it is like punishing people for not using Oxford commas, or entering an hour long debate each time someone writes “would of” instead of “would have”. It grinds my gears too, but I have different hills to die on.

bazoom42 · 2026-04-18T08:39:30.000Z 1776501570

If different clients does it differently, you have incompatibilies. This punishes everybody. Since normalizing // to / removes information which may be significant, the obviously correct choice is folllowing the spec.

PunchyHamster · 2026-04-18T08:43:53.000Z 1776501833

if it is significant, you coded your app wrong, plain and simple

jeroenhd · 2026-04-18T08:57:17.000Z 1776502637

Of course not. It's an explicit feature part of every specification.

Plenty of websites rewrite paths like /a/b/c/d into a backend service call like /?w=a&x=b&y=c&z=d. In that scheme, /a//c/d would rewrite to /?w=a&x=&y=c&z=d, something entirely distinct from /a/c/d working out to /?w=a&x=b&y=c

It's not the application's fault that the people attempting to configure web server URLs don't know how web server URLs work.

bazoom42 · 2026-04-18T09:08:43.000Z 1776503323

Etheryte · 2026-04-18T08:10:23.000Z 1776499823

Not sure I agree. The correct thing is to not mess with the URL at all if you're unsure about what to be doing to it. Doing nothing is the easiest thing of them all, why not do that?

j16sdiz · 2026-04-18T08:35:22.000Z 1776501322

because the you need some consistency or normalisation before applying ACL or do routing?

jeroenhd · 2026-04-18T08:59:15.000Z 1776502755

URL normalization is defined and it doesn't include collapsing slashes.

Not that you can include custom normalization rules (like collapsing slashes, tolower()ing the entire path, removing the query part of the URL), but that's not part of the standard. If you're doing anything extra, the risk of breaking stuff is on you.