记一次由于nginx优先级导致的生产故障

故障描述

由于APP访问前端资源会出现白屏的问题,一路排查发现是由于缓存头etag, Last-Modified导致,于是就加上了一个禁用的骚操作

然后几个小时过去了,就炸锅了,为啥是几个小时后呢,毕竟只是一部分页面挂掉了。

nginx 配置与现场还原

已经隐藏了部分真实信息,不过保证了结构的一致性。

## 错误的配置
upstream api_example_loveyu_info {
    server 127.0.0.1:8088;
}

server {
    listen 80;
    server_name web.example.loveyu.info;
    root /data/htdocs/web.example.loveyu.info/dist;
    index index.html index.htm;

    location /web2/ {
        alias /data/htdocs/web2.example.loveyu.info/web2/;
    }

    location /web2/other/ {
        alias /data/htdocs/other.example.loveyu.info/;
    }

    location / {
        try_files $uri /index.html;
    }

    # 导致故障的配置,新增部分,先注释
    # location ~ .*\.(html|htm)$ {
    #     # 针对html不再缓存304
    #     if_modified_since off;
    #     etag off;
    #     add_header Last-Modified "";
    # }

    location /api/ {
        proxy_connect_timeout 180;
        proxy_send_timeout 180;
        proxy_read_timeout 180;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://api_example_loveyu_info/api/;
    }

    location ~ .*\.(js|css|gif|jpg|jpeg|png|webp|swf|svg|ttf|eot|woff|woff2)$ {
        add_header Access-Control-Allow-Origin *;
        expires 1d;
        add_header Pragma public;
        add_header Cache-Control "public";
    }
}

server {
    listen 8088;
    server_name web.example.loveyu.info;
    root /data/htdocs/api.example.loveyu.info;
    index index.html index.htm;
}

文件目录结构,每个目录三种文件类型

htdocs
├── api.example.loveyu.info
│   └── api
│       ├── index.html
│       ├── test.css
│       └── test.php
├── other.example.loveyu.info
│   ├── index.html
│   ├── test.css
│   └── test.php
├── web2.example.loveyu.info
│   └── web2
│       ├── index.html
│       ├── test.css
│       └── test.php
└── web.example.loveyu.info
    └── dist
        ├── index.html
        ├── test.css
        └── test.php

测试现有的配置是否正确

结果有点惨,原来原本的配置就是有问题的,虽然之前分析时已经猜到了

#!/usr/bin/env bash
urlArr=(\
"http://web.example.loveyu.info/index.html" \
"http://web.example.loveyu.info/test.css" \
"http://web.example.loveyu.info/test.php" \
"http://web.example.loveyu.info/web2/index.html" \
"http://web.example.loveyu.info/web2/test.css" \
"http://web.example.loveyu.info/web2/test.php" \
"http://web.example.loveyu.info/web2/other/index.html" \
"http://web.example.loveyu.info/web2/other/test.css" \
"http://web.example.loveyu.info/web2/other/test.php" \
"http://web.example.loveyu.info/api/index.html" \
"http://web.example.loveyu.info/api/test.css" \
"http://web.example.loveyu.info/api/test.php"
)
for url in ${urlArr[@]}
do
    echo "URL: ${url}"
    curl ${url}
    echo ""
done

测试结果:从结果可以看出,原本服务的配置文件中,只有根目录能够完全正确访问,剩下的3个项目均无法正确访问css的文件,故已经可以猜测到正则的配置的确有问题。

URL: http://web.example.loveyu.info/index.html
web.example.loveyu.info/dist.html

URL: http://web.example.loveyu.info/test.css
web.example.loveyu.info/dist.css

URL: http://web.example.loveyu.info/test.php
web.example.loveyu.info/dist.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/web2/index.html
web2.example.loveyu.info/web2.html

URL: http://web.example.loveyu.info/web2/test.css
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/web2/test.php
web2.example.loveyu.info/web2.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/web2/other/index.html
other.example.loveyu.info.html

URL: http://web.example.loveyu.info/web2/other/test.css
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/web2/other/test.php
other.example.loveyu.info.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/api/index.html
api.example.loveyu.info.html

URL: http://web.example.loveyu.info/api/test.css
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/api/test.php
api.example.loveyu.info.php

修复原有配置

先直接去掉css缓存部分的配置,如图:

重新验证脚本的结果:这时候发现一切正常

URL: http://web.example.loveyu.info/index.html
web.example.loveyu.info/dist.html

URL: http://web.example.loveyu.info/test.css
web.example.loveyu.info/dist.css

URL: http://web.example.loveyu.info/test.php
web.example.loveyu.info/dist.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/web2/index.html
web2.example.loveyu.info/web2.html

URL: http://web.example.loveyu.info/web2/test.css
web2.example.loveyu.info/web2.css

URL: http://web.example.loveyu.info/web2/test.php
web2.example.loveyu.info/web2.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/web2/other/index.html
other.example.loveyu.info.html

URL: http://web.example.loveyu.info/web2/other/test.css
other.example.loveyu.info.css

URL: http://web.example.loveyu.info/web2/other/test.php
other.example.loveyu.info.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/api/index.html
api.example.loveyu.info.html

URL: http://web.example.loveyu.info/api/test.css
api.example.loveyu.info.css

URL: http://web.example.loveyu.info/api/test.php
api.example.loveyu.info.php

重点来了,要怎么修复原本旧的配置呢

这次问题先分解成三步:

  1. 修正现有配置文件
  2. 复现生产的html问题
  3. 提供完整的正确配置

要解决配置的问题,就得先研究一下nginx匹配优先级的问题:在nginx的location和配置中location的顺序没有太大关系。正location表达式的类型有关。相同类型的表达式,字符串长的会优先匹配。

以下是按优先级排列说明:

  1. 第一优先级:等号类型(=)的优先级最高。一旦匹配成功,则不再查找其他匹配项。
  2. 第二优先级:^~类型表达式。一旦匹配成功,则不再查找其他匹配项。
  3. 第三优先级:正则表达式类型(~ ~*)的优先级次之。如果有多个location的正则能匹配的话,则使用正则表达式最长的那个。
  4. 第四优先级:常规字符串匹配类型。按前缀匹配。

修正关于正则优先级的问题

按照上面的说法,我们将对应的四组路径匹配表达式改成:

# location /web2/ {
location ^~ ^/web2/ {

# location /web2/other/ {
location ^~ ^/web2/other/ {

# location /api/ {
location ^~ ^/api/ {

此时访问文件内容正确,需要再检查一下关于css部分的缓存是否有生效, 通过CURL请求结果可以看到,针对根目录的已经生效,但是正对二级目录或alias并不生效,此时还需要对相关参数进行修复。

root@debian-home:~ # curl -i http://web.example.loveyu.info/test.css
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:05:20 GMT
Content-Type: text/css
Content-Length: 33
Last-Modified: Sun, 03 May 2020 17:16:50 GMT
Connection: keep-alive
ETag: "5eaefc82-21"
Expires: Mon, 04 May 2020 19:05:20 GMT
Cache-Control: max-age=86400
Access-Control-Allow-Origin: *
Pragma: public
Cache-Control: public
Accept-Ranges: bytes

web.example.loveyu.info/dist.css
root@debian-home:~ # curl -i http://web.example.loveyu.info/web2/test.css
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:05:24 GMT
Content-Type: text/css
Content-Length: 34
Last-Modified: Sun, 03 May 2020 17:16:50 GMT
Connection: keep-alive
ETag: "5eaefc82-22"
Accept-Ranges: bytes

web2.example.loveyu.info/web2.css

修正关于二级目录资源文件缓存不生效的问题

针对二级目录无效的问题,通过添加location嵌套解决,如下:

location ^~ /web2/ {
    alias /data/htdocs/web2.example.loveyu.info/web2/;
    location ~ .*\.(js|css|gif|jpg|jpeg|png|webp|swf|svg|ttf|eot|woff|woff2)$ {
        add_header Access-Control-Allow-Origin *;
        expires 1d;
        add_header Pragma public;
        add_header Cache-Control "public";
    }
}

location ^~ /web2/other/ {
    alias /data/htdocs/other.example.loveyu.info/;
    location ~ .*\.(js|css|gif|jpg|jpeg|png|webp|swf|svg|ttf|eot|woff|woff2)$ {
        add_header Access-Control-Allow-Origin *;
        expires 1d;
        add_header Pragma public;
        add_header Cache-Control "public";
    }
}

此时,观察3组文件的缓存, 目前已知是全部生效。

root@debian-home:~ # curl -i http://web.example.loveyu.info/web2/test.css
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:28:19 GMT
Content-Type: text/css
Content-Length: 34
Last-Modified: Sun, 03 May 2020 17:16:50 GMT
Connection: keep-alive
ETag: "5eaefc82-22"
Expires: Mon, 04 May 2020 19:28:19 GMT
Cache-Control: max-age=86400
Access-Control-Allow-Origin: *
Pragma: public
Cache-Control: public
Accept-Ranges: bytes

web2.example.loveyu.info/web2.css
root@debian-home:~ # curl -i http://web.example.loveyu.info/web2/other/test.css
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:28:24 GMT
Content-Type: text/css
Content-Length: 30
Last-Modified: Sun, 03 May 2020 17:16:50 GMT
Connection: keep-alive
ETag: "5eaefc82-1e"
Expires: Mon, 04 May 2020 19:28:24 GMT
Cache-Control: max-age=86400
Access-Control-Allow-Origin: *
Pragma: public
Cache-Control: public
Accept-Ranges: bytes

other.example.loveyu.info.css
root@debian-home:~ # curl -i http://web.example.loveyu.info/test.css
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:28:38 GMT
Content-Type: text/css
Content-Length: 33
Last-Modified: Sun, 03 May 2020 17:16:50 GMT
Connection: keep-alive
ETag: "5eaefc82-21"
Expires: Mon, 04 May 2020 19:28:38 GMT
Cache-Control: max-age=86400
Access-Control-Allow-Origin: *
Pragma: public
Cache-Control: public
Accept-Ranges: bytes

web.example.loveyu.info/dist.css

关于try_files的问题

这里使用了try_files关键字,主要是用来解决前端使用了非hash路由的问题,从目前的效果来看是没问题的,不过此时使用了 location / { 容易引起歧义。

直接调整为:

# old
index index.html index.htm;
location / {  
    try_files $uri /index.html;
}

# new
index index.html index.htm;
try_files $uri /index.html;

这里可能有几个误解,并且有几个特性:

  1. 当访问的文件不存在时会访问到index.html,这个是默认情况,不过有几个特例
  2. /web2/, /web2/other/, /api/ 三个目录访问且文件不存在时,index.html这个不会生效,这个属于正常的业务需求,不应该做调整
  3. 当文件以 js|css|gif|jpg|jpeg|png|webp|swf|svg|ttf|eot|woff|woff2 结尾时依旧返回nginx 404, 也时正常需求

解决我们最终的生产故障

上述一堆操作已经正确的修正了关于nginx访问不正确的问题,现在就是历史问题改如何解决的问题了,首先回到事故现场,我们添加的配置是:

# 导致故障的配置,新增部分,先注释
location ~ .*\.(html|htm)$ {
    # 针对html不再缓存304
    if_modified_since off;
    etag off;
    add_header Last-Modified "";
}

结合上面的情形进行分析,我们此时的访问情形是:

URL: http://web.example.loveyu.info/index.html
web.example.loveyu.info/dist.html

URL: http://web.example.loveyu.info/test.css
web.example.loveyu.info/dist.css

URL: http://web.example.loveyu.info/test.php
web.example.loveyu.info/dist.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/web2/index.html
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/web2/test.css
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/web2/test.php
web2.example.loveyu.info/web2.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/web2/other/index.html
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/web2/other/test.css
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/web2/other/test.php
other.example.loveyu.info.php

------------------------------------------------------------------

URL: http://web.example.loveyu.info/api/index.html
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/api/test.css
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.17.10</center>
</body>
</html>

URL: http://web.example.loveyu.info/api/test.php
api.example.loveyu.info.php

可以很明显的发现问题,由于location ~ .*\.(html|htm)$这段优先级的问题,大于location /web2/, 导致html的文件访问出错,以至于线上出现了404, 最后导致一串问题。

Q: 不过这里有一个问题,之前的测试中发现css部分也是无法访问,但线上却没有报错呢?
A: 前端目前的方案都是CDN打包上传,然后不会直接访问服务器的css文件,故一直没有出现问题

那么我们最终新的配置文件

不过由于我们的try_files的存储,同时需要注意在location ~ .*\.(html|htm)$也要加上try_files

# 最终配置文件

upstream api_example_loveyu_info {
    server 127.0.0.1:8088;
}

server {
    listen 80;
    server_name web.example.loveyu.info;
    root /data/htdocs/web.example.loveyu.info/dist;
    index index.html index.htm;

    try_files $uri /index.html;

    location ^~ /web2/ {
        # 优先级 A2
        alias /data/htdocs/web2.example.loveyu.info/web2/;
        location ~ .*\.(js|css|gif|jpg|jpeg|png|webp|swf|svg|ttf|eot|woff|woff2)$ {
            # 优先级 A2-1
            add_header Access-Control-Allow-Origin *;
            expires 1d;
            add_header Pragma public;
            add_header Cache-Control "public";
        }
        location ~ .*\.(html|htm)$ {
            # 优先级 A2-2
            # 针对html不再缓存304
            if_modified_since off;
            etag off;
            add_header Last-Modified "";
        }
    }

    location ^~ /web2/other/ {
        # 优先级 A1
        alias /data/htdocs/other.example.loveyu.info/;
        location ~ .*\.(js|css|gif|jpg|jpeg|png|webp|swf|svg|ttf|eot|woff|woff2)$ {
            # 优先级 A1-1
            add_header Access-Control-Allow-Origin *;
            expires 1d;
            add_header Pragma public;
            add_header Cache-Control "public";
        }
        location ~ .*\.(html|htm)$ {
            # 优先级 A1-2
            # 针对html不再缓存304
            if_modified_since off;
            etag off;
            add_header Last-Modified "";
        }
    }

    location ~ .*\.(html|htm)$ {
        # 优先级 A5, 此时html依旧需要保持try_files的结构
        # -------#重点关注#------ #
        try_files $uri /index.html;
        # 针对html不再缓存304
        if_modified_since off;
        etag off;
        add_header Last-Modified "";
    }

    location ~ .*\.(js|css|gif|jpg|jpeg|png|webp|swf|svg|ttf|eot|woff|woff2)$ {
        # 优先级 A3
        add_header Access-Control-Allow-Origin *;
        expires 1d;
        add_header Pragma public;
        add_header Cache-Control "public";
    }

    location ^~ /api/ {
        # 优先级 A3
        proxy_connect_timeout 180;
        proxy_send_timeout 180;
        proxy_read_timeout 180;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://api_example_loveyu_info/api/;
    }
}

server {
    listen 8088;
    server_name web.example.loveyu.info;
    root /data/htdocs/api.example.loveyu.info;
    index index.html index.htm;
}

测试结果:访问正确性不贴了,一律正常。

root@debian-home:~ # curl -i http://web.example.loveyu.info/index.html
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:48:26 GMT
Content-Type: text/html
Content-Length: 34
Connection: keep-alive
Accept-Ranges: bytes

web.example.loveyu.info/dist.html

root@debian-home:~ # curl -i http://web.example.loveyu.info/web/try_files.html
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:52:25 GMT
Content-Type: text/html
Content-Length: 34
Connection: keep-alive
Accept-Ranges: bytes

web.example.loveyu.info/dist.html

root@debian-home:~ # curl -i http://web.example.loveyu.info/test.php
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:48:33 GMT
Content-Type: application/octet-stream
Content-Length: 35
Last-Modified: Sun, 03 May 2020 17:16:50 GMT
Connection: keep-alive
ETag: "5eaefc82-23"
Accept-Ranges: bytes

web.example.loveyu.info/dist.php

root@debian-home:~ # curl -i http://web.example.loveyu.info/web2/index.html
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:48:47 GMT
Content-Type: text/html
Content-Length: 35
Connection: keep-alive
Accept-Ranges: bytes

web2.example.loveyu.info/web2.html

root@debian-home:~ # curl -i http://web.example.loveyu.info/web2/other/index.html
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:48:55 GMT
Content-Type: text/html
Content-Length: 31
Connection: keep-alive
Accept-Ranges: bytes

other.example.loveyu.info.html

root@debian-home:~ # curl -i http://web.example.loveyu.info/web2/other/test.php
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 03 May 2020 19:49:08 GMT
Content-Type: application/octet-stream
Content-Length: 30
Last-Modified: Sun, 03 May 2020 17:16:50 GMT
Connection: keep-alive
ETag: "5eaefc82-1e"
Accept-Ranges: bytes

other.example.loveyu.info.php

写在最后

所有的nginx配置相关调整,最好还是通过一个测试脚本验证一下,别再踩坑,比如这里就适合用postman来测试一下,效果还是不错的

POST测试脚本写法:

  1. 测试HTML,禁止存在缓存
pm.test("Successful POST request", function () {
    pm.expect(pm.response.code).to.be.oneOf([200]);
});

pm.test("Content-Type Cache-Control is present", function () {
    pm.response.to.not.have.header("Cache-Control");
});

pm.test("Content-Type ETag is present", function () {
    pm.response.to.not.have.header("ETag");
});
  1. 测试资源文件,必须存在缓存
pm.test("Successful POST request", function () {
    pm.expect(pm.response.code).to.be.oneOf([200]);
});

pm.test("Content-Type Cache-Control is present", function () {
    pm.response.to.have.header("Cache-Control");
});

pm.test("Content-Type ETag is present", function () {
    pm.response.to.have.header("ETag");
});

pm.test("Cache-Control have public", function () {
    let headerValue = pm.response.headers.get('Cache-Control')
    pm.expect(headerValue).to.have.string('public');
});

pm.test("Pragma have public", function () {
    let headerValue = pm.response.headers.get('Pragma')
    pm.expect(headerValue).to.have.string('public');
});

2条评论在“记一次由于nginx优先级导致的生产故障”

  1. 请问一下,您之前发的gadm数据,各个国家的纬度信息在哪里哇,或者说怎么判断某个国家是否经过赤道哇

    1. https://gadm.org/ 官方就有这个数据,判断是否经过赤道,直接通过pg计算是否和赤道是否相交即可,有直接可用的函数

写下你最简单的想法