在mac上使用docker测试filebeat+logstash+elastchsearch+kibana收集分析nginx日志 - 温启东 - 茵科美信

使用docker测试filebeat+logstash+elastchsearch+kibana收集分析nginx日志

由于新的系统需要做多个服务器的集群。为了方便查看和管理服务器以及项目日志，偿试采用elastic家族的filebeat+logstash+elasticsearch+kibana搭建日志管理平台。

个人理解

Logstash本身也可以做日志的收集并且可以做日志解析，但是由于Logstash服务比较占服务器资源，所以很多时候我们采用filebeat做日志收集。

filebeat是一个轻量级的日志收集服务，据网上相关资料显示，一个Logstash运行时占用内存在480M，而一个filebeat运行时只占40M，可见filebeat比Logstash更节省服务器资源。所以在网上看到一些用法是，在分布式集群的多台生产服务器上安装filebeat，然后在一个独立的服务器上安装logstash，filebeat收集当前服务器上的日志后发送到Logstash，由Logstash解析后再保存到Elasticsearch。

当然 filebeat也可以直接将日志输出给Elasticsearch而不经过Logstash，我觉得将日志输出给Logstash再存入Elasticsearch应该是为了对日志内容做一次解析，并且将日志格式化为标准格式再存入elasticsearch。

先在本地Mac上测试filebeat

由于是开发环境，为了便于其他开发人员使用同样的环境做开发，本地采用了docker，并且已经有elasticsearch在docker中运行。这里测试filebeat的时候，我暂时先用docker的elasticsearch做数据存储，filebeat也采用docker运行模式。最终需要布署到服务器时，elasticsearch将采用云Elasticsearch，这样可以减少维护成本，同时服务稳定性也不用有过多的担心；而filebeat则是需要在各个服务器上单独安装。

filebeat下载地址： https://www.elastic.co/cn/products/beats/filebeat

首先看一下docker-compose.yml

#版本号
version: "3"
#服务
services:
    logstash:
      depends_on:
        - elasticsearch
      volumes:
        - ./logstash.conf:/config/logstash.conf
      image: logstash:5.6.5    
      restart: always
      command: "/usr/share/logstash/bin/logstash -f /config/logstash.conf"
    filebeat:
      depends_on:
        - logstash
      image: docker.elastic.co/beats/filebeat:5.6.5
      volumes:
        - ./filebeat.yml:/usr/share/filebeat/filebeat.yml
        - ./log:/tmp
      restart: always
      privileged: true
    nginx:
      volumes:
       - ./log/:/var/log/nginx
       - ./nginx.conf:/etc/nginx/nginx.conf
      image: nginx
      ports:
        - 888:80
    elasticsearch:
      image: elasticsearch:5.6.5
      restart: always
      volumes:
       - ./jvm.options:/etc/elasticsearch/jvm.options
    kibana:
      ports:
        - 5601:5601
      environment:
        ELASTICSEARCH_URL: "http://elasticsearch:9200"
      depends_on:
        - elasticsearch
      image: kibana:5.6.5    
      restart: always

另外，准备一些配置文件：

filebeat.yml

filebeat的配置文件，配置需要读取的日志文件以及输出的接收地址，在这个测试中，我们读取nginx的access.log文件，并输出到logstash。然后再由logstash解析日志，并提交到elasticsearch保存。

jvm.options

这个是elasticsearch的jvm配置文件，文件是在网上找的，具体是些什么配置还没有深入研究

logstash.conf

这个是logstash的配置文件，在这里添加对filebeat提交过的日志做解析，并且把解析后的数据提交到指定的elasticsearch的指定index中保存。

nginx.conf

这个是docker中nginx的配置文件，为了测试的独立性，在这个测试中单独拉起了一个nginx容器，并且在容器外挂了一个nginx日志目录

nginx.conf

user  nginx;
worker_processes  1;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

log_format  main  '$remote_addr [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" $request_time "$upstream_addr" $upstream_response_time '
                  '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;
    #gzip  on;
    include /etc/nginx/conf.d/*.conf;
}

在nginx配置文件中，主要注意一下日志的输出格式定义，在logstash的配置文件中，我们还需要基于这个格式对日志记录做解析。

logstash.conf

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  beats {
    port => 5044
  }
}
filter {
  if 'nginx-access' in [tags]{
    grok {
      match =>{ 
          "message" => "^%{IPV4:remote_addr} \[%{HTTPDATE:timestamp}\] \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{INT:status} %{INT:body_bytes_sent} \"%{NOTSPACE:http_referer}\" %{NUMBER:request_time} \"%{DATA:upstream_addr}%{DATA:upstream_port}\" %{DATA:upstream_response_time} \"%{DATA:http_user_agent}\" \"%{NOTSPACE:http_x_forwarded_for}\""
        }
      remove_field => ["message"]
    }
  }

  if 'nginx-error' in [tags]{
    grok {
      match =>{ 
          "message" => "^%{SYSLOGPROG:ues} %{TIME:time} \[error\] %{DATA:code_code}: %{DATA:code_num} %{DATA:operate} \"%{DATA:file}\" %{DATA:err_desc} %{UNIXPATH:decription} %{PROG:prog}\", host\: \"%{DATA:host}\", referrer: \"%{DATA:referrer}\""
        }
      remove_field => ["message"]
    }
  }

}

output {
    elasticsearch {
      hosts => ["http://elasticsearch:9200"]
      index => '%{[fields][log_index]}'
    }
}

在logstash的配置文件中，由三部分组成，input，filter，output；其中filter是对提交数据的过滤解析。

Nginx的日志必须和logstash的正则匹配。否则无法正常切割。

Log stash grok正则表达式在线调试网址：http://grokdebug.herokuapp.com/

在这个测试中，为了测试在同一个服务器上收集不同的日志，并将不日志存到elasticsearch不同的索引中，我除了收集access.log还对error.log也做了收集。并且在elashticsearch中建立两个相应的索引来保存。在logstash的配置参数中可以看到 output.elasticsearch中的index并不是一个写死的索引名称。而是接收filebeat传过来的fields参数的log_index。

filebeat.yml

filebeat:
  prospectors:
    - input_type: log
      paths:  # 这里是容器内的path
          - /tmp/access.log
      tags: ["nginx-access"]
      fields: 
        log_index: nginx-access

    - input-type: log
      paths:
        - /tmp/error.log
      tags: ["nginx-error"]
      fields:
        log_index: nginx-error

  registry_file: /usr/share/filebeat/data/registry/registry  # 这个文件记录日志读取的位置，如果容器重启，可以从记录的位置开始取日志

output:
  logstash:  
    hosts: ["logstash:5044"]

kibana分析nginx访问日志：https://www.centos.bz/2018/04/%E4%BD%BF%E7%94%A8kibana%E5%88%86%E6%9E%90nginx%E8%AE%BF%E9%97%AE%E6%97%A5%E5%BF%97/

jvm.options

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms512m
-Xmx512m

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## optimizations

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# force the server VM (remove on 32-bit client JVMs)
-server

# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}

## GC logging

#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}

# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M

# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true