给站点做一个全站相关性搜索

之前学习了elasticsearch, 发现它简单易用,很适合做内容检索。比如我现在这个站点,就可以通过elasticsearch实现一个全站内容的快速搜索,搜索结果是按相关性排序的,越相关的内容排序越靠前。

先上代码:

class ElasticPostsService
{
    protected static $client;
    const INDEX = 'blog';
    const TYPE = 'posts';

    private static function getClient()
    {
        if (self::$client) {
            return self::$client;
        } else {
            $config = [
                'localhost:9200'
            ];
            return self::$client = \Elasticsearch\ClientBuilder::create()->setHosts($config)->build();
        }
    }

    public static function search($keyword)
    {
        $param = [
            'index' => self::INDEX,
            'type' => self::TYPE,
            'body' => [
                'query' => [
                    'bool' => [
                        'should' => [
                            ['match' => ['title' => $keyword]],
                            ['match' => ['content' => $keyword]],
                            ['match' => ['description' => $keyword]],
                            ['match' => ['keywords' => $keyword]]
                        ]
                    ]
                ]
            ]
        ];
        try {
            $result = self::getClient()->search($param);
            $data = [];
            foreach ($result['hits']['hits'] as $item) {
                $data[] = $item['_source'];
            }
            return $data;
        } catch (Exception $ex) {
            return false;
        }
    }

    // 数据存入
    public static function putData($id, $data)
    {
        $param = [
            'index' => self::INDEX,
            'type' => self::TYPE,
            'id' => $id,
            'body' => $data
        ];
        return self::getClient()->create($param);
    }

    // 数据更新
    public static function updateData($id, $data)
    {
        $param = [
            'index' => self::INDEX,
            'type' => self::TYPE,
            'id' => $id,
            'body' => [
                'doc' => $data
            ]
        ];
        return self::getClient()->update($param);
    }

    // 根据ID获取
    public static function getById($id)
    {
        $param = [
            'index' => self::INDEX,
            'type' => self::TYPE,
            'id' => $id
        ];
        try {
            $result = self::getClient()->get($param);
            return $result['_source'];
        } catch (Missing404Exception $ex) {
            return false;
        } catch (Exception $ex) {
            throw $ex;
        }
    }

    // 全量同步
    public static function asyncAll()
    {
        $data = PostsModel::getAll([]);
        $result = [];
        foreach ($data as $item) {
            if (self::getById($item['id'])) {
                $result[] = self::updateData($item['id'], $item);
            } else {
                $result[] = self::putData($item['id'], $item);
            }
        }
        return $result;
    }

}
  • 首先需要通过 asyncAll() 方法,把所有mysql里现有的文章同步到elasticsearch中,在文章数量还不是很多的情况下是可以这样操作的,如果文章数据量特别庞大了,就得另想办法了。
  • 以后,当文章发生修改时,就只需要通过updateData() 方法更新一下elasticsearch中的数据就好了
  • 新添加文章后,则需要调用putData() 方法,把新增的文章添加到elasticsearch中

搭建服务器的elasticsearch

以上全文搜索功能,在本地测试是OK的,现在需要上传到服务器。服务器上还没有安装 elasticsearch,现在考虑一下如何在服务器上使用elasticsearch功能。

  • 方案1. 购买一个elasticsearch云服务,如:阿里云
  • 方案2. 直接在服务器本地安装一个elasticsearch
  • 方案3. 使用docker开启一个elasticsearch容器

方案1:在看了一下阿里云的报价后,直接pass了,价格太贵了, 1块多钱一个小时,一天就是20多块钱,对于我一个非营利性的小站来说,不太划算。 方案2:也不是很想用,环境全部装在服务器上,如果需要换服务器是很麻烦的。 方案3: 在这里先用方案3。

在官方找一个elasticsearch 6的 Dockerfile

# Elasticsearch 6.7.2

# This image re-bundles the Docker image from the upstream provider, Elastic.
FROM docker.elastic.co/elasticsearch/elasticsearch:6.7.2@sha256:7003e76b007510d54daf1f4fec1d032fca2b12bd2f9a7b1d12a2e99220590996

# The upstream image was built by:
#   https://github.com/elastic/dockerfiles/tree/v6.7.2/elasticsearch

# For a full list of supported images and tags visit https://www.docker.elastic.co

# For Elasticsearch documentation visit https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

# See https://github.com/docker-library/official-images/pull/4916 for more details.

放到项目文件:/docker/elasticsearch/Dockerfile

在根目录的 docker-compose.yml中配置

version: '3'

networks:
  host:

services:
  redis:
    build: docker/redis
    networks:
      - host
    ports:
      - "6789:6379"
    container_name: redis
    volumes:
      - $PWD/docker/redis/conf/redis.conf:/etc/redis/redis.conf
      - $PWD/docker/data:/data
    command: redis-server /etc/redis/redis.conf
  elastic:
    build: docker/elasticsearch
    networks:
      - host
    ports:
      - "9200:9200"
    container_name: elk
    command: elasticsearch

代码更新到服务器后,docker-compose up -d

服器上遇到一个错误,导致elasticsearch没启动起来

688467e0867ca) Copyright (c) 2019 Elasticsearch BV
elk        | [2019-05-16T07:11:02,568][INFO ][o.e.d.DiscoveryModule    ] [dJ1nASz] using discovery type [zen] and host providers [settings]
elk        | [2019-05-16T07:11:03,706][INFO ][o.e.n.Node               ] [dJ1nASz] initialized
elk        | [2019-05-16T07:11:03,706][INFO ][o.e.n.Node               ] [dJ1nASz] starting ...
elk        | [2019-05-16T07:11:03,915][INFO ][o.e.t.TransportService   ] [dJ1nASz] publish_address {172.19.0.2:9300}, bound_addresses {[::]:9300}
elk        | [2019-05-16T07:11:03,934][INFO ][o.e.b.BootstrapChecks    ] [dJ1nASz] bound or publishing to a non-loopback address, enforcing bootstrap checks
elk        | ERROR: [1] bootstrap checks failed
elk        | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
elk        | [2019-05-16T07:11:03,948][INFO ][o.e.n.Node               ] [dJ1nASz] stopping ...
elk        | [2019-05-16T07:11:03,968][INFO ][o.e.n.Node               ] [dJ1nASz] stopped
elk        | [2019-05-16T07:11:03,968][INFO ][o.e.n.Node               ] [dJ1nASz] closing ...
elk        | [2019-05-16T07:11:03,994][INFO ][o.e.n.Node               ] [dJ1nASz] closed
elk exited with code 78
^CGracefully stopping... (press Ctrl+C again to force)
Stopping redis ... done
[root@iZwz910c5h9nqohlh4shj6Z blog]# sysctl -a|grep vm.max_map_count
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.br-86f239178d1f.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.docker0.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.max_map_count = 65530
[root@iZwz910c5h9nqohlh4shj6Z blog]# sysctl -w vm.max_map_count=262144
vm.max_map_count = 262144
[root@iZwz910c5h9nqohlh4shj6Z blog]# sysctl -a|grep vm.max_map_count
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.br-86f239178d1f.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.docker0.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.max_map_count = 262144
[root@iZwz910c5h9nqohlh4shj6Z blog]# packet_write_wait: Connection to 120.77.87.61 port 22: Broken pipe
wenqidongdeMBP:blog wenqidong$ 

参考错误修正方法:

elasticsearch启动时遇到的错误

问题翻译过来就是:elasticsearch用户拥有的内存权限太小,至少需要262144;

解决:

切换到root用户

执行命令:

sysctl -w vm.max_map_count=262144

查看结果:

sysctl -a|grep vm.max_map_count

显示:

vm.max_map_count = 262144

上述方法修改之后,如果重启虚拟机将失效,所以:

解决办法:

在 /etc/sysctl.conf文件最后添加一行

vm.max_map_count=262144

即可永久修改