elasticsearch.

flexible and powerful open source, distributed real-time
search and analytics engine for the cloud

by @timglabisch

Danke fürs kommen.
Ich bin Tim Glabisch, arbeite für AnyMotion Graphics.
Hab schon mehrfach festgestellt, dass Suchen mit einer Relationalen Datenbank keinen Spass machen.
- Performance
- viel Aufwand
- Features fehlen - Volltextsuchen, Facetten, u.s.w.
fork me on github!
fast alles paktische Beispiele!

is for search

has brilliant defaults

is flexible

Real time data, Real time analytics, Distributed, High availability, Multi-tenancy, Full text search Document oriented, Conflict management, ~~Schema free~~, RESTful API, Per-operation persistence, Apache 2 Open Source License, Build on top of Apache Lucene ™

lots of api's...

Search > Query, Highlighting, Suggest, Facets, Min Score, Scroll, Explain, Version, ...

Indices > Aliases, Analyze, Create Index, Delete Index, Open/Close Index, Get Settings, Get Mapping, Put Mapping, Delete Mapping, Refresh, Optimize, Flush, Snapshot, Update Settings, Templates, Warmers, Stats, Status, Segments, Clear Cache, Indices Exists, Types Exists, ...

Cluser > Health, State, Update Settings, Nodes Info, Nodes Stats, Nodes Shutdown, Nodes Hot Threads, Cluster reroute

Complex Queries > match, multi_match, bool, boosting, ids, custom_score, custom_boost_factor, constant_score, dis_max, field, filtered, flt, flt_field, fuzzy, has_child, has_parent, match_all, mlt, mlt_field, prefix, query_string, range, span_first, span_near, span_not, span_or, span_term, term, terms, top_children, wildcard, nested, custom_filters_score, indices, text, geo_shape, and, bool, exists, ids, limit, type, geo_bbox, geo_distance, geo_distance_range, geo_polygon, geo_shape, has_child, has_parent, match_all, missing, not, numeric_range, or, prefix, query, range, script, term terms, nested

Routings > ...

Mappings > ...

Modules > ...

River > ...

...

elasticsearch

in action...

install

1. Download and unzip the latest Elasticsearch distribution.

2. Run bin/elasticsearch -f on Unix.

bin/elasticsearch -f

org/elasticsearch/node/internal/InternalNode.java#L136

say hello.

{
    "user" : "timglabisch",
    "message" : "#elasticsearch is #awesome",
    "tags" : ["elasticsearch", "awesome"]
}

src/main/java/org/elasticsearch/rest/action/search/RestSearchAction.java#L57

{
    "query" : {
        "term": {
            "user" : "timglabisch"
        }
    }
}

{
    "query" : {
        "term": {
            "user" : "timglabisch",
            "message" : "#elasticsearch"
        }
    }
}

{
    "query" : {
        "term": {
            "message" : "#elasticsearch"
        }
    }
}

be aware

- term queries use OR by default!

- know the tokenizer, there is no #elasticsearch token!

{
    "query" : {
        "bool": {
            "must": [
                {
                    term: {"message" : "elasticsearch" }
                },
                {
                    term: {"user" : "timglabisch" }
                }
            ]
        }
    }
}

{
    "query" : {
        "bool": {
            "must": [
                {
                    term: {"message" : "elasticsearch" }
                },
                {
                    bool: {
                        "must" : {
                            term: {"user" : "timglabisch" }
                        }
                    }
                }
            ]
        }
    }
}

- json is awesome to build queries.

- easy to write complex queries

- mix different query types match, multi_match, bool, boosting, ids, custom_score, custom_boost_factor, constant_score, dis_max, field, filtered, flt,flt_field, fuzzy, has_child, has_parent, match_all, mlt, mlt_field, prefix, query_string, range, span_first, span_near, span_not, span_or, span_term, term, terms, top_children wildcard, nested, custom_filters_score, indices, text, geo_shape
look at the manual

facets

{
    "size": 1,
    "facets" : {
        "i_am_just_an_identifier": {
            "terms": { "field": "source" }
        }
    }
}

multiple facets

{
    "size": 1,
    "facets" : {
        "i_am_just_an_identifier": {
            "terms": { "field": "text" }
        },
        "i_am_just_another_identifier": {
            "terms": { "field": "source" }
        }
    }
}

different search phases.

src/main/java/org/elasticsearch/search/query/QueryPhase.java#L128

the breakpoint is awesome to debug the generated lucene query.

combine query and facets.

{
    "size": 0,
    "query" : {
        "bool": {
            "must": [
                {
                    term: {"text" : "fuck" }
                }
            ]
        }
    },
    "facets" : {
        "i_am_just_another_identifier": {
            "terms": { "field": "source" }
        },
        "i_am_just_an_identifier": {
            "terms": { "field": "text" }
        }
    }
}

Filters

Teilabfragen können unabhängig gefiltert werden.

{
   "size":16,
   "facets":{
      "provider":{
         "terms":{
            "field":"provider",
            "size":35
         },
         "facet_filter":{
            "bool":{
               "must":[
                  {
                     "term":{
                        "level1":"Motor"
                     }
                  },
                  {
                     "term":{
                        "cars":"vw"
                     }
                  }
               ]
            }
         }
      },
      "cars":{
         "terms":{
            "field":"cars",
            "size":35
         },
         "facet_filter":{
            "bool":{
               "must":[
                  {
                     "term":{
                        "level1":"Motor"
                     }
                  },
                  {
                     "term":{
                        "provider":"Top-Angebot"
                     }
                  }
               ]
            }
         }
      },
      // ....
   },
   "query":{
      // jep we have a empty query ...
   },
   "filter":{
      "bool":{
         "must":[
            {
               "term":{
                  "level1":"Motor"
               }
            },
            {
               "term":{
                  "provider":"Top-Angebot"
               }
            },
            {
               "term":{
                  "cars":"vw"
               }
            }
         ]
      }
   }
}

filters are awesome flexible, cacheable and reduce the number of requests.

Shards

routing

routing by

- id (default)
- parameter ( /twitter/tweet?routing=timglabisch )
- custom field / path

Mappings

Elasticseatch isn't schemaless.

Index an int

{
    "some_value": 1
}

try to index a string to the same field ...

{
    "some_value": "i am just a string"
}

the first value for a key maps your types.

sounds ugly but allows to start easy.

add the mapping

{
    "schemaless_proof_2" : {
        "properties" : {
            "age" : {
                "type" : "integer"
            },
            "name" : {
                "type" : "string",
                "search_analyzer": "keyword",
                "index_analyzer" : "keyword"
            }
        }
    }
}

get the mapping

Proof

{
    "age": 4, "name" : 10
}

{
    "age": 22, "name" : "Tim Glabisch"
}

remapping sucks

tokenizer

by mata.gia.rwth-aachen.de

there are a bunch of tokenizers: Edge NGram, Keyword, Letter, Lowercase, NGram, Standard, Whitespace, Pattern, UAX URL Email, Path Hierarchy

this is a text

there are a bunch of tokenfilters: Standard, ASCII Folding, Length, Lowercase, NGram, Edge NGram, Porter Stem, Shingle, Stop, Word Delimiter, Stemmer, and 15 more...

Für die Zukunft sind
Computer mit weniger
als 1,5 Tonnen Gewicht vorstellbar.

Popular Mechanics, US-Technik-Magazin, 1949

there is so much more...

Gui's

bigdesk
elasticsearch-head