• [source,js]
  • }
  • [TIP]
    • [source,js]
    • }
    • [source,js]
    • Quick brown fox
    • [source,js]
    • }
  • [TIP]

    [[phrase-matching]]
    === Phrase Matching

    In the same way that the match query is the go-to query for standard
    full-text search, the match_phrase query(((“proximity matching”, “phrase matching”)))(((“phrase matching”)))(((“match_phrase query”))) is the one you should reach for
    when you want to find words that are near each other:

    [source,js]

    GET /my_index/my_type/_search
    {
    “query”: {
    “match_phrase”: {
    “title”: “quick brown fox”
    }
    }

    }

    // SENSE: 120_Proximity_Matching/05_Match_phrase_query.json

    Like the match query, the match_phrase query first analyzes the query
    string to produce a list of terms. It then searches for all the terms, but
    keeps only documents that contain all of the search terms, in the same
    positions relative to each other. A query for the phrase quick fox
    would not match any of our documents, because no document contains the word
    quick immediately followed by fox.

    [TIP]

    The match_phrase query can also be written as a match query with type
    phrase:

    [source,js]

    “match”: {
    “title”: {
    “query”: “quick brown fox”,
    “type”: “phrase”
    }

    }

    // SENSE: 120_Proximity_Matching/05_Match_phrase_query.json

    ==================================================

    ==== Term Positions

    When a string is analyzed, the analyzer returns not(((“phrase matching”, “term positions”)))(((“matchphrase query”, “position of terms”)))(((“position-aware matching”))) only a list of terms, but
    also the _position
    , or order, of each term in the original string:

    [source,js]

    GET /_analyze?analyzer=standard

    Quick brown fox

    // SENSE: 120_Proximity_Matching/05_Term_positions.json

    This returns the following:

    [role=”pagebreak-before”]

    [source,js]

    {
    “tokens”: [
    {
    “token”: “quick”,
    “start_offset”: 0,
    “end_offset”: 5,
    “type”: ““,
    “position”: 1 <1>
    },
    {
    “token”: “brown”,
    “start_offset”: 6,
    “end_offset”: 11,
    “type”: ““,
    “position”: 2 <1>
    },
    {
    “token”: “fox”,
    “start_offset”: 12,
    “end_offset”: 15,
    “type”: ““,
    “position”: 3 <1>
    }
    ]

    }

    <1> The position of each term in the original string.

    Positions can be stored in the inverted index, and position-aware queries like
    the match_phrase query can use them to match only documents that contain
    all the words in exactly the order specified, with no words in-between.

    ==== What Is a Phrase

    For a document to be considered a(((“match_phrase query”, “documents matching a phrase”)))(((“phrase matching”, “criteria for matching documents”))) match for the phrase ``quick brown fox,’’ the following must be true:

    • quick, brown, and fox must all appear in the field.

    • The position of brown must be 1 greater than the position of quick.

    • The position of fox must be 2 greater than the position of quick.

    If any of these conditions is not met, the document is not considered a match.

    [TIP]

    Internally, the match_phrase query uses the low-level span query family to
    do position-aware matching. (((“match_phrase query”, “use of span queries for position-aware matching”)))(((“span queries”)))Span queries are term-level queries, so they have
    no analysis phase; they search for the exact term specified.

    Thankfully, most people never need to use the span queries directly, as the
    match_phrase query is usually good enough. However, certain specialized
    fields, like patent searches, use these low-level queries to perform very
    specific, carefully constructed positional searches.

    ==================================================