Start Slideshow

This document is designed to be viewed as a reveal.js slide show presentation. You can either read/print the contents of the presentation (and the speaker notes) below, or click here to view slideshow.

(Please consult the reveal.js docs for information on how to control the slideshow once it's started)

A ZIP file containing this presentation is available for offline perusal.

Hidden Gems of Apache Solr

Lucene/Solr Revolution - 2016-10-13


https://home.apache.org/~hossman/rev2016

https://twitter.com/_hossman/

https://www.lucidworks.com/


Query Parsers

Usual Suspects

  • lucene
    • Canonical query syntax
  • dismax
    • Simplified syntax building Disjunction across configured fields
  • edismax
    • Hybrid: Full Lucene syntax with configurable Disjunction field aliases

The Godfather

          q = actor:"Ben Affleck" keywords:Boston
    defType = lucene

{!dismax} Steel

          q = ben affleck boston
         qf = title actor writer director keywords
         pf = title actor writer director
    defType = dismax

Mad {!edismax}

          q = actor:"ben affleck" boston
         qf = title actor writer director keywords
         pf = title actor writer director
        pf2 = title actor writer director
    defType = edismax

The Road Warrior

          q = person:"ben affleck" boston
f.person.qf = actor writer director
         qf = title person keywords
         pf = title person
        pf2 = title person
    defType = edismax

Beyond Thunderdome

          q = person:"Matt Damon" Geronimo:An American Legend 
         uf = person title
f.person.qf = actor writer director
         qf = title person keywords
         pf = title person
        pf2 = title person
    defType = edismax

The Dirty Two Dozen

Syntax

  • simple
  • maxscore
  • complexphrase
  • surround
  • xmlparser
  • func

Block Join

  • parent
  • child

No Syntax

  • prefix
  • field
  • raw
  • term
  • terms

Common Values

  • mlt
  • join
  • graph

Wrappers

  • boost
  • query
  • switch
  • rerank

Filtering

  • geofilt
  • bbox
  • collapse
  • frange

{!field} of Dreams

q = {!field f='title'}HELL OR HIGH WATER

{!terms} of Endearment

# Filter on both term exclusions (AND)
facet.field = genre
         fq = {!term f='genre'}drama
         fq = {!term f='genre'}comedy

# Filter on either term exclusions (OR)
facet.field = genre
         fq = {!terms f='genre'}drama,comedy

More Like Them Apples

 q = {!mlt maxdf=1000}good-will-hunting-1997
qf = actor writer director genre keywords

Some Kind Of Wonderful

   q = {!boost b=$func v=$qq}
  qq = actor:"Matt Damon" keywords:Boston
func = prod(rating, log(popularity))
Math Equation showing native lucene score for a document multiplied times a simple boost function which is defined as a the product of the document 'rating' and the log of the document 'popularity'

The {!lucene} Strikes Back

     q = +actor:"Matt Damon"
         +keywords:Boston
    fq = awards:oscar
    fq = awards:golden-globe

I've got a bad feeling...

     q = +actor:"Matt Damon"
         +keywords:Boston
          (awards:oscar
           awards:golden-globe)^1000

Return of the Jedi

     q = +actor:"Matt Damon"
         +keywords:Boston
          (filter(awards:oscar)
           filter(awards:golden-globe))^1000

Inception

    qq = Matt Damon Boston
     q = (+{!edismax v=$qq}
           (filter(awards:oscar)
            filter(awards:golden-globe))^1000)
    qf = title actor writer director keywords
   pf2 = title actor writer director

A Fistful of ${dollars}

    qq = Matt Damon Boston
     q = (+{!edismax v=$qq}
           (filter(awards:oscar)
            filter(awards:golden-globe))^1000
    qf = title ${people} keywords
   pf2 = title ${people}
people = actor writer director

For A Few ${dollars:More}

    qq = Matt Damon Boston
     q = (+{!edismax v=$qq}
           (filter(awards:oscar)
            filter(awards:golden-globe))^1000
    qf = title ${people:actor} keywords
   pf2 = title ${people:actor}
people = actor writer director

Request Parameters

Or...

The Good, The OK, And The Ugly

Remember Me

    qq = Matt Damon Boston
     q = (+{!edismax v=$qq}
           (filter(awards:oscar)
            filter(awards:golden-globe))^1000)
    qf = title ${people} keywords
   pf2 = title ${people}
people = actor writer director

Coyote Ugly

/select?qq=Matt%20Damon%20Boston&q=%2B%7B%21edismax%20v%3D%24qq%7D%20%28filter%28awards%3Aoscar%29%20filter%28awards%3Agolden-globe%29%29%5E1000&qf=title%20%24%7Bpeople%7D%20keywords&pf2=title%20%24%7Bpeople%7D&people=actor%20writer%20director

A Few Good Men

/find-by-person?qq=Matt+Damon+Boston
/find-by-person?qq=Ben+Affleck+Boston&people=director

Gunfight at the O.K. Corral

<requestHandler name="/find-by-person"
                class="solr.StandardRequestHandler" >
 <lst name="defaults">
  <str name="people">actor writer director</str>
 </lst>
 <lst name="invariants">
  <str name="q">(+{!edismax v=$qq}
                  (filter(awards:oscar)
                   filter(awards:golden-globe))^1000
  </str>
  <str name="qf">title ${people} keywords</str>
  <str name="pf2">title ${people}</str>
 </lst>
</requestHandler>

Gunfight at the O.K. Corral

<requestHandler name="/find-by-person"
                class="solr.StandardRequestHandler" 
                initParams="boostAwards" />
<initParams name="boostAwards" >
 <lst name="defaults">
  <str name="people">actor writer director</str>
 </lst>
 <lst name="invariants">
  <str name="q">(+{!edismax v=$qq}
                  (filter(awards:oscar)
                   filter(awards:golden-globe))^1000
  <str name="qf">title ${people} keywords</str>
  <str name="pf2">title ${people}</str>
</initParams>

As Good As It Gets

<requestHandler name="/find-by-person"
                class="solr.StandardRequestHandler" 
                useParams="boostAwards,queryDefaults" />
curl http://localhost:8983/solr/films/config/params -d '
{ "set":{ "queryDefaults":{
    "people": "actor writer director" }}
  "set":{ "boostAwards":{
    "_invariants_": {
      "q": "(+{!edismax v=$qq}
             (filter(awards:oscar)
              filter(awards:golden-globe))^1000",
      "qf": "title ${people} keywords",
      "pf2": "title ${people}" }}}
}' -H 'Content-type:application/json'

Personalized Scoring

Two Days of the Condor

Lots of talks on relevancy tuning today & tomorrow

Memento

  • Using aggregate user behavior to rank documents is useful, But...
  • Not all users are the same.
  • If you can remember anything about a user's past behavior, you can tune your results to them next time.

Total Recall

  • IF:
    1. We know which movies are generally popular with the user base in general
    2. We know which categories of movies this user tends to prefer (or avoid)
  • THEN:
    1. We can bias any search any user does in favor of popular movies
    2. We can bias any search this user does for/against movies in categories the user prefers/avoids

The Clone Wars

  qq = Matt Damon
   q = {!boost b=$func v=$qq}
func = popularity

A Beautiful Mind

Math Equation showing native lucene score for a document multiplied times the (weighted) scores that document has against the genres the user cares most about

Stand and Deliver

  qq = Matt Damon
   q = {!boost b=$func v=$qq}
func = prod(popularity,
            pow( query({!term f=genre v=$ga), $z_ga ),
            pow( query({!term f=genre v=$gb), $z_gb ))))

  ga = action   # The user's 2 most significant genres
z_ga = 1.48     # ... and their Z-scores
  gb = kids
z_gb = -1.33

Q & A


Post-Credits
Bonus Scene!

The Trouble With Harry

        qq = Harry
         q = +{!edismax v=$qq}
        qf = title actor writer director keywords
      sort = score desc
  • Dirty Harry
  • The Escape Artist (Harry Anderson, Harry Caeser, Harry Cohn)
  • When Harry Met Sally...
  • How The West Was Won (Harry Dean Stanton, Harry Morgan)
  • Harry and the Hendersons

Deconstructing Harry

        qq = Harry
         q = +{!edismax v=$qq}
        qf = title actor writer director  keywords
      sort = query($title_sort,0) desc, title asc
title_sort = {!field f=title v=$qq}
  • Dirty Harry
  • When Harry Met Sally...
  • Harry and the Hendersons
  • How The West Was Won (Harry Dean Stanton, Harry Morgan)
  • The Escape Artist (Harry Anderson, Harry Caeser, Harry Cohn)