TechForce: Making a simple full-text search with Golang and Redis

With Redis’s rich support for Sets, we built a fast, realtime full-text search.

Imagine you have blog entries such as these written in theGolang struct:

<code>type Entry struct {
    Id          string
    Title       string
    Content     string
}
</code>

You want a user to be able to search any word or combination of Title or Content. Two data samples are:

<code>Entry {
        Id:      "50344415ff3a8aa694000001",
        Title:   "Organizing Go code",
        Content: "Go code is organized differently from that of other languages. This post discusses",
}
Entry {
        Id:      "50344415ff3a8aa694000002",
        Title:   "Getting to know the Go community",
        Content: "Over the past couple of years Go has attracted a lot of users and contributors",
}
</code>

In order to let people search any word in these two entries, we first index these texts into the Redis database as keywords that we segmented from the title and content as keys, with the Ids as Redis set values.

<code>redis 127.0.0.1:6379> keys *
1) "entries:keywords:go"
2) "entries:keywords:community"
3) ...

redis 127.0.0.1:6379> SMEMBERS entries:keywords:go
1) "entries:entity:50344415ff3a8aa694000001"
2) "entries:entity:50344415ff3a8aa694000002"
redis 127.0.0.1:6379> SMEMBERS entries:keywords:community
1) "entries:entity:50344415ff3a8aa694000002"
</code>

Then, for example, if the user types the keywords go community, we first segment the keywords to ["go", "community"] then do:

<code>redis 127.0.0.1:6379> SINTER entries:keywords:go entries:keywords:community
1) "entries:entity:50344415ff3a8aa694000002"
</code>

With the Redis Intersect command SINTER for Set, we are able to get the entry ids for entries that contain both the keywords goand community.

With this basic idea in mind, we were able to create the packageredisgosearch that has the ability to index any Go object that satisfies the Indexable interface and is able to Search from the indexed Redis database and unmarshall them to the indexed objects.

Let’s take the following entry as an example:

<code>type Entry struct {
    Id          bson.ObjectId `bson:"_id"`
    GroupId     string
    Title       string
    Content     string
    Attachments []*Attachment
}

func (this *Entry) IndexPieces() (r []string, ais []redisgosearch.Indexable) {
    r = append(r, this.Title)
    r = append(r, this.Content)

    for _, a := range this.Attachments {
        r = append(r, a.Filename)
        ais = append(ais, &IndexedAttachment{this, a})
    }

    return
}

func (this *Entry) IndexEntity() (indexType string, key string, entity interface{}) {
    key = this.Id.Hex()
    indexType = "entries"
    entity = this
    return
}

func (this *Entry) IndexFilters() (r map[string]string) {
    r = make(map[string]string)
    r["group"] = this.GroupId
    return
}
</code>

The IndexPieces tells the package what text needs to be segmented and indexed. Note that in an Entry struct, you might also want to index other types of data that are connected to the entry, for example attachment data. In our case, the user can search any filename and find out which entries those files belong to. So, the other return values return an array of Indexableobjects that can be indexed and connected together.

The IndexEntity tells the package the type of index from the namespace and the key stored in the Redis Set. The actual entity value will be marshalled into JSON and stored into Redis.

The IndexFilters gives the ability to add additional metadata that can be filtered when searching. For example I want to search “go community” in “New York”.

This is how to get the data out of the indexed Redis database:

<code>var entries []*Entry
count, err := client.Search("entries", "go community", map[string]string{"group": "New York"}, 0, 20, &entries)
</code>

The 0 and 20 is for pagination, and the count returned is the total count of entries that matched go community

You can check the full code list at:https://github.com/sunfmin/redisgosearch/blob/master/tests/search_test.go

This is just a simple package and is still missing a lot of features that is provided by full text searches in other software.


TechForce is a weekly meeting held at The Plant for developers to present and discuss new technologies and projects they are working on. Each week a different developer presents.