*Redis数据类型介绍

Redis 不是一个简单的键值存储,它实际上是一个数据结构服务器,支持不同类型的值。这意味着,在传统的键值存储中将字符串键与字符串值相关联,而在redis中,该值不仅限于一个简单的字符串,而且还可以保存更复杂的数据结构。以下是Redis支持的所有数据结构的列表,本教程将分别介绍这些结构:

  • 二进制安全的字符串。
  • 列表(Lists):按照插入顺序排列的字符串元素的集合。从根本上说它们是 Linked Lists。
  • 集合(Sets):唯一、无序的字符串元素的集合。
  • 有序集合(Sorted sets): 类似于集合,但每个字符串元素都与一个称为score的浮点数值相关联。元素总是按分数排序,因此与集合不同,可以检索一系列元素(例如,您可能会问:给出前10个或后10个)。
  • 哈希(Hashes): 它是由字段和值组成的map。字段和值都是字符串。这和Ruby或Python的hashes非常相似。
  • 位图 Bit arrays (或bitmaps):使用特殊命令,可以像处理一组位那样处理字符串值:可以设置或清除某个位,统计所有被设置为1的位,找到第一个被设置或未设置的位,等等。 
  • 超日志(HyperLogLogs):这是一种概率数据结构,用于估计集合的基数。别害怕,这比看起来简单...请参阅本教程后面的"超级日志"部分。
  • 流(Streams0): 只追加的集合,它由类map提供抽象日志数据类型的元素组成。会在 Introduction to Redis Streams 深入介绍。

掌握这些数据类型是如何工作的,以及从命令参考中选择哪种类型来解决给定的问题并不总是很容易的,因此本文档是Redis数据类型及其最常见模式的一个速成课程。

对于所有示例,我们将使用redis cli实用程序(一个简单但方便的命令行实用程序)来针对redis服务器发出命令。

*Redis keys

Redis keys是二进制安全的,这意味着你可以使用任意的二进制序列作为key,从像'foo'这样的字符串到JPEG 文件的内容。空字符串也是有效的key。

关于key的一些规则:

  • 很长的key不是个好的选择。例如,一个1024字节的key不仅在使用内存方面是一个选择,而且因为在数据集中查找key可能需要一些代价高昂的键比较。即使手头的任务是匹配一个大值的存在,对它进行散列(例如使用sha1)是一个更好的选择,特别是从内存和带宽的角度。
  • 非常短的key通常也不是好的选择。如果你能写出"user:1000:followers",那么把"u1000flw"作为一个键就没有什么意义了。与键对象本身和值对象使用的空间相比,前者可读性更强,增加的空间较小。尽管短键显然会消耗更少的内存,但你的工作是找到正确的平衡点。
  • 试着坚持使用模式。例如,"object-type:id"是一个好主意,就像"user:1000"一样。点或破折号通常用于多单词字段,如"comment:1234:reply.to"或"comment:1234:reply-to"。
  • key最大允许的长度是512MB。

*Redis Strings

Redis 字符串类型是可以用键关联的最简单的值类型。它是Mamcached的唯一数据类型,所以对于新来者来说,在Redis中使用它是非常自然的。

由于Redis的键是字符串,当我们也用字符串作为值的时候,我们从一个字符串映射另一个字符串。字符串类型在大量的用例中都是有用的,像缓存HTML片段和页面。

使用 redis-cli 测试字符串类型:

> set mykey somevalue
OK
> get mykey
"somevalue"

上面使用  SET 和 GET 命令来设置和读取一个字符串值。需要注意的是 SET 将替换存储在key中任何已存在的值,即使key关联的是一个非字符串值。 SET 执行的是赋值。

值可以是各种类型的字符串(包括二进制数据),例如,您可以在值中存储一个jpeg图像。值不能大于512 MB。

SET 命令有很多有趣的选项,这些选项被作为额外的参数提供。例如,可以让 SET 命令失败在key已经存在的情况下,或反过来,只有在key存在的情况下成功。:

> set mykey newval nx
(nil)
> set mykey newval xx
OK

虽然字符串是Redis最基本的值类型,但是也可以在字符串上执行很多有趣的操作。例如,原子自增:

> set counter 100
OK
> incr counter
(integer) 101
> incr counter
(integer) 102
> incrby counter 50
(integer) 152

INCR 命令把字符串解析成整数,并把值加一,并最终把计算的结果作为新值。还有一些类似的命令像 INCRBYDECR 和 DECRBY。本质上它们都是同样的命令,只是表现行为上有一点不同。

"INCR是原子的"是什么意思?即使多个客户端发送INCR命令操作同一个key,也永远不会进入一个竞争状态。例如,客户端1读到10,客户端2同时都到10,它们同时各自自增为11,然后把新值设置为11的情况永远不会发生。最终的值总会是12,读-自增-写操作执行被执行的同时,所有其它客户端都没有在执行命令。 

有许多操作字符串的命令。例如, GETSET 命令设置新值,并返回旧值作为结果。 它的使用实例是:每当有新的访客都把key的值加一,使用这个命令可以准确获取每小时的访问量,通过GETSET 命令,把key的新值设置为0,返回旧值。

在单个命令中同时设置或读取多个key值的能力也是有用的。所有有了 MSET 和 MGET 命令:

> mset a 10 b 20 c 30
OK
> mget a b c
1) "10"
2) "20"
3) "30"

 使用MGET 的时候,Redis返回一个值的数组。

*修改和查询key空间

有些命令没有在特定类型上定义,但对于与键的空间交互非常有用,因此可以与任何类型的键一起使用。

例如 EXISTS 命令返回1或0来表明给定的key在数据库中是否存在,而 DEL 命令删除某个key和它关联的值,不管值是什么类型。

> set mykey hello
OK
> exists mykey
(integer) 1
> del mykey
(integer) 1
> exists mykey
(integer) 0

从例子中也可以看出 DEL 命令怎样返回1或0,它依赖于key被删除(key存在)与否(没有这个名字的key)。

有很多key空间相关的命令,但是上面这两个命令和TYPE命令一起是必须的,返回存储在特定key中的值类型。

> set mykey x
OK
> type mykey
string
> del mykey
(integer) 1
> type mykey
none

*Redis 过期(Redis expires): 带有有限生存时间的key

开始介绍更复杂的数据结构之前,我们先讨论另一个对任何类型都有作用的特性,叫做Redis 过期(Redis expires)。你可以给key设置超时,这是一个有限的生存时间。当生存期走完之后,这个key会被自动删除,跟直接调用 DEL 命令效果一样。

关于Redis超时的几个简短信息:

  • 它们可以使用秒或毫秒精度进行设置。
  • 但是,过期时间分辨率始终为1毫秒。
  • 有关过期的信息将被复制并保存在磁盘上,当您的redis服务器保持停止状态时,时间实际上会过去(这意味着redis会保存key过期的日期)。

设置超时很简单:

> set key some-value
OK
> expire key 5
(integer) 1
> get key (立刻执行)
"some-value"
> get key (5秒以后执行)
(nil)

因为第二次调用GET延迟时间超过5秒,所以key在两次 GET 之间消失。上面的例子当中使用 EXPIRE 来设置超时(可以给存在超时的key设置一个不同超时,例如PERSIST 可以用来删除超时和设置key为持久的)。也可以使用其它Redis命令来这是超时,例如使用  SET 选项:

> set key 100 ex 10
OK
> ttl key
(integer) 9

上面的例子设置了一个名为key的键,它的值是字符串格式的100,它的超时时间是10秒。接着调用 TTL 命令来检查key的剩余生存时间。

如果像设置或检查超时以毫秒为单位,可以参考 PEXPIRE 和 PTTL 命令,和完整的 SET 选项。

*Redis列表(Redis Lists)

为了解释 List 数据类型,最好从一点理论入手,因为信息技术人员经常以不正确的方式使用列表这个术语。例如,"python-lists"并不是这个名字所暗示的(链表),而是数组(同一数据类型实际上在ruby中称为array)。

从一个非常一般的角度来看,列表只是一系列有序元素:10、20、1、2、3是一个列表。但是,使用数组实现的列表的属性与使用链接列表实现的列表的属性非常不同。

Redis列表通过链接列表实现。这意味着,即使列表中有数百万个元素,在列表的头部或尾部添加新元素的操作也将在恒定时间内执行。使用 LPUSH 命令将一个新元素添加到包含10个元素的列表头的速度与向包含1000万个元素的列表头添加元素的速度相同。

通过索引访问元素在使用数组实现的列表中速度非常快(常量时间索引访问),而在由链接列表实现的列表中速度却不太快(在该操作中需要与被访问元素的索引成比例的工作量)。

Redis列表是通过链表实现的,因为对于数据库系统来说,能够以非常快的方式将元素添加到一个非常长的列表中是至关重要的。另一个优势,如您稍后将看到的,是Redis列表可以在恒定的时间内处理恒定的长度。

当快速访问大量元素集合的中间部分很重要时,可以使用不同的数据结构,叫做有序集。本教程稍后将介绍有序集。

*Redis Lists上手

LPUSH 命令向list的左边(List头部)添加一个新元素, RPUSH 命令向list的右边(list尾部)添加一个新元素。 然后 LRANGE 命令从list中提取一定范围的元素。

> rpush mylist A
(integer) 1
> rpush mylist B
(integer) 2
> lpush mylist first
(integer) 3
> lrange mylist 0 -1
1) "first"
2) "A"
3) "B"

注意 LRANGE 接收两个索引参数,要返回的第一个元素和最后一个元素的索引。这两个参数可以都是负数,表明从List的结尾开始计算:-1是最后一个元素,-2标示倒数第二个元素,以此类推。

正如你所看到的, RPUSH 把元素添加到list的右边, LPUSH 把元素添加到list的左边。

这两个命令参数都是可变的,即你可以在一次调用中自由添加多个元素到list中。

> rpush mylist 1 2 3 4 5 "foo bar"
(integer) 9
> lrange mylist 0 -1
1) "first"
2) "A"
3) "B"
4) "1"
5) "2"
6) "3"
7) "4"
8) "5"
9) "foo bar"

Redis list中定义的一个总要操作是能够删除元素(pop elements)。Popping elements操作是从list中获取元素,并同时从list中删除它。就像你可以从list的两边添加元素一样,你可以从list的左边或右边删除元素:

> rpush mylist a b c
(integer) 3
> rpop mylist
"c"
> rpop mylist
"b"
> rpop mylist
"a"

我们添加3个元素并删除3个元素,这这一系列的命令之后list是空的,list中没有元素可删除。如果我们尝试继续删除另一个元素,会返回:

> rpop mylist
(nil)

Redis 返回空值表示list中已没有元素。

*list的常用案例

Lists 可以用于完成多种任务,下面是两个非常有代表性的案例:

  • 记录用户发送到社交网络中最新的更新。
  • 进程间通信,使用生产者-消费者模式。生产者向list中增加元素,消费者消费这些元素并执行动作。Redis有特定的list命令来使这个用例更可靠和高效。

例如非常有名的Ruby库resquesidekip,在底层都使用了Redis列表来实现后台作业。

著名的社交网络Twitter使用Redis列表来获取用户发布的最新的消息。

为了一步一步地描述一个常见用例,假设要在你的主页上展示社交网络上最新分享的照片并且加速访问。

  • 每当一个用户发布了一张新的照片,我们使用 LPUSH命令把它的ID加入到列表中。
  • 当用户访问这个主页,我们使用 LRANGE 0 9 获取最新加入的10个表项。

*上限 lists

很多情况下我们只想用list来存储最新的几个条目,不论他们是哪种:社交网络更新,日志,或其它任何。

Redis 允许我们使用list作为一个上限集合,使用 LTRIM 命令,只存储最新的N个条目并且丢弃最老的条目。

LTRIM 命令类似于 LRANGE, 但是不同于显示特点范围的元素,LTRIM设置这个范围的值作为新的列表值。列表外的所有值都会被删除。 

看个例子会更直观一些:

> rpush mylist 1 2 3 4 5
(integer) 5
> ltrim mylist 0 2
OK
> lrange mylist 0 -1
1) "1"
2) "2"
3) "3"

上面的 LTRIM 命令告诉Redis只取得列表中下标为0到2的元素,其它的都要丢弃。这就是一种简单有用的模式成为了可能:列表增加(push)元素操作+列表提取(trim)元素操作=增加一个元素同时删除一个元素使得列表元素总数有限:

LPUSH mylist <some element>
LTRIM mylist 0 999

上面的操作结合增加一个元素但只是存在1000个最新的元素在列表中。通过 LRANGE 你可以获取最新的表项而不需要记住旧的数据。

注意: 由于理论上LRANGE 是 [O(N) ]{.math}命令,读取从头开始或从尾开始的小范围数据是常量时间的操作。

*列表中的阻塞型操作

列表的一些特性使它适合现实队列(queues),也通常作为进程间通信系统的一个基础组件:阻塞式操作。

假设你通过一个进程把元素增加到列表中,使用另一个进程对这些元素做些实际的操作。这是通常的生产者/消费者体系,你可以用下面这种简单的方法实现:

  • 生产者调用LPUSH,把元素加入列表T 。
  • 消费者调用 RPOP,把元素从列表中取出或处理。

然而很有可能有时候list是空的,没有东西可处理,那么  RPOP 返回 NULL。这种请情况下,消费者需要被限制等待一些时间再重试  RPOP。这叫做轮询( polling), 这种场景下轮训并不是一个好办法,它有下面的缺点:

  1. 迫使Redis和客户端处理无效的命令。(当列表为空是所有的请求都不会执行实际工作,只是返回NULL)
  2. 消费者在收到NULL之后加入一个延时,让它等待一些时间。如果让延时小一点,在两次调用 RPOP之间的等待时间会比较短,这放大了第一个问题-调用Redis更加没有意义。

所以Redis实现了 BRPOP 和 BLPOP 命令,他们是 RPOP 和 LPOP 带阻塞功能的版本。它们只有当有新元素被添加到list或者到了用户指定的超时时间才返回。

这是一个调用 BRPOP 的例子,我们可以在消费者worker中使用它:

> brpop tasks 5
1) "tasks"
2) "do_something"

它标示:"等待名为 tasks 的list中的元素,如果超过5秒钟还没有可用的元素则返回"。

你可以使用参数0做为超时时间表示永久等待元素,也可以同时指定多个list,以便同时在多个list上等待,当有一个list收到一个元素则返回。

 BRPOP需要注意的一些事情:

  1. 客户端是按顺序被服务:第一个阻塞的客户端,当有元素增加的时候第一个被服务。以此顺序类推。
  2. 返回值与 RPOP 不同: 是一个两个元素的数组,包括key的名字,因为 BRPOP 和 BLPOP 也能够阻塞等待多个list上的元素。
  3. 如果到达超时时间,返回NULL。

关于list和阻塞还有很多需要了解的地方,建议阅读下面材料了解更多:

  • 使用 RPOPLPUSH构建安全队列或循环队列
  • 一个阻塞的命令变体,叫做 BRPOPLPUSH

*自动创建和移除key

目前为止我们的例子中,我们从未在增加元素前创建空list,或者删除list当list为空的时候。当列表为空的时候删除key,或者添加元素的时候,当key不存在时创建一个空的list,例如使用 LPUSH

这并不是list特有的,它适用于所有的由多个元素组成的数据类型: Streams, Sets, Sorted Sets and Hashes。

基本上,我们可以把它们行为总结为三个规则:

  1. 当把一个元素增加到一个集合类数据类型时,如果这个键不存在,在增加前会创建一个空的集合类数据类型。
  2. 当从一个集合类数据类型中移除一个元素时,如果值保持为空,键就会被自动删除。Stream 数据类型是这个规则的唯一的例外。
  3. 在一个空的key上,调用一个只读命令如 LLEN (返回list的长度), 或者写命令删除元素, 会产生同样的结果,好像key含有一个该命令所希望找到的类型的空的集合。

例子 1:

> del mylist
(integer) 1
> lpush mylist 1 2 3
(integer) 3

如果key存在那么,我们不能在错误的类型上执行操作:

> set foo bar
OK
> lpush foo 1 2 3
(error) WRONGTYPE Operation against a key holding the wrong kind of value
> type foo
string

例子 2:

> lpush mylist 1 2 3
(integer) 3
> exists mylist
(integer) 1
> lpop mylist
"3"
> lpop mylist
"2"
> lpop mylist
"1"
> exists mylist
(integer) 0

所有元素都被移除后,key将不再存在。

例子3:

> del mylist
(integer) 0
> llen mylist
(integer) 0
> lpop mylist
(nil)

*Redis Hashes

Redis hashes 看起来完全像我们期待"hash"看起来那样, 带有 field-value 对:

> hmset user:1000 username antirez birthyear 1977 verified 1
OK
> hget user:1000 username
"antirez"
> hget user:1000 birthyear
"1977"
> hgetall user:1000
1) "username"
2) "antirez"
3) "birthyear"
4) "1977"
5) "verified"
6) "1"

hashes 擅长用来代表对象,实际上可以放到hash中的fields数量是没有特别限制的(除非超过内存容量),所以你可以你可以在你的应用中以很多不同的方式使用 hashes 。

 HMSET 命令用来设置hash的多个fields,HGET 获取一个单独的field。HMGET 类似 HGET 返回的是一个值的数组:

> hmget user:1000 username birthyear no-such-field
1) "antirez"
2) "1977"
3) (nil)

也有可以在单个field上执行操作的命令,例如 HINCRBY:

> hincrby user:1000 birthyear 10
(integer) 1987
> hincrby user:1000 birthyear 10
(integer) 1997

完整的hash命令列表 full list of hash commands in the documentation.

需要注意,对于小的hash(元素个数较少)在内存中是使用特殊方式编码的,使它非常内存高效。

*Redis Sets

Redis Sets are unordered collections of strings. The SADD command adds new elements to a set. It's also possible to do a number of other operations against sets like testing if a given element already exists, performing the intersection, union or difference between multiple sets, and so forth.

> sadd myset 1 2 3
(integer) 3
> smembers myset
1. 3
2. 1
3. 2

Here I've added three elements to my set and told Redis to return all the elements. As you can see they are not sorted --- Redis is free to return the elements in any order at every call, since there is no contract with the user about element ordering.

Redis has commands to test for membership. For example, checking if an element exists:

> sismember myset 3
(integer) 1
> sismember myset 30
(integer) 0

"3" is a member of the set, while "30" is not.

Sets are good for expressing relations between objects. For instance we can easily use sets in order to implement tags.

A simple way to model this problem is to have a set for every object we want to tag. The set contains the IDs of the tags associated with the object.

One illustration is tagging news articles. If article ID 1000 is tagged with tags 1, 2, 5 and 77, a set can associate these tag IDs with the news item:

> sadd news:1000:tags 1 2 5 77
(integer) 4

We may also want to have the inverse relation as well: the list of all the news tagged with a given tag:

> sadd tag:1:news 1000
(integer) 1
> sadd tag:2:news 1000
(integer) 1
> sadd tag:5:news 1000
(integer) 1
> sadd tag:77:news 1000
(integer) 1

To get all the tags for a given object is trivial:

> smembers news:1000:tags
1. 5
2. 1
3. 77
4. 2

Note: in the example we assume you have another data structure, for example a Redis hash, which maps tag IDs to tag names.

There are other non trivial operations that are still easy to implement using the right Redis commands. For instance we may want a list of all the objects with the tags 1, 2, 10, and 27 together. We can do this using the SINTERcommand, which performs the intersection between different sets. We can use:

> sinter tag:1:news tag:2:news tag:10:news tag:27:news
... results here ...

In addition to intersection you can also perform unions, difference, extract a random element, and so forth.

The command to extract an element is called SPOP, and is handy to model certain problems. For example in order to implement a web-based poker game, you may want to represent your deck with a set. Imagine we use a one-char prefix for (C)lubs, (D)iamonds, (H)earts, (S)pades:

>  sadd deck C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 CJ CQ CK
   D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 DJ DQ DK H1 H2 H3
   H4 H5 H6 H7 H8 H9 H10 HJ HQ HK S1 S2 S3 S4 S5 S6
   S7 S8 S9 S10 SJ SQ SK
   (integer) 52

Now we want to provide each player with 5 cards. The SPOP command removes a random element, returning it to the client, so it is the perfect operation in this case.

However if we call it against our deck directly, in the next play of the game we'll need to populate the deck of cards again, which may not be ideal. So to start, we can make a copy of the set stored in the deck key into the game:1:deck key.

This is accomplished using SUNIONSTORE, which normally performs the union between multiple sets, and stores the result into another set. However, since the union of a single set is itself, I can copy my deck with:

> sunionstore game:1:deck deck
(integer) 52

Now I'm ready to provide the first player with five cards:

> spop game:1:deck
"C6"
> spop game:1:deck
"CQ"
> spop game:1:deck
"D1"
> spop game:1:deck
"CJ"
> spop game:1:deck
"SJ"

One pair of jacks, not great...

This is a good time to introduce the set command that provides the number of elements inside a set. This is often called the cardinality of a set in the context of set theory, so the Redis command is called SCARD.

> scard game:1:deck
(integer) 47

The math works: 52 -- 5 = 47.

When you need to just get random elements without removing them from the set, there is the SRANDMEMBERcommand suitable for the task. It also features the ability to return both repeating and non-repeating elements.

*Redis Sorted sets

Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set as well.

However while elements inside sets are not ordered, every element in a sorted set is associated with a floating point value, called the score (this is why the type is also similar to a hash, since every element is mapped to a value).

Moreover, elements in a sorted sets are taken in order (so they are not ordered on request, order is a peculiarity of the data structure used to represent sorted sets). They are ordered according to the following rule:

  • If A and B are two elements with a different score, then A > B if A.score is > B.score.
  • If A and B have exactly the same score, then A > B if the A string is lexicographically greater than the B string. A and B strings can't be equal since sorted sets only have unique elements.

Let's start with a simple example, adding a few selected hackers names as sorted set elements, with their year of birth as "score".

> zadd hackers 1940 "Alan Kay"
(integer) 1
> zadd hackers 1957 "Sophie Wilson"
(integer) 1
> zadd hackers 1953 "Richard Stallman"
(integer) 1
> zadd hackers 1949 "Anita Borg"
(integer) 1
> zadd hackers 1965 "Yukihiro Matsumoto"
(integer) 1
> zadd hackers 1914 "Hedy Lamarr"
(integer) 1
> zadd hackers 1916 "Claude Shannon"
(integer) 1
> zadd hackers 1969 "Linus Torvalds"
(integer) 1
> zadd hackers 1912 "Alan Turing"
(integer) 1

As you can see ZADD is similar to SADD, but takes one additional argument (placed before the element to be added) which is the score. ZADD is also variadic, so you are free to specify multiple score-value pairs, even if this is not used in the example above.

With sorted sets it is trivial to return a list of hackers sorted by their birth year because actually they are already sorted.

Implementation note: Sorted sets are implemented via a dual-ported data structure containing both a skip list and a hash table, so every time we add an element Redis performs an [O(log(N)) ]{.math}operation. That's good, but when we ask for sorted elements Redis does not have to do any work at all, it's already all sorted:

> zrange hackers 0 -1
1) "Alan Turing"
2) "Hedy Lamarr"
3) "Claude Shannon"
4) "Alan Kay"
5) "Anita Borg"
6) "Richard Stallman"
7) "Sophie Wilson"
8) "Yukihiro Matsumoto"
9) "Linus Torvalds"

Note: 0 and -1 means from element index 0 to the last element (-1 works here just as it does in the case of the LRANGE command).

What if I want to order them the opposite way, youngest to oldest? Use ZREVRANGE instead of ZRANGE:

> zrevrange hackers 0 -1
1) "Linus Torvalds"
2) "Yukihiro Matsumoto"
3) "Sophie Wilson"
4) "Richard Stallman"
5) "Anita Borg"
6) "Alan Kay"
7) "Claude Shannon"
8) "Hedy Lamarr"
9) "Alan Turing"

It is possible to return scores as well, using the WITHSCORES argument:

> zrange hackers 0 -1 withscores
1) "Alan Turing"
2) "1912"
3) "Hedy Lamarr"
4) "1914"
5) "Claude Shannon"
6) "1916"
7) "Alan Kay"
8) "1940"
9) "Anita Borg"
10) "1949"
11) "Richard Stallman"
12) "1953"
13) "Sophie Wilson"
14) "1957"
15) "Yukihiro Matsumoto"
16) "1965"
17) "Linus Torvalds"
18) "1969"

*Operating on ranges

Sorted sets are more powerful than this. They can operate on ranges. Let's get all the individuals that were born up to 1950 inclusive. We use the ZRANGEBYSCORE command to do it:

> zrangebyscore hackers -inf 1950
1) "Alan Turing"
2) "Hedy Lamarr"
3) "Claude Shannon"
4) "Alan Kay"
5) "Anita Borg"

We asked Redis to return all the elements with a score between negative infinity and 1950 (both extremes are included).

It's also possible to remove ranges of elements. Let's remove all the hackers born between 1940 and 1960 from the sorted set:

> zremrangebyscore hackers 1940 1960
(integer) 4

ZREMRANGEBYSCORE is perhaps not the best command name, but it can be very useful, and returns the number of removed elements.

Another extremely useful operation defined for sorted set elements is the get-rank operation. It is possible to ask what is the position of an element in the set of the ordered elements.

> zrank hackers "Anita Borg"
(integer) 4

The ZREVRANK command is also available in order to get the rank, considering the elements sorted a descending way.

*Lexicographical scores

With recent versions of Redis 2.8, a new feature was introduced that allows getting ranges lexicographically, assuming elements in a sorted set are all inserted with the same identical score (elements are compared with the C memcmpfunction, so it is guaranteed that there is no collation, and every Redis instance will reply with the same output).

The main commands to operate with lexicographical ranges are ZRANGEBYLEXZREVRANGEBYLEXZREMRANGEBYLEX and ZLEXCOUNT.

For example, let's add again our list of famous hackers, but this time use a score of zero for all the elements:

> zadd hackers 0 "Alan Kay" 0 "Sophie Wilson" 0 "Richard Stallman" 0
  "Anita Borg" 0 "Yukihiro Matsumoto" 0 "Hedy Lamarr" 0 "Claude Shannon"
  0 "Linus Torvalds" 0 "Alan Turing"

Because of the sorted sets ordering rules, they are already sorted lexicographically:

> zrange hackers 0 -1
1) "Alan Kay"
2) "Alan Turing"
3) "Anita Borg"
4) "Claude Shannon"
5) "Hedy Lamarr"
6) "Linus Torvalds"
7) "Richard Stallman"
8) "Sophie Wilson"
9) "Yukihiro Matsumoto"

Using ZRANGEBYLEX we can ask for lexicographical ranges:

> zrangebylex hackers [B [P
1) "Claude Shannon"
2) "Hedy Lamarr"
3) "Linus Torvalds"

Ranges can be inclusive or exclusive (depending on the first character), also string infinite and minus infinite are specified respectively with the + and - strings. See the documentation for more information.

This feature is important because it allows us to use sorted sets as a generic index. For example, if you want to index elements by a 128-bit unsigned integer argument, all you need to do is to add elements into a sorted set with the same score (for example 0) but with an 16 byte prefix consisting of the 128 bit number in big endian. Since numbers in big endian, when ordered lexicographically (in raw bytes order) are actually ordered numerically as well, you can ask for ranges in the 128 bit space, and get the element's value discarding the prefix.

If you want to see the feature in the context of a more serious demo, check the Redis autocomplete demo.

*Updating the score: leader boards

Just a final note about sorted sets before switching to the next topic. Sorted sets' scores can be updated at any time. Just calling ZADD against an element already included in the sorted set will update its score (and position) with [O(log(N)) ]{.math}time complexity. As such, sorted sets are suitable when there are tons of updates.

Because of this characteristic a common use case is leader boards. The typical application is a Facebook game where you combine the ability to take users sorted by their high score, plus the get-rank operation, in order to show the top-N users, and the user rank in the leader board (e.g., "you are the #4932 best score here").

*Bitmaps

Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type. Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 232 different bits.

Bit operations are divided into two groups: constant-time single bit operations, like setting a bit to 1 or 0, or getting its value, and operations on groups of bits, for example counting the number of set bits in a given range of bits (e.g., population counting).

One of the biggest advantages of bitmaps is that they often provide extreme space savings when storing information. For example in a system where different users are represented by incremental user IDs, it is possible to remember a single bit information (for example, knowing whether a user wants to receive a newsletter) of 4 billion of users using just 512 MB of memory.

Bits are set and retrieved using the SETBIT and GETBIT commands:

> setbit key 10 1
(integer) 1
> getbit key 10
(integer) 1
> getbit key 11
(integer) 0

The SETBIT command takes as its first argument the bit number, and as its second argument the value to set the bit to, which is 1 or 0. The command automatically enlarges the string if the addressed bit is outside the current string length.

GETBIT just returns the value of the bit at the specified index. Out of range bits (addressing a bit that is outside the length of the string stored into the target key) are always considered to be zero.

There are three commands operating on group of bits:

  1. BITOP performs bit-wise operations between different strings. The provided operations are AND, OR, XOR and NOT.
  2. BITCOUNT performs population counting, reporting the number of bits set to 1.
  3. BITPOS finds the first bit having the specified value of 0 or 1.

Both BITPOS and BITCOUNT are able to operate with byte ranges of the string, instead of running for the whole length of the string. The following is a trivial example of BITCOUNT call:

> setbit key 0 1
(integer) 0
> setbit key 100 1
(integer) 0
> bitcount key
(integer) 2

Common use cases for bitmaps are:

  • Real time analytics of all kinds.
  • Storing space efficient but high performance boolean information associated with object IDs.

For example imagine you want to know the longest streak of daily visits of your web site users. You start counting days starting from zero, that is the day you made your web site public, and set a bit with SETBIT every time the user visits the web site. As a bit index you simply take the current unix time, subtract the initial offset, and divide by 3600*24.

This way for each user you have a small string containing the visit information for each day. With BITCOUNT it is possible to easily get the number of days a given user visited the web site, while with a few BITPOS calls, or simply fetching and analyzing the bitmap client-side, it is possible to easily compute the longest streak.

Bitmaps are trivial to split into multiple keys, for example for the sake of sharding the data set and because in general it is better to avoid working with huge keys. To split a bitmap across different keys instead of setting all the bits into a key, a trivial strategy is just to store M bits per key and obtain the key name with bit-number/M and the Nth bit to address inside the key with bit-number MOD M.

*HyperLogLogs

A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, which in the case of the Redis implementation is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.

HLLs in Redis, while technically a different data structure, are encoded as a Redis string, so you can call GET to serialize a HLL, and SET to deserialize it back to the server.

Conceptually the HLL API is like using Sets to do the same task. You would SADD every observed element into a set, and would use SCARD to check the number of elements inside the set, which are unique since SADD will not re-add an existing element.

While you don't really add items into an HLL, because the data structure only contains a state that does not include actual elements, the API is the same:

  • Every time you see a new element, you add it to the count with PFADD.

  • Every time you want to retrieve the current approximation of the unique elements added with PFADD so far, you use the PFCOUNT.

    > pfadd hll a b c d
    (integer) 1
    > pfcount hll
    (integer) 4
    

An example of use case for this data structure is counting unique queries performed by users in a search form every day.

Redis is also able to perform the union of HLLs, please check the full documentation for more information.

*其它值得注意的特性

Redis API 有很多重要的东西不能在本文深入探索,但却值得了解:

*了解更多

本教程没办法完全覆盖所有内容,只介绍了API的基本概念。可以阅读 command reference 来了解更多。