除此之外,我使用 --replSet 参数指定了 mongod 实例所属 Replica Set 的名字。这个名字是可以随意起的,但必须确保属于同一个 Replica Set 的 mongod 实例设置了相同的 --replSet,否则可能会产生一些不可预期的后果。
在顺利打开这些 mongod 实例后以后,不出意外的话我们应该能在输出的日志信息中看到如下记录:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
2015-11-14T16:25:46.060+0800 I JOURNAL [initandlisten] journal dir=3\journal 2015-11-14T16:25:46.061+0800 I JOURNAL [initandlisten] recover : no journal files present, no recovery needed 2015-11-14T16:25:46.078+0800 I JOURNAL [durability] Durability thread started 2015-11-14T16:25:46.078+0800 I JOURNAL [journal writer] Journal writer thread started 2015-11-14T16:25:46.613+0800 I CONTROL [initandlisten] MongoDB starting : pid=9812 port=27003 dbpath=3 64-bit host=mrdai-Laptop 2015-11-14T16:25:46.613+0800 I CONTROL [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2 2015-11-14T16:25:46.613+0800 I CONTROL [initandlisten] db version v3.0.7 2015-11-14T16:25:46.614+0800 I CONTROL [initandlisten] git version: 6ce7cbe8c6b899552dadd907604559806aa2e9bd 2015-11-14T16:25:46.614+0800 I CONTROL [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49 2015-11-14T16:25:46.614+0800 I CONTROL [initandlisten] allocator: tcmalloc 2015-11-14T16:25:46.614+0800 I CONTROL [initandlisten] options: { net: { port: 27003 }, replication: { replSet: "myRS" }, storage: { dbPath: "3" } } 2015-11-14T16:25:46.615+0800 I INDEX [initandlisten] allocating new ns file 3\local.ns, filling with zeroes... 2015-11-14T16:25:47.542+0800 I STORAGE [FileAllocator] allocating new datafile 3\local.0, filling with zeroes... 2015-11-14T16:25:47.543+0800 I STORAGE [FileAllocator] creating directory 3\_tmp 2015-11-14T16:25:47.544+0800 I STORAGE [FileAllocator] done allocating datafile 3\local.0, size: 64MB, took 0 secs 2015-11-14T16:25:47.551+0800 I REPL [initandlisten] Did not find local replica set configuration document at startup; NoMatchingDocument Did not find replica set configuration document in local.system.replset 2015-11-14T16:25:47.552+0800 I NETWORK [initandlisten] waiting for connections on port 27003
可以注意到,倒数第二条记录显示 mongod 未能在本地数据中找到 Replica Set 的设置信息。这是正常的,因为这是第一次创建的 Replica Set。最后一条信息显示 mongod 启动完毕,等待外界连接它的端口。
在 conf 中,我们将 _id 设置为 Replica Set 的名称,并在 members 中设置了 Replica Set 所有成员的信息,其中包括成员的名称 _id 以及成员的主机名 host。
注意,尽管这里可以直接使用了 IP:端口 的形式来指定 mongod 实例,但在真实环境中,不要这么做,这种做法十分糟糕。不过现在搭建分布式,大家的做法似乎更倾向于为每台机器修改 hosts 文件。同样,不要这么做,这两种做法都属于 bad practice。最好的做法,是在你的集群环境中配置一台 DNS 服务器。这样,当你的某一个结点的 IP 发生变化时,你就只需要修改 DNS 服务器中的那条解析条目,而不需要修改每个结点的 hosts 文件了。
2015-11-14T16:41:54.946+0800 I NETWORK [initandlisten] connection accepted from 127.0.0.1:61875 #1 (1 connection now open) 2015-11-14T16:41:54.951+0800 I NETWORK [conn1] end connection 127.0.0.1:61875 (0 connections now open) 2015-11-14T16:41:54.953+0800 I NETWORK [initandlisten] connection accepted from 127.0.0.1:61877 #2 (1 connection now open) 2015-11-14T16:41:55.013+0800 I NETWORK [initandlisten] connection accepted from 127.0.0.1:61882 #3 (2 connections now open) 2015-11-14T16:41:55.018+0800 I NETWORK [conn3] end connection 127.0.0.1:61882 (1 connection now open) 2015-11-14T16:41:55.078+0800 I REPL [WriteReplSetConfig] Starting replication applier threads 2015-11-14T16:41:55.082+0800 I REPL [ReplicationExecutor] New replica set config in use: { _id: "myRS", version: 1, members: [ { _id: 1, host: "localhost:27001", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "localhost:27002", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id:3, host: "localhost:27003", arbiterOnly: false, buildIndexes: true, hidden: false, priority...(line truncated)... 2015-11-14T16:41:55.086+0800 I NETWORK [initandlisten] connection accepted from 127.0.0.1:61884 #4 (2 connections now open) 2015-11-14T16:41:55.115+0800 I REPL [ReplicationExecutor] This node is localhost:27003 in the config 2015-11-14T16:41:55.128+0800 I REPL [ReplicationExecutor] transition to STARTUP2 2015-11-14T16:41:55.134+0800 I REPL [rsSync] ****** 2015-11-14T16:41:55.136+0800 I REPL [rsSync] creating replication oplog of size: 6172MB... 2015-11-14T16:41:55.137+0800 I STORAGE [FileAllocator] allocating new datafile 3\local.1, filling with zeroes... 2015-11-14T16:41:55.139+0800 I REPL [ReplicationExecutor] Member localhost:27001 is now in state STARTUP2 2015-11-14T16:41:55.151+0800 I STORAGE [FileAllocator] done allocating datafile 3\local.1, size: 2047MB, took 0.001 secs 2015-11-14T16:41:55.153+0800 I STORAGE [FileAllocator] allocating new datafile 3\local.2, filling with zeroes... 2015-11-14T16:41:55.161+0800 I STORAGE [FileAllocator] done allocating datafile 3\local.2, size: 2047MB, took 0.001 secs 2015-11-14T16:41:55.170+0800 I STORAGE [FileAllocator] allocating new datafile 3\local.3, filling with zeroes... 2015-11-14T16:41:55.171+0800 I REPL [ReplicationExecutor] Member localhost:27002 is now in state STARTUP2 2015-11-14T16:41:55.186+0800 I STORAGE [FileAllocator] done allocating datafile 3\local.3, size: 2047MB, took 0.001 secs 2015-11-14T16:41:56.198+0800 I REPL [rsSync] ****** 2015-11-14T16:41:56.198+0800 I REPL [rsSync] initial sync pending 2015-11-14T16:41:56.200+0800 I REPL [rsSync] no valid sync sources found in current replset to do an initial sync 2015-11-14T16:41:57.139+0800 I REPL [ReplicationExecutor] Member localhost:27001 is now in state SECONDARY 2015-11-14T16:41:57.206+0800 I REPL [rsSync] initial sync pending 2015-11-14T16:41:57.206+0800 I REPL [ReplicationExecutor] syncing from: localhost:27001 2015-11-14T16:41:57.221+0800 I REPL [rsSync] initial sync drop all databases 2015-11-14T16:41:57.222+0800 I STORAGE [rsSync] dropAllDatabasesExceptLocal 1 2015-11-14T16:41:57.222+0800 I REPL [rsSync] initial sync clone all databases 2015-11-14T16:41:57.229+0800 I REPL [rsSync] initial sync data copy, starting syncup 2015-11-14T16:41:57.234+0800 I REPL [rsSync] oplog sync 1 of 3 2015-11-14T16:41:57.239+0800 I REPL [rsSync] oplog sync 2 of 3 2015-11-14T16:41:57.254+0800 I REPL [rsSync] initial sync building indexes 2015-11-14T16:41:57.258+0800 I REPL [rsSync] oplog sync 3 of 3 2015-11-14T16:41:57.265+0800 I REPL [rsSync] initial sync finishing up 2015-11-14T16:41:57.268+0800 I REPL [rsSync] replSet set minValid=5646f3d4:1 2015-11-14T16:41:57.274+0800 I REPL [rsSync] initial sync done 2015-11-14T16:41:57.290+0800 I REPL [ReplicationExecutor] transition to RECOVERING 2015-11-14T16:41:57.292+0800 I REPL [ReplicationExecutor] transition to SECONDARY 2015-11-14T16:41:58.136+0800 I REPL [ReplicationExecutor] could not find member to sync from 2015-11-14T16:41:58.971+0800 I REPL [ReplicationExecutor] replSetElect voting yea for localhost:27001 (1) 2015-11-14T16:41:59.140+0800 I REPL [ReplicationExecutor] Member localhost:27001 is now in state PRIMARY 2015-11-14T16:41:59.171+0800 I REPL [ReplicationExecutor] Member localhost:27002 is now in state SECONDARY
MongoClient client = new MongoClient(asList( new ServerAddress("localhost", 27001), new ServerAddress("localhost", 27002), new ServerAddress("localhost", 27003) ));
当然,也有可能正好你指定的这多个结点都同时挂掉,那样自然是防不胜防了。不过,提高 Replica Set 拓扑可用性就是网络架构的问题了。当我们在执行写操作时,我们还需要考虑 Primary 会突然挂掉。比如说,我们正在执行这样的写操作: