Programming Assignment 4: Key-Value Store with Configurable Consistency Solution

$30.00 $24.90

Description

In this assignment, you will implement a distributed key-value store that borrows designs from Dynamo and Cassandra using Python, C/C++, Java. For communication among different entities in the system, you can either use the Apache Thrift framework we have used in Assignment 2 or Google Protobuf in Assignment 3. This assignment is worth 11% of your total score.

This is a group assignment. Every group can have at most 2 students. It is also okay if you prefer to work individually on this assignment. However, you will not receive any extra credit for working individually.

If you choose to work in a group, at least one member in the group must send the instructor and cc the TA an email listing the names and email addresses of the both members of your group by November 19th, 2018 by the end of day. If we do not receive your email by this deadline, we will assume that you prefer to work individually on this assignment. No exception will be made.

  • Key-Value Store

Each replica server will be a key-value store. Keys are unsigned integers between 0 and 255. Values are strings.

Each replica server should support the following key-value operations:

  • get key – given a key, return its corresponding value

  • put key value – if the key does not already exist, create a new key-value pair; otherwise, update the key to the new value

For simplicity, each replica only needs to store key-value pairs in its memory. That is, there is no need to flush the memory content to persistent storage.

As with Cassandra, to handle a write request, the replica must first log this write in a write-ahead log on persistent storage before updating its in-memory data structure. In this way, if a replica failed and restarted, it can restore its memory state by replaying the disk log.

  • Configurable Consistency

Your distributed key-value store will include four replicas. Each replica server is pre-configured with information

about all other replicas. Keys are assigned to replica servers using a partitioner similar to the ByteOrderedPartitioner in Cassandra. Each replica server is expected to be assigned equal portions of the key space. The replication factor

will be 3 – every key-value pair should be stored on three out of four replicas. Three replicas are selected as fol-lows: the first replica is determined by the partitioner, and the second and third replicas are determined by going clockwise on the partitioner ring.

Every client request (get or put) is handled by a coordinator. Client can select any replica server as the coordi-nator. Therefore, any replica can be a coordinator.

1

Consistency level. Similar to Cassandra, consistency level is configured by the client. When issuing a request, put or get, the client explicitly specifies the desired consistency level: ONE or QUORUM. For example, receiving a write request with consistency level QUORUM, the coordinator will send the request to all replicas for a key (may or may not include itself). It will respond successful to the client once the write has been written to quorum replicas – i.e., two in our setup. For a read request with consistency level QUORUM, the coordinator will return the most recent data from two replicas. To support this operation, when handling a write request, the coordinator should record the time at which the request was received and include this as a timestamp when contacting replica servers for writing.

When the consistency level is set to QUORUM during both get and put, we have strong consistency. However, this is not the case when the client uses ONE. With ONE consistency level, different replicas may be inconsistent. For example, due to failure, a replica misses one write for a key k. When it recovers, it replays its log to restore its memory state. When a read request for key k comes next, it returns its own version of the value, which is incon-sistent. To ensure that all replicas eventually become consistent, you will implement the following two procedures, and your key-value store will be configured to use either of the two.

Read repair. When handling read requests, the coordinator contacts all replicas. If it finds inconsistent data, it will perform “read repair” in the background.

Hinted handoff. During write, the coordinator tries to write to all replicas. As long as enough replicas have succeeded, ONE or QUORUM, it will respond successful to the client. However, if not all replicas succeeded, e.g., two have succeeded but one replica server has failed, the coordinator would store a “hint” locally. If at a later time the failed server has recovered, it might be selected as coordinator for another client’s request. This will allow other replica servers that have stored “hints” for it to know it has recovered and send over the stored hints.

If not enough replicas of a key are available, e.g., consistency level is configured to QUORUM, but only one replica of the key is available, then the coordinator should return an exception to the issuing client. NOTE that this is different from the “sloppy quorum” in Dynamo.

  • Client

You should also implement a client that issues a stream of get and put requests to the key-value store. Once started, the client should act as a console, allowing users to issue a stream of requests. The client selects one replica server as the coordinator for all its requests. That is, all requests from a single client are handled by the same coordinator. You should be able to launch multiple clients, potentially issue requests to different coordinators at the same time.

  • Demonstration

After the submission deadline, every group will sign up for a demonstration time slot with the TA. All group members have to be present during the demonstration. You will show the capabilities of the key-value store that your group has implemented.

  • Github classroom

Once you have formed your group, you can access the assignment by first logging into your Github account created with your BU email. Then go to: https://classroom.github.com/g/0ReYBIBG.

Github classroom will ask you to create a team name. Please use a team name with both students’ BU email IDs. For example, if a group’s two students’ email IDs are jdoe@binghamton.edu and jsmith@binghamton.edu, you should create a team name called jdoe-jsmith.

Github classroom will automatically create a repository (e.g., if your team name is jdoe-jsmith, the repository will be named cs457-cs557-pa4-jdoe-jsmith). This is the repository you will push your code to.

2

If your partner has already created a repository, you can select your team from a list of existing teams and join

it.

The repository created by Github classroom is a private repository. Only the group members, course instructor, and teaching assistant are able to see this repository. Follow the instruction on the Github page to create a new repository on your local directory and link it to the Github repository.

We expect each repository to have at least three commits, with the first one and the last one more than 48 hours apart. Submissions that do not meet the three commits / 48 hours requirement will not be accepted or graded.

  • How to submit

To submit, commit and push your latest code to the private Github repository Github classroom created. Your commit should contain the following files:

  1. Your source code.

  1. A Makefile to compile your source code into executables. (It is okay if this executable is a bash script that calls the Java interpreter, as long as the command line argument follows the format required.)

  1. A Readme file describing:

    • the programming language(s)/tools that you used,

    • how to compile and run your code on remote.cs.binghamton.edu computers,

    • the tasks both group members worked on in this assignment,

    • completion status of the assignment, e.g., what has been implemented and tested, what has not,

    • anything else you want the TA to be aware of while grading your assignment.

  1. Two STATEMENT files, signed by each group member individually, containing the following statement:

I have done this assignment completely on my own and in collaboration with my partners. I have not copied my portion of the assignment, nor have I given the project solution to anyone else. I understand that if I am involved in plagiarism or cheating I will have to sign an official form that I have cheated and that this form will be stored in my official university record. I also understand that I will receive a grade of 0 for the involved assignment and my grade will be reduced by one level (e.g., from A to A- or from B+ to B) for my first offense, and that I will receive a grade of “F” for the course for any additional offense of any kind.”

After pushing your final commit to the Github repository, please submit your commit hash to myCourses. This helps us know your submission is ready for grading and which of your commits we should grade. We will not grade your assignment unless you have submitted the commit hash to myCourses before the assignment submission deadline.

Your assignment will be graded on the CS Department computers remote.cs.binghamton.edu. It is your responsibility to make sure that your code compiles and runs correctly on these remoteXX computers.

Your assignment must be your original work. We will use MOSS1 to detect plagiarism in programming assignments.

1https://theory.stanford.edu/~aiken/moss/

3