The Actor Model
This post is primarily a summarization of a talk given by John Murray. You may not get anything out of it that you wouldn’t get out of the talk, but I think it could be helpful when you want to refer to a particular concept. In addition, for my own benefit, writing about them helps solidify it in my head.

What is the actor model?

The actor model is a conceptual model to deal with concurrent computation. It defines some general rules for how the system’s components should behave and interact with each other.
Properties of actor:
  • Unlike thread, which no longer exists when it finishes, actors are persistent.
  • Unlike goroutine/threads/future, which are stateless, actors encapsulate internal state.
  • Actors are asynchronous
What can actors do?
  • Create new actors - similar to the main thread creates other threads.
  • Receive messages and in response:
    • make local decisions(e.g. alter local state)
    • perform arbitrary, side-effecting action(e.g. writing to a database, a log file or anything that change the global state of your application.)
    • send messages to other actors
    • respond to the sender zero or more times. Whereas, in procedural programming paradigm, if you call a function, the function either returns nothing or returns exactly one thing back.
  • Process exactly one message at a time. (The messages are stored in actors' mailboxes until they're processed.)
Actors do not communicate by sharing memory; instead, share memory by communicating.
Example - Checking account:
Suppose if the current balance of a shared account between Alice and Bob is $80 and we want to make sure that the balance is non-negative at any time. Alice wants to withdraw $60 and Bob wants to withdraw $50.
Approach #1:
1
struct Checking {
2
balance int
3
}
4
5
if( Checking.balance > withDrawAmt) {
6
Checking.balance -= withdrawAmt;
7
return true;
8
} else {
9
return false;
10
}
Copied!
However, it's easy to create a situation that violates the invariants - If Alice and Bob submit their request at the same time, it's likely that the interleaving execution will lead to negative balance.
Approach #2 - Using locks:
1
struct Checking {
2
balance int
3
lock Mutex
4
}
5
6
Checking.lock.Lock();
7
success = false;
8
if( Checking.balance > withdrawAmt) {
9
Checking.balance -= withdrawAmt;
10
success = true;
11
} else {
12
success = false;
13
}
14
Checking.lock.Free();
15
return success;
Copied!
Introducing mutex will solve the problem for us : only one person will be updating at any time. While it is correct, in real world, using locks are often expensive and complicated. You need to make sure that the order is correct and there can be deadlock situations.
In general, while locks seem to be the natural remedy to uphold the invariants with multiple threads, in practice they are inefficient and easily lead to deadlocks in any application of real-world scale. Even worse, distributed locks, while exists, offer limited potential for scaling out.
Approach #3 - Using actors:
1
Actor Checking {
2
var balance = 80
3
4
// Looking for incoming messages
5
def receive = {
6
case Withdraw(amt) =>
7
if( balance > amt) {
8
balance -= amt
9
// Instead of return, we send a message to the sender
10
sender sendMsg true
11
} else {
12
sender sendMsg false
13
}
14
}
15
}
Copied!
If we make two requests:
1
// send message to withdraw 60 dollars
2
Checking sendMsg Withdraw(50)
3
4
// send message to withdraw 80 dollars
5
Checking sendMsg Withdraw(50)
Copied!
Two requests are stored in the mailbox, and the rule of only processing one message at a time will make sure that only one request goes through and the other person will receive a "false" message. Modifying the internal state of the checking actor is only possible via messages, which are processed one at a time eliminating races when trying to keep invariants. Even better, the senders are not blocked as they do when using locks - Millions of actors can be efficiently scheduled on a dozen of threads reaching the full potential of modern CPUs
Properties of communication:
  • No channels or intermediates(e.g. CSP)
  • "Best effort" delivery(i.e. no matter what your underlying protocol is, on the actor's perspective, there are no time-outs/retries)
  • At-most-once delivery
  • Message can take arbitrary long to be delivered. In actor model, there is no concept of 'time')
  • No message ordering guarantees

Address

Address identifies an actor. However, it may also represent a proxy/forwarder to an actor(e.g. a load balancer). The addresses contain location(e.g. IP addresses) and transportation information(e.g. TCP/UDP). They give us the notion of location transparency. As a programmer, we don't need to care about where the actor lives, as long as we can send message to it. In other words, we can have actors in the same process, or different machines, but how you communicate are completely the same across all of them.
One address may represent many actors(pool)
One actor may have many addresses(A, B, C uses different addresses)

Handling failure

The running state of an actor is monitored and managed by another actor, which is called the supervisor.
Properties of supervision:
  • Constantly monitors running state of an actor
  • Performs actions based on the state of the actor(e.g. restart the actor)
But, who supervise the supervisor? In the actor model, there will be a supervision tree. Similar to a organizational structure within a company, the managers manage their direct reports and they are managed by their managers. At the top, there will be a oracle provided by the framework(e.g. the root guardian in akka). It never dies and has some default behaviors about how to handle exceptions and errors.

Transparent life-cycle management

  • Address do not change during restart
  • Mailboxes(Queues) are persistent outside the actor instance
Addresses "encapsulate" the mailbox and the actor

When to use actors?

  1. 1.
    Processing pipeline
  2. 2.
    Streaming data
  3. 3.
    multi-user concurrency
  4. 4.
    System with high up-time requirement
  5. 5.
    Applications with shared state

When not to use actors?

  1. 1.
    Non-concurrent systems
  2. 2.
    Performance critical applications
Note: It's important to keep in mind that, actors are just an abstraction. They exist on top of processes and threads. If you need fine-control over the running threads(e.g. interrupts), they you want to use something else.

Drawbacks:

  • "Too much actors"
  • Testing
  • Debugging

Next steps: