Yige

Yige

Build

Livy Series - Overview of Livy's Core Features and Modules

Livy Series - Overview of Livy's Core Functions and Modules#

Content organized from:

  1. Apache Livy Implementation Ideas and Module Overview

Core Functionality of Livy#

  1. Send tasks to the Livy server via HTTP requests
  2. The Livy server receives user requests and routes them to call a specific method of a certain class
  3. Authentication (can be disabled)
  4. Start Spark application services based on task requests
  5. Execute user-specified tasks, providing capabilities to get running status, obtain results, share SparkContext, and stop normally
  6. Fault tolerance mechanism, allowing tasks to recover from the state before failure

Module Overview#

The above functionalities can be summarized into the following modules:

  • Client
  • Router
  • Access Management
  • Generate Spark App
  • Interactive Driver (only for session tasks, not for batch)
  • State Data Storage

image.png

Router#

The API provided by the Livy server is a REST API, and the requests sent by the Client are for CRUD operations on various resources (URIs). The core responsibility of the router is to manage which operations on which resources are assigned to which class's function for processing. The core class of this module is SessionServlet, which has two subclasses: InteractiveSessionServlet and BatchSessionServlet, used to route requests related to sessions and batches, respectively.

Access Management#

Permissions are managed by the AccessManager class, which maintains several different levels of users:

  • superUser
  • modifyUser
  • viewUser
  • allowedUser

And different levels of ACL (Access Control List):

  • viewAcls: superUsers ++ modifyUsers ++ viewUsers, corresponding to view permissions
  • modifyAcls: superUsers ++ modifyUsers, corresponding to modify permissions (including kill permissions)
  • superAcls: superUsers, with all permissions
  • allowedAcls: superUsers ++ modifyUsers ++ viewUsers ++ allowedUsers, representing the complete set of ACLs

Generate Spark APP#

For session and batch tasks, the logic for generating Spark Apps and the final generated Spark Apps are different.

Main classes involved in generating batch Spark Apps:

  • SparkProcessBuilder: Used to extract everything needed to run a Spark App from livyConf, including mainClass, executableFile, deployMode, conf, master, queue, env, and resource configurations for driver and executors, etc.; and ultimately generates a spark-submit command to start the Spark App.

  • SparkYarnApp: Used to run the startup command generated by SparkProcessBuilder and monitor/manage the running Spark App, including obtaining status, logs, diagnostic information, kill, etc. (Currently, Livy only supports local and yarn modes).

Main classes involved in generating session Spark Apps:

  • ContextLauncher: Used to start a new Spark App (via SparkLauncher) and obtain information on how to connect to its driver (address, clientId, and secret).

  • RSCClient: Establishes a connection with the Spark Driver, sends requests to create, view status results logs, modify statements, jobs, etc., and receives responses.

Interactive Driver#

The core class is RSCDriver, which inherits from RpcDispatcher. RpcDispatcher receives RPC requests sent from RSCClient and calls the corresponding methods of RSCDriver based on the type of request to handle the specific information contained in the request. For the core execution code snippet (statement) requests, it calls repl/Session for processing, which ultimately calls different Interpreters based on different session kinds for actual code execution.

State Data Storage#

The core class is StateStore, and the storage of state data is in key-value format. Currently, implementations are based on filesystem and Zookeeper. Additionally, SessionStore inherits from this class to provide high-level APIs for storing and recovering sessions.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.