Migrating File Based Sentry Policies to Sentry Server

  1. Install Sentry service 
  2. Remove the configuration for Hive or Impala to use Sentry Policy files:
In Cloudera Manager for Hive:
  1. Navigate to Hive > Configuration > Service-Wide > Policy File Based Sentry > Enable Sentry Authorization using Policy Files
  2. Uncheck the box.
In Cloudera Manager for Impala:
  1. Navigate to Impala > Configuration > Service-Wide > Policy File Based Sentry > Enable Sentry Authorization using Policy Files 
  2. Uncheck the box.
  1. Enable the Sentry Service:
In Cloudera Manager for Hive:
  1. Navigate to Hive > Configuration > Service-Wide > Sentry Service
  2. Click on the radio button for the Sentry Service
In Cloudera Manager for Impala:
  1. Navigate to Impala > Configuration > Service-Wide > Sentry Service
  2. Click on the radio button for the Sentry Service
  1. Stop the Sentry Service:
  1. Back up the Sentry database. The following steps will write data into the Sentry database.
  2. Import the settings by running the following commands on the node where HiveServer2 is running:
    1. ​Set HIVE_HOME location in order to have Sentry commands working.
This should contain bin/hive (typically /usr/lib/hive or under /opt/cloudera/parcels export HIVE_HOME=/usr/lib/hive).
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
  1. Validate the existing Sentry Provider.INI file to make sure it does not have any errors using the example syntax here:
    sentry --hive-config /etc/hive/conf --command config-tool -s file:///etc/sentry/conf/sentry-site.xml -i hdfs://nameservice1/user/hive/sentry/sentry-provider.ini -v
Note : If you get error like below:
Sentry server: HS2 Found configuration problems ERROR: Error processing file hdfs://nameservice1/user/hive/sentry/sentry-provider.iniServer name server1 in server=server1 is invalid. Expected HS2 ERROR: Failed to process global policy file hdfs://nameservice1/user/hive/sentry/sentry-provider.ini
It implies, that Sentry is expecting its server name to be HS2 by default. So you would need to specify its server name as server1 (as specified in sentry-provider.ini file).
In order to do that, provide the below snippet in this value Sentry Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml and do a restart:
<property>
     <name>sentry.hive.server</name>
     <value>server1</value>
 </property>
 
Ensure that, the same is reflected in sentry-site.xml in /etc/sentry/conf/sentry-site.xml on the host where Sentry is installed. If it does not take effect, copy the sentry-site.xml from the Cloudera Manager process section and create a new sentry-site.xml in the home location with that information and reference it in the above syntax to validate as below.
  1. Set HIVE_CONF_DIR - This contains hive-site and sentry-site for Hive. For Cloudera Manager deployed systems it is set as follows:
    export HIVE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -alrt /var/run/cloudera-scm-agent/process | grep HIVESERVER2 | tail -1 | awk '{print $9}'`"
  2. Run the Sentry config-tool:
    sentry --hive-config /etc/hive/conf --command config-tool --import --policyIni hdfs://nameservice1/user/hive/sentry/sentry-provider.ini -s file:///home/subbav/sentry-site.xml
    Important: The policy file should be fully qualified URI, For example:
hdfs://namenode:8020/user/hive/sentry/sentry-provider.ini or file:///local/data/sentry/sentry-provider.ini
sentry --command config-tool --import -i <Policy_file_URI>
  1. Start the Sentry Service
  2. Run commands in Beeline to test if privileges are set correctly.

Dr Elephant Installation on Linux - Cloudera - Part 4

Dr Elephant Installation 

·        Clone Dr.elephant :

[mani@node1.manilab.com drelephant]$ pwd
/opt/drelephant

mkdir dr-ele       

Now Inside à /dr-ele/app/com/linkedin/drelephant/analysis/AnalyticJobGeneratorHadoop2.java change the following values of Resource Manager from  http: to https:/

yarn.resourcemanager.webapp.http.address to yarn.resourcemanager.webapp.https.address

[cmndusr@cdts99hdbe01d analysis]$ cat AnalyticJobGeneratorHadoop2.java | grep https
  private static final String RESOURCE_MANAGER_ADDRESS = "yarn.resourcemanager.webapp.https.address";
  private static final String RM_NODE_STATE_URL = "https://%s/ws/v1/cluster/info";
    URL succeededAppsURL = new URL(new URL("https://" + _resourceManagerAddress), String.format(
    URL failedAppsURL = new URL(new URL("https://" + _resourceManagerAddress), String.format(

·        Compile Dr.elephant

[mani@node1.manilab.com dr-ele]$ pwd
/opt/drelephant/dr-ele

[mani@node1.manilab.com dr-ele]$ ./compile.sh ./compile.conf

·        Now inside ‘dist’ folder dr-elephant-2.0.13.zip will be created .Extract zip file inside /dist.

mani@node1.manilab.com dist]$ pwd
/opt/drelephant/dr-ele/dist

[mani@node1.manilab.com dist]$ ll
total 88148
drwxr-xr-x 8 mani manidevl      131 Jan 18 07:42 dr-elephant-2.0.13
-rw-r--r-- 1 mani manidevl 90259919 Jan 18 07:33 dr-elephant-2.0.13.zip
Extract it.

Everything happens inside the extracted folder now.

·        Now inside /dr-ele/dist/dr-elephant-2.0.13/app-conf/  in the elephant.conf change the following values :

port=8083

# Database configuration
db_url=node1.manilab.com
db_name=drelephant
db_user=admin
db_password="*****"

keytab_user="mani"
keytab_location="/home/mani/mani.keytab"


jvm_args="-Devolutionplugin=disabled -DapplyEvolutions.default=false -mem 1024 -J-Xloggc:$project_root../logs/elephant/dr-gc.`date +'%Y%m%d%H%M'` -J-XX:+PrintGCDetails"
give permission 777 to hadoop fs –ls /user/history/done

·        Start Dr.elephant

./start.sh ../app-conf/
Make sure dr.elephant started without any errors, check the dr.log


·        Go to Dr.elephant UI

Change the hostname/ip according to your env, you should be able to see the Dr.elephant dashboard

Hostname:<port>
In our case it is: http://node1.manilab.com:8083/


·        Run a sample Hadoop job


 After the job completes you could see the analysis on Dr.elephant UI

Dr Elephant Installation on Linux - Cloudera - Part 3


Dr Elephant Installation on Linux - Cloudera - Part 3


Setting Environment variables :


Export the below environment variables 


export ELEPHANT_CONF_DIR=/opt/drelephant/dr-ele
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_CONF_DIR=$SPARK_HOME/conf
export PATH=$PATH:/opt/drelephant/activator/bin:$HADOOP_HOME/bin:/etc/hadoop/conf:/usr/local/bin/watchman

Dr Elephant Installation on Linux - Cloudera - Part 2

Data Base Creation  - Part 2

Install Mysql server and Create  Database,tables,indexes.
·        Install :
yum install  mysql-server
·        Start service :
service mysqld start
·        Generate a temporary password :
sudo grep 'temporary password' /var/log/mysqld.log
2018-01-18T09:59:41.365811Z 1 [Note] A temporary password is generated for root@localhost: 1Dh3uczIw)Jd
Enter mysql :
mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.21

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

After that on the mysql prompt

·        Alter the user and provide the password:
ALTER USER 'root'@'localhost' IDENTIFIED BY '*****';
·        Create Admin User :
CREATE USER 'admin'@'node1.manilab.com' IDENTIFIED BY '*****';
·        Grant All permission and flush :
GRANT ALL PRIVILEGES ON * . * TO 'admin'@'node1.manilab.com';
flush privileges;
·        Create Tables and Indexes :
CREATE TABLE yarn_app_result (
  id               VARCHAR(50)   NOT NULL              COMMENT 'The application id, e.g., application_1236543456321_1234567',
  name             VARCHAR(100)  NOT NULL              COMMENT 'The application name',

  username         VARCHAR(50)   NOT NULL              COMMENT 'The user who started the application',
  queue_name       VARCHAR(50)   DEFAULT NULL          COMMENT 'The queue the application was submitted to',
  start_time       BIGINT        UNSIGNED NOT NULL     COMMENT 'The time in which application started',
  finish_time      BIGINT        UNSIGNED NOT NULL     COMMENT 'The time in which application finished',
  tracking_url     VARCHAR(255)  NOT NULL              COMMENT 'The web URL that can be used to track the application',
  job_type         VARCHAR(20)   NOT NULL              COMMENT 'The Job Type e.g, Pig, Hive, Spark, HadoopJava',
  severity         TINYINT(2)    UNSIGNED NOT NULL     COMMENT 'Aggregate severity of all the heuristics. Ranges from 0(LOW) to 4(CRITICAL)',
  score            MEDIUMINT(9)  UNSIGNED DEFAULT 0    COMMENT 'The application score which is the sum of heuristic scores',
  workflow_depth   TINYINT(2)    UNSIGNED DEFAULT 0    COMMENT 'The application depth in the scheduled flow. Depth starts from 0',
  scheduler        VARCHAR(20)   DEFAULT NULL          COMMENT 'The scheduler which triggered the application',
  job_name         VARCHAR(255)  NOT NULL DEFAULT ''   COMMENT 'The name of the job in the flow to which this app belongs',
  job_exec_id      VARCHAR(800)  NOT NULL DEFAULT ''   COMMENT 'A unique reference to a specific execution of the job/action(job in the workflow). This should filter all applications (mapreduce/spark) triggered by the job for a particular execution.',
  flow_exec_id     VARCHAR(255)  NOT NULL DEFAULT ''   COMMENT 'A unique reference to a specific flow execution. This should filter all applications fired by a particular flow execution. Note that if the scheduler supports sub-workflows, then this ID should be the super parent flow execution id that triggered the the applications and sub-workflows.',
  job_def_id       VARCHAR(800)  NOT NULL DEFAULT ''   COMMENT 'A unique reference to the job in the entire flow independent of the execution. This should filter all the applications(mapreduce/spark) triggered by the job for all the historic executions of that job.',
  flow_def_id      VARCHAR(800)  NOT NULL DEFAULT ''   COMMENT 'A unique reference to the entire flow independent of any execution. This should filter all the historic mr jobs belonging to the flow. Note that if your scheduler supports sub-workflows, then this ID should reference the super parent flow that triggered the all the jobs and sub-workflows.',
  job_exec_url     VARCHAR(800)  NOT NULL DEFAULT ''   COMMENT 'A url to the job execution on the scheduler',
  flow_exec_url    VARCHAR(800)  NOT NULL DEFAULT ''   COMMENT 'A url to the flow execution on the scheduler',
  job_def_url      VARCHAR(800)  NOT NULL DEFAULT ''   COMMENT 'A url to the job definition on the scheduler',
  flow_def_url     VARCHAR(800)  NOT NULL DEFAULT ''   COMMENT 'A url to the flow definition on the scheduler',

  PRIMARY KEY (id)
);

create index yarn_app_result_i1 on yarn_app_result (finish_time);
create index yarn_app_result_i2 on yarn_app_result (username,finish_time);
create index yarn_app_result_i3 on yarn_app_result (job_type,username,finish_time);
create index yarn_app_result_i4 on yarn_app_result (flow_exec_id);
create index yarn_app_result_i5 on yarn_app_result (job_def_id);
create index yarn_app_result_i6 on yarn_app_result (flow_def_id);
create index yarn_app_result_i7 on yarn_app_result (start_time);

CREATE TABLE yarn_app_heuristic_result (
  id                  INT(11)       NOT NULL AUTO_INCREMENT COMMENT 'The application heuristic result id',
  yarn_app_result_id  VARCHAR(50)   NOT NULL                COMMENT 'The application id',
  heuristic_class     VARCHAR(255)  NOT NULL                COMMENT 'Name of the JVM class that implements this heuristic',
  heuristic_name      VARCHAR(128)  NOT NULL                COMMENT 'The heuristic name',
  severity            TINYINT(2)    UNSIGNED NOT NULL       COMMENT 'The heuristic severity ranging from 0(LOW) to 4(CRITICAL)',
  score               MEDIUMINT(9)  UNSIGNED DEFAULT 0      COMMENT 'The heuristic score for the application. score = severity * number_of_tasks(map/reduce) where severity not in [0,1], otherwise score = 0',

  PRIMARY KEY (id),
  CONSTRAINT yarn_app_heuristic_result_f1 FOREIGN KEY (yarn_app_result_id) REFERENCES yarn_app_result (id)
);

create index yarn_app_heuristic_result_i1 on yarn_app_heuristic_result (yarn_app_result_id);
create index yarn_app_heuristic_result_i2 on yarn_app_heuristic_result (heuristic_name,severity);

CREATE TABLE yarn_app_heuristic_result_details (
  yarn_app_heuristic_result_id  INT(11) NOT NULL                  COMMENT 'The application heuristic result id',
  name                          VARCHAR(128) NOT NULL DEFAULT ''  COMMENT 'The analysis detail entry name/key',
  value                         VARCHAR(255) NOT NULL DEFAULT ''  COMMENT 'The analysis detail value corresponding to the name',
  details                       TEXT                              COMMENT 'More information on analysis details. e.g, stacktrace',

  PRIMARY KEY (yarn_app_heuristic_result_id,name),
  CONSTRAINT yarn_app_heuristic_result_details_f1 FOREIGN KEY (yarn_app_heuristic_result_id) REFERENCES yarn_app_heuristic_result (id)
);

create index yarn_app_heuristic_result_details_i1 on yarn_app_heuristic_result_details (name);

create index yarn_app_result_i8 on yarn_app_result (queue_name);

alter table yarn_app_result add column resource_used    BIGINT        UNSIGNED DEFAULT 0    COMMENT 'The resources used by the job in MB Seconds';
alter table yarn_app_result add column resource_wasted  BIGINT        UNSIGNED DEFAULT 0    COMMENT 'The resources wasted by the job in MB Seconds';

alter table yarn_app_result add column total_delay      BIGINT        UNSIGNED DEFAULT 0    COMMENT 'The total delay in starting of mappers and reducers';

Dr Elephant Installation on Linux - Cloudera - Part 1

Softwares Required

      Java version 1.8 should be installed
$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

·        Install nodejs , npm , zip and http-parser, git and libtool,Generate Keytab

yum install -y nodejs
yum install -y npm
yum install -y zip
For http-parser download it wincp into the host. Then do a rpm -ivh
yum install –y git
Yum install –y libtool
Generate Keytab.

·        Install Watchman  :
URL : https://facebook.github.io/watchman/docs/install.html

$ git clone https://github.com/facebook/watchman.git
$ cd watchman
$ git checkout v4.9.0  # the latest stable release
$ ./autogen.sh
$ ./configure
$ make
$ make install
              
               Remember to export this path later: usr/local/bin/watchman

·        Install Activator :
Do a wget https://downloads.typesafe.com/typesafe-activator/1.3.12/typesafe-activator-1.3.12.zip
unzip typesafe-activator-1.3.12.zip

·        Install Play framework (Not Mandatory)

Download the file from here:

Unzip the file, make sure you unzip the file into a folder that you have write access, I have downloaded the file into /home/<folder>


Add activator to your path, and also add it to your login profile $HOME/.profile


export PATH=$PATH:/home/mani/activator-dist-1.3.12/bin/