Error CM Server guid updated | CDH 5.14

Error Message :

[19/Jun/2018 02:05:43 +0000] 4343 MainThread agent INFO CM server guid: 2994b67a-d726-4b5f-a389-6e17214dcf00
[19/Jun/2018 02:05:43 +0000] 4343 MainThread agent INFO Using parcels directory from server provided value: /opt/cloudera/parcels
[19/Jun/2018 02:05:43 +0000] 4343 MainThread parcel INFO Agent does create users/groups and apply file permissions
[19/Jun/2018 02:05:43 +0000] 4343 MainThread parcel_cache INFO Using /opt/cloudera/parcel-cache for parcel cache
[19/Jun/2018 02:05:43 +0000] 4343 MainThread agent ERROR Error, CM server guid updated, expected 2994b67a-d726-4b5f-a389-6e17214dcf00, received 88ac9a7c-0b7a-48af-a8d0-93e67b496056

Solution :

Fixed it by deleting /var/lib/cloudera-scm-agent/cm_guid on each node.

Restart cloudera agent .

Migrating File Based Sentry Policies to Sentry Server

Install Sentry service
Remove the configuration for Hive or Impala to use Sentry Policy files:

In Cloudera Manager for Hive:

Navigate to Hive > Configuration > Service-Wide > Policy File Based Sentry > Enable Sentry Authorization using Policy Files
Uncheck the box.

In Cloudera Manager for Impala:

Navigate to Impala > Configuration > Service-Wide > Policy File Based Sentry > Enable Sentry Authorization using Policy Files
Uncheck the box.

Enable the Sentry Service:

In Cloudera Manager for Hive:

Navigate to Hive > Configuration > Service-Wide > Sentry Service
Click on the radio button for the Sentry Service

In Cloudera Manager for Impala:

Navigate to Impala > Configuration > Service-Wide > Sentry Service
Click on the radio button for the Sentry Service

Stop the Sentry Service:

Back up the Sentry database. The following steps will write data into the Sentry database.
Import the settings by running the following commands on the node where HiveServer2 is running:
1. Set HIVE_HOME location in order to have Sentry commands working.

This should contain bin/hive (typically /usr/lib/hive or under /opt/cloudera/parcels export HIVE_HOME=/usr/lib/hive).

export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive

Validate the existing Sentry Provider.INI file to make sure it does not have any errors using the example syntax here:

sentry --hive-config /etc/hive/conf --command config-tool -s file:///etc/sentry/conf/sentry-site.xml -i hdfs://nameservice1/user/hive/sentry/sentry-provider.ini -v

Note : If you get error like below:

Sentry server: HS2 Found configuration problems ERROR: Error processing file hdfs://nameservice1/user/hive/sentry/sentry-provider.iniServer name server1 in server=server1 is invalid. Expected HS2 ERROR: Failed to process global policy file hdfs://nameservice1/user/hive/sentry/sentry-provider.ini

It implies, that Sentry is expecting its server name to be HS2 by default. So you would need to specify its server name as server1 (as specified in sentry-provider.ini file).

In order to do that, provide the below snippet in this value Sentry Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml and do a restart:

<property>
     <name>sentry.hive.server</name>
     <value>server1</value>
 </property>

Ensure that, the same is reflected in sentry-site.xml in /etc/sentry/conf/sentry-site.xml on the host where Sentry is installed. If it does not take effect, copy the sentry-site.xml from the Cloudera Manager process section and create a new sentry-site.xml in the home location with that information and reference it in the above syntax to validate as below.

Set HIVE_CONF_DIR - This contains hive-site and sentry-site for Hive. For Cloudera Manager deployed systems it is set as follows:

export HIVE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -alrt /var/run/cloudera-scm-agent/process | grep HIVESERVER2 | tail -1 | awk '{print $9}'`"

Run the Sentry config-tool:

sentry --hive-config /etc/hive/conf --command config-tool --import --policyIni hdfs://nameservice1/user/hive/sentry/sentry-provider.ini -s file:///home/subbav/sentry-site.xml

Important: The policy file should be fully qualified URI, For example:

hdfs://namenode:8020/user/hive/sentry/sentry-provider.ini or file:///local/data/sentry/sentry-provider.ini
sentry --command config-tool --import -i <Policy_file_URI>

Start the Sentry Service
Run commands in Beeline to test if privileges are set correctly.

Dr Elephant Installation on Linux - Cloudera - Part 4

Dr Elephant Installation

· Clone Dr.elephant :

[mani@node1.manilab.com drelephant]$ pwd

/opt/drelephant

mkdir dr-ele

git clone https://github.com/linkedin/dr-elephant.git

Now Inside à /dr-ele/app/com/linkedin/drelephant/analysis/AnalyticJobGeneratorHadoop2.java change the following values of Resource Manager from http: to https:/

yarn.resourcemanager.webapp.http.address to yarn.resourcemanager.webapp.https.address

mani@node1.manilab analysis]$ cat AnalyticJobGeneratorHadoop2.java | grep https

private static final String RESOURCE_MANAGER_ADDRESS = "yarn.resourcemanager.webapp.https.address";

private static final String RM_NODE_STATE_URL = "https://%s/ws/v1/cluster/info";

URL succeededAppsURL = new URL(new URL("https://" + _resourceManagerAddress), String.format(

URL failedAppsURL = new URL(new URL("https://" + _resourceManagerAddress), String.format(

· Compile Dr.elephant

[mani@node1.manilab.com dr-ele]$ pwd

/opt/drelephant/dr-ele

[mani@node1.manilab.com dr-ele]$ ./compile.sh ./compile.conf

· Now inside ‘dist’ folder dr-elephant-2.0.13.zip will be created .Extract zip file inside /dist.

mani@node1.manilab.com dist]$ pwd

/opt/drelephant/dr-ele/dist

[mani@node1.manilab.com dist]$ ll

total 88148

drwxr-xr-x 8 mani manidevl 131 Jan 18 07:42 dr-elephant-2.0.13

-rw-r--r-- 1 mani manidevl 90259919 Jan 18 07:33 dr-elephant-2.0.13.zip

Extract it.

Everything happens inside the extracted folder now.

· Now inside /dr-ele/dist/dr-elephant-2.0.13/app-conf/ in the elephant.conf change the following values :

port=8083

# Database configuration

db_url=node1.manilab.com

db_name=drelephant

db_user=admin

db_password="*****"

keytab_user="mani"

keytab_location="/home/mani/mani.keytab"

jvm_args="-Devolutionplugin=disabled -DapplyEvolutions.default=false -mem 1024 -J-Xloggc:$project_root../logs/elephant/dr-gc.`date +'%Y%m%d%H%M'` -J-XX:+PrintGCDetails"

give permission 777 to hadoop fs –ls /user/history/done

· Start Dr.elephant

./start.sh ../app-conf/

Make sure dr.elephant started without any errors, check the dr.log

· Go to Dr.elephant UI

Change the hostname/ip according to your env, you should be able to see the Dr.elephant dashboard

Hostname:<port>

In our case it is: http://node1.manilab.com:8083/

· Run a sample Hadoop job

After the job completes you could see the analysis on Dr.elephant UI

Dr Elephant Installation on Linux - Cloudera - Part 3

Setting Environment variables :

Export the below environment variables

export ELEPHANT_CONF_DIR=/opt/drelephant/dr-ele
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_CONF_DIR=$SPARK_HOME/confexport PATH=$PATH:/opt/drelephant/activator/bin:$HADOOP_HOME/bin:/etc/hadoop/conf:/usr/local/bin/watchman

Dr Elephant Installation on Linux - Cloudera - Part 2

Data Base Creation - Part 2

Install Mysql server and Create Database,tables,indexes.

· Install :

yum install mysql-server

· Start service :

service mysqld start

· Generate a temporary password :

sudo grep 'temporary password' /var/log/mysqld.log

2018-01-18T09:59:41.365811Z 1 [Note] A temporary password is generated for root@localhost: 1Dh3uczIw)Jd

Enter mysql :

mysql -uroot -p

Enter password:

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 2

Server version: 5.7.21

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

After that on the mysql prompt

· Alter the user and provide the password:

ALTER USER 'root'@'localhost' IDENTIFIED BY '*****';

· Create Admin User :

CREATE USER 'admin'@'node1.manilab.com' IDENTIFIED BY '*****';

· Grant All permission and flush :

GRANT ALL PRIVILEGES ON * . * TO 'admin'@'node1.manilab.com';

flush privileges;

· Create Tables and Indexes :

CREATE TABLE yarn_app_result (

id VARCHAR(50) NOT NULL COMMENT 'The application id, e.g., application_1236543456321_1234567',

name VARCHAR(100) NOT NULL COMMENT 'The application name',

username VARCHAR(50) NOT NULL COMMENT 'The user who started the application',

queue_name VARCHAR(50) DEFAULT NULL COMMENT 'The queue the application was submitted to',

start_time BIGINT UNSIGNED NOT NULL COMMENT 'The time in which application started',

finish_time BIGINT UNSIGNED NOT NULL COMMENT 'The time in which application finished',

tracking_url VARCHAR(255) NOT NULL COMMENT 'The web URL that can be used to track the application',

job_type VARCHAR(20) NOT NULL COMMENT 'The Job Type e.g, Pig, Hive, Spark, HadoopJava',

severity TINYINT(2) UNSIGNED NOT NULL COMMENT 'Aggregate severity of all the heuristics. Ranges from 0(LOW) to 4(CRITICAL)',

score MEDIUMINT(9) UNSIGNED DEFAULT 0 COMMENT 'The application score which is the sum of heuristic scores',

workflow_depth TINYINT(2) UNSIGNED DEFAULT 0 COMMENT 'The application depth in the scheduled flow. Depth starts from 0',

scheduler VARCHAR(20) DEFAULT NULL COMMENT 'The scheduler which triggered the application',

job_name VARCHAR(255) NOT NULL DEFAULT '' COMMENT 'The name of the job in the flow to which this app belongs',

job_exec_id VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A unique reference to a specific execution of the job/action(job in the workflow). This should filter all applications (mapreduce/spark) triggered by the job for a particular execution.',

flow_exec_id VARCHAR(255) NOT NULL DEFAULT '' COMMENT 'A unique reference to a specific flow execution. This should filter all applications fired by a particular flow execution. Note that if the scheduler supports sub-workflows, then this ID should be the super parent flow execution id that triggered the the applications and sub-workflows.',

job_def_id VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A unique reference to the job in the entire flow independent of the execution. This should filter all the applications(mapreduce/spark) triggered by the job for all the historic executions of that job.',

flow_def_id VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A unique reference to the entire flow independent of any execution. This should filter all the historic mr jobs belonging to the flow. Note that if your scheduler supports sub-workflows, then this ID should reference the super parent flow that triggered the all the jobs and sub-workflows.',

job_exec_url VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A url to the job execution on the scheduler',

flow_exec_url VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A url to the flow execution on the scheduler',

job_def_url VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A url to the job definition on the scheduler',

flow_def_url VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A url to the flow definition on the scheduler',

PRIMARY KEY (id)

);

create index yarn_app_result_i1 on yarn_app_result (finish_time);

create index yarn_app_result_i2 on yarn_app_result (username,finish_time);

create index yarn_app_result_i3 on yarn_app_result (job_type,username,finish_time);

create index yarn_app_result_i4 on yarn_app_result (flow_exec_id);

create index yarn_app_result_i5 on yarn_app_result (job_def_id);

create index yarn_app_result_i6 on yarn_app_result (flow_def_id);

create index yarn_app_result_i7 on yarn_app_result (start_time);

CREATE TABLE yarn_app_heuristic_result (

id INT(11) NOT NULL AUTO_INCREMENT COMMENT 'The application heuristic result id',

yarn_app_result_id VARCHAR(50) NOT NULL COMMENT 'The application id',

heuristic_class VARCHAR(255) NOT NULL COMMENT 'Name of the JVM class that implements this heuristic',

heuristic_name VARCHAR(128) NOT NULL COMMENT 'The heuristic name',

severity TINYINT(2) UNSIGNED NOT NULL COMMENT 'The heuristic severity ranging from 0(LOW) to 4(CRITICAL)',

score MEDIUMINT(9) UNSIGNED DEFAULT 0 COMMENT 'The heuristic score for the application. score = severity * number_of_tasks(map/reduce) where severity not in [0,1], otherwise score = 0',

PRIMARY KEY (id),

CONSTRAINT yarn_app_heuristic_result_f1 FOREIGN KEY (yarn_app_result_id) REFERENCES yarn_app_result (id)

);

create index yarn_app_heuristic_result_i1 on yarn_app_heuristic_result (yarn_app_result_id);

create index yarn_app_heuristic_result_i2 on yarn_app_heuristic_result (heuristic_name,severity);

CREATE TABLE yarn_app_heuristic_result_details (

yarn_app_heuristic_result_id INT(11) NOT NULL COMMENT 'The application heuristic result id',

name VARCHAR(128) NOT NULL DEFAULT '' COMMENT 'The analysis detail entry name/key',

value VARCHAR(255) NOT NULL DEFAULT '' COMMENT 'The analysis detail value corresponding to the name',

details TEXT COMMENT 'More information on analysis details. e.g, stacktrace',

PRIMARY KEY (yarn_app_heuristic_result_id,name),

CONSTRAINT yarn_app_heuristic_result_details_f1 FOREIGN KEY (yarn_app_heuristic_result_id) REFERENCES yarn_app_heuristic_result (id)

);

create index yarn_app_heuristic_result_details_i1 on yarn_app_heuristic_result_details (name);

create index yarn_app_result_i8 on yarn_app_result (queue_name);

alter table yarn_app_result add column resource_used BIGINT UNSIGNED DEFAULT 0 COMMENT 'The resources used by the job in MB Seconds';

alter table yarn_app_result add column resource_wasted BIGINT UNSIGNED DEFAULT 0 COMMENT 'The resources wasted by the job in MB Seconds';

alter table yarn_app_result add column total_delay BIGINT UNSIGNED DEFAULT 0 COMMENT 'The total delay in starting of mappers and reducers';

Dr Elephant Installation on Linux - Cloudera - Part 1

Softwares Required

Java version 1.8 should be installed

$ java -version

java version "1.8.0_121"

Java(TM) SE Runtime Environment (build 1.8.0_121-b13)

Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

· Install nodejs , npm , zip and http-parser, git and libtool,Generate Keytab

yum install -y nodejs

yum install -y npm

yum install -y zip

For http-parser download it wincp into the host. Then do a rpm -ivh

yum install –y git

Yum install –y libtool

Generate Keytab.

· Install Watchman :

URL : https://facebook.github.io/watchman/docs/install.html

$ git clone https://github.com/facebook/watchman.git

$ cd watchman

$ git checkout v4.9.0 # the latest stable release

$ ./autogen.sh

$ ./configure

$ make

$ make install

Remember to export this path later: usr/local/bin/watchman

· Install Activator :

Do a wget https://downloads.typesafe.com/typesafe-activator/1.3.12/typesafe-activator-1.3.12.zip

unzip typesafe-activator-1.3.12.zip

· Install Play framework (Not Mandatory)

Download the file from here:

https://downloads.typesafe.com/typesafe-activator/1.3.12/typesafe-activator-1.3.12.zip

Unzip the file, make sure you unzip the file into a folder that you have write access, I have downloaded the file into /home/<folder>

Add activator to your path, and also add it to your login profile $HOME/.profile

export PATH=$PATH:/home/mani/activator-dist-1.3.12/bin/