fatal error: Python.h: No such file or directory

    src/MD2.c:31:20: fatal error: Python.h: No such file or directory
     #include "Python.h"
    compilation terminated.
    error: command 'gcc' failed with exit status 1

Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-pfOapp/pycrypto/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ppFlB3-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-pfOapp/pycrypto/
[root@puppet ~]#

Solution :

yum install python-devel   ( for Centos / Rhel )

Uninstalling Cloudera Manager

1. Stop the services

2. Deactivate and remove the parcels.
                    Click on parcel ->select the parcel ->Deactivate ->Action->Delete

3.Delete the cluster
             Home Page ->select the cluster -> Delete

4.Login to CM host and run the below command
                     sudo /usr/share/cmf/uninstall-cloudera-manager.sh

5. Uninstall Cloudera Manager Agent and Managed Software

                 a. Stop cloudera manger agent on all agent hosts
                             service cloudera-scm-agent stop
                  b. Uninstall the software

                              sudo yum remove 'cloudera-manager-*'

                              yum clean all

6. Kill Cloudera Manager and Managed Processes

for u in cloudera-scm flume hadoop hdfs hbase hive httpfs hue impala llama mapred oozie solr spark sqoop sqoop2 yarn zookeeper; do sudo kill $(ps -u $u -o pid=); done

7. Remove Cloudera Manager Data

           sudo umount cm_processes
           sudo rm -Rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/cloudera* /var/log/cloudera*                   /var/run/cloudera*

8.Remove the Cloudera Manager Lock File

            sudo rm /tmp/.scm_prepare_node.lock

9.Remove User Data

            sudo rm -Rf /var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie                 /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper

            sudo rm -rf /dfs /yarn     -->  mention the locations of your data / yarn dir

10. Reboot the machines


curl: (77) Problem with the SSL CA cert (path? access rights?)

The error is due to corrupt or missing SSL chain certificate files in the PKI directory. You’ll need to make sure the files /etc/pki/tls/certs/ca-bundle.crt and /etc/pki/tls/certs/ca-bundle.trust.crt (on CentOS) exist on yourserver. If they do not exist, get your server management provider to complete the following steps:

  1. 1. mkdir /usr/src/ca-certificates && cd /usr/src/ca-certificates
  2. 2. wget http://mirror.centos.org/centos/6/os/i386/Packages/ca-certificates-2015.2.6-65.0.1.el6_7.noarch.rpm
  3. 3. rpm2cpio ca-certificates-2015.2.6-65.0.1.el6_7.noarch.rpm | cpio -idmv
  4. 4. cp -pi ./etc/pki/tls/certs/ca-bundle.* /etc/pki/tls/certs/

Hadoop Topics

1. Introduction to Hadoop
            Enterprise Data Trends @ Scale
            What is Big Data?
            A Market for Big Data
            Characteristics of Big Data 3V 5V 7V's of Big Data
            Most Common New Types of Data
            Moving from Causation to Correlation
            What is Hadoop? And Why Hadoop?
            Traditional Systems vs. Hadoop
            What is Hadoop 2.0?
            Overview of a Hadoop Cluster and Core components of Hadoop
            Different distributions of Hadoop
            Hadoop Use Case
            Lab exercise: - Login to Your Cluster
2. Hadoop Architecture
            Characteristics of Hadoop
a. Fault tolerance
b. replication
c. block size
d. robustness
What is node, Rack, Cluster, datacenter and Data Hub
MapReduce Architecture
HDFS Architecture
Understanding Block Storage
 Understanding Block Storage
The NameNode
The Data Nodes
HDFS Clients

3. Installing Hadoop Cluster using Cloudera Manager
            Minimum Hardware Requirements
            Minimum Software Requirements
            A Formidable Starter Cluster
            Lab exercise: - Setting up the Environment
            Lab exercise:- Installing Cloudera Manager and CDH
            Lab exercise :- Adding Services to Cluster
4. Configuring Hadoop
            Hadoop configuration files (core, hdfs. mapred,yarn-site.xml ,      bigtop_utils , master and slave files)
            Configuration Considerations
            Deployment Layout
            Configuring Hadoop Ports
            Configuring HDFS
            What Does the File System Check Look For?
            Replication Factor
            Understanding Hadoop Logs
            What is Cloudera Manager
            Configuration via Cloudera Manager
            Management Monitoring
            REST API and Thrift Server Overview
            Lab exercise :-
Commissioning and Decommissioning of nodes
Lab exercise: -
Stopping and Starting CDH Services
Lab exercise: -
Using HDFS Commands, hadoop fsck and syntax and hadoop dfsadmin command
        5. Ensuring Data Integrity 
                        Replication Placement
                        Data Integrity - Writing Data
                        Data Integrity - Reading Data
                        Data Integrity - Block Scanning
                        Running a File System Check
                        What Does the File System Check Look for?
                        hadoop fsck Syntax
                        Data Integrity - File System Check: Commands & Output
                        Hadoop dfsadmin Command
                        NameNode Information
                        Changing the Replication Factor
                        Lab exercise: -
 Verify Data with Block Scanner and fsck 
       7. MapReduce and YARN
                        Understanding MapReduce
                        What is YARN?
                        YARN Architecture (RM, NM, AM, Container)
                        Lifecycle of a YARN Application
                        Configuring YARN
                        Configuring MapReduce tools
                        YARN application logs
                        YARN CLI
                        Lab exercise: - Troubleshooting a MapReduce Job

     8. Job Schedulers
                        Overview of Job Scheduling
                        The Built-in Schedulers
                        Overview of the Capacity Scheduler
                        Configuring the Capacity Scheduler
                        Defining Queues
                        Configuring Capacity Limits
                        Configuring User Limits
                        Configuring Permissions
                        Overview of the Fair Scheduler
                        Multi-Tenancy Limits
                        Lab exercise: Configuring the Capacity Scheduler

     9. Enterprise Data Movement Backup and Recovery
                        What should you backup?
                        HDFS Snapshots
                        HDFS Data - Backups
                        HDFS Data - Automate & Restore
                        Overview of BDR (Backup Disaster Recovery)
                        Lab exercise:- Using HDFS Snapshots
                        Managing Resources
                                    - Configuring groups with Static Service Pools
                                    - The Fair Scheduler
                                    - Configuring Dynamic Resource Pools
                                    - YARN Memory and CPU Settings
     10. Hive Administration
                        Introduction and architecture of Hive
                        Comparing Hive with RDBMS
                        Hive Components-- Hive MetaStore, HiveServer2, HCatalog
                        Hive Clients-- beeline

       11. Sqoop
                        Overview of Sqoop
                        The Sqoop Import Tool
                        Importing a Table
                        Importing Specific Columns
                        The Sqoop Export Tool
            Lab exercise: - Using Sqoop
        12. Flume
                        Flume Introduction
                        Installing Flume
                        Flume Configuration
                        Monitoring Flume
            Lab exercise: - Install and Test Flume
      13. Oozie
            Oozie Overview
            Oozie Components
            Jobs, Workflows, Coordinators, Bundles
            Workflow Actions and Decisions
            Oozie Job Submission
            Oozie Console
            The Oozie CLI
            Using the Oozie CLI
            Oozie Actions
Lab exercise: Running an Oozie Workflow

            Why HBASE ?
HBASE Components and Daemons
HBASE Administration and Cluster Management
Cluster Monitoring and Troubleshooting 
Cloudera Manager Monitoring Features
Configuring Events and Alerts
Monitoring Hadoop Clusters
Troubleshooting Hadoop services
Common Misconfigurations
Monitoring Cluster services using Charts
Using Trigger option
Monitoring JVM Processes
Understanding JVM Memory
Eclipse Memory Analyzer
JVM Memory Heap Dump
Java Management Extensions (JMX)
Garbage Collection Tuninig

16. Commissioning and De-commissioning of Cluster Nodes

            Decommissioning and Commissioning Nodes
            Decommissioning Nodes
            Steps for Decommissioning a Node
            Decommissioning Node States
            Steps for Commissioning a Node
            Balancer Threshold Setting
            Configuring Balancer Bandwidth
            Lab exercise :- Commissioning & Decommissioning Nodes
17. Backup and Recovery
            What should you backup?
            HDFS Snapshots
            HDFS Data - Backups
            HDFS Data - Automate & Restore
            Hive & Backup
            BDR (Backup Disaster Recovery)
            Lab exercise:- Using HDFS Snapshots

18. Rack Awareness
            Rack Awareness
            YARN Rack Awareness
            Replica Placement
            Rack Topology
            Rack Topology Script
            Configuring the Rack Topology Script
            Lab exercise: Configuring Rack Awareness
19. Name Node High Availability
            NameNode Architecture Cloudera
            NameNode High Availability
            HDFS HA Components
            Understanding NameNode HA
            NameNodes in HA
            Failover Modes
            NameNode Architectures
            hdfs haadmin Command
            Protecting Metadata Repositories
            Lab exercise :- Configure NameNode High Availability using Cloudera Manager
            20. Security in Hadoop
            Security Concepts - Why Hadoop Security is required?
            Kerberos Synopsis - How it works?
            - Enabling Kerberos via Cloudera Manager Lab exercise :- Installing and configuring Kerberos

Overview & Architecture of the following: 

Install and configure Apache Phoenix on Cloudera Hadoop CDH5

          Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.

Step 1: Download Latest version of Phoenix using command given below

--2015-11-23 12:20:21-- http://mirror.reverse.net/pub/apache/phoenix/phoenix-4.3.1/bin/phoenix-4.3.1-bin.tar.gz
Resolving mirror.reverse.net...
Connecting to mirror.reverse.net||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 72155049 (69M) [application/x-gzip]
Saving to: “phoenix-4.3.1-bin.tar.gz.1”
100%[=====================================] 72,155,049   614K/s   in 2m 15s
2015-04-10 12:25:45 (521 KB/s) - “phoenix-4.3.1-bin.tar.gz.1” saved [72155049/72155049]

Step 2: Extract the downloaded tar file to convenient location

[root@maniadmin ~]# tar -zxvf phoenix-4.3.1-bin.tar.gz

Step 3: Copy phoenix-4.3.1-server.jar to hbase libs on each reagion server and master server

On master server you should copy “phoenix-4.3.1-server.jar” at “/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase/lib/” location

On Hbase region server you should copy “phoenix-4.3.1-server.jar” at /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase/lib/ location

Step 4: Copy phoenix-4.3.1-client.jar to each Hbase region server

Please make sure to have phoenix-4.3.1-client.jar at /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase/lib/ on each region sever.

Step 5: Restart hbase services via Cloudera manager

Step 6: Testing – Goto extracted_dir/bin and run below command

[root@maniadmin bin]# ./psql.py localhost ../examples/WEB_STAT.sql ../examples/WEB_STAT.csv ../examples/WEB_STAT_QUERIES.sql 
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
15/11/23 13:51:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
no rows upserted
Time: 2.297 sec(s)
csv columns from database.
CSV Upsert complete. 39 rows upserted
Time: 0.554 sec(s)
DOMAIN                                                         AVERAGE_CPU_USAGE                         AVERAGE_DB_USAGE---------------------------------------- ---------------------------------------- ----------------------------------------
Salesforce.com                                                           260.727                                 257.636
Google.com                                                              212.875                                   213.75
Apple.com                                                                 114.111                                 119.556
Time: 0.2 sec(s)
DAY                                             TOTAL_CPU_USAGE                           MIN_CPU_USAGE                           MAX_CPU_USAGE
----------------------- ---------------------------------------- ---------------------------------------- ----------------------------------------
2013-01-01 00:00:00.000                                       35                                       35                                       35
2013-01-02 00:00:00.000                                     150                                       25                                      125
2013-01-03 00:00:00.000                                       88                                       88                                       88
2013-01-04 00:00:00.000                                       26                                      3                                       232013-01-05 00:00:00.000                                     550                                       75                                     475
Time: 0.09 sec(s)
HO                   TOTAL_ACTIVE_VISITORS
-- ----------------------------------------
EU                                     150
NA                                       1
Time: 0.052 sec(s)

Step 7: To get sql shell

[root@maniadmin bin]# ./sqlline.py localhost
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
15/11/23 14:58:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 4.3)
Driver: PhoenixEmbeddedDriver (version 4.3)
Autocommit status: true
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
77/77 (100%) Done
sqlline version 1.1.8
0: jdbc:phoenix:localhost>

Amazon in-house interview questions

Once you clear the 2/3 telephonic rounds they will invite you for in-house interviews ( F2F or Video conference  )  to the nearest amazon office .

You will meet with 5-6 Amazonians. The mix of interviewers will include managers and peers that make up the technical team.

Each meeting will be one-on-one interview sessions lasting approximately 45-60 minutes ( approx 5 hrs).

In my case it was a video conference round , below are few behavioral  questions covered in all the 5 rounds .

1) Why Amazon?
2)What is your understanding about the role?
3)What do you wish to change in your current environment?
4)What is the customer interaction that you are most proud of?
5)Describe me your most difficult customer interaction?
6)Tell me about a time you made a significant mistake . What would you have done differently?
7)Give an example of a tough or critical piece of feedback you received. What was it and what did you do about it?
8)Describe a time when you needed the cooperation of a peer or peers who were resistant to what you were trying to do. What did you do? What was the outcome?
9)Saw a Peer Struggling and what did you do to help?
10)Give me an example of when you have to make an important decision in the absence of good data because there just wasn’t any. What was the situation and how did you arrive at your decision? Did the decision turn out to be the correct one? Why or why not?
11)Tell me about a time you took a big risk. What was the situation?
12)Give me an example of a time when you were able to deliver an important project under a tight deadline. What sacrifices did you have to make to meet the deadline? How did they impact the final deliverable s?

Amazon interview questions for cloud support engineer

Interview process :

2/3 Telephonic rounds ( 45 min to 1 hr each )  and 5 back to back rounds with 5 managers ( 5 hrs ).

Ist round :

1. About the role ?
2. Linux boot process ?
3. what is GRUB ?
4. what is iptables ?
5. what is default gateway and where we can configure the same ?
6. what are all the parmeters are there in ifcfg-eth0 file ?
7. Difference between TCP and UDP ?
8. How you will check the free space ?
9. what is HDFS ?
10.file write process in haddop?
11.file read process in hadoop?
12.how to run a job in hadoop ?
13.what is loopback address ? and what is 0.0.0 in it ?
14.what is subnet masking?
15.Asked about some port no like port 22 / 25 / 53 / 80 / 110 /3306
16.what is DNS ?
17.what is DHCP how it works ?
18.what is the diffrence between NTFS / Fat32 ?
19.what is RODC ?