Posts

Showing posts from 2017

sqoop incremental import in cloudera hadoop

Image
In the last blog post , I described how we can import data from RDBMS to HDFS using sqoop. Now will discuss how we can do incremental import in cloudera hadoop user interface. If you know the basic functionalities on hadoop, this is a simple task! You need to consider ‘incremental’, ‘check-column’, and ‘last-value’ options to perform the incremental import in sqoop. Following syntax is using for the incremental import --incremental <mode> --check-column <column name> --last value <last check column value> Cloudera hadoop is a commercial version of the hadoop. I am using Oozie workflow UI provided by the cloudera to import data. When you are defining workflows in Oozie UI, you need to give the correct file path for the JDBC driver as well. If you didn’t include the drivers yet, please make sure you include all of those in a folder that can be accessed by everyone. Login to the Hue UI -> Workflows -> editors -> workflows

Import relational databases to hadoop using sqoop

Image
Hello there, This time will discuss how to import the data in to hadoop from the RDBMS. We are using sqoop as the import mechanism. What’s sqoop? It’s an open source software product of the Apache Software Foundation. The tool is designed to transfer data between relational databases and hadoop. It allows users to import data to a target location inside hadoop and export from hadoop as well. If you are not willing to use sqoop to transfer data, there are alternatives available such as spark. But there are some disadvantages like, Spark did not work well for complex data types. Before run the commands to import data, please make sure you installed, Java, Hadoop and sqoop on your workplace.                                                                 Source: severalnines.com When considering hadoop file system, there are two types of table you need to use in the process of importing data. 1. External tables We do create these tab

Introduction to Hadoop

Image
So after a while :D This time, let’s discuss about the hadoop What is hadoop? Hadoop is an open-source framework for storing data and running applications on clusters of device components that are relatively inexpensive and widely available. When dig in to the hadoop and check the internal process, we can identify few core components. Those are 1. Open-source data storage or HDFS which stands for Hadoop Distributed File System. 2. Processing API which is called MapReduce. Commonly in deployments hadoop does include more than 25 other projects or libraries. Few of the common names are HBase, Hive, Pig and Ozzie. Let’s discuss about the hadoop distributions. There are mainly 3 types. 100% Open source - Apache hadoop Commercial - Cloudera, Hortonworks, MapR Cloud - Microsoft Azure HDInsight, AWS Most of the enterprises are stay on one to two full versions behind the currently released version of the hadoop. Because they consider the open source so

How you can manage throttle out errors in WSO2 API Manager

Image
You might face the above-mentioned error if you have an application in Dialog Ideabiz platform.Let’s start from the beginning. Once you create an application on Ideabiz, you need to subscribe to the APIs you require. We use WSO2 API manager to manage the APIs . When admin approves your API subscriptions, he/she will assign a tier value for each API(5tps, 10 tps etc.). Also, they will assign a tier value for the application as well. Let’s assume you wanted to create an application by using SMS and Payment API. The team has assigned 5tps for each API and 10tps for your application. You use SMS API only to send SMS to the users and payment API to charge the users on daily basis. When considering each API, (as per above example) you should manage  5 transactions per second in the SMS API. Which means you can have 5 API calls per second. Let’s assume before you send the SMS, you want to charge Rs 1 from the user. Therefore you need to call payment API as well. Since the Pa

Why DNS is important?

Image
I wrote this to give a basic idea on DNS and how it works. Let's start from the bottom. What is DNS? Did you know, every time when you search for a website on the internet, you are searching on the largest distributed database in the world? This is called DNS :) People do locate the web name and computers use to identify the websites by the IP address. Therefore DNS is used to map each IP address with the web address.   Why it is important? Assume if DNS is not here, Then you can’t visit the blog using the name blingtechs.blogspot.com  :) The only way you can visit us is calling the IP address directly. Type HTTP://<The IP address>. By entering the IP address, you can bypass the DNS lookup. But you need to keep in mind that, an IP address can be changed regularly and each domain name can have multiple IP addresses. When you are looking for a website on the internet, the domain name you are trying to access is checked on a DNS server by your computer. Th