Prepare better with the best interview questions and answers, and walk away with top interview tips. These interview questions and answers will boost your core interview skills and help you perform better. Be smarter with every interview.
The four major components of Kafka are:
The role of Kafka’s Producer API is to wrap the two producers – kafka.producer.SyncProducer and the kafka.producer.async.AsyncProducer.
The goal is to expose all the producer functionality through a single API to the client.
Kafka provides single consumer abstractions that discover both queuing and publish-subscribe Consumer Group. They tag themselves with a user group and every communication available on a topic is distributed to one user case within every promising user group. User instances are in disconnected process. We can determine the messaging model of the consumer based on the consumer groups.
Every partition in Kafka has one server which plays the role of a Leader, and none or more servers that act as Followers.
The Leader performs the task of all read and write requests for the partition, while the role of the Followers is to passively replicate the leader.
In the event of the Leader failing, one of the Followers will take on the role of the Leader. This ensures load balancing of the server.
Partitions: A single piece of a Kafka topic. The number of partitions is configurable on a per topic basis. More partitions allow for great parallelism when reading from the topics. The number of partitions determines how many consumers you have in a consumer group. This partition number is somewhat hard to determine until you know how fast you are producing data and how fast you are consuming the data. If you have a topic that you know will be high volume, you will need to have more partitions.
Replicas: These are copies of the partitions. They are never written to or read. Their only purpose is for data redundancy. If your topic has n replicas, n-1 brokers can fail before there is any data loss. Additionally, you cannot have a topic a replication factor greater than the number of brokers that you have.
The Flume is the best option used when you have non-relational data sources if you have a long file to stream into the Hadoop.
Kafka can be used when you particularly need a highly reliable and scalable enterprise messaging system to connect many multiple systems like Hadoop.
Apache Kafka has 4 main APIs:
A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it. For each topic, the Kafka cluster maintains a partitioned log.
Kafka MirrorMaker provides geo-replication support for your clusters. With MirrorMaker, messages are replicated across multiple datacenters or cloud regions. You can use this in active/passive scenarios for backup and recovery, or inactive/active scenarios to place data closer to your users, or support data locality requirements.
The maximum size of the message that Kafka server can receive is 1000000 bytes.
The traditional method of message transfer includes two methods
Apache Kafka has following benefits above traditional messaging technique:
ISR stands for In sync replicas.
They are classified as a set of message replicas which are synched to be leaders.
When a consumer wants to join a group, it sends a JoinGroup request to the group coordinator. The first consumer to join the group becomes the group leader. The leader receives a list of all consumers in the group from the group coordinator and is responsible for assigning a subset of partitions to each consumer. It uses an implementation of PartitionAssignor to decide which partitions should be handled by which consumer.
After deciding on the partition assignment, the consumer group leader sends the list of assignments to the Group Coordinator, which sends this information to all the consumers. Each consumer only sees his own assignment—the leader is the only client process that has the full list of consumers in the group and their assignments. This process repeats every time a rebalance happens.
Replication of message in Kafka ensures that any published message does not lose and can be consumed in case of machine error, program error or more common software upgrades.
If a replica remains out of ISR for an extended time, it indicates that the follower is unable to fetch data as fast as data accumulated at the leader.
During data, production to get exactly once messaging from Kafka you have to follow two things avoiding duplicates during data consumption and avoiding duplication during data production. Here are the two ways to get exactly one semantics while data production:
You cannot do that from a class that behaves as a producer like in most queue systems, its role is to fire and forget the messages. The broker will do the rest of the work like appropriate metadata handling with id’s, offsets, etc. As a consumer of the message, you can get the offset from a Kafka broker. If you gaze in the SimpleConsumer class, you will notice it fetches MultiFetchResponse objects that include offsets as a list. In addition to that, when you iterate the Kafka Message, you will have MessageAndOffset objects that include both, the offset and the message sent.
The following recommendations for Kafka configuration settings make it extremely difficult for data loss to occur.
If you have more than 3 hosts, you can increase the broker settings appropriately on topics that need more protection against data loss.
This one comes up when a customer adds new nodes or disks to existing nodes. Partitions are not automatically balanced. If a topic already has a number of nodes equal to the replication factor (typically 3), then adding disks will not help with rebalancing.
Using the Kafka-reassign-partitions command after adding new hosts is the recommended method.
There are several caveats to using this command:
You will need to set up your development environment to use both Spark libraries and Kafka libraries.
You can use Apache maven to build Spark applications developed using Java and Scala.
Compile against the same version of Spark that you are running.
Build a single assembly JAR ("Uber" JAR) that includes all dependencies. In Maven, add the Maven assembly plugin to build a JAR containing all dependencies:
<plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin>
This plug-in manages the merge procedure for all available JAR files during the build.
In Maven, specify Spark, Hadoop, and Kafka dependencies with scope provided.
Every partition in Kafka has one main server that plays the role of a leader and one or more non-connected servers that are named as the followers. Here, the leading server sets the permission and rest of the servers just follow him accordingly. In case, leading server fails then followers take the responsibility of the main server.
In Kafka the communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol. This protocol is versioned and maintains backwards compatibility with the older version.
The essential configurations are the following:
broker.id
log.dirs
zookeeper.connect
The log cleaner is enabled by default. This will start the pool of cleaner threads. To enable log cleaning on a particular topic you can add the log-specific property
log.cleanup.policy=compact
This can be done either at topic creation time or using the alter topic command.
Log compaction is handled by the log cleaner, a pool of background threads that recopy log segment files, removing records whose key appears in the head of the log. Each compactor thread works as follows:
It chooses the log that has the highest ratio of log head to log tail
It creates a succinct summary of the last offset for each key in the head of the log
It re-copies the log from beginning to end removing keys which have a later occurrence in the log. New, clean segments are swapped into the log immediately so the additional disk space required is just one additional log segment.
The summary of the log head is essentially just a space-compact hash table. It uses exactly 24 bytes per entry. As a result with 8GB of cleaner buffer one cleaner iteration can clean around 366GB of log head (assuming 1k messages).
Create a topic – Let’s create a topic named “test” with a single partition and only one replica:
> bin/kafka-topics.sh -create -zookeeper localhost:2181 -replication-factor 1 -partitions 1 -topic test
We can now see that topic if we run the list topic command:
> bin/kafka-topics.sh -list -zookeeper localhost:2181 test
The four major components of Kafka are:
The role of Kafka’s Producer API is to wrap the two producers – kafka.producer.SyncProducer and the kafka.producer.async.AsyncProducer.
The goal is to expose all the producer functionality through a single API to the client.
Kafka provides single consumer abstractions that discover both queuing and publish-subscribe Consumer Group. They tag themselves with a user group and every communication available on a topic is distributed to one user case within every promising user group. User instances are in disconnected process. We can determine the messaging model of the consumer based on the consumer groups.
Every partition in Kafka has one server which plays the role of a Leader, and none or more servers that act as Followers.
The Leader performs the task of all read and write requests for the partition, while the role of the Followers is to passively replicate the leader.
In the event of the Leader failing, one of the Followers will take on the role of the Leader. This ensures load balancing of the server.
Partitions: A single piece of a Kafka topic. The number of partitions is configurable on a per topic basis. More partitions allow for great parallelism when reading from the topics. The number of partitions determines how many consumers you have in a consumer group. This partition number is somewhat hard to determine until you know how fast you are producing data and how fast you are consuming the data. If you have a topic that you know will be high volume, you will need to have more partitions.
Replicas: These are copies of the partitions. They are never written to or read. Their only purpose is for data redundancy. If your topic has n replicas, n-1 brokers can fail before there is any data loss. Additionally, you cannot have a topic a replication factor greater than the number of brokers that you have.
The Flume is the best option used when you have non-relational data sources if you have a long file to stream into the Hadoop.
Kafka can be used when you particularly need a highly reliable and scalable enterprise messaging system to connect many multiple systems like Hadoop.
Apache Kafka has 4 main APIs:
A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it. For each topic, the Kafka cluster maintains a partitioned log.
Kafka MirrorMaker provides geo-replication support for your clusters. With MirrorMaker, messages are replicated across multiple datacenters or cloud regions. You can use this in active/passive scenarios for backup and recovery, or inactive/active scenarios to place data closer to your users, or support data locality requirements.
The maximum size of the message that Kafka server can receive is 1000000 bytes.
The traditional method of message transfer includes two methods
Apache Kafka has following benefits above traditional messaging technique:
ISR stands for In sync replicas.
They are classified as a set of message replicas which are synched to be leaders.
When a consumer wants to join a group, it sends a JoinGroup request to the group coordinator. The first consumer to join the group becomes the group leader. The leader receives a list of all consumers in the group from the group coordinator and is responsible for assigning a subset of partitions to each consumer. It uses an implementation of PartitionAssignor to decide which partitions should be handled by which consumer.
After deciding on the partition assignment, the consumer group leader sends the list of assignments to the Group Coordinator, which sends this information to all the consumers. Each consumer only sees his own assignment—the leader is the only client process that has the full list of consumers in the group and their assignments. This process repeats every time a rebalance happens.
Replication of message in Kafka ensures that any published message does not lose and can be consumed in case of machine error, program error or more common software upgrades.
If a replica remains out of ISR for an extended time, it indicates that the follower is unable to fetch data as fast as data accumulated at the leader.
During data, production to get exactly once messaging from Kafka you have to follow two things avoiding duplicates during data consumption and avoiding duplication during data production. Here are the two ways to get exactly one semantics while data production:
You cannot do that from a class that behaves as a producer like in most queue systems, its role is to fire and forget the messages. The broker will do the rest of the work like appropriate metadata handling with id’s, offsets, etc. As a consumer of the message, you can get the offset from a Kafka broker. If you gaze in the SimpleConsumer class, you will notice it fetches MultiFetchResponse objects that include offsets as a list. In addition to that, when you iterate the Kafka Message, you will have MessageAndOffset objects that include both, the offset and the message sent.
The following recommendations for Kafka configuration settings make it extremely difficult for data loss to occur.
If you have more than 3 hosts, you can increase the broker settings appropriately on topics that need more protection against data loss.
This one comes up when a customer adds new nodes or disks to existing nodes. Partitions are not automatically balanced. If a topic already has a number of nodes equal to the replication factor (typically 3), then adding disks will not help with rebalancing.
Using the Kafka-reassign-partitions command after adding new hosts is the recommended method.
There are several caveats to using this command:
You will need to set up your development environment to use both Spark libraries and Kafka libraries.
You can use Apache maven to build Spark applications developed using Java and Scala.
Compile against the same version of Spark that you are running.
Build a single assembly JAR ("Uber" JAR) that includes all dependencies. In Maven, add the Maven assembly plugin to build a JAR containing all dependencies:
<plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin>
This plug-in manages the merge procedure for all available JAR files during the build.
In Maven, specify Spark, Hadoop, and Kafka dependencies with scope provided.
Every partition in Kafka has one main server that plays the role of a leader and one or more non-connected servers that are named as the followers. Here, the leading server sets the permission and rest of the servers just follow him accordingly. In case, leading server fails then followers take the responsibility of the main server.
In Kafka the communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol. This protocol is versioned and maintains backwards compatibility with the older version.
The essential configurations are the following:
broker.id
log.dirs
zookeeper.connect
The log cleaner is enabled by default. This will start the pool of cleaner threads. To enable log cleaning on a particular topic you can add the log-specific property
log.cleanup.policy=compact
This can be done either at topic creation time or using the alter topic command.
Log compaction is handled by the log cleaner, a pool of background threads that recopy log segment files, removing records whose key appears in the head of the log. Each compactor thread works as follows:
It chooses the log that has the highest ratio of log head to log tail
It creates a succinct summary of the last offset for each key in the head of the log
It re-copies the log from beginning to end removing keys which have a later occurrence in the log. New, clean segments are swapped into the log immediately so the additional disk space required is just one additional log segment.
The summary of the log head is essentially just a space-compact hash table. It uses exactly 24 bytes per entry. As a result with 8GB of cleaner buffer one cleaner iteration can clean around 366GB of log head (assuming 1k messages).
Create a topic – Let’s create a topic named “test” with a single partition and only one replica:
> bin/kafka-topics.sh -create -zookeeper localhost:2181 -replication-factor 1 -partitions 1 -topic test
We can now see that topic if we run the list topic command:
> bin/kafka-topics.sh -list -zookeeper localhost:2181 test
Node.js is a popular open-source server environment which runs on various platforms (Windows, Mac OS, Linux, Unix, etc.). It allows you to build an entire website using one programming language: Javascript. It is one of the most sought-after development tools to learn as the demand for the same has increased and continues to increase. According to Ziprecruiter.com, the average salary of a Node Js Developer is $117,350 per year.
Many reputed companies are hunting for a good web developer and if you’re passionate about becoming a web developer and planning to opt for Node JS as a career building tool, you are already on the right track! Make the best use of your time and be thorough with these Node js interview questions and the best answers. These Node.js interview questions have been designed to get you familiarized with the types of questions that you may encounter in your interviews. Our basic and advanced Node.js interview questions are followed by answers from industry experts so that you can prepare better for your upcoming interviews. These top Node.js interview questions will help to save your time in preparation and will definitely help your interviewer to understand your deep knowledge of Nodejs.
We’ve listed all the frequently asked questions and answers which will help you get a clear understanding of Node.js and they are simple to remember as well. The answers you find here have been prepared by industry experts.
All our interview questions for Node.js are up-to-date with the aim to always keep you updated with the latest interview questions. These Node JS Interview Questions and answers will definitely help you in cracking your interview and follow your dream career as a Node.JS Developer.
Practice well with these interview questions. Be confident, gear up. All the best!
Submitted questions and answers are subjecct to review and editing,and may or may not be selected for posting, at the sole discretion of Knowledgehut.