SIRAJ CHAUDHARY: October 2024

GitHub Copilot is an AI-powered code completion tool developed by GitHub in collaboration with OpenAI. It helps developers by suggesting code snippets, entire functions, and even generating documentation as they type, improving coding efficiency. Here's how it works and what it offers:

Key Features

Code Suggestions
- GitHub Copilot can suggest entire lines or blocks of code based on the context of your current coding task.
- It learns from the code you write and adjusts its suggestions accordingly.
Supports Multiple Languages
- Copilot supports a wide range of programming languages including Python, JavaScript, TypeScript, Ruby, Go, Java, C#, and more.
- It works across multiple frameworks and libraries.
Contextual Awareness
- It understands comments and context within the code. If you describe a function in a comment, Copilot can generate a full implementation.
- It’s capable of interpreting comments, variables, and even importing necessary modules automatically.
Integrated in Development Environments
- GitHub Copilot is available as an extension for Visual Studio Code (VS Code), making it easy to integrate into your existing workflow.
- It also works in other IDEs like JetBrains.
Learning from Open Source Code
- Copilot is trained on a vast amount of publicly available open-source code, helping it suggest relevant code patterns and solutions.
Limitations
- It doesn’t always generate perfect code, and in some cases, suggestions might need refinement.
- It can sometimes suggest code snippets that may have security vulnerabilities or outdated patterns, so developers need to verify the suggestions.
- Copilot doesn't have awareness of private code unless specifically trained or given access to it, so it’s privacy-conscious.
Ethical Considerations
- Since Copilot is trained on public repositories, some concerns have been raised about licensing, particularly whether the code snippets it generates might unintentionally include license-protected code.

How to Use GitHub Copilot

Installation
- You can install GitHub Copilot as an extension in Visual Studio Code by searching for "GitHub Copilot" in the Extensions marketplace.
Subscription
- As of 2023, GitHub Copilot requires a subscription, though it provides a free trial for users to test its features.
Workflow

After installation, as you code in supported languages, Copilot will start suggesting code in real-time.
You can accept suggestions by pressing Tab, or cycle through multiple suggestions with keyboard shortcuts.

Example: Setting up and using "GitHub Copilot" and "GitHub Copilot Chat" extentions in VS Code.

Step1: Install the GitHub Copilot Extension

Open VS Code.
Go to the Extensions panel (on the sidebar or Ctrl+Shift+X).
Search for "GitHub Copilot" and click Install.
Search for "GitHub Copilot Chat" and click Install.

Step2: Sign in to GitHub

After installing the extension, you'll be prompted to sign in to your GitHub account.
Make sure your account has access to GitHub Copilot (it requires a paid subscription or a free trial).

Step3: Start coding

Create a new file with an appropriate file extension (e.g. .js, .py, .java).

Please note while using first time, It will suggest you to install few essential plugins based on file you create (e.g. .java), which you should install.

You can type or ask by pressing CTRL+I and Copilot will generate code for your problem statement.

Copilot will start suggesting code, as you type. You can accept suggestions by pressing Tab key.

Copilot also suggest code for a comment. Just write a comment on a class and when you move curser inside class you will be suggested with a few codes where you can accept one.

You can get explanation of your code, press CTRL+I and type /explain and enter

👉 Similarly you can generate /doc, /tests, and also /fix the code by pressing CTRL+I and providing these paths.

GitHub Copilot Chat

👉 Open "GitHub Copilot Chat" window to chat with GitHub Copilot (an OpenAI LLM Codex)

Click on GitHub Copilot icon which is at right button in IDE -> show Copilot status menu -> GitHub Copilot Chat -> than start chatting

👉 Explain the code: Select the code and type /explain in Copilot Chat

👉 Debug the exception: Copy your exception from the console and paste in Copilot Chat.

👉 Fix the code: Open the code to fix and type /fix in Copilot Chat

👉 Generate unit tests: Select the method and type /tests in Copilot Chat.

👉 Get documentation: Select the code and type /doc in Copilot Chat.

👉 Review and refactor: Select the code and type like review and refactor in Copilot Chat.

👉 You can generate anything like exception handling class, etc for a specific code in Copilot Chat.

GitHub Copilot vs GitHub Copilot Chat

Copilot provides real-time code suggestions to speed up your workflow, while Copilot Chat offers deeper interactions like answering specific questions, helping with debugging, and explaining code when needed.

You can think of Copilot as a passive code-writing assistant and Copilot Chat as a more active, conversational partner.

Apache Kafka is a distributed streaming platform that is used for building real-time data pipelines and streaming applications. It is designed to handle high-throughput, fault-tolerant, and scalable messaging systems. It was originally developed at LinkedIn and later open-sourced through the Apache Software Foundation. Kafka is primarily used for three key functions:

Publish and Subscribe: Kafka allows applications to publish and subscribe to streams of records, which makes it similar to a message queue or enterprise messaging system.
Store Streams of Data: Kafka can store streams of records in a fault-tolerant, durable manner. The stored data can be persisted for a defined period, making it suitable for applications that need to process and analyze historical data as well as live data.
Process Streams: Kafka allows applications to process streams of data in real-time as they are produced. This is useful for real-time analytics, monitoring systems, and event-driven architectures.

Key Definitions

Producers: Producers are applications that write data (records) to Kafka topics.
Consumers: Consumers are applications that read data from topics.
Topic: A topic is a category or feed name to which records are published. Each record in Kafka belongs to a topic.
Partition: A subdivision of a topic. Each partition is an ordered, immutable sequence of messages, which allows Kafka to scale horizontally.
Broker: A Kafka server that stores messages in topics and serves client requests. A Kafka cluster consists of multiple brokers.
Replication: The process of copying data across multiple brokers to ensure durability and availability. Each partition can have multiple replicas.
Leader and Follower: In a replicated partition, one broker acts as the leader (handling all reads and writes), while the others are followers (replicating data from the leader).
Offset: A unique identifier for each message within a partition, allowing consumers to track their progress.
Consumer Lag: The difference between the latest message offset in a topic and the offset of the last message processed by a consumer. It indicates how far a consumer is behind.
Schema Registry: A service for managing schemas used in Kafka messages, ensuring that producers and consumers agree on data formats. It supports Avro, Protobuf, and JSON formats and ensures that schema evolution is handled safely (e.g., forward and backward compatibility).
Kafka Connect: A framework for integrating Kafka with external systems (databases, file systems, cloud services, etc.). Kafka Connect provides source connectors (to pull data into Kafka) and sink connectors (to push data out of Kafka).
Kafka Streams: A client library for building real-time applications that process data stored in Kafka, allowing for transformations, aggregations, and more.
Topic Retention: The policy that dictates how long messages are kept in a topic. This can be based on time (e.g., retain messages for 7 days) or size (e.g., retain up to 1 GB of messages).
Transactional Messaging: A feature that allows for exactly-once processing semantics, enabling producers to send messages to multiple partitions atomically.
Log Compaction: A process that reduces the storage footprint of a topic by retaining only the most recent message for each key, useful for maintaining a state snapshot.
KSQL: KSQL is a SQL-like streaming engine for Apache Kafka, which allows you to query, manipulate, and aggregate data in Kafka topics using SQL commands.
Zookeeper: While not directly part of Kafka's core functionality, Zookeeper is used for managing cluster metadata, broker coordination, and leader election. Note that newer versions are moving towards removing Zookeeper dependency.

Features

Durability: Kafka guarantees durability by writing data to disk and replicating it across multiple brokers. Even if some brokers fail, the data remains safe.
High Throughput: Kafka can handle a high volume of data with low latency. It achieves this by batching messages, storing them efficiently, and leveraging a zero-copy optimization in modern operating systems.
Fault Tolerance: Kafka replicates data across brokers, ensuring that if one broker fails, the data can still be read from another broker that holds the replica.
Scalability: Kafka’s partition-based architecture allows horizontal scaling. You can add more brokers to the cluster, and Kafka will redistribute data to ensure balance.
Retention: Kafka allows for configuring the retention policy of messages. You can store messages indefinitely or delete them after a certain period or when the log reaches a specific size. This makes Kafka flexible for different use cases, whether you need short-term processing or long-term storage.

Use Cases

Real-Time Analytics: Kafka is widely used in big data environments where companies want to process massive streams of events in real time. For example, LinkedIn uses Kafka for tracking activity data and operational metrics, feeding into both batch and stream processing systems.

Log Aggregation: Kafka can aggregate logs from multiple services or applications, making it easier to analyze them or store them for future reference. This is useful for monitoring, diagnostics, and troubleshooting.

Event Sourcing: Kafka is often used in event-driven architectures, where systems communicate by publishing events to Kafka topics. Consumers can process these events in real-time or later, enabling systems to handle complex workflows and state changes.

Messaging System: Kafka can replace traditional message brokers like RabbitMQ or ActiveMQ, especially when dealing with high-throughput messaging needs.

Data Pipelines: Kafka serves as a backbone for large-scale data pipelines, allowing the integration of data across multiple systems, such as databases, analytics platforms, and machine learning systems.

Companies Using Kafka

LinkedIn (where Kafka was originally developed)

Netflix (for real-time monitoring and analytics)

Uber (for geospatial tracking and event-based communication)

Airbnb (for real-time data flow management)

Twitter (for its log aggregation and stream processing systems)

Kafka's ability to handle large volumes of real-time data efficiently, with fault tolerance and scalability, makes it a vital tool for modern data-driven architectures.

Documentation: An official (great) documentation on Kafka can be found with following URL. You can find everything like definitions, setup, commands and all.

https://kafka.apache.org/documentation/

Example 1: A basic example on Kafka

- Setup apache kafka server on a VM. Use two terminals and consider one for producer and another for consumer. Producer will produce a message to the topic and consumer will read it from the topic.

Prerequisites

Java: Kafka runs on JVM, so ensure that Java is installed.
Zookeeper: Kafka uses Zookeeper to manage brokers, topics, and other cluster-related metadata. Zookeeper comes bundled with Kafka.

Step 1: Install Java. Kafka requires Java 8 or higher

sudo apt update

sudo apt install openjdk-17-jdk -y

java -version

Step 2: Download Kafka

wget https://downloads.apache.org/kafka/3.8.0/kafka_2.12-3.8.0.tgz

tar -xzf kafka_2.12-3.8.0.tgz

cd kafka_2.12-3.8.0

Step 3: Start Zookeeper

- Kafka requires Zookeeper to run, so you must first start a Zookeeper instance. Zookeeper comes bundled with Kafka, so you can use the default Zookeeper configuration.

bin/zookeeper-server-start.sh config/zookeeper.properties

Step 4: Start Kafka Broker

Once Zookeeper is running, you can start the Kafka broker. Open another terminal and run

bin/kafka-server-start.sh config/server.properties

Step 5: Create a topic

- Kafka organizes messages into topics. You can create a new topic

bin/kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

- You can verify the created topic

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Step 6: Produce a message to the topic

bin/kafka-console-producer.sh --topic my-topic --bootstrap-server localhost:9092

Type a message and press Enter.

Step 7: Consume a message from the topic

- Open another terminal and run

bin/kafka-console-consumer.sh --topic my-topic --from-beginning --bootstrap-server localhost:9092

Step 8: Managing Kafka

To scale your setup or add brokers, you'll need to configure more brokers and manage them via Zookeeper. Kafka supports various configurations for high availability, replication, and partitioning.

Additional Steps:

Configure Kafka for production: You’ll need to modify the server.properties file (e.g. set broker ID, configure log retention, optimize replication, etc.).
Monitoring and logging: Set up metrics and logging tools like Prometheus, Grafana, or Kafka’s own JMX monitoring.

Example 2: Integrate Apache Kafka with a Spring Boot application.

GitHub code: https://github.com/SirajChaudhary/basic-springboot-kafka-example

Step 1: Setup Spring Boot Project

You can create a Spring Boot application using Spring Initializr (https://start.spring.io/). Include the following dependencies:

Spring Web
Spring for Apache Kafka

Step 2: Add Kafka configuration in application.properties

#65.0.215.170 is IP of kafka server

spring.kafka.bootstrap-servers=65.0.215.170:9092

spring.kafka.consumer.group-id=my-group

spring.kafka.consumer.auto-offset-reset=earliest

Step 3: Create a Kafka Producer

- A service that will send messages to a Kafka topic.

import org.springframework.beans.factory.annotation.Autowired;

import org.springframework.kafka.core.KafkaTemplate;

import org.springframework.stereotype.Service;

@Service

public class KafkaProducer {

private final KafkaTemplate<String, String> kafkaTemplate;

@Autowired

public KafkaProducer(KafkaTemplate<String, String> kafkaTemplate) {

this.kafkaTemplate = kafkaTemplate;

}

public void sendMessage(String topic, String message) {

kafkaTemplate.send(topic, message);

}

Step 4: Create a Kafka Consumer

- A listener that will consume messages from a Kafka topic.

- Assuming we already created a topic name my-topic in Kafka.

import org.springframework.kafka.annotation.KafkaListener;

import org.springframework.stereotype.Service;

@Service

public class KafkaConsumer {

@KafkaListener(topics = "my-topic", groupId = "my-group")

public void listen(String message) {

System.out.println("Received message: " + message);

}

Step 5: Create a controller to test the Producer

import org.springframework.beans.factory.annotation.Autowired;

import org.springframework.web.bind.annotation.GetMapping;

import org.springframework.web.bind.annotation.RequestParam;

import org.springframework.web.bind.annotation.RestController;

@RestController

public class MessageController {

private final KafkaProducer kafkaProducer;

@Autowired

public MessageController(KafkaProducer kafkaProducer) {

this.kafkaProducer = kafkaProducer;

}

@GetMapping("/send")

public String sendMessage(@RequestParam String message) {

kafkaProducer.sendMessage("my-topic", message);

return "Message sent: " + message;

}

Step 6: Run Zookeeper, Kafka and the Spring Boot Application

Step 7: Test the application

Copilot

How to Use GitHub Copilot

GitHub Copilot vs GitHub Copilot Chat

Kafka

Views