Terminologies Tech Startup Entrepreneurs Should Know
They say the only thing that is constant is change. This applies very well to the startup technology world. Also, technology grows exponentially, evolving and changing at faster and faster rates. Its development unfolds right before our eyes and we don’t even realize it.
If you happen to be a tech entrepreneur working on your startup, you should at least be aware of the latest technologies that are being adopted by other software developers around the world.
As a startup founder, when we start working on a startup idea, most of us want to quickly get the first version (Minimum viable product) out and see the market feasibility of the idea/product. We do this with our know-how from previous experiences. At this point, you’d hardly think much about long-term concerns like scaling, response time, performance, etc. unless you come from a very good technical experience background.
I have gone through the same while building my previous startups. Therefore, I came up with this list of new age technologies used by the most successful startups and organisations to make their product highly scalable and performant while also making the user experience is awesome.
Just being aware of these terminologies and technologies will help any startup founder/developer make a better decision about what technology to choose when they are starting up.
- Jenkins is a powerful application that allows continuous integration and continuous delivery of projects, regardless of the platform you are working on. It is a free, open source automation server that handles any kind of build or continuous integration. You can integrate Jenkins with a number of testing and deployment technologies.
- Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
- Chef is both the name of a company and the name of a configuration management tool. Chef is used to streamline the task of configuring and maintaining a company’s servers, and can integrate with cloud-based platforms such as Internap, Amazon EC2, Google Cloud Platform, OpenStack, SoftLayer, Microsoft Azure and Rackspace to automatically provision and configure new machines.
- Docker enables users to package any application in a lightweight, portable container so that installing a server-side Linux app becomes as easy as installing a mobile app from the command line. It packages an application with all of its dependencies into a single unit. Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries that can be installed on a server.
- ZooKeeper is a distributed coordination service for distributed systems. It provides a centralized infrastructure and services that enable synchronization across a cluster. ZooKeeper maintains common objects needed in large cluster environments. It is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
- Apache Flink is an open-source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.
- Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. The first step to create a data warehouse is to launch a set of nodes, called an Amazon Redshift cluster. After you provision your cluster, you can upload your data set and then perform data analysis queries. Regardless of the size of the data set, Amazon Redshift offers fast query performance.
- Amazon S3 (Simple Storage Service) is an online file storage web service offered by Amazon Web Services. Amazon S3 provides storage through web services interfaces (REST, SOAP, and BitTorrent).
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop consists of different parts:
HDFS - Hadoop Distributed File System
YARN - Yet Another Resource Negotiator (or Resource Manager)
MapReduce - The batch processing Framework of Hadoop
- Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, analysis and managing large datasets residing in distributed storage.
- Apache Spark is an open-source big data processing framework built around speed, ease of use, and sophisticated analytics. Spark enables applications in Hadoop clusters to run up to 100 times faster in memory and 10 times faster even when running on disk.
- HBase is a non-relational (NoSQL) database that runs on top of HDFS(Hadoop Distributed File System). It is most suited for real-time read/write access to large datasets. HBase scales linearly to handle huge data sets with billions of rows and millions of columns, and it easily combines data sources that use a wide variety of different structures and schemas. HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
- Scala is a general purpose programming language. Scala has full support for functional programming and a very strong static type system. Scala source code is intended to be compiled to Java bytecode, so that the resulting executable code runs on a Java virtual machine.
- MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a
Map()procedure (method) that performs filtering and sorting and a
Reduce()method that performs a summary operation.
- Drill is an Apache open-source SQL query engine for Big Data exploration. Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications, while still providing the familiarity of ANSI SQL. Drill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments.
- Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.
- Lambda Architecture is a generic, scalable and fault-tolerant data processing architecture. It is an approach to building stream processing applications on top of MapReduce and Storm or similar systems.
- The Advanced Message Queuing Protocol (AMQP) is an open standard for passing business messages between applications or organizations. It connects systems, feeds business processes with the information they need and reliably transmits onward the instructions that achieve their goals.
- Memcached is a general-purpose distributed memory caching system. It is used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read.
- Extract, Transform and Load (ETL) refers to a process in data warehousing operations of extracting data from source systems and bringing it into the data warehouse.
- The Java Message Service (JMS) API is a Java Message Oriented Middleware API for sending messages between two or more clients. It is a messaging standard that allows application components to create, send, receive, and read messages between different components of a distributed applications.
- HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. Its most common use is to improve the performance and reliability of a server environment by distributing the workload across multiple servers (e.g. web, application, database).
- RabbitMQ is a complete and highly reliable enterprise messaging system based on the emerging AMQP standard. RabbitMQ is a message broker. The principal idea is, it accepts and forwards messages. It can be thought of as a post office: when we send mail to the post box we are pretty sure that Postman will eventually deliver the mail to our recipient. Using this metaphor RabbitMQ is a post box, a post office, and a postman.
- Apache Lucene is an extremely rich and powerful full-text search library written in Java. It is used to provide full-text indexing across both database objects and documents in various formats.
- Solr is an open source enterprise search platform, written in Java, build on top of Apache Lucene. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration and NoSQL features while providing distributed search and index replication, Solr is designed for scalability and Fault tolerance. Solr is the second-most popular enterprise search engine after Elasticsearch.
- Elastic (Search) is a flexible and powerful open source, distributed, real-time search and analytics engine built on top of Apache Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
- Redis is an open-source, in-memory data structure store used as database, cache, and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
- Apache Cassandra, a top level Apache project, is a distributed database for managing large amounts of structured data across many servers, while providing highly available service and no single point of failure. It offers continuous availability, linear scale performance, operational simplicity and easy data distribution across multiple data centers and cloud availability zones.
- Pingdom is a service that tracks the uptime, downtime, and performance of websites. Based in Sweden, Pingdom monitors websites from multiple locations globally so that it can distinguish genuine downtime from routing and access problems.
- IBM Watson is a technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data. Watson Analytics combines visualization with data tagging, machine learning, and cloud storage.
- Kubernetes is an open-source system for automating deployment, operations, and scaling of containerized applications. Kubernetes is a powerful system, developed by Google, for managing containerized applications in a clustered environment. It aims to provide better ways of managing related, distributed components across varied infrastructure. It schedules containers to run across a cluster of machines, deploying them individually or in tightly coupled groups called pods, and keeping resource needs in mind as it distributes the work.
- Apache Mesos abstracts computational resources such as CPU, memory, storage away from machines (physical or virtual), enabling distributed systems to easily built and run effectively. It basically acts like an operating system for the data center, distributing work across multiple machines without your having to manage and monitor resources on those machines yourself.
- Splunk enables searching, monitoring, and analyzing machine-generated big data via a web-style interface. Splunk captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations.
- OpenShift is a cloud Platform-as-a-Service (PaaS) developed by Red Hat build on Docker and Kubernetes. It lets developers quickly develop, deploy, and run applications in a cloud environment.
- AWS Lambda is a compute service where you can upload your code to AWS Lambda and the service can run the code on your behalf using AWS infrastructure. After you upload your code and create what is called a Lambda function, AWS Lambda takes care of provisioning and managing the servers that you can use to run the code.
- Gulp is a fast and intuitive streaming build tool built on Node.js that helps you automate time-consuming tasks in your development workflow.
- NGINX (pronounced “engine x”) is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. NGINX is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. It can act as well as a load balancer and an HTTP cache.
Feel free to suggest more technologies which should be added to the list!