Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Most results come back in seconds. With Amazon Redshift, you can start small for just $0.25 per hour with no commitments and scale out to petabytes of data for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions.
Amazon Redshift also includes Redshift Spectrum, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3. No loading or transformation is required, and you can use open data formats, including Avro, CSV, Grok, Ion, JSON, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV. Redshift Spectrum automatically scales query compute capacity based on the data being retrieved, so queries against Amazon S3 run fast, regardless of data set size.
Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and by parallelizing queries across multiple nodes. Data load speed scales linearly with cluster size, with integrations to Amazon S3, Amazon DynamoDB, Amazon EMR, Amazon Kinesis, and any SSH-enabled host.
You only pay for what you use. You can have unlimited number of users doing unlimited analytics on all your data for just $1000 per terabyte per year, 1/10th the cost of traditional data warehouse solutions. Most customers see 3-4x reduction of data size after compression, reducing their costs to $250-$333 per uncompressed terabyte per year.
Redshift Spectrum enables you to run queries against exabytes of data in Amazon S3 as easily as you run queries against petabytes of data stored on local disks in Amazon Redshift, using the same SQL syntax and BI tools you use today. You can store highly structured, frequently accessed data on Redshift local disks, keep vast amounts of unstructured data in an Amazon S3 “data lake”, and query seamlessly across both.
Amazon Redshift allows you to easily automate most of the common administrative tasks to manage, monitor, and scale your data warehouse. By handling all these time-consuming, labor-intensive tasks, Amazon Redshift frees you up to focus on your data and business.
You can easily resize your cluster up and down as your performance and capacity needs change with just a few clicks in the console or a simple API call.
Security is built-in. You can encrypt data at rest and in transit using hardware-accelerated AES-256 and SSL, isolate your clusters using Amazon VPC and even manage your keys using AWS Key Management Service (KMS) and hardware security modules (HSMs).
Optimized for Data Warehousing
Amazon Redshift uses a variety of innovations to obtain very high query performance on datasets ranging in size from a hundred gigabytes to an exabyte or more. For petabyte-scale local data, it uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries. Amazon Redshift has a massively parallel processing (MPP) data warehouse architecture, parallelizing and distributing SQL operations to take advantage of all available resources. The underlying hardware is designed for high performance data processing, using local attached storage to maximize throughput between the CPUs and drives, and a 10GigE mesh network to maximize throughput between nodes. For exabyte-scale data in Amazon S3, Amazon Redshift generates an optimal query plan that minimizes the amount of data scanned and delegates the query execution to a pool of Redshift Spectrum instances that scales automatically, so queries run quickly regardless of data size.
With a few clicks in console or a simple API call, you can easily change the number or type of nodes in your data warehouse and scale up all the way to a petabyte or more of compressed user data. Dense Storage (DS) nodes allow you to create very large data warehouses using hard disk drives (HDDs) for a very low price point. Dense Compute (DC) nodes allow you to create very high performance data warehouses using fast CPUs, large amounts of RAM and solid-state disks (SSDs). While resizing, Amazon Redshift allows you to continue to query your data warehouse in read-only mode until the new cluster is fully provisioned and ready for use.
Query your Amazon S3 “data lake”
Redshift Spectrum enables you to run queries against exabytes of unstructured data in Amazon S3, with no loading or ETL required. When you issue a query, it goes to the Amazon Redshift SQL endpoint, which generates and optimizes a query plan. Amazon Redshift determines what data is local and what is in Amazon S3, generates a plan to minimize the amount of Amazon S3 data that needs to be read, requests Amazon Redshift Spectrum workers out of a shared resource pool to read and process data from Amazon S3, and pulls results back into your Amazon Redshift cluster for any remaining processing.
No Up-Front Costs
You pay only for the resources you provision. You can choose On-Demand pricing with no up-front costs or long-term commitments, or obtain significantly discounted rates with Reserved Instance pricing. On-Demand pricing starts at just $0.25/hour per 160GB DC1.Large node or $0.85/hour per 2TB DS2.XLarge node. With Partial Upfront Reserved Instances, you can lower your effective price to $0.10/hour per DC1.Large node ($5,500/TB/year) or $0.228/hour per DS2.XLarge node ($999/TB/year). Redshift Spectrum queries are priced at $5/TB scanned from S3. For more information, see the Amazon Redshift Pricing page.
Amazon Redshift has multiple features that enhance the reliability of your data warehouse cluster. All data written to a node in your cluster is automatically replicated to other nodes within the cluster and all data is continuously backed up to Amazon S3. Amazon Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary.