AWS Big Data Blog

Official Big Data Blog of Amazon Web Services

  • Develop an application migration methodology to modernize your data warehouse with Amazon Redshift

    Develop an application migration methodology to modernize your data warehouse with Amazon Redshift

    This post demonstrates how to develop a comprehensive, wave-based application migration methodology for a complex project to modernize a traditional MPP data warehouse with Amazon Redshift. It...

    Watch Now
  • Simplifying and modernizing home search at Compass with Amazon ES

    Simplifying and modernizing home search at Compass with Amazon ES

    Amazon Elasticsearch Service (Amazon ES) is a fully managed service that makes it easy for you to deploy, secure, and operate Elasticsearch in AWS at scale. It’s a widely popular service and...

    Watch Now
  • Introducing Amazon EMR Managed Scaling – Automatically Resize Clusters to Lower Cost

    Introducing Amazon EMR Managed Scaling – Automatically Resize Clusters to Lower Cost

    AWS is happy to announce the release of Amazon EMR Managed Scaling—a new feature that automatically resizes your cluster for best performance at the lowest possible cost. With EMR Managed Scaling...

    Watch Now
  • Restrict Amazon Redshift Spectrum external table access to Amazon Redshift IAM users and groups using role chaining

    Restrict Amazon Redshift Spectrum external table access to Amazon Redshift IAM users and groups using role chaining

    With Amazon Redshift Spectrum, you can query the data in your Amazon Simple Storage Service (Amazon S3) data lake using a central AWS Glue metastore from your Amazon Redshift cluster. This...

    Watch Now
  • Enable fine-grained permissions for Amazon QuickSight authors in AWS Lake Formation

    Enable fine-grained permissions for Amazon QuickSight authors in AWS Lake Formation

    This post demonstrates how to extend the Lake Formation security model to QuickSight users and groups, which allows data lake administrators to manage data catalog resource permissions centrally...

    Watch Now
  • Enforce column-level authorization with Amazon QuickSight and AWS Lake Formation

    Enforce column-level authorization with Amazon QuickSight and AWS Lake Formation

    Amazon QuickSight is a fast, cloud-powered, business intelligence service that makes it easy to deliver insights and integrates seamlessly with your data lake built on Amazon Simple Storage...

    Watch Now
  • How Wind Mobility built a serverless data architecture

    How Wind Mobility built a serverless data architecture

    Guest post by Pablo Giner, Head of BI, Wind Mobility. Over the past few years, urban micro-mobility has become a trending topic. With the contamination indexes hitting historic highs, cities and...

    Watch Now
  • Streaming web content with a log-based architecture with Amazon MSK

    Streaming web content with a log-based architecture with Amazon MSK

    Content, such as breaking news or sports scores, requires updates in near-real-time. To stay up to date, you may be constantly refreshing your browser or mobile app. Building APIs to deliver this...

    Watch Now
  • Process data with varying data ingestion frequencies using AWS Glue job bookmarks

    Process data with varying data ingestion frequencies using AWS Glue job bookmarks

    We often have data processing requirements in which we need to merge multiple datasets with varying data ingestion frequencies. Some of these datasets are ingested one time in full, received...

    Watch Now
  • Moovit embraces data lake architecture by extending their Amazon Redshift cluster to analyze billions of data points every day

    Moovit embraces data lake architecture by extending their Amazon Redshift cluster to analyze billions of data points every day

    In this post, we demonstrate how Moovit, with the support of AWS, implemented a lake house architecture by unloading data into Amazon Simple Storage Service (Amazon S3), instituting a hot/cold...

    Watch Now
  • Access web interfaces securely on Amazon EMR launched in a private subnet using an Application Load Balancer

    Access web interfaces securely on Amazon EMR launched in a private subnet using an Application Load Balancer

    Amazon EMR web interfaces are hosted on the master node of an EMR cluster. When you launch an EMR cluster in a private subnet, the EMR master node doesn’t have a public DNS record. The web...

    Watch Now
  • Best practices for Amazon Redshift Federated Query

    Best practices for Amazon Redshift Federated Query

    This post discusses 10 best practices to help you maximize the benefits of Federated Query when you have large federated data sets, when your federated queries retrieve large volumes of data, or...

    Watch Now
  • Analyzing Google Analytics data with Amazon AppFlow and Amazon Athena

    Analyzing Google Analytics data with Amazon AppFlow and Amazon Athena

    This post demonstrates how you can transfer Google Analytics data to Amazon S3 using Amazon AppFlow, and analyze it with Amazon Athena. You no longer need to build your own application to extract...

    Watch Now
  • Monitor Spark streaming applications on Amazon EMR

    Monitor Spark streaming applications on Amazon EMR

    This post demonstrates how to implement a simple SparkListener, monitor and observe Spark streaming applications, and set up some alerts. The post also shows how to use alerts to set up automatic...

    Watch Now
  • Setting up trust between ADFS and AWS and using Active Directory credentials to connect to Amazon Athena with ODBC driver

    Setting up trust between ADFS and AWS and using Active Directory credentials to connect to Amazon Athena with ODBC driver

    Amazon Athena is a serverless and interactive query service that allows you to easily analyze your raw and processed datasets in Amazon Simple Storage Service (Amazon S3) using standard SQL. The...

    Watch Now
  • How Drop used the Amazon EMR runtime for Apache Spark to halve costs and get results 5.4 times faster

    How Drop used the Amazon EMR runtime for Apache Spark to halve costs and get results 5.4 times faster

    This post details how we designed and implemented our data lake’s batch ETL pipeline to use Amazon EMR, and the numerous ways we iterated on its architecture to reduce Apache Spark runtimes from...

    Watch Now
  • Using Random Cut Forests for real-time anomaly detection in Amazon Elasticsearch Service

    Using Random Cut Forests for real-time anomaly detection in Amazon Elasticsearch Service

    Anomaly detection is a rich field of machine learning. Many mathematical and statistical techniques have been used to discover outliers in data, and as a result, many algorithms have been...

    Watch Now
  • Running a high-performance SAS Grid Manager cluster on AWS with Amazon FSx for Lustre

    Running a high-performance SAS Grid Manager cluster on AWS with Amazon FSx for Lustre

    SAS® is a software provider of data science and analytics used by enterprises and government organizations. SAS Grid is a highly available, fast processing analytics platform that offers...

    Watch Now
  • Moving to managed: The case for the Amazon Elasticsearch Service

    Moving to managed: The case for the Amazon Elasticsearch Service

    You need to factor several considerations into your decision to move to a managed service. Obviously, you want your teams focused on doing meaningful work that propels the growth of your company....

    Watch Now
  • Monitor and control the storage space of a schema with quotas with Amazon Redshift

    Monitor and control the storage space of a schema with quotas with Amazon Redshift

    Yelp connects people with great local businesses. Since its launch in 2004, Yelp has grown from offering services for just one city—its headquarters home of San Francisco—to a multinational...

    Watch Now
  • loading
    Loading More...