A Peek Into the World of Streaming

Similar documents
Using social media research methods to identify hidden churches

Urban Chickens P U B L I C P A R T I C I P A T I O N R E P O R T

Shell (cont d) SSE2034: System Software Experiment 3, Fall 2018, Jinkyu Jeong

Responsible Antimicrobial Use

Trail Blazing on Facebook to Save More Lives. Presented by 4 contestants in the 2012 ASPCA Rachael Ray $100K Challenge $100K Challenge

Feline Environmental Enrichment Gentle Paws

The ALife Zoo: cross-browser, platform-agnostic hosting of Artificial Life simulations

AKC Twitter

SPONSORED CONTENT 2018

Design of 32 bit Parallel Prefix Adders

New Zealand s Strategy for a more profitable sheep & beef industry. 5 September 2011 P11026

Moving towards formalisation COMP62342

SpayJax: Government-Funded Support for Spay/Neuter

DESIGN AND SIMULATION OF 4-BIT ADDERS USING LT-SPICE

Dynamic Programming for Linear Time Incremental Parsing

The OIE Relevant Standards and Guidelines for Veterinary Medicinal Products

Stray Dog Survey A report prepared for: Dogs Trust. GfK NOP. Provided by: GfK NOP Social Research. Your contact:

Getting Started! Searching for dog of a specific breed:

PREFACE: How To Use The Charts. Nelson Gun Model. W/SR GUN, Ring Nozzles. W/SR GUN, Ring Nozzles

DICOM Correction Proposal

RESPONSIBLE ANTIMICROBIAL USE

GUIDELINES FOR THE NATIONAL DIGITAL COMPETITION

NEW VOLUNTEER GUIDELINES

Good Health Records Setup Guide for DHI Plus Health Event Users

Moving toward formalisation COMP62342

Free Splat The Cat Ebooks Online

Hydraulic Report. County Road 595 Bridge over Yellow Dog River. Prepared By AECOM Brian A. Hintsala, P.E

16-BIT CARRY SELECT ADDER. Anushree Garg B.Tech Scholar, JVW, University, Rajasthan, India

Writing Simple Procedures Drawing a Pentagon Copying a Procedure Commanding PenUp and PenDown Drawing a Broken Line...

Overcoming Scaffolding ADDICTION

Social Change 101. April 14, ASPCA. All Rights Reserved.

Scratch Lesson Plan. Part One: Structure. Part Two: Movement

Use of Agent Based Modeling in an Ecological Conservation Context

Also place two 2x4 boards along the bottom or use bricks to keep off the ground. 16

Tests. tend. name. get descriptive stats

Rethinking RTOs: Identifying and Removing Barriers to Owner Reclaim, Part One

I used to love shrimp cocktail. When I was four years old, I would constantly ask my

Coding with Scratch - First Steps

Recursion with Turtles

Bulk Milk Data and Udder Health

Example Items. Grade 4 Reading

Stray Dog Survey 2010

288 Seymour River Place North Vancouver, BC V7H 1W6

Economics of mastitis. Kirsten Huijps and Henk Hogeveen

SPONSORED CONTENT 2018

Advanced Uses of Earned Value Management in Projects, Programmes and Portfolios

REARING LAYING HENS IN A BARN SYSTEM WITHOUT BEAK TRIMMING: THE RONDEEL EXAMPLE

LISTEN A MINUTE.com. Chickens. Focus on new words, grammar and pronunciation in this short text.

Ouch! You re hurting. CoMPeting With general electric. The Glowworm

Pixie-7P. Battery Connector Pixie-7P Fuse* Motor. 2.2 Attaching the Motor Leads. 1.0 Features of the Pixie-7P: Pixie-7P Batt Motor

Code Documentation MFA (Movable Finite Automata) Eric Klemchak CS391/CS392

Scratch. To do this, you re going to need to have Scratch!

The OIE Relevant Standards and Guidelines for Vaccines

just so dreary. Bored with her game, she sat and stared blankly into space for the remainder of the short bus trip. The bus dropped her off just half

STRAY DOGS SURVEY 2015

If the good Lord had wanted most of us to see the sunrise, He would of scheduled it later in the day.

StarLogo Complete Command List (Edited and reformatted by Nicholas Gessler, 6 June 2001.)

The European AMR Challenge - strategic views from the human perspective -

Breaking News English.com Ready-to-Use English Lessons by Sean Banville

Like to see more lambs?

KB Record Errors Report

ROUGH TERRAIN CRANE GR-120NL GR-120N

THE BUTTERFLY AND THE KITTEN

LABORATORY EXERCISE 7: CLADISTICS I

Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder

CHAPTER THIRTEEN KEEPING OF ANIMALS, POULTRY AND BEES 2007

User Manual. Senior Project Mission Control. Product Owner Charisse Shandro Mission Meow Cat Rescue and Adoptions, Inc.

It Is Raining Cats. Margaret Kwok St #: Biology 438

REPUBLIC OF LITHUANIA LAW ON VETERINARY ACTIVITIES. 17 December 1991, No.I-2110 Vilnius (As amended by 7 October 1999, No.

6.14(a) - How to Run CAT Reports Record Errors Report

The City School. Learn Create Program

Doggie Down. A beginners guide to being a dogs best friend and a astonishing excellent owner! By Zoe.B

Genetics for breeders. The genetics of polygenes: selection and inbreeding

Nutrient analysis of eggs

Environment and Health did you get the memo?

Antimicrobial Stewardship and Use Monitoring Michael D. Apley, DVM, PhD, DACVCP Kansas State University, Manhattan, KS

Multiclass and Multi-label Classification

Subdomain Entry Vocabulary Modules Evaluation

Import Health Standard

Prairie Warbler Survival

SEVENTH'ANNUAL'JUILFS'CONTEST' SPRING'2015' ' '

Cats See Us Less. AAHA Web Conference: Becoming a Cat Friendly Practice. February 20 - March 4, Oh Where, Oh Where Have Our Feline Friends Gone?

Proposed New Brighton Park Shoreline Habitat Restoration Project

Development of the New Zealand strategy for local eradication of tuberculosis from wildlife and livestock

FPGA Implementation of Efficient 16-Bit Parallel Prefix Kogge Stone Architecture for Convolution Applications Geetha.B 1 Ramachandra.A.

B B. Thank You. ytes. A Special Note to Our Awesome FurKid Families. Donate to BFK. Inside. Find us on Facebook

Apple Training Series: AppleScript PDF

Please initial and date as your child has completely mastered reading each column.

THE GOLD COIN AN EARLY READER SERIES READER 5

Antibiotics and beef & lamb

The Emergency Shelter Learning Series. Low-Barrier Access to Shelters for People and Their Animals

Breaking News English.com Ready-to-Use English Lessons by Sean Banville

FINAL Preliminary Report for CSP Project New Zealand sea lion monitoring at the Auckland Islands 2017/18

Cam in the Classroom Mrs. Brown s Fourth Grade Class Churchville Elementary School Churchville Ave, Churchville, VA

VGP 101 Part 2: Making a Training Plan

Component Specification NFQ Level 5. Sheep Husbandry 5N Component Details. Sheep Husbandry. Level 5. Credit Value 10

University Council on Animal Care

BOUNDARY GAMES THE MOST REQUESTED LEARNING SUBJECT EVER

A CAT AND A DOG epoint.edu.vn A CAT AND A DOG. page 1 / 5

Breaking News English.com Ready-to-Use English Lessons by Sean Banville

Transcription:

A Peek Into the World of Streaming

What s Streaming? Data Stream processing engine Summarized data

What s Streaming? Data Stream processing engine Summarized data Data storage

Funny thing: Streaming in practice often started on disk! Data storage Data Stream processing engine Summarized data Data storage

Outline Motivate streaming applications Apache Spark Streaming Dataflow/Apache Beam and Watermarks Apache Spark Structured Streaming and Watermarks.

Counting hashtags: batch Input: Output: Timestamped twitter messages, some of them with hash tags. For each five-minute window, the top ten hashtags along with their counts. 10:01 I love cats #cats 10:04 My cat just ate a bug, gross. #cats 10:06 My cat is so cute! #cats 10:00-10:05 #cats 2 10:05-10:10 #cats 1

Counting hashtags: batch, cont Compute the time interval the tweet falls into (eg, 10:00-10:05, or 10:05-10:10) reduce by a key of time-interval,hashtag

Counting hashtags: streaming Input: Timestamped twitter messages, some of them with hash tags. Output: Output the top ten hashtags along with their counts. 10:01 I love cats #cats 10:04 My cat just ate a bug, gross. #cats 10:06 My cat is so cute! #cats 10:00-10:05 #cats 2 10:05-10:10 #cats 1

Apache Spark Streaming Idea: divide input up into micro-batches 10:01 I love cats #cats 10:04 My cat just ate a bug, gross. #cats 10:06 My cat is so cute! #cats Batch 10:00-10:05 Batch 10:05-10:10 For each batch, group by the hashtag (reduceby in Apache Spark), and perform the count. DONE.

Apache Spark Streaming: Batch boundary does not need to match aggregation boundary 10:01:30 I love cats #cats 10:01:55 RT #cats are the best. 10:02:12 Dead mouse #cats 10:03:52 Live mouse. Wish I had a cat. #cat 10:04:23 My cat just ate a bug, gross. #cats 10:04:44 My #cat had kittens! 10:06 My cat is so cute! #cats 10:01 10:02 10:03 10:04 Aggregate by batches and sum over five batches.

Dataflow/Apache Beam Data is in a PCollection Programmer provides transformations on the PCollection Helpers for basics like groupby When done, Run! Pipeline mypipeline = Pipeline.create(options) PCollection<String> inputdata = mypipeline.apply(/* read the data */) // Nothing PCollection<String> foo =inputdata.apply(...).apply(...); // Nothing happens. // more stuff... mypipeline.run(); // Actually does something

Dataflow: Program vs execution may be different Trim string To lower case Group By Dataflow program is given all operations in advance and may re-order and rearrange Execution engine essentially pluggable because what is provided is a program description. Program can be batch or streaming Uses a watermark in streaming applications

Back to Counting #cats Or, why watermark is useful

What problems can arise here? Data Data Do aggregation #cat counts

Counting hashtags: real world Data as seen by the stream processor: 10:01 I love cats #cats 10:06 My cat is so cute! #cats 10:04 My cat just ate a bug, gross. #cats Really common! Question: What s the count for 10:00-10:05? When do we output it? 10:00-10:05 #cats? 10:05-10:10 #cats?

Counting hashtags: real world Data as seen by the stream processor: 10:01 I love cats #cats 10:06 My cat is so cute! #cats 10:04 My cat just ate a bug, gross. #cats Really common! Question: What s the count for 10:00-10:05? When do we output it? 10:00-10:05 #cats 2 10:00-10:05 #cats 1

Counting hashtags: real world Case 1: 10:01 I love cats #cats 10:06 My cat is so cute! #cats 10:04 My cat just ate a bug, gross. #cats Case 2: 10:01 I love cats #cats 10:06 My cat is so cute! #cats three hours later... 10:04 My cat just ate a bug, gross. #cats Question: What s the count for 10:00-10:05? When do we output it?

What to do with out-of-order data?

What to do with out-of-order data? Discard anything earlier than what s already been seen One early data item, and you miss a lot! You could miss an entire slow-to-arrive source! When nothing from 10:00-10:05 has been seen for x minutes Depends on data input source Maybe the source has some idea of how out-of-order data can be Special business logic: When fewer than x things seen and. Question: How do we program this?

Watermark: Data is complete up until this point Watermark is X all data earlier than X has arrived In our example: Watermark is 10:05 all data up until 10:05 has arrived we can output the #cat count Doneness for a Stream! Refinement 1: Early outputs Refinement 2: Late data treatment: drop it all, allow it all, drop it if later than some time.

Dataflow with watermarks Runtime tracks the watermark (with potential source-specific logical) API provides way to specify whether and when to output before watermark is reached API provides way to specify whether and when to output after watermark is reached To see how much of a difference that can make: https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison (google spark vs dataflow)

Apache Spark 2.0 structured streaming Incorporates idea of a watermark. It maintains a table of output results, and updates them as data is processed. 10:01 I love cats #cats 10:06 My cat is so cute! #cats 10:00-10:05 #cats 1 10:05-10:10 #cats 1

Apache Spark 2.0 structured streaming Incorporates idea of a watermark. It maintains a table of output results, and updates them as data is processed. 10:01 I love cats #cats 10:06 My cat is so cute! #cats 10:04 My cat just ate a bug, gross. #cats 10:00-10:05 #cats 2 10:05-10:10 #cats 1

Questions?