- Flow
- Is a logical unit of work
- Combines multiple MapReduce jobs
- Is a combination of Source + pipes + Sink #52
- FlowConnector
- Is a class for supporting various platforms
- Available FlowConnector's #54
- LocalFlowConnector
- HadoopFlowConnector
- ...
- 'Cascade' #55
- Is a connected series of Flow's
Wednesday, October 12, 2016
Flow, FlowConnector & Cascades : 101
Tap : 101
- Source & Sink are specified in terms of Tap #51
- Available Taps
- Hfs
- Lfs
- Dfs
- FileTap
- Example
- ----- Example 1
- Scheme sourceScheme = new TextLine( new Fields( "line" ) );
- Tap source = new Hfs( sourceScheme, inputPath );
- Scheme sinkScheme = new TextLine(new Fields("department","salary" ));
- Tap sink = new Hfs( sinkScheme, outputPath, SinkMode.REPLACE);
Pipe : 101
- Transformation is done through Pipe #40
- branch : Chain of pipes without merge or split is called as branch #40
- Pipe Operations #40
- Filter
- Function
- Aggregator
- Count
- Buffer
- Assertion
- Examples
- ----- Example 1
- Pipe firstPipe = new Pipe("main");
- Pipe eachPipe = new Each(firstPipe, new MyFunction());
- eachPipe = new Each(firstPipe, new MyFilter());
- 'Each' (given above in example) #41
- Flows tuples one at a time through processing chain
- Allows 'Function' or 'Filter' operation
- ----- Example 1
- Pipe payroll =Pipe("payroll");
-
payroll = new Each(payroll, new calc_raise(), new Fields("name","division","salary","raise"));
- Splitting a Pipe #42
- ----- Example 1
- Pipe hrdata = new Each("hrdata", new Fields("name","address","phone"));
-
Pipe developers = new Each(hrdata, new GetDevelopers()); //This is a filter
- Pipe managers = new Each(hrdata, new GetManagers()); // This is a filter
- 'GroupBy' Pipe (GroupBy & sorting) #44
- ----- Example 1
- Pipe payroll = new Each("payroll", new Fields("division", "name", "salary", "rise"), new Identity());
-
// Group by division, sort by salaryFields groupFields = new Fields( "division");Fields sortFields = new Fields( "salary" );Pipe assembly = new GroupBy( payroll, groupFields, sortFields );
- 'Every' Pipe #44
- Operates on Groups of records (from GroupBy or CoGroup pipes)
- 'Merge' Pipe #46
- To join multiple stream into a single stream
- Join pipes #46
- CoGroup Pipe
- HashJoin Pipe
TupleEntry : 101
- Is used as a wrapper for Tuple & Fields #39
- ----- Example 1
- Fields selector = new Fields("name","ssn");
- Tuple tuple = Tuple.size(2);
- TupleEntry tupleEntry = new TupleEntry(selector, tuple);
Defining Scheme : 101
- A Scheme contain field definition #34
- Is also used to parse & transform data #36
- ----- Examples (Scheme, Fields & Coercion)
- Scheme personScheme = new TextLine( new Fields( "first_name","last_name", "age") ); #35
- Fields longFields = new Fields("age", "height", Long.class);
- ---
- Fields[] nameFields = new Fields[] {new Fields("name"), new Fields("age")}; #35
- Type[]typeFields = new Type[]{ String.class, Integer.class };
- Scheme personScheme = new TextLine(new Fields(nameFields, typeFields));
- ---
- ----- Fluent Interface
- Fields inFields = new Fields("name", "address", "phone", "age");
-
inFields.appplyType("age", long.class);
- Preexisting schemes #35
- NullScheme
- TextLine
- Used to read & write text files(gets offset<TAB>data) #36
- TextDelimited
- Used to read delimited text files #37
- SequenceFile #38
Tuple, Fields & Data type coercion...
- Tuple #31
- cascading.tuple.Tuple
- Data is represented as Tuple in Cascading
- Elements in a Tuple is represented by Fields #32
- ----- Methods
- size(): Tuple
- Used to create a Tuple of given size #31
- getObject(): Object
- Returns the element at a given position #32
- ----- Related
- getBoolean(), getString(), getFloat(), getDouble(), getInteger()
- Fields #32
- cascading.tuple.Fields
- Provides storage for field metadata : names, types, comparators & type coercion
- ----- Example
- Fields people = new Fields("first_name", "last_name")
- ----- Useful Field sets
- Fields.ALL #34
- A wild card that represents all the current available fields
- Fields.UNKNOWN
- Fields.ARGS
- ....
- Can be used as both 'declarators' and 'selectors' #34
- Data typing & coercion
- For implicit conversion of one type to another
- ----- Example
- pipe = new Coerce(new Fields("timestamp"), Long.class)
- Fields simple = new Fields ("age", Long.class);
- ----- Methods
- TupleEntry.getString()
Subscribe to:
Posts (Atom)