Ambari Views Now Available on HDInsight
Ambari software was developed through Apache to enable provisioning, managing and monitoring of Hadoop clusters with a GUI and API. Previously Ambari was only available through a plugin to Ambari View Framework. Now, it is available to be used on HDInsight, allowing for the deployment and management of Linux clusters. Two of the predefined views in Ambari are Pig and Hive views. Both can be launched through the Ambari portal.
Hive view allows one to browse databases, write an execute Hive query, look at job history, set Hive query execution parameters and debug Hive queries. An Ambari Views link and tab have been added to the portal to simplify the finding of this option. In addition, this portal will permit both Hive and Pig queries, changing of settings, provide a visual explanation of queries, allow the addition of UDFs, and allow monitoring and debugging of Tez jobs.
Issues with Extensible Web Resource Loading
This method works well for most application and webpages, but it does not work well for an extensible and perf-friendly platform. For this platform to function as developers need and users desire, the developer must be able to:
- Modify default fetch settings of all requests initiated via JS, CSS, and HTML.
- Define the preloader policy for any resource declared in CSS and HTML.
- Define the fetch dispatch policy for any resource declared in CSS and HTML.
FLARE’s pykd Project
FireEye Labs Advanced Reverse Engineering (FLARE) has built a new tool for debugging. This tool uses a scripting library on top of pykd for Windbg. Debuggers typically use a self-decoding or manual programming approach to deobfuscating strings from malware. In self-decoding, when library call emulation is performed, consistent and persistent emulation is necessary and challenging.
In self-decoding, the string decoder function must be detected and recorded at every instance and the arguments to those instances must also be recorded. Ideally, this process would occur semi-automatically. To understand the inputs and outputs of this function as well as its arguments, Python’s Vivisect can be used for binary analysis using heuristics, cross-referencing, and emulating and disassembling series of opcodes.
flare-dbg, which runs on top of pykd, aims to make scripting in Windbg simple by using the DebugUtils class of functions. These functions use Vivisect and provide memory, register manipulation, perform stack operations, debugger execution, and breakpoints and function calling. With these functions working together, once the call_list is generated, all associated strings and arguments are located and string_decoder is used by the DebugUtils call function. Once all strings are decoded, the utils script can be used to create IDA Python scripts that creates the comments in the IDB and the script can be fully debugged.
Launching Clusters in VPC Subnets Supported by Amazon EMR
Amazon EMR 4.2.0 now supports launching of clusters in the Amazon VPC private subnets with “Hadoop ecosystem applications, Spark, and Presto in the subnet” of the client’s choice. Clusters can be launched without IP addresses or Internet gateways, the cluster can have direct access to data in S3, and a Network Address Translation (NAT) can be created so that the cluster may interact with other AWS services.
In order to launch the Amazon EMR clusters, the permissions must be changed in the EMR service role and EC2 instance profile. A route to the S3 buckets must also be established to initialize the clusters. A NAT is not necessary to route to public endpoints if only the S3 functionality will be used for the cluster in AWS.
There are three methods for achieving resting security of the input and output results as well as the “Hadoop Distributed Filesystem (HDFS) distributed across [the] cluster and the Local Filesystem on each instance.” First, Amazon S3 using the EMR Filesystem (EMRFS), which works seamlessly with encrypted data in S3, Second, HFDS transparent encryption with Hadoop KMS can be installed on the master node of the EMR cluster.
Finally, local filesystems on each node can be used on each slave instance. For encryption in transit for Hadoop and Spark, Hadoop MapReduce Shuffle can be used through providing SSL certificates to each node, HFDS rebalancing will send blocks between DataNode processes, or Spark Shuffle will shuffle data between nodes during a job.
The calls that APIs can make can be limited through the use of the Identity and Access Managers (IAM) if a cluster is created with both an EMR service role and an EC2 instance profile. This limits its abilities, and the number of calls it makes can be monitored through AWS CloudTrail. Other existing security features can also be used. “Amazon EMR was also added to the AWS Business Associates Agreement (BAA) for running workloads which process PII data (including eligibility for HIPAA workloads).”