Tuesday, July 23, 2013

Split String into Fixed Length chunks using Java

I needed to split a large string into fixed length chunks of equal size using Java Apart from the regular method of looping and doing a substring for the required length, I was wondering in what other way was it possible to achieve the same. Of course, it's Mr. Regex to the rescue for this task at hand! Here was the regex I used to do this:
String largeString = "This is a very large and totally useless and meaningless string";
int chunkLength = 5;
String[] chunks = largeString.split("(?<=\\G.{" + chunkLength + "})");
System.out.println("Number of chunks: " + chunks.length); // Should print 13
System.out.println("Chunk size: " + chunks[0].size()); // Should print 5 == chunkLength
The regular expression is a Positive Look-Behind looking for any chunkLength characters beginning at the position where the last match ended. So the first time around, it matches the beginning of the string and then after that, it keeps matching every set of chunkLength characters.
This makes me think I should write a more detailed post about regular expressions, especially on Look-Ahead and Look-Behind - stay tuned!

Wednesday, May 22, 2013

Free 15+ GB of online storage

Copy is the new online cloud storage service in town.
Sign up for a free account of Copy and you will get 15 GB of storage. And if you use the below referral link, both of us will get an additional 5 GB of extra storage space for free!
So what are you waiting for? Let's all get some more free storage space!


Tuesday, April 30, 2013

Viewing scrolled out logger lines in Eclipse console

Some programs output a lot of text onto the console in Eclipse, especially in debug mode. To ensure that all console output for such programs is available, here are a few Eclipse settings that will be very helpful. Go to Window > Preferences > Run/Debug > Console and change the following:
  1. Fixed width console: un-check this property
  2. Limit console output: un-check this property
  3. Displayed tab width: set this to 4
Save these changes and run the program again. Now you can see the specific logger lines that you want to monitor even if they scroll by and go out of the console.

Thursday, April 18, 2013

Java 5 - Summary of new & important features

Here is a summary of the useful features that were introduced in Java 5.

Generics simplify code
Adds compile-time type safety and eliminates the necessity for type casting without any performance hit

Enhanced For loops - also known as For/Each
Simplifies looping code but does not make the iterator visible. Loops through each entry of the collection/array returning one properly cast value at a time for processing - without requiring to define an Iterator.

Typesafe Enums
Better than public static final String variables - allows us to create enumerated types with arbitrary methods and fields

Metadata Annotations
Reduces coding issues like overloading instead of overriding with the use of pre-defined annotations. Allows
for a more declarative style of programming reducing boiler-plate code with user-defined annotations.

Varargs provides flexibility
Methods can accept a multiple number of parameters defined at runtime with a few restrictions - they have to be the last set of parameters of the method and all have to be of the same data type. The ellipsis "…" is used to indicate that the argument might appear a variable number of times.

Auto Boxing / Unboxing
Implicit conversion between primitives and their wrapper classes - convenient, but can have lower performance as primitives are stored on the stack while the actual objects are stored in the heap.

Synchronization changes in java.util.concurrent package
Introduction of Locks concept to provide better semantics than "synchronized" keyword and many more useful features like Read/Write locks.

Static import
Makes code more readable and reduces redundant typing for qualifying the static members (methods & fields) of a class with the class name in each occurance

Performance improvements with better GC, StringBuilder, etc

Formatter allows for better printing with printf
No more clumsy println's with string concatenation - use C-style printf with format strings.

Scanner simplifies basic parsing
Easier than String.split and Integer.parseInt

These are too numerous and important to detail all of them in a single post - so each will be covered in detail with code examples in posts of their own soon.

Check out the Summary of Java 7 features that make Java even better!

Wednesday, April 10, 2013

Six Sigma - a brief introduction

Sigma is the 18th letter of the Greek alphabet and among other things, the lower case representation is used to denote the Standard Deviation of a population or Probability Distribution in Statistics. In simple terms, this denotes how much of a deviation there is from the perfect situation.

Enough already about plain Sigma, now let's look at Six Sigma.
Six Sigma is a set of data-driven tools and strategies for improving processes - originally developed by Motorola in 1985. It became more popular after Jack Welch made it the standard process to be used in GE in 1995.

The main philosophy behind Six Sigma is to use a data driven approach to measure the number of defects in the process and then figure out a way to systematically reduce them as close to zero as possible. To achieve Six Sigma quality, the defects should be reduced to less than 3.4 per million opportunities. Say for example, if a million parts are produced by a factory in a week, then it would be running at Six Sigma if less than 3.4 (3 or less) parts turn out to be defective in that week.

It was originally intended for improving existing manufacturing processes but later on enhanced to have a methodology for designing new processes as well.
The two methodologies which are composed of five phases each, are (quoted from Wikipedia):
DMAIC : used for improving an existing process.
DFSS / DMADV : used for creating/designing new product or process.


The DMAIC project methodology consists of the following five phases:
Define the problem and the project goals, specifically.
Measure key aspects of the current process and collect relevant data.
Analyze the data to investigate and verify cause-and-effect relationships. Determine what the relationships are, and attempt to ensure that all factors have been considered. Seek out root cause of the defect under investigation.
Improve or optimize the current process based upon data analysis using techniques such as design of experiments, poka yoke or mistake proofing, and standard work to create a new, future state process. Set up pilot runs to establish process capability.
Control the future state process to ensure that any deviations from target are corrected before they result in defects. Implement control systems such as statistical process control, production boards, visual workplaces, and continuously monitor the process.


The DMADV methodology is also known as DFSS - "Design For Six Sigma" and consists of the following five phases:
Define design goals that are consistent with customer demands and the enterprise strategy.
Measure and identify CTQs (characteristics that are Critical To Quality), product capabilities, production process capability, and risks.
Analyze to develop and design alternatives
Design an improved alternative, best suited per analysis in the previous step
Verify the design, set up pilot runs, implement the production process and hand it over to the process owner(s).

Six Sigma denotes four roles for its successful implementation as shown below:
Executive Leadership or top management who are responsible for creating the Six Sigma vision and to empower the employees to bring about this change.
Champions who take responsibility for Six Sigma implementation across the organization in an integrated & consistent manner and also identifying projects/functions for Six Sigma.
Master Black Belts who act as in-house coaches on Six Sigma & devote all their time only to Six Sigma. They assist Champions and guide Black Belts & Green Belts and work on ensuring consistent application of Six Sigma across various functions and departments.
Black Belts operate under Master Black Belts to apply Six Sigma methodology to specific projects and devote all their time to application & execution of Six Sigma for specific projects.
Green Belts are the employees who take up Six Sigma implementation along with their other job responsibilities and work under the guidance of Black Belts.

Tuesday, April 9, 2013

Enable Hibernate in Windows 7

I was very used to hibernating my WinXp Pro laptop to save the state of my work to give me a head start the next morning. So when I upgraded to Windows 7, I was surprised to find that it did not have an option to Hibernate. The only close option was Sleep which is a hybrid option that sleeps on very low power, but hibernates when the remaining battery is very low.

Thankfully, the hibernate option was just hidden out of sight, not removed totally and  one command took care of it as below:

  1. Log in to the laptop as Administrator
  2. Click on Start and type cmd in the search box and press enter
  3. In the command window, type the command:
 powercfg.exe /hibernate on 

Viola! After you exit the command prompt and now you should see the option to hibernate your laptop.

Friday, March 29, 2013

Benchmarking the performance of Java & web Frameworks/Platforms

The guys over at TechEmpower put some of their time to great use for the benefit of the community and put together a meaningful test to benchmark the performance of Java and web frameworks / platforms that are popular these days. The results were very surprising - Netty, Vertx and Servlets outperformed all others by a very big margin!

However, one of the major eye-poppers was raw PHP (no ORM framework) and MySQL database with multiple queries per request - which is likely the practically used business case - performed amazingly well! Add to this the dearth of available good PHP programmers and low hosting costs, no wonder many small businesses opt for rolling their own custom PHP apps & websites than going for big frameworks.

Check out the full blog post with the graphs and details of the test in the TechEmpower blog post here.
You can also check out their code on Git here and join in on the conversation at HackerNews here.

Let the flame wars begin!  :P   ;P

Thursday, March 7, 2013

Count distinct values of a field in a text file on Unix

Here's a simple command to count the number of distinct values in a field in a delimited text file on *nix.
For example, we can assume that the text file is comma separated and the field being counted is the second field.

cut -d ',' -f 2 /files/data.csv | uniq | wc -l

On the other hand, if you also wanted a count of how many times each value has occurred, then use the following command:
cut -d ',' -f 2 /files/data.csv | uniq -c

Simple does it - no need for awk or sed!

Wednesday, March 6, 2013

Interesting fact about the Bluetooth logo

An interesting background about the Bluetooth logo that we see almost every day - quoted from Wikipedia.

The word "Bluetooth" is an anglicized version of the Scandinavian Blåtand/Blåtann, the epithet of the tenth-century king Harald I of Denmark and parts of Norway who united dissonant Danish tribes into a single kingdom. The idea of this name was proposed by Jim Kardach who developed a system that would allow mobile phones to communicate with computers (at the time he was reading Frans Gunnar Bengtsson's historical novel The Long Ships about Vikings and king Harald Bluetooth). The implication is that Bluetooth does the same with communications protocols, uniting them into one universal standard
The Bluetooth logo is a bind rune merging the Younger Futhark runes Runic letter ior.svg (Hagall) and Runic letter berkanan.svg (Bjarkan), Harald's initials.
You can read more details about Bluetooth on Wikipedia or the official Bluetooth website as well.


Related Posts Plugin for WordPress, Blogger...