× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

Removing unwanted characters in an organic molecule diagram

Ray Phan
Jul 16, 2015
<p>A cheap way to do this would be to invert the image so that the object pixels are white instead of black, then performing a morphological closing on the image and removing those areas that fall under a certain amount. Closing would join disconnected regions together and taking advantage that joining the "structure" would generate a region with a large area, you can threshold by the area of each region and eliminate those regions that fall below a certain amount. </p> <p>You can then get back the original image by simply performing a logical <code>AND</code> with the inverted image and the closed result, then reinverting this intermediate result. The effect of this would be that we only keep the pixels that belong to the original image due to the closing operation artificially creating object pixels. Specifically, the joining of the nearby regions of the structure would create new object pixels and so performing an <code>AND</code> will ensure that those pixels not in common with the original get removed. As this is performed on the reverse of the original result, reinverting gets you back to the original domain of object pixels being black instead of white. </p> <p>Something like this:</p> <pre><code>%// Read in image from StackOverflow im = imread('http://i.stack.imgur.com/A7iT7.png'); %// Invert image im = ~im; %// Define 50 x 50 structuring element and close the image se = strel('square', 50); out = imclose(im, se); %// Remove regions whose areas fall below 10000 pixels out = bwareaopen(out, 10000); %// Remove out extraneous closing areas by ANDing with inverted image %// then reinvert to bring back to original label scheme out = ~(im &amp; out); %// Show the image imshow(out); </code></pre> <p>We get this image:</p> <p><img src="http://i.stack.imgur.com/gLdmx.png" alt="enter image description here"></p> <h1>Notes</h1> <ol> <li>The function <a href="http://www.mathworks.com/help/images/ref/imclose.html" rel="nofollow"><code>imclose</code></a> will perform the morphological closing for you with a structuring element defined by <a href="http://www.mathworks.com/help/images/ref/strel.html" rel="nofollow"><code>strel</code></a>. I used a 50 x 50 square to ensure that we have a large enough window to join neighbouring object pixels together.</li> <li>The function <a href="http://www.mathworks.com/help/images/ref/bwareaopen.html" rel="nofollow"><code>bwareaopen</code></a> takes in a binary image and removes out regions whose pixel areas fall below a certain amount. After doing a closing, you will have two connected regions - the top of the image with the structure and the bottom with the text. By experimentation, 10000 pixels removed the region at the bottom.</li> </ol> <hr> <h1>Alternative Method</h1> <p>An alternative method to doing this and being threshold agnostic is to go with your original idea. Do the closing operation, but then assess the areas of each of the connected regions and select the one with the largest area. What I recommend is to use <a href="http://www.mathworks.com/help/images/ref/regionprops.html" rel="nofollow"><code>regionprops</code></a> in that case, which is a function that is specifically designed to analyze characteristics of distinct image regions. The output will be a structure of <code>N</code> elements, where <code>N</code> is the total number of unique and connected objects found in the image and each structure contains fields of properties you'd like to measure in the image. In your case, specify the <code>'Area'</code> and the <code>'PixelIdxList'</code> attributes which contain the areas and column-major pixel locations of each region.</p> <p>You'd find the maximum area overall and use the corresponding pixel locations and set an output map which you'd logical <code>AND</code> with.</p> <p>Something like this:</p> <pre><code>%// Read in image from StackOverflow im = imread('http://i.stack.imgur.com/A7iT7.png'); %// Invert image im = ~im; %// Define 50 x 50 structuring element and close the image se = strel('square', 50); out = imclose(im, se); s = regionprops(out, 'Area', 'PixelIdxList'); %// Apply regionprops %// Find the region with the max area [~,id] = max([s.Area]); %// Create an output mask with the largest area %// Make logical out = zeros(size(im)) == 1; %// Set pixels from largest area out(s(id).PixelIdxList) = true; %// Rest of the logic from before %// Remove out extraneous closing areas by ANDing with inverted image %// then reinvert to bring back to original label scheme out = ~(im &amp; out); %// Show the image imshow(out); </code></pre> <p>You should get exactly the same results as the first method.</p> <p>This tip was originally posted on <a href="http://stackoverflow.com/questions/31356383/Removing%20unwanted%20characters%20in%20an%20organic%20molecule%20diagram/31358189">Stack Overflow</a>.</p>
comments powered by Disqus