× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

MATLAB removing rows which has duplicates in sequence

Ray Phan
Jun 17, 2015
<p>How about an approach using strings? This is certainly not as fast as Luis Mendo's method where you work directly with the numerical array, but it's thinking a bit outside of the box. The basis of this approach is that I consider each row of <code>A</code> to be a unique string, and I can search each string for occurrences of a string of 0s by regular expressions.</p> <pre><code>A=[0 1 0 1 0 1; 0 0 0 1 1 1; 0 0 1 0 0 1; 0 1 0 0 1 0; 1 0 0 0 1 0]; t = 3; B = sprintfc('%s', char('0' + A)); ind = cellfun('isempty', regexp(B, repmat('0', [1 t]))); B(~ind) = []; B = double(char(B) - '0'); </code></pre> <p>We get:</p> <pre><code>B = 0 1 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 </code></pre> <hr> <h1>Explanation</h1> <ul> <li><p>Line 1: Convert each line of the matrix <code>A</code> into a string consisting of 0s and 1s. Each line becomes a cell in a cell array. This uses the undocumented function <a href="http://undocumentedmatlab.com/blog/sprintfc-undocumented-helper-function" rel="nofollow"><code>sprintfc</code></a> to facilitate this cell array conversion.</p></li> <li><p>Line 2: I use <a href="https://en.wikipedia.org/wiki/Regular_expression" rel="nofollow">regular expressions</a> to find any occurrences of a string of 0s that is <code>t</code> long. I first use <a href="http://www.mathworks.com/help/matlab/ref/repmat.html" rel="nofollow"><code>repmat</code></a> to create a search string that is full of 0s and is <code>t</code> long. After, I determine if each line in this cell array contains this sequence of characters (i.e. <code>000....</code>). The function <a href="http://www.mathworks.com/help/matlab/ref/regexp.html" rel="nofollow"><code>regexp</code></a> helps us perform regular expressions and returns the locations of any matches for each cell in the cell array. Alternatively, you can use the function <a href="http://www.mathworks.com/help/matlab/ref/strfind.html" rel="nofollow"><code>strfind</code></a> for more recent versions of MATLAB to speed up the computation, but I chose <code>regexp</code> so that the solution is compatible with most MATLAB distributions out there. </p> <p>Continuing on, the output of <code>regexp/strfind</code> is a cell array of elements where each cell reports the locations of where we found the particular string. If we have a match, there should be at least one location that is reported at the output, so I check to see if any matches are <strong>empty</strong>, meaning that these are the rows we <strong>don't want to remove</strong>. I want to turn this into a <code>logical</code> array for the purposes of removing rows from <code>A</code>, and so this is wrapped with a <a href="http://www.mathworks.com/help/matlab/ref/cellfun.html" rel="nofollow"><code>cellfun</code></a> call to determine the cells that are empty. Therefore, this line returns a <code>logical</code> array where a 0 means that remove this row and a 1 means that we don't.</p></li> <li><p>Line 3: I take the <code>logical</code> array from Line 2 and <strong>invert</strong> it because that's what we really want. We use this inverted array to index into the cell array and remove those strings.</p></li> <li><p>Line 4: The output is still a cell array, so I convert it back into a character array, and finally back into a numerical array.</p></li> </ul> <p>This tip was originally posted on <a href="http://stackoverflow.com/questions/30890771/MATLAB%20removing%20rows%20which%20has%20duplicates%20in%20sequence/30893068">Stack Overflow</a>.</p>
comments powered by Disqus