2 views (last 30 days)

Show older comments

I have a table of data. I want to find out the max values of Melatonin and Cortisol values (two columns in the table) but at the same time I want the corresponding date & time for this max value from the Date/Time column in the table. This is what I am doing but this only gives me the max value of Mel and Cort. How can I find the corresponding date/time as well?

T1_Smooth_Mel=groupsummary(T,{'Participant_ID','Sample_ID'},'max',{'Smooth_Melatonin_concentration'});

T1_Smooth_Cort=groupsummary(T,{'Participant_ID','Sample_ID'},'max',{'Smooth_Cortisol_concentration'});

Any help would be hugely appreciated. Thanks.

Peter Perkins
on 30 Jul 2021

This is a pretty common question, so I'm gonna show a bunch of stuff in hopes others can find it useful.

Let's start with a simple timetable. A timetable, because you have timestamps.

>> Time = datetime(2021,7,1:10)';

>> X = rand(10,1); Y = rand(10,1); G = randi(2,10,1);

>> tt = timetable(Time,X,Y,G)

tt =

10×3 timetable

Time X Y G

___________ _______ _______ _

01-Jul-2021 0.81472 0.15761 2

02-Jul-2021 0.90579 0.97059 1

03-Jul-2021 0.12699 0.95717 2

04-Jul-2021 0.91338 0.48538 2

05-Jul-2021 0.63236 0.80028 2

06-Jul-2021 0.09754 0.14189 2

07-Jul-2021 0.2785 0.42176 2

08-Jul-2021 0.54688 0.91574 1

09-Jul-2021 0.95751 0.79221 2

10-Jul-2021 0.96489 0.95949 1

In its simplest form, groupsummary applies a function to variables in the timetable one at a time. That function must return one value per group. You can also use a function that is applied to more than one variable at a time, but the function must still return one value. Here's a function that takes two inputs, finds the max in one, and returns the corresponding value from the other:

function tmax = myfun1(t,x)

[~,imax] = max(x);

tmax = t(imax);

end

This can't be an anonymous function defined on the fly, because it requires two statements. One more wrinkle is that groupsummary does not currently allow the row times as an input to your function, so to work around that, we can do an on-the-fly conversion to a table with a time variable (this sounds expensive, but actually is not).

>> tMax = groupsummary(timetable2table(tt),"G",@myfun1,{"Time" "X"});

>> tMax.Properties.VariableNames(end) = "TMax"

tMax =

2×3 table

G GroupCount TMax

_ __________ ___________

1 3 10-Jul-2021

2 7 09-Jul-2021

The one value returned by this function is the time at which the maximum occurs, so we can use that as in index into the original data to get the maximum value itself.

>> tMax.XMax = tt.X(tMax.TMax) % index into timetable by time

tMax =

2×4 table

G GroupCount TMax XMax

_ __________ ___________ _______

1 3 10-Jul-2021 0.96489

2 7 09-Jul-2021 0.95751

You have two variables, and the idea extends. Apply myfun1 to Time and X, and then to Time and Y:

>> tMax = groupsummary(timetable2table(tt),"G",@myfun1,{["Time" "Time"] ["X" "Y"]});

>> tMax.Properties.VariableNames(3:end) = ["TMax_X" "TMax_Y"];

>> tMax.XMax = tt.X(tMax.TMax_X);

>> tMax.YMax = tt.Y(tMax.TMax_Y)

tMax =

2×6 table

G GroupCount TMax_X TMax_Y XMax YMax

_ __________ ___________ ___________ _______ _______

1 3 10-Jul-2021 02-Jul-2021 0.96489 0.97059

2 7 09-Jul-2021 03-Jul-2021 0.95751 0.95717

You have a time variable in your data, but suppose you did not. In that case, you can temporarily add a variable that identifies rows in the original data and use it like the time variable.

>> t = table(X,Y,G);

>> t.TempRowInds = (1:height(t))'

t =

10×4 table

X Y G TempRowInds

_______ _______ _ ___________

0.81472 0.15761 2 1

0.90579 0.97059 1 2

0.12699 0.95717 2 3

0.91338 0.48538 2 4

0.63236 0.80028 2 5

0.09754 0.14189 2 6

0.2785 0.42176 2 7

0.54688 0.91574 1 8

0.95751 0.79221 2 9

0.96489 0.95949 1 10

>> tMax = groupsummary(t,"G",@myfun2,{["TempRowInds" "TempRowInds"] ["X" "Y"]})

>> tMax.Properties.VariableNames(3:end) = ["RowMax_X" "RowMax_Y"];

>> tMax.XMax = t.X(tMax.RowMax_X);

>> tMax.YMax = t.Y(tMax.RowMax_Y)

tMax =

2×6 table

G GroupCount RowMax_X RowMax_Y XMax YMax

_ __________ ________ ________ _______ _______

1 3 10 2 0.96489 0.97059

2 7 9 3 0.95751 0.95717

>> t.TempRowInds = []; % don't forget to remove the temp var

where myfun2 is

function rowmax = myfun2(row,x)

[~,imax] = max(x);

rowmax = row(imax);

end

Getting further into advanced manoeuvers, you can use two functions, one to compute the max value, one to find the index at which it occurs.

tMax = groupsummary(timetable2table(tt),"G",{@(t,x)max(x) @myfun2},{["Time" "Time"] ["X" "Y"]});

tMax.Properties.VariableNames(3:end) = ["TMax_X" "XMax" "TMax_Y" "YMax"]

tMax =

2×6 table

G GroupCount TMax_X XMax TMax_Y YMax

_ __________ _______ _______ _______ _______

1 3 0.96489 0.96489 0.97059 0.95949

2 7 0.95751 0.95751 0.95717 0.79221

This is fewer lines of code, but notice that it has to go through each group twice for each variable. Can we avoid that? Also, can we avoid have to regroup for each variable? Yes we can: rowfun.

tMax = rowfun(@myfun3,timetable2table(tt),"InputVariables",["Time" "X" "Y"],"GroupingVariables","G","OutputVariableNames",["TMax_X" "XMax" "TMax_Y" "YMax"])

tMax =

2×6 table

G GroupCount TMax_X XMax TMax_Y YMax

_ __________ ___________ _______ ___________ _______

1 3 10-Jul-2021 0.96489 02-Jul-2021 0.97059

2 7 09-Jul-2021 0.95751 03-Jul-2021 0.95717

where

function [tmax_x,xmax,tmax_y,ymax] = myfun3(t,x,y)

[xmax,imax] = max(x);

tmax_x = t(imax);

[ymax,imax] = max(y);

tmax_y = t(imax);

end

And of course the TempRowInds strategy works here too.

dpb
on 30 Jul 2021

That's what the optional return index value of max is for...

[mx_Smooth_Mel,ixMel]=max(T.Smooth_Melatonin_concentration);

[mx_Smooth_Cort,ixCort]=max(T.Smooth_Cortisol_concentration);

Then

Tmaxes=[T([ixMel;ixCort],:);

will give you a 2-row table of all the data corresponding to those two locations

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!