[tex] \tt \: why \: is\sum \: x(1 - x) equal \: to \: 1 - \sum \: x {}^{2} ?[/tex]
I'm going through Python Machine Learning and I'm at the Gini impurity sections, where they define Gini Impurities as

[tex] \tt \: I_g(t) = \sum_{i=1}^c p(i|t) (1 - p(i|t))[/tex]
where p is the proportion of samples that belong to a class c for a particular node t. Fine, seems reasonable enough. But then they go on to simplify the formula into this:

[tex] \tt \: I_g(t) = 1 - \sum_{i=1}^c p(i|t)^2[/tex]
And I cannot, for the life of me, figure out how they arrived at this example. Am I making some incorrect assumptions as to how p(i|t) works? Can I not tokenize p(i|t) like any general variable?

Yung matinong sagot po Sana

❤️Advance thank you❤️​


Sagot :

[tex] \bold{Note° \: that \: \sum_{i=1}^C P(i|t)=1}\\ \bold{thats \: is \: how \: the \: obtain \: in \: the \: simpication.}[/tex]

[tex] \bold{\begin{align}I_g(t) & \bold{= \sum_{i=1}^c p(i|t) (1 - p(i|t)) }\\& \bold{= \sum_{i=1}^c (p(i|t) - p(i|t)^2) }\\& \bold{= \sum_{i=1}^c p(i|t)- \sum_{i=1}^cp(i|t)^2}\\& \boxed{ \bold{=1- \sum_{i=1}^cp(i|t)^2}} \end{align}} \\ \bold{or} \\ \boxed{\bold{\sum_{i=1}^{c} p(i|t) = 1}}[/tex]